A regex cheatsheet of the patterns I actually use weekly
Regex in Production: Practical Patterns, Pitfalls, and Performance for Modern TypeScript
Current Situation Analysis
Regular expressions remain one of the most misunderstood tools in a developer's arsenal. The industry pain point is not a lack of patterns, but a lack of context regarding when to use regex, how to implement it safely, and where it fails. Many teams copy-paste patterns from forums without understanding the performance implications or security boundaries, leading to subtle bugs, denial-of-service vulnerabilities, and false negatives in user input.
This problem is often overlooked because regex is treated as a "magic string" rather than a computational model. Developers frequently attempt to solve structural parsing problems (like HTML sanitization or RFC-compliant email validation) with regex, despite these being provably difficult or impossible tasks for finite state automata.
Data-backed evidence from industry practice:
- RFC 5322 Complexity: The official email specification allows for quoted strings, comments, and IP address literals. A fully compliant regex for RFC 5322 exceeds 6,000 characters and is widely considered impractical for production validation.
- Security Risks: Using regex to strip HTML tags for security purposes is a known anti-pattern. Attackers can bypass simple tag-stripping regexes using malformed tags or encoding tricks, leading to Cross-Site Scripting (XSS) vulnerabilities.
- Performance Degradation: Poorly constructed regex patterns can cause Catastrophic Backtracking (ReDoS), where execution time grows exponentially with input length, potentially freezing Node.js event loops or browser main threads.
WOW Moment: Key Findings
The critical insight for production engineering is distinguishing between format validation and semantic validation. Regex excels at checking structure (format) but should rarely be used for logic or security-critical transformations. The table below contrasts naive regex approaches with production-ready strategies.
| Scenario | Naive Regex Approach | Production-Ready Strategy | Risk / Impact |
|---|---|---|---|
| HTML Sanitization | <[^>]+> |
Use DOMPurify or sanitize-html |
Critical: XSS vulnerability via malformed tags. |
| Email Validation | RFC 5322 Regex | Loose Regex + Confirmation Email | High: Strict regex blocks valid users; loose regex requires verification. |
| UUID Verification | ^[0-9a-f-]+$ |
Version-specific Regex | Low: Loose regex accepts invalid UUIDs; specific regex ensures version/variant. |
| Date Parsing | ^\d{4}-\d{2}-\d{2}$ |
Regex Format Check + Date.parse() |
Medium: Regex matches 2024-13-99; library validates calendar logic. |
| Text Extraction | ".*" |
"(.*?)" |
Medium: Greedy matching captures across multiple fields; lazy matching isolates data. |
Core Solution
The following implementation provides a TypeScript utility module for common production patterns. Each pattern is wrapped in a typed function with clear documentation on scope and limitations. The code emphasizes non-capturing groups where possible to reduce memory overhead and includes comments explaining the structural constraints.
1. Identity and IDs
UUID v4 Validation
This pattern enforces the version marker (4) and the variant bits (8, 9, a, or b). It ensures the string is a valid v4 UUID, not just any hex string with dashes.
/**
* Validates a string against the UUID v4 specification.
* Enforces version '4' and variant bits '89ab'.
*/
export const isUUIDv4 = (input: string): boolean => {
// Structure: 8-4-4-4-12 hex chars
// Group 3 starts with '4' (version)
// Group 4 starts with '8', '9', 'a', or 'b' (variant)
const uuidV4Pattern = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/;
return uuidV4Pattern.test(input);
};
2. Text Transformation
Slug Generation This utility converts arbitrary text into URL-safe slugs. It normalizes case, replaces non-alphanumeric sequences with hyphens, and trims edge cases.
/**
* Converts a string to a URL-safe slug.
* Replaces non-alphanumeric characters with hyphens and trims boundaries.
*/
export const generateSlug = (text: string): string => {
return text
.toLowerCase()
// Replace any sequence of non-alphanumeric chars with a single hyphen
.replace(/[^a-z0-9]+/g, '-')
// Remove leading or trailing hyphens
.replace(/^-+|-+$/g, '');
};
HTML Preview Extraction Warning: This is strictly for generating plain-text previews. Never use this for sanitizing user-generated content.
/**
* Strips HTML tags to extract plain text for preview purposes.
* NOT SECURE for sanitization. Use DOMPurify for security-critical contexts.
*/
export const extractPlainTextPreview = (html: string): string => {
// Matches opening and closing tags, including attributes
return html.replace(/<[^>]+>/g, '');
};
3. Contact and Dates
Loose Email Format Check
RFC-compliant validation is impractical. This pattern checks for the essential structure: non-whitespace characters, an @ symbol, a domain, and a TLD. Real validation requires sending a confirmation link.
/**
* Performs a loose validation of email format.
* Checks for basic structure: local-part@domain.tld
* Does not validate RFC 5322 compliance.
*/
export const hasValidEmailFormat = (input: string): boolean => {
// ^[^\s@]+ : Start, one or more chars that are not whitespace or @
// @[^\s@]+ : @ symbol, followed by domain chars
// \.[^\s@]+$ : Dot, followed by TLD chars, end of string
const emailPattern = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return emailPattern.test(input);
};
Lenient US Phone Number This pattern accommodates various formatting styles common in user input, including optional country codes, parentheses, and separators.
/**
* Validates a US phone number with lenient formatting.
* Accepts formats like: 555-555-5555, (555) 555-5555, +1 555.555.5555
*/
export const isUSPhoneLenient = (input: string): boolean => {
// Optional +1 or 1 prefix
// Optional separators [-.\s]
// Optional parentheses around area code
const phonePattern = /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
return phonePattern.test(input);
};
ISO Date Format Check
Validates the YYYY-MM-DD structure. Note that this does not verify calendar validity (e.g., 2024-13-99 will pass). Use a date library for semantic validation.
/**
* Checks if a string matches the ISO 8601 date format (YYYY-MM-DD).
* Validates structure only; does not check for valid calendar dates.
*/
export const isISODateFormat = (input: string): boolean => {
return /^\d{4}-\d{2}-\d{2}$/.test(input);
};
4. Parsing and Extraction
Markdown Link Extraction Captures link text and URL from Markdown syntax. Uses capturing groups to extract the components.
interface MarkdownLink {
text: string;
url: string;
}
/**
* Extracts the text and URL from a Markdown link pattern [text](url).
* Returns null if no match is found.
*/
export const parseMarkdownLink = (text: string): MarkdownLink | null => {
// \[ and \] match literal brackets
// ([^\]]+) captures text inside brackets (Group 1)
// \[ and \] match literal parentheses
// ([^)]+) captures URL inside parentheses (Group 2)
const linkPattern = /\[([^\]]+)\]\(([^)]+)\)/;
const match = linkPattern.exec(text);
if (!match) return null;
return {
text: match[1],
url: match[2],
};
};
Pitfall Guide
Production regex requires vigilance against common traps. The following pitfalls are derived from real-world debugging sessions and security audits.
1. Catastrophic Backtracking (ReDoS)
Explanation: Certain patterns can cause the regex engine to enter an exponential time complexity loop when processing malicious input. This is known as Regular Expression Denial of Service.
Example: ^(a+)+$ applied to aaaaaaaaaaaaaaaaaaaaaaaaaaaaX causes the engine to try every possible combination of a+ groups before failing.
Fix: Avoid nested quantifiers on overlapping character classes. Use possessive quantifiers or atomic groups if supported, or limit input length before matching. In TypeScript/JS, validate input length explicitly: if (input.length > 1000) return false;.
2. The HTML Sanitization Fallacy
Explanation: Attempting to remove malicious scripts using regex like <script[^>]*>(.*?)</script> is insecure. Attackers can use malformed tags, encoding, or CSS expressions to bypass regex filters.
Fix: Never use regex for security sanitization. Use a battle-tested library like DOMPurify for DOM contexts or sanitize-html for Node.js. These libraries parse the HTML structure and whitelist safe elements.
3. Greedy vs. Lazy Quantifiers
Explanation: The * and + quantifiers are greedy by default, matching as much text as possible. This causes issues when extracting data between delimiters.
Example: "(.*?)" correctly matches "foo" and "bar" separately. "(.*") matches "foo" and "bar" as a single match, swallowing the middle content.
Fix: Always use lazy quantifiers (*?, +?) when extracting content between known delimiters.
4. Anchor Misuse in Multiline Strings
Explanation: By default, ^ and $ match the start and end of the entire string, not individual lines. This leads to false negatives when validating line-based input.
Fix: Use the m (multiline) flag to make ^ and $ match line boundaries. Example: /^pattern$/m.
5. Capturing Group Overhead
Explanation: Parentheses () create capturing groups, which store matched text in memory. If you only need grouping for alternation or repetition, capturing adds unnecessary overhead.
Fix: Use non-capturing groups (?:...) when you don't need to extract the sub-match. Example: (?:https?|ftp):// is more efficient than (https?|ftp):// if you only care about the protocol presence.
6. Character Class Escaping Confusion
Explanation: Inside square brackets [], most special characters lose their meaning and do not require escaping. However, ], \, ^ (at start), and - (between chars) still need care.
*Fix:** Remember that `[.+?]matches literal dot, plus, asterisk, and question mark without backslashes. Only escape], `, ^, and - as needed.
7. RFC Perfectionism Trap
Explanation: Trying to validate emails, phone numbers, or URLs against full RFC specifications often results in patterns that reject valid inputs or are too slow. Fix: Adopt a "loose validation, strict verification" approach. Use a simple regex to catch obvious typos, then verify the resource exists (e.g., send a confirmation email). This improves user experience and reduces maintenance burden.
Production Bundle
Action Checklist
- Define Scope: Determine if regex is appropriate. Use parsers for complex structures (HTML, JSON, SQL).
- Security Review: Ensure regex is not used for sanitization. Verify inputs against ReDoS patterns.
- Performance Test: Benchmark regex against large inputs. Check for catastrophic backtracking.
- Type Safety: Wrap regex in typed functions with clear interfaces and documentation.
- Edge Cases: Test with empty strings, unicode characters, and malformed inputs.
- Documentation: Comment patterns with explanations of structure and limitations.
- Library Check: Prefer established libraries for complex tasks (e.g.,
uuid,date-fns,DOMPurify).
Decision Matrix
Use this matrix to choose the right approach for input handling.
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Simple Format Check | Regex | Fast, lightweight, sufficient for structure. | Low |
| HTML Sanitization | DOMPurify / sanitize-html | Regex cannot handle edge cases securely. | Medium (Dependency) |
| UUID Generation/Check | uuid library |
Handles versioning, variants, and RFC compliance. | Low |
| Date Parsing | date-fns / dayjs |
Regex cannot validate calendar logic or timezones. | Low |
| Complex Extraction | Parser / AST | Regex fails on nested or recursive structures. | High (Complexity) |
| Email Validation | Loose Regex + Verification | RFC regex is impractical; verification ensures validity. | Low |
Configuration Template
A reusable TypeScript module structure for regex utilities.
// regex-utils.ts
/**
* Production-ready regex utilities.
* All patterns are documented with scope and limitations.
*/
// --- Types ---
export interface MarkdownLink {
text: string;
url: string;
}
// --- Patterns ---
const PATTERNS = {
UUID_V4: /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/,
EMAIL_LOOSE: /^[^\s@]+@[^\s@]+\.[^\s@]+$/,
SLUG_CHARS: /[^a-z0-9]+/g,
SLUG_TRIM: /^-+|-+$/g,
HTML_TAGS: /<[^>]+>/g,
ISO_DATE: /^\d{4}-\d{2}-\d{2}$/,
MARKDOWN_LINK: /\[([^\]]+)\]\(([^)]+)\)/,
US_PHONE: /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/,
} as const;
// --- Functions ---
export const isUUIDv4 = (input: string): boolean => PATTERNS.UUID_V4.test(input);
export const hasValidEmailFormat = (input: string): boolean => PATTERNS.EMAIL_LOOSE.test(input);
export const generateSlug = (text: string): string => text.toLowerCase().replace(PATTERNS.SLUG_CHARS, '-').replace(PATTERNS.SLUG_TRIM, '');
export const extractPlainTextPreview = (html: string): string => html.replace(PATTERNS.HTML_TAGS, '');
export const isISODateFormat = (input: string): boolean => PATTERNS.ISO_DATE.test(input);
export const isUSPhoneLenient = (input: string): boolean => PATTERNS.US_PHONE.test(input);
export const parseMarkdownLink = (text: string): MarkdownLink | null => {
const match = PATTERNS.MARKDOWN_LINK.exec(text);
return match ? { text: match[1], url: match[2] } : null;
};
Quick Start Guide
- Copy Template: Add
regex-utils.tsto your project's utility directory. - Import Functions: Use named imports in your code:
import { isUUIDv4, generateSlug } from './regex-utils';. - Validate Inputs: Replace ad-hoc regex calls with the utility functions for consistency.
- Run Tests: Verify patterns against your specific edge cases. Add unit tests for each function.
- Monitor Performance: In high-throughput scenarios, benchmark regex usage and consider caching compiled patterns if necessary.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
