A regex cheatsheet of the patterns I actually use weekly

Regex in Production: Practical Patterns, Pitfalls, and Performance for Modern TypeScript

Current Situation Analysis

Regular expressions remain one of the most misunderstood tools in a developer's arsenal. The industry pain point is not a lack of patterns, but a lack of context regarding when to use regex, how to implement it safely, and where it fails. Many teams copy-paste patterns from forums without understanding the performance implications or security boundaries, leading to subtle bugs, denial-of-service vulnerabilities, and false negatives in user input.

This problem is often overlooked because regex is treated as a "magic string" rather than a computational model. Developers frequently attempt to solve structural parsing problems (like HTML sanitization or RFC-compliant email validation) with regex, despite these being provably difficult or impossible tasks for finite state automata.

Data-backed evidence from industry practice:

RFC 5322 Complexity: The official email specification allows for quoted strings, comments, and IP address literals. A fully compliant regex for RFC 5322 exceeds 6,000 characters and is widely considered impractical for production validation.
Security Risks: Using regex to strip HTML tags for security purposes is a known anti-pattern. Attackers can bypass simple tag-stripping regexes using malformed tags or encoding tricks, leading to Cross-Site Scripting (XSS) vulnerabilities.
Performance Degradation: Poorly constructed regex patterns can cause Catastrophic Backtracking (ReDoS), where execution time grows exponentially with input length, potentially freezing Node.js event loops or browser main threads.

WOW Moment: Key Findings

The critical insight for production engineering is distinguishing between format validation and semantic validation. Regex excels at checking structure (format) but should rarely be used for logic or security-critical transformations. The table below contrasts naive regex approaches with production-ready strategies.

Scenario	Naive Regex Approach	Production-Ready Strategy	Risk / Impact
HTML Sanitization	`<[^>]+>`	Use `DOMPurify` or `sanitize-html`	Critical: XSS vulnerability via malformed tags.
Email Validation	RFC 5322 Regex	Loose Regex + Confirmation Email	High: Strict regex blocks valid users; loose regex requires verification.
UUID Verification	`^[0-9a-f-]+$`	Version-specific Regex	Low: Loose regex accepts invalid UUIDs; specific regex ensures version/variant.
Date Parsing	`^\d{4}-\d{2}-\d{2}$`	Regex Format Check + `Date.parse()`	Medium: Regex matches `2024-13-99`; library validates calendar logic.
Text Extraction	`".*"`	`"(.*?)"`	Medium: Greedy matching captures across multiple fields; lazy matching isolates data.

Core Solution

The following implementation provides a TypeScript utility module for common production patterns. Each pattern is wrapped in a typed function with clear documentation on scope and limitations. The code emphasizes non-capturing groups where possible to reduce memory overhead and includes comments explaining the structural constraints.

1. Identity and IDs

UUID v4 Validation This pattern enforces the version marker (4) and the variant bits (8, 9, a, or b). It ensures the string is a valid v4 UUID, not just any hex string with dashes.

/**
 * Validates a string against the UUID v4 specification.
 * Enforces version '4' and variant bits '89ab'.
 */
export const isUUIDv4 = (input: string): boolean => {
  // Structure: 8-4-4-4-12 hex chars
  // Group 3 starts with '4' (version)
  // Group 4 starts with '8', '9', 'a', or 'b' (variant)
  const uuidV4Pattern = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/;
  return uuidV4Pattern.test(input);
};

2. Text Transformation

Slug Generation This utility converts arbitrary text into URL-safe slugs. It normalizes case, replaces non-alphanumeric sequences with hyphens, and trims edge cases.

/**
 * Converts a string to a URL-safe slug.
 * Replaces non-alphanumeric characters with hyphens and trims boundaries.
 */
export const generateSlug = (text: string): string => {
  return text
    .toLowerCase()
    // Replace any sequence of non-alphanumeric chars with a single hyphen
    .replace(/[^a-z0-9]+/g, '-')
    // Remove leading or trailing hyphens
    .replace(/^-+|-+$/g, '');
};

HTML Preview Extraction Warning: This is strictly for generating plain-text previews. Never use this for sanitizing user-generated content.

/**
 * Strips HTML tags to extract plain text for preview purposes.
 * NOT SECURE for sanitization. Use DOMPurify for security-critical contexts.
 */
export const extractPlainTextPreview = (html: string): string => {
  // Matches opening and closing tags, including attributes
  return html.replace(/<[^>]+>/g, '');
};

3. Contact and Dates

Loose Email Format Check RFC-compliant validation is impractical. This pattern checks for the essential structure: non-whitespace characters, an @ symbol, a domain, and a TLD. Real validation requires sending a confirmation link.

/**
 * Performs a loose validation of email format.
 * Checks for basic structure: local-part@domain.tld
 * Does not validate RFC 5322 compliance.
 */
export const hasValidEmailFormat = (input: string): boolean => {
  // ^[^\s@]+ : Start, one or more chars that are not whitespace or @
  // @[^\s@]+ : @ symbol, followed by domain chars
  // \.[^\s@]+$ : Dot, followed by TLD chars, end of string
  const emailPattern = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
  return emailPattern.test(input);
};

Lenient US Phone Number This pattern accommodates various formatting styles common in user input, including optional country codes, parentheses, and separators.

/**
 * Validates a US phone number with lenient formatting.
 * Accepts formats like: 555-555-5555, (555) 555-5555, +1 555.555.5555
 */
export const isUSPhoneLenient = (input: string): boolean => {
  // Optional +1 or 1 prefix
  // Optional separators [-.\s]
  // Optional parentheses around area code
  const phonePattern = /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
  return phonePattern.test(input);
};

ISO Date Format Check Validates the YYYY-MM-DD structure. Note that this does not verify calendar validity (e.g., 2024-13-99 will pass). Use a date library for semantic validation.

/**
 * Checks if a string matches the ISO 8601 date format (YYYY-MM-DD).
 * Validates structure only; does not check for valid calendar dates.
 */
export const isISODateFormat = (input: string): boolean => {
  return /^\d{4}-\d{2}-\d{2}$/.test(input);
};

4. Parsing and Extraction

Markdown Link Extraction Captures link text and URL from Markdown syntax. Uses capturing groups to extract the components.

interface MarkdownLink {
  text: string;
  url: string;
}

/**
 * Extracts the text and URL from a Markdown link pattern [text](url).
 * Returns null if no match is found.
 */
export const parseMarkdownLink = (text: string): MarkdownLink | null => {
  // \[ and \] match literal brackets
  // ([^\]]+) captures text inside brackets (Group 1)
  // \[ and \] match literal parentheses
  // ([^)]+) captures URL inside parentheses (Group 2)
  const linkPattern = /\[([^\]]+)\]\(([^)]+)\)/;
  const match = linkPattern.exec(text);
  
  if (!match) return null;
  
  return {
    text: match[1],
    url: match[2],
  };
};

Pitfall Guide

Production regex requires vigilance against common traps. The following pitfalls are derived from real-world debugging sessions and security audits.

1. Catastrophic Backtracking (ReDoS)

Explanation: Certain patterns can cause the regex engine to enter an exponential time complexity loop when processing malicious input. This is known as Regular Expression Denial of Service. Example: ^(a+)+$ applied to aaaaaaaaaaaaaaaaaaaaaaaaaaaaX causes the engine to try every possible combination of a+ groups before failing. Fix: Avoid nested quantifiers on overlapping character classes. Use possessive quantifiers or atomic groups if supported, or limit input length before matching. In TypeScript/JS, validate input length explicitly: if (input.length > 1000) return false;.

2. The HTML Sanitization Fallacy

Explanation: Attempting to remove malicious scripts using regex like <script[^>]*>(.*?)</script> is insecure. Attackers can use malformed tags, encoding, or CSS expressions to bypass regex filters. Fix: Never use regex for security sanitization. Use a battle-tested library like DOMPurify for DOM contexts or sanitize-html for Node.js. These libraries parse the HTML structure and whitelist safe elements.

3. Greedy vs. Lazy Quantifiers

Explanation: The * and + quantifiers are greedy by default, matching as much text as possible. This causes issues when extracting data between delimiters. Example: "(.*?)" correctly matches "foo" and "bar" separately. "(.*") matches "foo" and "bar" as a single match, swallowing the middle content. Fix: Always use lazy quantifiers (*?, +?) when extracting content between known delimiters.

4. Anchor Misuse in Multiline Strings

Explanation: By default, ^ and $ match the start and end of the entire string, not individual lines. This leads to false negatives when validating line-based input. Fix: Use the m (multiline) flag to make ^ and $ match line boundaries. Example: /^pattern$/m.

5. Capturing Group Overhead

Explanation: Parentheses () create capturing groups, which store matched text in memory. If you only need grouping for alternation or repetition, capturing adds unnecessary overhead. Fix: Use non-capturing groups (?:...) when you don't need to extract the sub-match. Example: (?:https?|ftp):// is more efficient than (https?|ftp):// if you only care about the protocol presence.

6. Character Class Escaping Confusion

Explanation: Inside square brackets [], most special characters lose their meaning and do not require escaping. However, ], \, ^ (at start), and - (between chars) still need care. *Fix:** Remember that `[.+?]matches literal dot, plus, asterisk, and question mark without backslashes. Only escape], `, ^, and - as needed.

7. RFC Perfectionism Trap

Explanation: Trying to validate emails, phone numbers, or URLs against full RFC specifications often results in patterns that reject valid inputs or are too slow. Fix: Adopt a "loose validation, strict verification" approach. Use a simple regex to catch obvious typos, then verify the resource exists (e.g., send a confirmation email). This improves user experience and reduces maintenance burden.

Production Bundle

Action Checklist

Define Scope: Determine if regex is appropriate. Use parsers for complex structures (HTML, JSON, SQL).
Security Review: Ensure regex is not used for sanitization. Verify inputs against ReDoS patterns.
Performance Test: Benchmark regex against large inputs. Check for catastrophic backtracking.
Type Safety: Wrap regex in typed functions with clear interfaces and documentation.
Edge Cases: Test with empty strings, unicode characters, and malformed inputs.
Documentation: Comment patterns with explanations of structure and limitations.
Library Check: Prefer established libraries for complex tasks (e.g., uuid, date-fns, DOMPurify).

Decision Matrix

Use this matrix to choose the right approach for input handling.

Scenario	Recommended Approach	Why	Cost Impact
Simple Format Check	Regex	Fast, lightweight, sufficient for structure.	Low
HTML Sanitization	DOMPurify / sanitize-html	Regex cannot handle edge cases securely.	Medium (Dependency)
UUID Generation/Check	`uuid` library	Handles versioning, variants, and RFC compliance.	Low
Date Parsing	`date-fns` / `dayjs`	Regex cannot validate calendar logic or timezones.	Low
Complex Extraction	Parser / AST	Regex fails on nested or recursive structures.	High (Complexity)
Email Validation	Loose Regex + Verification	RFC regex is impractical; verification ensures validity.	Low

Configuration Template

A reusable TypeScript module structure for regex utilities.

// regex-utils.ts

/**
 * Production-ready regex utilities.
 * All patterns are documented with scope and limitations.
 */

// --- Types ---
export interface MarkdownLink {
  text: string;
  url: string;
}

// --- Patterns ---
const PATTERNS = {
  UUID_V4: /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/,
  EMAIL_LOOSE: /^[^\s@]+@[^\s@]+\.[^\s@]+$/,
  SLUG_CHARS: /[^a-z0-9]+/g,
  SLUG_TRIM: /^-+|-+$/g,
  HTML_TAGS: /<[^>]+>/g,
  ISO_DATE: /^\d{4}-\d{2}-\d{2}$/,
  MARKDOWN_LINK: /\[([^\]]+)\]\(([^)]+)\)/,
  US_PHONE: /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/,
} as const;

// --- Functions ---
export const isUUIDv4 = (input: string): boolean => PATTERNS.UUID_V4.test(input);
export const hasValidEmailFormat = (input: string): boolean => PATTERNS.EMAIL_LOOSE.test(input);
export const generateSlug = (text: string): string => text.toLowerCase().replace(PATTERNS.SLUG_CHARS, '-').replace(PATTERNS.SLUG_TRIM, '');
export const extractPlainTextPreview = (html: string): string => html.replace(PATTERNS.HTML_TAGS, '');
export const isISODateFormat = (input: string): boolean => PATTERNS.ISO_DATE.test(input);
export const isUSPhoneLenient = (input: string): boolean => PATTERNS.US_PHONE.test(input);

export const parseMarkdownLink = (text: string): MarkdownLink | null => {
  const match = PATTERNS.MARKDOWN_LINK.exec(text);
  return match ? { text: match[1], url: match[2] } : null;
};

Quick Start Guide

Copy Template: Add regex-utils.ts to your project's utility directory.
Import Functions: Use named imports in your code: import { isUUIDv4, generateSlug } from './regex-utils';.
Validate Inputs: Replace ad-hoc regex calls with the utility functions for consistency.
Run Tests: Verify patterns against your specific edge cases. Add unit tests for each function.
Monitor Performance: In high-throughput scenarios, benchmark regex usage and consider caching compiled patterns if necessary.

Mid-Year Sale — Unlock Full Article