us typos without rejecting valid but unusual addresses.
2. UUID v4 Verification
UUIDs have a specific structure defined by RFC 4122. Version 4 UUIDs include a version marker and a variant field that must be validated to ensure the identifier is genuine.
const UUID_V4_PATTERN = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
export function isUUIDv4(identifier: string): boolean {
return UUID_V4_PATTERN.test(identifier);
}
- Rationale: The pattern enforces the hex structure. The
4 at the 13th character position confirms the version. The character class [89ab] at the start of the fourth group validates the variant bits. The i flag allows case-insensitive matching, as UUIDs are often represented in uppercase.
3. String Slugification
Generating URL-safe slugs requires normalizing text to lowercase, replacing non-alphanumeric characters with hyphens, and trimming edge hyphens.
export function generateSlug(rawText: string): string {
return rawText
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, '');
}
- Rationale: The first replacement
/[^a-z0-9]+/g matches any sequence of non-alphanumeric characters and replaces them with a single hyphen. The second replacement /^-+|-+$/g removes leading or trailing hyphens. This chain ensures a clean, SEO-friendly slug.
4. ISO Date Format Detection
Regex can verify the YYYY-MM-DD format but cannot validate calendar logic (e.g., February 30th). Use regex for format checking and a date library for semantic validation.
const ISO_DATE_PATTERN = /^\d{4}-\d{2}-\d{2}$/;
export function matchesISODateFormat(value: string): boolean {
return ISO_DATE_PATTERN.test(value);
}
- Rationale: The pattern
^\d{4}-\d{2}-\d{2}$ checks for four digits, a hyphen, two digits, a hyphen, and two digits. It matches 2024-13-99, which is structurally correct but semantically invalid. Always follow this check with new Date(value).toISOString() or a library like date-fns to confirm validity.
5. US Phone Number Matching
US phone numbers appear in various formats. A lenient pattern accommodates country codes, parentheses, spaces, and dashes.
const US_PHONE_PATTERN = /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
export function matchesUSPhoneFormat(input: string): boolean {
return US_PHONE_PATTERN.test(input);
}
- Rationale: The pattern allows an optional
+ and 1 for the country code. It permits optional separators [-.\s] and optional parentheses around the area code. This flexibility reduces user friction while ensuring the core digit structure is present.
6. Markdown Link Extraction
Extracting link text and URLs from Markdown requires capturing groups. The pattern isolates the text within brackets and the URL within parentheses.
const MD_LINK_PATTERN = /\[([^\]]+)\]\(([^)]+)\)/g;
export function extractMarkdownLinks(content: string): Array<{ text: string; url: string }> {
const results: Array<{ text: string; url: string }> = [];
let match;
while ((match = MD_LINK_PATTERN.exec(content)) !== null) {
results.push({ text: match[1], url: match[2] });
}
return results;
}
- Rationale: The pattern
\[([^\]]+)\]\(([^)]+)\) uses two capturing groups. Group 1 captures text excluding ], and Group 2 captures the URL excluding ). The global flag g and exec loop allow extraction of multiple links from a single string.
7. HTML Tag Removal for Previews
Regex can strip tags to generate plain-text previews. This must never be used for sanitization, as it cannot handle malicious payloads or nested structures reliably.
const HTML_TAG_PATTERN = /<[^>]+>/g;
export function stripHtmlForPreview(html: string): string {
return html.replace(HTML_TAG_PATTERN, '');
}
- Rationale: The pattern
<[^>]+> matches any string starting with < and ending with >. This removes tags but leaves text content. For security-critical applications, use a library like DOMPurify to parse and sanitize the DOM structure.
Pitfall Guide
Understanding regex limitations is as important as knowing the patterns. The following pitfalls are common sources of bugs and vulnerabilities in production code.
1. The HTML Sanitization Trap
- Explanation: Developers sometimes use regex to remove
<script> tags or attributes to prevent XSS. This is insecure because attackers can use encoding, nested tags, or browser quirks to bypass regex filters.
- Fix: Never use regex for HTML sanitization. Always use a dedicated library like DOMPurify that implements a proper HTML parser and whitelist approach.
2. Greedy Quantifiers Breaking Captures
- Explanation: The
* quantifier is greedy by default, matching as much text as possible. In patterns like "(.*?)", using .* instead of .*? can cause the regex to match across multiple quoted strings on the same line, capturing unintended content.
- Fix: Use lazy quantifiers
*? or +? when capturing content between delimiters. For example, use "(.*?)" to match individual quoted strings.
3. Anchors in Multiline Strings
- Explanation: By default,
^ and $ match the start and end of the entire string, not individual lines. When processing log files or multiline input, this can cause patterns to miss matches or fail to anchor correctly.
- Fix: Add the
m (multiline) flag to the regex. This changes ^ and $ to match line boundaries, allowing line-by-line processing.
4. Over-Escaping Character Classes
- Explanation: Inside character classes
[...], most special characters lose their meaning and do not require escaping. Developers often escape characters like ., +, *, and ?, which is unnecessary and reduces readability.
- Fix: Only escape
], \, ^ (if at the start), and - (if between characters). For example, [.+*?] matches literal dot, plus, asterisk, and question mark without backslashes.
5. Capturing Group Memory Bloat
- Explanation: Using capturing groups
(...) for every alternation or grouping stores the matched text in memory. This increases overhead and can slow down regex execution, especially in loops or large texts.
- Fix: Use non-capturing groups
(?:...) when you only need grouping for repetition or alternation. For example, use (?:https?|ftp):// instead of (https?|ftp):// if you don't need to extract the protocol separately.
6. False Confidence in Date Validation
- Explanation: Regex patterns for dates like
^\d{4}-\d{2}-\d{2}$ verify format but not calendar validity. They will match impossible dates like 2024-13-99, leading to errors downstream if developers assume the date is valid.
- Fix: Use regex only for format checking. Follow up with semantic validation using
Date.parse(), new Date(), or a library like date-fns to ensure the date exists in the calendar.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| User Signup Form | Loose Regex + Email Confirmation | Balances UX speed with security; prevents fake emails. | Low regex cost; adds email delivery latency. |
| Log Parsing | Regex with m flag | Efficiently isolates lines and extracts fields. | Minimal CPU overhead; high throughput. |
| XSS Prevention | DOMPurify | Guarantees security by parsing DOM structure. | Increases bundle size; requires library dependency. |
| UUID Verification | Strict Regex Pattern | Validates version/variant bits for integrity. | Negligible performance cost. |
| Date Input | Regex Format + Date Library | Ensures format correctness and calendar validity. | Two-step validation; robust error handling. |
Configuration Template
Copy this TypeScript module to integrate production-ready regex utilities into your project.
// regex-utils.ts
export const EMAIL_PATTERN = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
export const UUID_V4_PATTERN = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
export const ISO_DATE_PATTERN = /^\d{4}-\d{2}-\d{2}$/;
export const US_PHONE_PATTERN = /^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
export const MD_LINK_PATTERN = /\[([^\]]+)\]\(([^)]+)\)/g;
export const HTML_TAG_PATTERN = /<[^>]+>/g;
export function isValidEmailFormat(candidate: string): boolean {
return EMAIL_PATTERN.test(candidate);
}
export function isUUIDv4(identifier: string): boolean {
return UUID_V4_PATTERN.test(identifier);
}
export function matchesISODateFormat(value: string): boolean {
return ISO_DATE_PATTERN.test(value);
}
export function matchesUSPhoneFormat(input: string): boolean {
return US_PHONE_PATTERN.test(input);
}
export function generateSlug(rawText: string): string {
return rawText
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, '');
}
export function extractMarkdownLinks(content: string): Array<{ text: string; url: string }> {
const results: Array<{ text: string; url: string }> = [];
let match;
while ((match = MD_LINK_PATTERN.exec(content)) !== null) {
results.push({ text: match[1], url: match[2] });
}
return results;
}
export function stripHtmlForPreview(html: string): string {
return html.replace(HTML_TAG_PATTERN, '');
}
Quick Start Guide
- Import Utilities: Add the
regex-utils.ts module to your project and import the required functions.
- Validate Input: Use
isValidEmailFormat or matchesUSPhoneFormat for pre-validation in forms.
- Process Text: Apply
generateSlug for URL generation or extractMarkdownLinks for content parsing.
- Handle Results: Implement error handling for invalid formats and follow up with semantic validation where necessary (e.g., sending confirmation emails).
- Test Edge Cases: Verify patterns against edge cases like
2024-13-99 or malformed UUIDs to ensure robust behavior.