y. This enables compile-time validation, consistent flag application, and easy mocking for tests.
interface PatternDefinition {
source: string;
flags: string;
description: string;
version: string;
}
interface ExtractionResult<T extends string> {
matched: boolean;
groups: Record<T, string | undefined>;
raw: string | null;
}
class PatternRegistry {
private cache: Map<string, RegExp> = new Map();
register(id: string, def: PatternDefinition): void {
if (this.cache.has(id)) {
throw new Error(`Pattern ${id} is already registered.`);
}
this.cache.set(id, new RegExp(def.source, def.flags));
}
getCompiled(id: string): RegExp {
const pattern = this.cache.get(id);
if (!pattern) throw new Error(`Pattern ${id} not found.`);
return pattern;
}
}
Architecture Rationale: Using new RegExp() instead of literals allows dynamic flag injection and runtime validation. The Map cache guarantees O(1) retrieval and prevents recompilation. Throwing on duplicate registration catches configuration drift early.
Numeric capture groups (match[1], match[2]) break when patterns are refactored. ES2018 named groups solve this by attaching semantic keys directly to the match result.
interface LogEntry {
timestamp: string;
severity: string;
service: string;
message: string;
}
const logPattern: PatternDefinition = {
source: String.raw`^\[(?<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\]\s+(?<severity>INFO|WARN|ERROR)\s+\[(?<service>\w+)\]\s+(?<message>.+)$`,
flags: 'u',
description: 'Structured application log line',
version: '1.0.0'
};
function parseLogLine(registry: PatternRegistry, line: string): ExtractionResult<keyof LogEntry> {
const compiled = registry.getCompiled('appLog');
const match = line.match(compiled);
if (!match || !match.groups) {
return { matched: false, groups: {} as Record<keyof LogEntry, string>, raw: null };
}
return {
matched: true,
groups: match.groups as Record<keyof LogEntry, string>,
raw: match[0]
};
}
Architecture Rationale: The u flag enables Unicode property escapes and ensures correct surrogate pair handling. Returning a typed ExtractionResult forces callers to handle the matched: false case explicitly, eliminating TypeError: Cannot read properties of null crashes.
Search-and-replace operations frequently mutate global regex state or fail on edge cases. A dedicated transformer isolates side effects and enforces idempotency.
class RegexTransformer {
static normalizeWhitespace(input: string): string {
return input.replace(/\s+/g, ' ').trim();
}
static maskSensitive(input: string, preserveTail: number = 4): string {
if (input.length <= preserveTail) return input;
return input.replace(/.(?=.{${preserveTail}}$)/g, '*');
}
static convertCase(input: string, target: 'snake' | 'camel'): string {
if (target === 'snake') {
return input.replace(/[A-Z]/g, (char) => `_${char.toLowerCase()}`).replace(/^_/, '');
}
return input.replace(/_([a-z])/g, (_, char) => char.toUpperCase());
}
}
Architecture Rationale: Static methods prevent accidental state sharing. The masking pattern uses a lookahead (?=.{4}$) to avoid consuming characters, ensuring the tail remains intact. Case conversion handles leading underscores explicitly to prevent malformed identifiers.
Step 4: Wire It Together
const registry = new PatternRegistry();
registry.register('appLog', logPattern);
const sample = '[2026-05-23T10:30:15Z] ERROR [AuthService] Token validation failed';
const result = parseLogLine(registry, sample);
if (result.matched) {
console.log(result.groups.severity); // "ERROR"
console.log(result.groups.message); // "Token validation failed"
}
This architecture decouples pattern definition from execution, enforces type safety, and provides a single point for logging, metrics, and version control.
Pitfall Guide
1. Catastrophic Backtracking
Explanation: Nested quantifiers like (a+)+ or (.*?)+ create exponential state exploration when the input fails to match. A 30-character string can trigger millions of backtracking steps, freezing the event loop.
Fix: Flatten nested quantifiers, use possessive quantifiers where supported, or restructure to match specific character classes instead of wildcards. Add explicit length limits: ^[a-zA-Z]{1,50}$.
2. Global Flag State Mutation
Explanation: Regex objects with the g flag maintain internal lastIndex state. Calling .test() or .exec() repeatedly on the same instance advances the index, causing alternating true/false results or missed matches.
Fix: Never reuse a g-flagged regex across multiple independent calls. Create a fresh instance per operation, or use .match() which handles iteration safely.
3. Over-Validation with Regex
Explanation: Attempting to validate RFC-compliant emails, URLs, or phone numbers with regex leads to fragile patterns that reject valid inputs or accept malformed ones. Specifications evolve; regex does not.
Fix: Use regex for structural sanitization (e.g., stripping invalid characters) and delegate full validation to dedicated libraries (zod, validator, libphonenumber-js).
4. Greedy vs Lazy Mismatch in Nested Structures
Explanation: .+ consumes everything up to the last possible match. When parsing HTML-like tags or quoted strings, greedy matching swallows intermediate delimiters, returning incorrect boundaries.
Fix: Use lazy quantifiers .+? for minimal matching, or explicitly exclude delimiters: [^"]* instead of .*? for quoted content.
5. Ignoring Unicode Boundaries
Explanation: \b and \w operate on ASCII by default. Strings containing accented characters, emojis, or non-Latin scripts break word boundary detection and character class matching.
Fix: Always use the u flag. Leverage Unicode property escapes: \p{L} for letters, \p{N} for numbers, \p{Sc} for currency symbols.
6. Inline Compilation in Hot Paths
Explanation: Writing /pattern/.test(input) inside loops or request handlers forces the engine to parse and compile the regex on every iteration. This adds measurable CPU overhead and GC pressure.
Fix: Extract patterns to module scope or pre-compile them in a registry. Reuse the compiled RegExp instance across all invocations.
7. Assuming .match() Returns an Array
Explanation: When using the g flag, .match() returns null if no matches exist, not an empty array. Destructuring or chaining .map() without null checks throws runtime errors.
Fix: Always apply nullish coalescing: const matches = text.match(pattern) ?? []; or use Array.from(text.matchAll(pattern)) for consistent iteration.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Input sanitization before DB storage | Pre-compiled registry with strict character classes | Prevents injection, ensures consistent formatting | Low (one-time setup) |
| RFC-compliant email/URL validation | Dedicated validation library (zod, validator) | Regex cannot reliably enforce evolving specs | Medium (dependency) |
| Log parsing in high-throughput pipeline | Named-group extraction + pre-compiled cache | Minimizes CPU, enables structured logging | Low |
| Dynamic pattern generation from user config | new RegExp() with strict allowlist + timeout wrapper | Prevents injection and backtracking | High (requires sandboxing) |
| Simple string replacement in UI | Inline literal with replace() | Low overhead, readable for trivial cases | Negligible |
Configuration Template
// regex.config.ts
import { PatternRegistry } from './PatternRegistry';
export const patternRegistry = new PatternRegistry();
patternRegistry.register('timestamp', {
source: String.raw`(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})`,
flags: 'u',
description: 'ISO 8601 date component',
version: '1.0.0'
});
patternRegistry.register('hexColor', {
source: String.raw`^(?<full>#(?<r>[0-9a-f]{2})(?<g>[0-9a-f]{2})(?<b>[0-9a-f]{2}))$`,
flags: 'iu',
description: '6-digit hexadecimal color code',
version: '1.0.0'
});
patternRegistry.register('token', {
source: String.raw`^(?<prefix>[a-z]{2})_(?<payload>[A-Za-z0-9_-]{16,64})$`,
flags: 'u',
description: 'Application authentication token',
version: '2.1.0'
});
export type PatternId = Parameters<typeof patternRegistry.getCompiled>[0];
Quick Start Guide
- Initialize the Registry: Import
PatternRegistry and register your patterns at application startup. Use String.raw to avoid double-escaping backslashes.
- Compile & Cache: Call
registry.getCompiled('patternId') once per module or service. Store the returned RegExp instance in a constant or closure.
- Extract Safely: Use
.match() with null coalescing or .matchAll() for iteration. Always check the matched flag before accessing groups.
- Validate Structure, Not Spec: Use regex to enforce format boundaries and strip invalid characters. Delegate semantic validation to type-safe libraries.
- Profile Under Load: Run benchmarks with realistic input volumes. If execution time exceeds 5ms per 10k calls, refactor quantifiers or switch to a parser combinator.