ter scanner provides deterministic control over syntax boundaries. The scanner tracks three states: static text, named parameters, and inline constraints.
interface RouteToken {
type: 'static' | 'param' | 'wildcard';
value: string;
optional?: boolean;
constraint?: string;
}
function tokenizePattern(pattern: string): RouteToken[] {
const tokens: RouteToken[] = [];
let buffer = '';
let i = 0;
while (i < pattern.length) {
const char = pattern[i];
if (char === ':') {
if (buffer) {
tokens.push({ type: 'static', value: buffer });
buffer = '';
}
const param = parseParameter(pattern, i);
tokens.push(param);
i += param.value.length + 1; // skip ':'
} else if (char === '*') {
if (buffer) {
tokens.push({ type: 'static', value: buffer });
buffer = '';
}
tokens.push({ type: 'wildcard', value: '*' });
i++;
} else {
buffer += char;
i++;
}
}
if (buffer) tokens.push({ type: 'static', value: buffer });
return tokens;
}
Architecture Rationale: Buffering static text reduces regex fragmentation. Emitting discrete tokens allows precise control over how each segment contributes to the final pattern.
Static segments must be escaped before injection into the regex. The escapeRegex utility ensures dots, brackets, and quantifiers match literally.
const REGEX_META_CHARS = /[.*+?^${}()|[\]\\]/g;
function escapeRegex(str: string): string {
return str.replace(REGEX_META_CHARS, '\\$&');
}
function buildRegex(tokens: RouteToken[]): { pattern: RegExp; keys: string[] } {
let regexStr = '^';
const keys: string[] = [];
for (const token of tokens) {
if (token.type === 'static') {
regexStr += escapeRegex(token.value);
} else if (token.type === 'wildcard') {
regexStr += '(.*)';
keys.push('wild');
} else if (token.type === 'param') {
const segment = token.constraint ? `(${token.constraint})` : '([^/]+)';
if (token.optional) {
// Absorb preceding slash into optional group
if (regexStr.endsWith('\\/')) {
regexStr = regexStr.slice(0, -2) + `(?:\\/${segment})?`;
} else {
regexStr += `(?:${segment})?`;
}
} else {
regexStr += `\\/${segment}`;
}
keys.push(token.value);
}
}
regexStr += '$';
return { pattern: new RegExp(regexStr), keys };
}
Why this works: The optional slash absorption logic (regexStr.endsWith('\\/')) solves the /users/:id? mismatch. By wrapping the slash and parameter in a non-capturing optional group, both /users and /users/42 resolve correctly. Inline constraints bypass the default ([^/]+) and inject the developer's regex directly, but only after validating parenthesis balance during tokenization.
Runtime matching must sanitize the input URL before applying the compiled regex. Query strings and fragments are stripped, and captured values are decoded with explicit error handling.
interface MatchResult {
matched: boolean;
params: Record<string, string | null>;
query: string;
}
function sanitizeUrl(rawUrl: string): { path: string; query: string } {
const hashIndex = rawUrl.indexOf('#');
const withoutHash = hashIndex === -1 ? rawUrl : rawUrl.slice(0, hashIndex);
const queryIndex = withoutHash.indexOf('?');
return {
path: queryIndex === -1 ? withoutHash : withoutHash.slice(0, queryIndex),
query: queryIndex === -1 ? '' : withoutHash.slice(queryIndex + 1)
};
}
function executeMatch(compiled: { pattern: RegExp; keys: string[] }, rawUrl: string): MatchResult {
const { path, query } = sanitizeUrl(rawUrl);
const match = path.match(compiled.pattern);
if (!match) {
return { matched: false, params: {}, query };
}
const params: Record<string, string | null> = {};
for (let i = 0; i < compiled.keys.length; i++) {
const rawValue = match[i + 1];
if (rawValue === undefined) {
params[compiled.keys[i]] = null;
} else {
try {
params[compiled.keys[i]] = decodeURIComponent(rawValue);
} catch {
// Fallback to raw value on malformed percent-encoding
params[compiled.keys[i]] = rawValue;
}
}
}
return { matched: true, params, query };
}
Architecture Decision: Separating sanitization from matching prevents regex injection and ensures the pattern only evaluates the path segment. The try/catch around decodeURIComponent is non-negotiable in production; malformed input like /users/foo%bar would otherwise throw an unhandled URIError and crash the request handler.
Pitfall Guide
Explanation: Writing /api/v1.0/users without escaping the dot generates /api/v1.0/users in regex, where . matches any character. This silently routes /api/v1X0/users to the same handler.
Fix: Always run static segments through a regex escape utility before concatenation. Never assume framework routing handles this automatically.
2. The Optional Segment Slash Trap
Explanation: Compiling /resource/:id? to ^\/resource\/([^/]+)?$ matches /resource/ but fails on /resource. The leading slash remains mandatory.
Fix: Detect the ? modifier and conditionally wrap the preceding \/ and parameter in an optional non-capturing group: (?:\/([^/]+))?.
3. Unbalanced Inline Regex Groups
Explanation: Using indexOf(')') to find the end of :id(\d+) breaks on nested constraints like :date((\d{4})-(\d{2})). It also fails when the constraint contains escaped parentheses like :code(\\)).
Fix: Implement a depth counter that increments on ( and decrements on ), skipping characters immediately following a backslash. Return the index only when depth reaches zero.
4. Query/Fragment Contamination
Explanation: Passing /users/42?sort=desc directly into a path regex causes the matcher to fail or capture query parameters as part of the segment.
Fix: Strip # and ? segments before matching. Extract the query string separately and attach it to the result object for downstream middleware.
5. Unsafe URL Decoding
Explanation: decodeURIComponent('%E5%B1%B1') works, but decodeURIComponent('%') throws URIError. Production traffic frequently contains truncated or malformed percent-encoding.
Fix: Wrap decoding in a try/catch. Return the raw captured string on failure rather than crashing the request pipeline.
6. Greedy Wildcard Overreach
Explanation: Using (.*) for /* captures everything, including slashes. If placed before a static segment, it consumes the entire path and prevents subsequent matches.
Fix: Anchor wildcards to the end of the pattern, or use non-greedy quantifiers (.+?) when multiple wildcards exist in the same route definition.
7. Framework Version Drift
Explanation: Upgrading a web framework often swaps the underlying routing library (e.g., path-to-regexp@6 β @7). Optional segment parsing and constraint syntax change without warning.
Fix: Abstract routing compilation behind a stable interface. Run integration tests against known edge-case paths after every framework upgrade.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Standard CRUD API | Framework default router | Optimized, battle-tested, zero boilerplate | Low (maintenance) |
| Complex constraint routing (nested groups, custom validators) | Custom compiled pipeline | Explicit control over regex generation and error boundaries | Medium (initial dev) |
| High-throughput edge routing (CDN, reverse proxy) | Pre-compiled regex cache | Eliminates runtime parsing overhead, deterministic matching | Low (runtime CPU) |
| Legacy system migration | Manual regex with wrapper | Bridges old string patterns to new execution model without refactoring entire stack | High (migration effort) |
Configuration Template
// route-compiler.ts
export interface CompiledRoute {
regex: RegExp;
paramKeys: string[];
sourcePattern: string;
}
export interface RouteMatch {
success: boolean;
parameters: Record<string, string | null>;
queryString: string;
}
export class RouteCompiler {
private cache = new Map<string, CompiledRoute>();
compile(pattern: string): CompiledRoute {
if (this.cache.has(pattern)) {
return this.cache.get(pattern)!;
}
const tokens = this.tokenize(pattern);
const { regex, keys } = this.generateRegex(tokens);
const compiled: CompiledRoute = { regex, paramKeys: keys, sourcePattern: pattern };
this.cache.set(pattern, compiled);
return compiled;
}
match(compiled: CompiledRoute, url: string): RouteMatch {
const { path, query } = this.sanitize(url);
const result = path.match(compiled.regex);
if (!result) {
return { success: false, parameters: {}, queryString: query };
}
const params: Record<string, string | null> = {};
for (let i = 0; i < compiled.paramKeys.length; i++) {
const raw = result[i + 1];
params[compiled.paramKeys[i]] = raw === undefined
? null
: this.safeDecode(raw);
}
return { success: true, parameters: params, queryString: query };
}
private sanitize(url: string): { path: string; query: string } {
const hash = url.indexOf('#');
const clean = hash === -1 ? url : url.slice(0, hash);
const qIndex = clean.indexOf('?');
return {
path: qIndex === -1 ? clean : clean.slice(0, qIndex),
query: qIndex === -1 ? '' : clean.slice(qIndex + 1)
};
}
private safeDecode(value: string): string {
try {
return decodeURIComponent(value);
} catch {
return value;
}
}
// Tokenization and regex generation methods omitted for brevity
// (Implement tokenize(), generateRegex() per Core Solution)
}
Quick Start Guide
- Initialize the compiler: Instantiate
RouteCompiler at application startup. Avoid creating new instances per request to leverage the internal cache.
- Compile your routes: Call
compiler.compile('/users/:id?') during configuration phase. Store the returned CompiledRoute objects in a routing table or array.
- Execute matches in middleware: Pass incoming request URLs to
compiler.match(compiledRoute, req.url). Check success flag before proceeding to handler logic.
- Validate edge cases: Run a test suite against
/users, /users/42, /users/42?fields=name, and /users/%ZZ to verify optional segments, query stripping, and safe decoding.
- Deploy with monitoring: Log compilation warnings for unbalanced parentheses or unescaped meta-characters during startup. Route mismatches should be tracked separately from 404s to identify pattern drift.