oundaries and handle negative offsets predictably.
interface LogEntry {
timestamp: string;
level: string;
message: string;
}
function parseLogLine(raw: string): LogEntry | null {
// Use slice for predictable negative indexing
const delimiterIndex = raw.indexOf(']');
if (delimiterIndex === -1) return null;
const timestamp = raw.slice(1, delimiterIndex);
const remainder = raw.slice(delimiterIndex + 2);
const levelEnd = remainder.indexOf(' ');
const level = remainder.slice(0, levelEnd);
const message = remainder.slice(levelEnd + 1);
return { timestamp, level, message };
}
// Usage
const entry = parseLogLine('[2024-03-15T10:22:01Z] ERROR Database connection timeout');
// { timestamp: '2024-03-15T10:22:01Z', level: 'ERROR', message: 'Database connection timeout' }
Architecture Rationale: slice() is preferred over substring() because it respects negative indices as offsets from the end of the string. This eliminates conditional branching when calculating dynamic boundaries. indexOf() provides O(n) scanning with early exit, which is faster than regex for fixed delimiters.
String transformation should prioritize immutability, explicit replacement scopes, and Unicode normalization. Never mutate strings in place; always return new instances.
function sanitizeAndFormat(input: string): string {
// Normalize Unicode to prevent visual spoofing
const normalized = input.normalize('NFC');
// Replace all occurrences explicitly
const cleaned = normalized
.trim()
.replaceAll(/\s+/g, ' ')
.replaceAll(/[<>]/g, '');
// Pad for fixed-width display alignment
const padded = cleaned.padEnd(40, '.');
return padded;
}
// Usage
const formatted = sanitizeAndFormat(' User <script>alert(1)</script> ');
// 'User scriptalert1......................'
Architecture Rationale: replaceAll() was introduced specifically to avoid the ambiguity of regex g flags when replacing literal strings. normalize('NFC') composes characters to their canonical form, preventing security issues where visually identical strings have different byte representations. trim() removes only whitespace, which is safer than aggressive character stripping.
Step 3: Interpolation & Structured Output
Template literals solve multi-line formatting and expression embedding, but they require discipline to avoid injection and maintain readability.
type UserProfile = {
username: string;
role: string;
lastLogin: Date;
};
function generateAccessBadge(user: UserProfile): string {
const dateStr = user.lastLogin.toISOString().split('T')[0];
return `
<div class="badge">
<span class="user">${user.username}</span>
<span class="role">${user.role}</span>
<span class="date">${dateStr}</span>
</div>
`.trim();
}
// Tagged template for structured escaping
function safeHtml(strings: TemplateStringsArray, ...values: unknown[]): string {
return strings.reduce((acc, str, i) => {
const val = values[i] ?? '';
const escaped = String(val)
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"');
return acc + escaped + str;
}, '');
}
// Usage
const badge = safeHtml`<div class="badge"><span>${user.username}</span></div>`;
Architecture Rationale: Tagged templates separate static text from dynamic values at parse time, enabling centralized sanitization. This pattern prevents accidental injection when interpolating user data into HTML, SQL fragments, or log formats. The reduce accumulator builds the final string efficiently without intermediate concatenation.
Pitfall Guide
1. The substring Negative Index Trap
Explanation: substring() treats negative arguments as 0. This causes silent data corruption when calculating dynamic offsets.
Fix: Always use slice() for boundary calculations. It correctly interprets negative indices as length + index.
2. replace Silent Single-Match Behavior
Explanation: String.prototype.replace() only replaces the first occurrence unless the pattern is a regex with the g flag. Developers often pass a string pattern expecting global replacement.
Fix: Use replaceAll() for literal strings. Reserve replace() with regex only when pattern matching or capture groups are required.
3. Unicode & Surrogate Pair Blind Spots
Explanation: length, charAt(), and charCodeAt() operate on UTF-16 code units, not visible characters. Emoji and combining marks occupy two code units, breaking length checks and character extraction.
Fix: Use codePointAt() for code points, or better yet, leverage Intl.Segmenter for grapheme cluster iteration. For simple checks, [...str].length or Array.from(str).length provides accurate character counts.
4. Forgetting String Immutability
Explanation: Strings are primitive and immutable. Methods like trim(), toUpperCase(), and replace() return new strings. Assigning the result back to the same variable without realizing it creates a new reference can lead to stale data in closures or state managers.
Fix: Treat string transformations as pure functions. Always assign results to explicitly named variables or return them directly. Avoid in-place mutation assumptions.
5. Template Literal Injection Risks
Explanation: Unescaped interpolation in template literals can inject malicious content into HTML, SQL, or shell commands. Developers often assume template literals are inherently safe.
Fix: Never interpolate raw user input into structured formats. Use tagged templates with sanitization functions, or escape values before interpolation. Validate and whitelist expected formats.
6. matchAll Iterator Exhaustion
Explanation: matchAll() returns an iterator, not an array. Consuming it once (e.g., in a for...of loop) exhausts it. Subsequent attempts to iterate or convert to an array yield empty results.
Fix: Convert to an array immediately if multiple passes are needed: const matches = [...str.matchAll(/pattern/g)];. Alternatively, store extracted data in a plain object or array during the first iteration.
7. Over-Reliance on Regex for Simple Delimiters
Explanation: Using regex for fixed delimiters (commas, pipes, newlines) adds parsing overhead and reduces readability. Regex engines also introduce backtracking risks with complex patterns.
Fix: Use split() for literal delimiters. Reserve regex for pattern matching, validation, or complex extraction. Benchmark if performance is critical; split() is typically faster for fixed strings.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Fixed delimiter splitting | split(',') | Faster, no regex compilation overhead | Lower CPU, predictable memory |
| Global literal replacement | replaceAll('a', 'b') | Explicit intent, avoids regex g flag mistakes | Same performance, higher maintainability |
| Unicode-aware character iteration | Intl.Segmenter or [...str] | Handles emoji, combining marks, surrogate pairs correctly | Slight memory overhead, prevents visual/logic bugs |
| Multi-line structured output | Tagged template + sanitizer | Separates static/dynamic content, enables centralized escaping | Negligible runtime cost, major security gain |
| Dynamic boundary extraction | slice(start, end) | Supports negative indices, predictable behavior | Zero extra cost, eliminates off-by-one bugs |
Configuration Template
// string-utils.ts
export class StringProcessor {
static normalize(input: string): string {
return input.normalize('NFC').trim();
}
static extractBetween(source: string, startMarker: string, endMarker: string): string | null {
const startIdx = source.indexOf(startMarker);
if (startIdx === -1) return null;
const contentStart = startIdx + startMarker.length;
const endIdx = source.indexOf(endMarker, contentStart);
if (endIdx === -1) return null;
return source.slice(contentStart, endIdx);
}
static safeInterpolate(strings: TemplateStringsArray, ...values: unknown[]): string {
return strings.reduce((acc, str, i) => {
const val = values[i] ?? '';
const escaped = String(val)
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
return acc + escaped + str;
}, '');
}
static countGraphemes(input: string): number {
if (typeof Intl !== 'undefined' && 'Segmenter' in Intl) {
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
return Array.from(segmenter.segment(input)).length;
}
return [...input].length;
}
}
Quick Start Guide
- Install/Import: Copy the
StringProcessor class into your utility module. No external dependencies required.
- Normalize Input: Pass all user-generated or external strings through
StringProcessor.normalize() before validation or storage.
- Extract Boundaries: Use
extractBetween() for parsing logs, CSV fragments, or markup. It handles missing markers gracefully.
- Interpolate Safely: Replace raw template literals with
StringProcessor.safeInterpolate when rendering HTML, emails, or structured logs.
- Validate Length: Use
countGraphemes() instead of .length when enforcing character limits for UI inputs or API payloads.