e pair complications in early JavaScript engines while maintaining a flat lookup table.
const VS_LOW_START = 0xFE00;
const VS_HIGH_START = 0xE0100;
const VS_LOW_COUNT = 16;
function byteToCodePoint(byte: number): number {
if (byte < VS_LOW_COUNT) {
return VS_LOW_START + byte;
}
return VS_HIGH_START + (byte - VS_LOW_COUNT);
}
function codePointToByte(cp: number): number | null {
if (cp >= VS_LOW_START && cp < VS_LOW_START + VS_LOW_COUNT) {
return cp - VS_LOW_START;
}
if (cp >= VS_HIGH_START && cp < VS_HIGH_START + 240) {
return (cp - VS_HIGH_START) + VS_LOW_COUNT;
}
return null;
}
Step 2: Encode Binary Payload into Text
The encoder accepts a Uint8Array, maps each byte to its corresponding Variation Selector, and appends the sequence to a base character. The base character can be any printable Unicode scalar value. Using a single base character minimizes string length and simplifies decoding.
export function encodePayload(payload: Uint8Array, baseChar: string = 'β'): string {
const vsSequence = Array.from(payload, byteToCodePoint)
.map(cp => String.fromCodePoint(cp))
.join('');
return baseChar + vsSequence;
}
Step 3: Decode and Reconstruct Binary
The decoder iterates over the string using proper Unicode iteration (for...of or Array.from), extracts code points, filters for VS ranges, and reconstructs the original byte array. This approach avoids regex pitfalls with surrogate pairs and ensures accurate boundary detection.
export function decodePayload(stegoText: string): Uint8Array | null {
const bytes: number[] = [];
const codePoints = Array.from(stegoText);
for (const char of codePoints) {
const cp = char.codePointAt(0)!;
const byte = codePointToByte(cp);
if (byte !== null) {
bytes.push(byte);
}
}
if (bytes.length === 0) return null;
return new Uint8Array(bytes);
}
Step 4: Safe Execution Boundary
Reconstructed bytes should never be passed directly to eval() or innerHTML. Instead, create a Blob with the appropriate MIME type, generate an object URL, and inject it via a script tag or module import. This maintains CSP compliance and isolates the payload from the main execution context.
export async function executePayload(decoded: Uint8Array, type: 'text/javascript' | 'application/json' = 'text/javascript'): Promise<void> {
const blob = new Blob([decoded], { type });
const url = URL.createObjectURL(blob);
if (type === 'text/javascript') {
const script = document.createElement('script');
script.src = url;
script.onload = () => URL.revokeObjectURL(url);
document.head.appendChild(script);
} else {
const response = await fetch(url);
const data = await response.json();
console.log('Decoded configuration:', data);
URL.revokeObjectURL(url);
}
}
Architecture Decisions and Rationale
- Single Base Character: Appending all selectors to one character reduces string fragmentation and simplifies extraction. Interleaving selectors across multiple characters increases decoding complexity without meaningful security gains.
- Explicit Code Point Iteration: JavaScript's
String.prototype.charCodeAt() operates on UTF-16 code units, which breaks for supplementary planes (U+10000+). Using Array.from() or for...of ensures correct handling of U+E0100+ ranges.
- Blob-Based Execution: Direct evaluation violates modern Content Security Policies and introduces injection vulnerabilities. Blob URLs provide a sandboxed, revocable execution boundary that aligns with production security standards.
- Deterministic Mapping: The 1:1 byte-to-VS relationship eliminates ambiguity during decoding. No length prefixes or delimiters are required because the decoder filters exclusively for valid VS code points.
Pitfall Guide
1. Unicode Normalization Stripping
Explanation: Text processing pipelines often apply NFC or NFD normalization, which can collapse combining characters or strip invisible modifiers. Some platforms treat Variation Selectors as ignorable diacritics.
Fix: Pre-normalize input using String.prototype.normalize('NFC') before encoding. Test target platforms explicitly, as normalization behavior varies between browsers, OS clipboards, and messaging apps.
2. Font Rendering Fallback Artifacts
Explanation: Not all fonts support the full Variation Selector range. Unsupported selectors may render as missing glyph boxes, tofu characters, or visible placeholders.
Fix: Use widely supported base characters (e.g., standard emoji or Latin letters). Test rendering across target environments. If artifacts appear, switch to a base character with broader font coverage.
3. Payload Size and String Length Limits
Explanation: Each byte becomes one character. A 10KB payload results in a 10KB string. Some platforms impose message length limits or truncate long strings.
Fix: Compress payloads using CompressionStream before encoding. Chunk large payloads across multiple base characters if platform limits require it. Monitor string length against target platform constraints.
Explanation: Scanning long strings for code points in JavaScript can become CPU-intensive if implemented naively. Regex-based extraction fails on supplementary planes and creates backtracking overhead.
Fix: Use Array.from() or for...of loops with early termination. Avoid String.match() or RegExp. Pre-allocate typed arrays when possible to reduce garbage collection pressure.
5. Clipboard Corruption on Mobile OS
Explanation: iOS and Android clipboard handlers sometimes strip non-ASCII combining characters or normalize text aggressively during paste operations.
Fix: Validate clipboard behavior on target devices. Provide a fallback transport method (e.g., QR code or short URL) for mobile-first workflows. Test with actual paste events, not just simulated inputs.
6. Security Scanner False Positives
Explanation: DLP tools and repository scanners may flag unusual Unicode sequences as obfuscation attempts, triggering alerts or blocking commits.
Fix: Accept that this technique is for covert transport, not evasion of advanced static analysis. Use it in controlled environments (internal docs, chat, configuration files) rather than public repositories. Document the encoding scheme internally to prevent false positives.
7. Execution Context Violations
Explanation: Injecting decoded payloads directly into the DOM or using eval() bypasses CSP, introduces XSS vectors, and breaks module boundaries.
Fix: Always use Blob URLs with explicit MIME types. Revoke object URLs after execution. Prefer import() for ES modules or fetch() for JSON payloads. Never trust decoded content without validation.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Internal configuration delivery | Unicode VS Embedding | Survives chat/email, invisible to users, zero infrastructure | Low (client-side decode only) |
| Public repository watermarking | Base64 in Comments | Visible, auditable, version-control friendly | None |
| High-security payload transport | Encrypted Blob + URL | CSP-compliant, revocable, scanner-friendly | Medium (requires hosting) |
| Mobile-first data sharing | QR Code + Short URL | Bypasses clipboard normalization, native support | Low (third-party SDK) |
Configuration Template
// stego-transport.ts
export interface StegoConfig {
baseChar: string;
compression: boolean;
executionMode: 'blob' | 'module';
}
export const DEFAULT_CONFIG: StegoConfig = {
baseChar: 'β',
compression: true,
executionMode: 'blob'
};
export async function transportPayload(
raw: Uint8Array,
config: Partial<StegoConfig> = {}
): Promise<string> {
const merged = { ...DEFAULT_CONFIG, ...config };
let payload = raw;
if (merged.compression) {
const cs = new CompressionStream('deflate-raw');
const writer = cs.writable.getWriter();
writer.write(payload);
writer.close();
const compressed = await new Response(cs.readable).arrayBuffer();
payload = new Uint8Array(compressed);
}
return encodePayload(payload, merged.baseChar);
}
export async function receivePayload(
stegoText: string,
config: Partial<StegoConfig> = {}
): Promise<Uint8Array> {
const merged = { ...DEFAULT_CONFIG, ...config };
const decoded = decodePayload(stegoText);
if (!decoded) throw new Error('No valid payload detected');
if (merged.compression) {
const ds = new DecompressionStream('deflate-raw');
const writer = ds.writable.getWriter();
writer.write(decoded);
writer.close();
const decompressed = await new Response(ds.readable).arrayBuffer();
return new Uint8Array(decompressed);
}
return decoded;
}
Quick Start Guide
- Prepare your payload: Convert your configuration, script, or data into a
Uint8Array. If using JSON, serialize first, then encode to UTF-8 bytes.
- Encode to text: Call
transportPayload(payload) with your desired base character. The function returns a single string containing the invisible selector sequence.
- Transport safely: Copy the resulting string into your target platform (chat, email, documentation). Verify it appears as a single character.
- Decode on receipt: Pass the received string to
receivePayload(stegoText). The function reconstructs the original Uint8Array, handling decompression if enabled.
- Execute securely: Use the provided
executePayload() utility to create a Blob URL and inject the content without violating CSP or exposing raw evaluation vectors.