I hid an entire webpage inside a cat face

By Codcompass Team·2026-05-19·8 min read

Covert Payload Delivery via Unicode Variation Selectors

Current Situation Analysis

Modern data transport pipelines treat plain text as a benign, human-readable medium. Security scanners, DLP (Data Loss Prevention) systems, and content filters routinely allow unrestricted text flow because it lacks executable structure. This assumption creates a blind spot: plain text can carry arbitrary binary payloads without altering its visual appearance or triggering standard inspection heuristics.

The problem is consistently overlooked because developers assume Unicode combining characters are either stripped during normalization, rendered as missing glyphs, or corrupted by clipboard handlers. In reality, the Unicode standard explicitly defines Variation Selectors (VS) as invisible modifiers that attach to base characters without changing their visual representation. Two specific ranges exist for this purpose: U+FE00–U+FE0F (16 code points) and U+E0100–U+E01EF (240 code points). Together, they provide exactly 256 distinct values, mapping perfectly to a single byte (0–255).

This mapping enables a deterministic steganographic channel. Any byte sequence can be translated into a string of invisible selectors appended to a visible base character. The resulting text survives copy-paste operations across Slack, Discord, iMessage, email clients, and documentation platforms. Rendering engines treat the selectors as zero-width modifiers, while JavaScript string APIs preserve them as distinct code points. The technique transforms ordinary text into a transport vehicle for executable payloads, configuration blobs, or encrypted data, bypassing filters that only inspect visible characters or whitespace.

WOW Moment: Key Findings

The following comparison demonstrates why Unicode Variation Selector embedding outperforms traditional covert channels in text-heavy environments:

Approach	Visual Detectability	Transport Resilience	Payload Density	Decoding Overhead
Image Steganography	High (requires image upload)	Medium (compression strips LSB data)	Low (requires large carrier)	High (requires canvas/image parsing)
Base64 in Comments	High (visible string)	High (survives most pipelines)	Medium (33% size overhead)	Low (native decode)
Unicode VS Embedding	Zero (invisible to humans)	Very High (survives copy-paste, DLP, email)	High (1:1 byte-to-character ratio)	Medium (requires code point mapping)

This finding matters because it decouples payload delivery from file uploads, network requests, or visible text modifications. Developers can embed configuration, feature flags, or initialization scripts directly into documentation, chat messages, or README files. The payload remains invisible to readers, survives platform normalization, and requires only a lightweight decoder to reconstruct the original binary. This enables zero-footprint data transport, secure configuration injection, and resilient watermarking without altering repository structure or triggering security alerts.

Core Solution

The implementation relies on three phases: byte-to-VS mapping, string construction, and runtime decoding. The architecture prioritizes deterministic encoding, Unicode-safe iteration, and safe execution boundaries.

Step 1: Define the Variation Selector Mapping

The 256-byte space maps directly to the two VS ranges. The first 16 bytes (0–15) map to U+FE00–U+FE0F. The remaining 240 bytes (16–255) map to U+E0100–U+E01EF. This split avoids surrogat

e pair complications in early JavaScript engines while maintaining a flat lookup table.

const VS_LOW_START = 0xFE00;
const VS_HIGH_START = 0xE0100;
const VS_LOW_COUNT = 16;

function byteToCodePoint(byte: number): number {
  if (byte < VS_LOW_COUNT) {
    return VS_LOW_START + byte;
  }
  return VS_HIGH_START + (byte - VS_LOW_COUNT);
}

function codePointToByte(cp: number): number | null {
  if (cp >= VS_LOW_START && cp < VS_LOW_START + VS_LOW_COUNT) {
    return cp - VS_LOW_START;
  }
  if (cp >= VS_HIGH_START && cp < VS_HIGH_START + 240) {
    return (cp - VS_HIGH_START) + VS_LOW_COUNT;
  }
  return null;
}

Step 2: Encode Binary Payload into Text

The encoder accepts a Uint8Array, maps each byte to its corresponding Variation Selector, and appends the sequence to a base character. The base character can be any printable Unicode scalar value. Using a single base character minimizes string length and simplifies decoding.

export function encodePayload(payload: Uint8Array, baseChar: string = '◉'): string {
  const vsSequence = Array.from(payload, byteToCodePoint)
    .map(cp => String.fromCodePoint(cp))
    .join('');
  return baseChar + vsSequence;
}

Step 3: Decode and Reconstruct Binary

The decoder iterates over the string using proper Unicode iteration (for...of or Array.from), extracts code points, filters for VS ranges, and reconstructs the original byte array. This approach avoids regex pitfalls with surrogate pairs and ensures accurate boundary detection.

export function decodePayload(stegoText: string): Uint8Array | null {
  const bytes: number[] = [];
  const codePoints = Array.from(stegoText);
  
  for (const char of codePoints) {
    const cp = char.codePointAt(0)!;
    const byte = codePointToByte(cp);
    if (byte !== null) {
      bytes.push(byte);
    }
  }
  
  if (bytes.length === 0) return null;
  return new Uint8Array(bytes);
}

Step 4: Safe Execution Boundary

Reconstructed bytes should never be passed directly to eval() or innerHTML. Instead, create a Blob with the appropriate MIME type, generate an object URL, and inject it via a script tag or module import. This maintains CSP compliance and isolates the payload from the main execution context.

export async function executePayload(decoded: Uint8Array, type: 'text/javascript' | 'application/json' = 'text/javascript'): Promise<void> {
  const blob = new Blob([decoded], { type });
  const url = URL.createObjectURL(blob);
  
  if (type === 'text/javascript') {
    const script = document.createElement('script');
    script.src = url;
    script.onload = () => URL.revokeObjectURL(url);
    document.head.appendChild(script);
  } else {
    const response = await fetch(url);
    const data = await response.json();
    console.log('Decoded configuration:', data);
    URL.revokeObjectURL(url);
  }
}

Architecture Decisions and Rationale

Single Base Character: Appending all selectors to one character reduces string fragmentation and simplifies extraction. Interleaving selectors across multiple characters increases decoding complexity without meaningful security gains.
Explicit Code Point Iteration: JavaScript's String.prototype.charCodeAt() operates on UTF-16 code units, which breaks for supplementary planes (U+10000+). Using Array.from() or for...of ensures correct handling of U+E0100+ ranges.
Blob-Based Execution: Direct evaluation violates modern Content Security Policies and introduces injection vulnerabilities. Blob URLs provide a sandboxed, revocable execution boundary that aligns with production security standards.
Deterministic Mapping: The 1:1 byte-to-VS relationship eliminates ambiguity during decoding. No length prefixes or delimiters are required because the decoder filters exclusively for valid VS code points.

Pitfall Guide

1. Unicode Normalization Stripping

Explanation: Text processing pipelines often apply NFC or NFD normalization, which can collapse combining characters or strip invisible modifiers. Some platforms treat Variation Selectors as ignorable diacritics. Fix: Pre-normalize input using String.prototype.normalize('NFC') before encoding. Test target platforms explicitly, as normalization behavior varies between browsers, OS clipboards, and messaging apps.

2. Font Rendering Fallback Artifacts

Explanation: Not all fonts support the full Variation Selector range. Unsupported selectors may render as missing glyph boxes, tofu characters, or visible placeholders. Fix: Use widely supported base characters (e.g., standard emoji or Latin letters). Test rendering across target environments. If artifacts appear, switch to a base character with broader font coverage.

3. Payload Size and String Length Limits

Explanation: Each byte becomes one character. A 10KB payload results in a 10KB string. Some platforms impose message length limits or truncate long strings. Fix: Compress payloads using CompressionStream before encoding. Chunk large payloads across multiple base characters if platform limits require it. Monitor string length against target platform constraints.

4. Decoding Performance Degradation

Explanation: Scanning long strings for code points in JavaScript can become CPU-intensive if implemented naively. Regex-based extraction fails on supplementary planes and creates backtracking overhead. Fix: Use Array.from() or for...of loops with early termination. Avoid String.match() or RegExp. Pre-allocate typed arrays when possible to reduce garbage collection pressure.

5. Clipboard Corruption on Mobile OS

Explanation: iOS and Android clipboard handlers sometimes strip non-ASCII combining characters or normalize text aggressively during paste operations. Fix: Validate clipboard behavior on target devices. Provide a fallback transport method (e.g., QR code or short URL) for mobile-first workflows. Test with actual paste events, not just simulated inputs.

6. Security Scanner False Positives

Explanation: DLP tools and repository scanners may flag unusual Unicode sequences as obfuscation attempts, triggering alerts or blocking commits. Fix: Accept that this technique is for covert transport, not evasion of advanced static analysis. Use it in controlled environments (internal docs, chat, configuration files) rather than public repositories. Document the encoding scheme internally to prevent false positives.

7. Execution Context Violations

Explanation: Injecting decoded payloads directly into the DOM or using eval() bypasses CSP, introduces XSS vectors, and breaks module boundaries. Fix: Always use Blob URLs with explicit MIME types. Revoke object URLs after execution. Prefer import() for ES modules or fetch() for JSON payloads. Never trust decoded content without validation.

Production Bundle

Action Checklist

Validate target platform clipboard behavior: Paste encoded text into Slack, Discord, email, and mobile apps to confirm selector preservation.
Implement Unicode-safe iteration: Replace all charCodeAt() calls with Array.from() or for...of to handle supplementary planes correctly.
Add payload compression: Integrate CompressionStream before encoding to reduce string length and improve transport reliability.
Enforce safe execution boundaries: Replace eval() and innerHTML with Blob URLs and explicit MIME type declarations.
Test font rendering across environments: Verify that Variation Selectors remain invisible in Chrome, Safari, Firefox, and mobile webviews.
Implement length validation: Add checks to reject payloads exceeding platform message limits or browser string constraints.
Document internal encoding schema: Share the mapping table and decoder module with your team to prevent security scanner false positives.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Internal configuration delivery	Unicode VS Embedding	Survives chat/email, invisible to users, zero infrastructure	Low (client-side decode only)
Public repository watermarking	Base64 in Comments	Visible, auditable, version-control friendly	None
High-security payload transport	Encrypted Blob + URL	CSP-compliant, revocable, scanner-friendly	Medium (requires hosting)
Mobile-first data sharing	QR Code + Short URL	Bypasses clipboard normalization, native support	Low (third-party SDK)

Configuration Template

// stego-transport.ts
export interface StegoConfig {
  baseChar: string;
  compression: boolean;
  executionMode: 'blob' | 'module';
}

export const DEFAULT_CONFIG: StegoConfig = {
  baseChar: '◉',
  compression: true,
  executionMode: 'blob'
};

export async function transportPayload(
  raw: Uint8Array,
  config: Partial<StegoConfig> = {}
): Promise<string> {
  const merged = { ...DEFAULT_CONFIG, ...config };
  let payload = raw;
  
  if (merged.compression) {
    const cs = new CompressionStream('deflate-raw');
    const writer = cs.writable.getWriter();
    writer.write(payload);
    writer.close();
    const compressed = await new Response(cs.readable).arrayBuffer();
    payload = new Uint8Array(compressed);
  }
  
  return encodePayload(payload, merged.baseChar);
}

export async function receivePayload(
  stegoText: string,
  config: Partial<StegoConfig> = {}
): Promise<Uint8Array> {
  const merged = { ...DEFAULT_CONFIG, ...config };
  const decoded = decodePayload(stegoText);
  if (!decoded) throw new Error('No valid payload detected');
  
  if (merged.compression) {
    const ds = new DecompressionStream('deflate-raw');
    const writer = ds.writable.getWriter();
    writer.write(decoded);
    writer.close();
    const decompressed = await new Response(ds.readable).arrayBuffer();
    return new Uint8Array(decompressed);
  }
  
  return decoded;
}

Quick Start Guide

Prepare your payload: Convert your configuration, script, or data into a Uint8Array. If using JSON, serialize first, then encode to UTF-8 bytes.
Encode to text: Call transportPayload(payload) with your desired base character. The function returns a single string containing the invisible selector sequence.
Transport safely: Copy the resulting string into your target platform (chat, email, documentation). Verify it appears as a single character.
Decode on receipt: Pass the received string to receivePayload(stegoText). The function reconstructs the original Uint8Array, handling decompression if enabled.
Execute securely: Use the provided executePayload() utility to create a Blob URL and inject the content without violating CSP or exposing raw evaluation vectors.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back