Back to KB
Difficulty
Intermediate
Read Time
9 min

Разбор архитектуры Reddit: Как создать высокопроизводительный загрузчик видео с поддержкой DASH и HLS

By Codcompass Team··9 min read

Architecting Client-Side Media Transmuxing for Adaptive Streaming Platforms

Current Situation Analysis

Modern social and media platforms have largely abandoned monolithic video files in favor of adaptive bitrate streaming protocols like MPEG-DASH and HLS. For end-users, this enables smooth playback across varying network conditions. For developers building downloaders, archiving tools, or media processing pipelines, it introduces a fragmented reality that is frequently misunderstood.

The core pain point is architectural: platforms like Reddit do not serve a single .mp4 file. Instead, they deliver hundreds of micro-segments split across independent audio and video tracks. Attempting to fetch the video track alone results in silent playback. Attempting to fetch segments sequentially triggers timeouts. Attempting to process everything server-side incurs prohibitive egress costs, storage overhead, and privacy liabilities.

This problem is often overlooked because developers approach streaming platforms with a legacy file-download mindset. They assume a direct URL maps to a complete media container. In reality, the "source" is a manifest file (.mpd or .m3u8) that acts as a routing table for fragmented assets. Furthermore, CDNs enforce strict header validation (User-Agent, Referer) and CORS policies that block browser-native fetches. Without a proxy layer to bridge the gap between CDN restrictions and browser security models, client-side processing becomes impossible.

Data from platform API structures confirms this complexity. Reddit's public JSON endpoints expose a secure_media object containing DASH manifest URLs, but accessing them requires header emulation. The combination of split streams, manifest parsing, CDN restrictions, and segment concurrency creates a multi-layered engineering challenge that traditional server-side FFmpeg pipelines struggle to handle efficiently at scale.

WOW Moment: Key Findings

Shifting the heavy lifting from server infrastructure to the client browser via WebAssembly fundamentally changes the cost, latency, and privacy profile of media processing tools. The following comparison highlights the operational impact of moving from a traditional server-side transcoding model to a client-side transmuxing architecture backed by a lightweight streaming proxy.

ApproachServer Egress CostProcessing LatencyPrivacy ModelOutput Quality
Server-Side TranscodingHigh (Full file download + re-encode + upload)5–15 secondsServer retains temporary media buffersRe-encoded artifacts, bitrate loss
Client-Side WASM Transmuxing + Edge ProxyNear-zero (Proxy streams only, no storage)<2 secondsZero-knowledge (memory-only processing)Bit-exact copy, original quality preserved

Why this matters: Transmuxing (remuxing) bypasses the CPU-intensive re-encoding phase entirely. By using the -c copy directive in FFmpeg, the tool repackages existing audio and video packets into a standard .mp4 container without decoding or re-encoding them. This preserves the original bitrate, eliminates quality degradation, and reduces processing time from seconds to milliseconds. Offloading this to the client eliminates server storage requirements, removes privacy compliance overhead, and scales infinitely with user count rather than infrastructure spend.

Core Solution

Building a reliable media extraction pipeline requires coordinating four distinct subsystems: manifest discovery, CDN header emulation, parallel segment retrieval, and client-side transmuxing. Each layer must be designed to handle streaming constraints, memory limits, and browser security policies.

1. Manifest Discovery via Structured API

Platforms expose metadata through predictable JSON structures. Instead of parsing HTML or reverse-engineering opaque endpoints, leverage the official JSON representation. Appending .json to a post URL returns a structured payload containing the DASH manifest URL.

interface RedditMediaPayload {
  data: {
    children: Array<{
      data: {
        secure_media?: {
          reddit_video?: {
            dash_url: string;
            fallback_url?: string;
            duration?: number;
          };
        };
      };
    }];
  };
}

export class ManifestResolver {
  async resolveManifestUrl(postUrl: string): Promise<string> {
    const jsonUrl = postUrl.replace(/\/$/, '') + '.json';
    const response = await fetch(jsonUrl, {
      headers: { 'Accept': 'application/json' }
    });
    
    if (!response.ok) throw new Error(`Manifest fetch failed: ${response.status}`);
    
    const payload: RedditMediaPayload = await response.json();
    const media = payload.data.children[0]?.data?.secure_media?.reddit_video;
    
    if (!media?.dash_url) {
      throw new Error('No DASH manifest found in payload');
    }
    
    return media.dash_url;
  }
}

2. CORS-Bypass Streaming Proxy

Browsers block cross-origin fetches to CDN domains like v.redd.it. A Node.js proxy must intercept segment requests, inject trusted headers, strip restrictive CDN responses, and stream data back to the client without buffering the entire file.

import { createServer, IncomingMessage, ServerResponse } from 'http';
import { request as httpsRequest } from 'https';

const PROXY_PORT = 3001;
const CDN_BASE = 'https://v.redd.it';

createServer((req: IncomingMessage, res: ServerResponse) => {
  const targetPath = req.url?.replace('/proxy/', '');
  if (!targetPath) {
    res.writeHead(400);
    return res.end('Missing target path');
  }

  const proxyReq = httpsRequest(
    `${CDN_BASE}${targetPath}`,
    {
      method: 'GET',
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Referer': 'https://www.reddit.com/',
        'Accept': '*/*'
      }
    },
    (proxyRes) => {
      res.writeHead(proxyRes.statusCode || 200, {
        'Content-Type': proxyRes.headers['content-type'] || 'application/octet-stream',
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Methods': 'GET, OPTIONS',
        'Access-Control-Allow-Headers': 'Content-Type'
      });
      
      // Stream directly to avoid memory accumulation
      proxyRes.pipe(res);
    }
  );

  proxyReq.on('error', (err) => {
    console.error('Proxy request failed:', err.message);
    res.writeHead(502);
    res.end('Bad Gateway');
  });

  proxyReq.end();
}).listen(PROXY_PORT, () => {
  console.log(`Streaming proxy active on port ${PROXY_PORT}`);
});

3. Parallel Segment Retrieval

DASH manifests reference dozens to hundreds of segment URLs. Fetching them sequentially creates a bottleneck. A concurrency-limited async pool ensures maximum throughput without ov

erwhelming the network stack or triggering CDN rate limits.

export class ConcurrentSegmentFetcher {
  private concurrency: number;

  constructor(concurrencyLimit: number = 8) {
    this.concurrency = concurrencyLimit;
  }

  async fetchAll(segmentUrls: string[]): Promise<ArrayBuffer[]> {
    const results: ArrayBuffer[] = new Array(segmentUrls.length);
    let currentIndex = 0;

    const worker = async () => {
      while (currentIndex < segmentUrls.length) {
        const index = currentIndex++;
        const url = segmentUrls[index];
        try {
          const response = await fetch(`/proxy/${url.split('/').pop()}`);
          results[index] = await response.arrayBuffer();
        } catch (err) {
          console.warn(`Segment ${index} failed, retrying...`);
          currentIndex--; // Push back for retry
          await new Promise(r => setTimeout(r, 500));
        }
      }
    };

    await Promise.all(Array.from({ length: this.concurrency }, () => worker()));
    return results;
  }
}

4. Client-Side Transmuxing with FFmpeg.wasm

WebAssembly enables FFmpeg to run natively in the browser. The critical optimization is using -c copy to remux streams without decoding. This preserves quality and executes in milliseconds.

import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg';

export class BrowserTransmuxer {
  private ffmpeg: ReturnType<typeof createFFmpeg>;

  constructor() {
    this.ffmpeg = createFFmpeg({ log: false, mainName: 'main' });
  }

  async initialize(): Promise<void> {
    if (!this.ffmpeg.isLoaded()) {
      await this.ffmpeg.load();
    }
  }

  async transmux(videoBuffer: ArrayBuffer, audioBuffer: ArrayBuffer): Promise<Uint8Array> {
    await this.initialize();
    
    const { fetchFile } = await import('@ffmpeg/ffmpeg');
    this.ffmpeg.FS('writeFile', 'input_video.mp4', fetchFile(videoBuffer));
    this.ffmpeg.FS('writeFile', 'input_audio.mp4', fetchFile(audioBuffer));

    await this.ffmpeg.run(
      '-i', 'input_video.mp4',
      '-i', 'input_audio.mp4',
      '-c', 'copy',
      '-movflags', '+faststart',
      'output.mp4'
    );

    const data = this.ffmpeg.FS('readFile', 'output.mp4');
    
    // Cleanup virtual filesystem to prevent memory leaks
    this.ffmpeg.FS('unlink', 'input_video.mp4');
    this.ffmpeg.FS('unlink', 'input_audio.mp4');
    this.ffmpeg.FS('unlink', 'output.mp4');

    return data;
  }
}

Architecture Rationale:

  • Proxy over Direct Fetch: Browsers enforce CORS. A streaming proxy handles header emulation and CORS injection while piping data directly to the client, avoiding server-side memory accumulation.
  • Concurrency Pool: Network latency dominates segment downloads. Limiting concurrency to 6–10 threads balances throughput with CDN rate-limit thresholds.
  • WASM Transmuxing: Server-side FFmpeg requires temporary storage, increases egress costs, and introduces privacy risks. Client-side -c copy remuxing is instantaneous, privacy-preserving, and quality-identical to the source.

Pitfall Guide

1. Ignoring Split-Stream Architecture

Explanation: Fetching only the video track manifest results in a playable but silent file. Many developers assume the primary stream contains both tracks. Fix: Always parse the DASH manifest to identify separate Representation IDs for video and audio content types. Fetch and process both tracks independently before remuxing.

2. Sequential Segment Downloads

Explanation: DASH manifests can contain 200+ segments. Sequential await fetch() calls create a linear bottleneck, causing timeouts and poor UX. Fix: Implement a concurrency-controlled async pool. Limit parallel requests to 6–10 to avoid triggering CDN throttling while maximizing bandwidth utilization.

3. Re-encoding Instead of Transmuxing

Explanation: Running FFmpeg without -c copy forces a full decode/encode cycle. This degrades quality, spikes CPU usage, and increases processing time by 10–50x. Fix: Always use -c copy for container conversion. Only re-encode when format conversion (e.g., WebM to MP4) or resolution scaling is explicitly required.

4. Proxy Memory Leaks

Explanation: Buffering entire video files in the proxy before sending them to the client causes Node.js heap exhaustion, especially for long-form content. Fix: Use ReadableStream piping (proxyRes.pipe(res)). Never accumulate chunks in memory. Set appropriate highWaterMark values if backpressure becomes an issue.

5. WASM Virtual Filesystem Accumulation

Explanation: FFmpeg.wasm stores files in a browser-side virtual filesystem. Failing to clean up after transmuxing causes memory leaks that crash the tab on subsequent runs. Fix: Explicitly call FS('unlink', filename) for all input and output files after reading the result. Consider resetting the FS instance if processing multiple videos in a single session.

6. CORS Header Omission

Explanation: The proxy returns data but omits Access-Control-Allow-Origin. The browser blocks the response, causing silent fetch failures. Fix: Always inject Access-Control-Allow-Origin: * (or specific origins) and handle OPTIONS preflight requests. Strip any X-Frame-Options or Strict-Transport-Security headers that interfere with client-side consumption.

7. Audio/Video Timestamp Drift

Explanation: If audio and video segments are fetched out of order or mismatched by index, the remuxed file exhibits sync drift or playback stuttering. Fix: Maintain strict index alignment between audio and video segment arrays. Validate segment durations against the manifest's SegmentTemplate before remuxing.

Production Bundle

Action Checklist

  • Validate manifest structure: Ensure the JSON endpoint returns a valid DASH URL before proceeding.
  • Configure proxy headers: Emulate a trusted browser User-Agent and Referer to bypass CDN 403 blocks.
  • Implement concurrency limits: Set parallel fetch count to 6–10 based on target CDN rate limits.
  • Initialize WASM early: Load FFmpeg.wasm during app idle time to avoid UI blocking during transmux.
  • Use -c copy exclusively: Verify FFmpeg command flags to prevent accidental re-encoding.
  • Clean up WASM FS: Unlink all virtual files post-transmux to prevent memory leaks.
  • Handle segment failures: Implement retry logic with exponential backoff for dropped network requests.
  • Stream proxy responses: Pipe CDN responses directly to client without server-side buffering.

Decision Matrix

ScenarioRecommended ApproachWhyCost Impact
High-traffic public toolClient-Side WASM + Streaming ProxyZero server storage, scales with users, privacy-compliantNear-zero infrastructure cost
Enterprise archiving (compliance)Server-Side Transcoding with Ephemeral StorageCentralized logging, audit trails, controlled environmentHigh egress + storage costs
Low-end mobile devicesServer-Side TranscodingWASM initialization and memory overhead may crash low-RAM browsersModerate compute cost
Real-time preview generationClient-Side WASM (thumbnail extraction)Instant feedback, no network roundtrip for processingMinimal bandwidth usage

Configuration Template

// proxy-server.ts
import { createServer } from 'http';
import { request as httpsRequest } from 'https';

const PROXY_PORT = process.env.PROXY_PORT || 3001;
const CDN_ORIGIN = 'https://v.redd.it';

createServer((req, res) => {
  if (req.method === 'OPTIONS') {
    res.writeHead(204, {
      'Access-Control-Allow-Origin': '*',
      'Access-Control-Allow-Methods': 'GET, OPTIONS',
      'Access-Control-Allow-Headers': 'Content-Type'
    });
    return res.end();
  }

  const target = req.url?.replace('/stream/', '');
  if (!target) return res.writeHead(400).end('Invalid path');

  const proxyReq = httpsRequest(`${CDN_ORIGIN}${target}`, {
    method: 'GET',
    headers: {
      'User-Agent': 'Mozilla/5.0 (compatible; MediaProxy/1.0)',
      'Referer': 'https://www.reddit.com/',
      'Accept': '*/*'
    }
  }, (proxyRes) => {
    res.writeHead(proxyRes.statusCode || 200, {
      'Content-Type': proxyRes.headers['content-type'] || 'application/octet-stream',
      'Access-Control-Allow-Origin': '*',
      'Cache-Control': 'no-cache'
    });
    proxyRes.pipe(res);
  });

  proxyReq.on('error', () => res.writeHead(502).end('Proxy Error'));
  proxyReq.end();
}).listen(PROXY_PORT);
// transmux-service.ts
import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg';

export const createTransmuxer = () => {
  const ffmpeg = createFFmpeg({ log: false, mainName: 'main' });
  
  return {
    async init() {
      if (!ffmpeg.isLoaded()) await ffmpeg.load();
    },
    async merge(videoBuf: ArrayBuffer, audioBuf: ArrayBuffer): Promise<Uint8Array> {
      await this.init();
      ffmpeg.FS('writeFile', 'vid.mp4', fetchFile(videoBuf));
      ffmpeg.FS('writeFile', 'aud.mp4', fetchFile(audioBuf));
      
      await ffmpeg.run('-i', 'vid.mp4', '-i', 'aud.mp4', '-c', 'copy', '-movflags', '+faststart', 'out.mp4');
      
      const result = ffmpeg.FS('readFile', 'out.mp4');
      ['vid.mp4', 'aud.mp4', 'out.mp4'].forEach(f => ffmpeg.FS('unlink', f));
      return result;
    }
  };
};

Quick Start Guide

  1. Deploy the streaming proxy: Run the Node.js proxy script on a lightweight container or serverless function. Ensure it forwards requests to the target CDN while injecting CORS and browser-emulation headers.
  2. Install FFmpeg.wasm: Add @ffmpeg/ffmpeg and @ffmpeg/util to your frontend project. Load the WASM core and shared library during application initialization.
  3. Fetch and parse the manifest: Query the platform's JSON endpoint, extract the DASH URL, and parse the XML manifest to isolate audio and video segment paths.
  4. Download segments concurrently: Use a concurrency pool to fetch audio and video segments through the proxy. Maintain strict index alignment between the two arrays.
  5. Transmux and deliver: Pass the segment buffers to FFmpeg.wasm with -c copy. Read the resulting Uint8Array, trigger a browser download, and clean up the virtual filesystem.