Decoding and External Import
WebCodecs decodes video streams directly into VideoFrame objects. These frames are GPU-backed but read-only and ephemeral. They cannot be written to, and they may be destroyed by the browser once passed to a WebGPU import call.
interface DecodedFrame {
timestamp: number;
frame: VideoFrame;
}
class FrameImporter {
private device: GPUDevice;
private externalTexturePool: Map<number, GPUExternalTexture> = new Map();
constructor(device: GPUDevice) {
this.device = device;
}
importFrame(decoded: DecodedFrame): GPUExternalTexture {
const externalTex = this.device.importExternalTexture({
source: decoded.frame,
});
this.externalTexturePool.set(decoded.timestamp, externalTex);
decoded.frame.close();
return externalTex;
}
cleanup(timestamp: number): void {
this.externalTexturePool.delete(timestamp);
}
}
The external texture serves as a bridge, not a destination. Downstream passes require a standard texture_2d with explicit format and usage flags.
Phase 2: Normalization to Internal Work Textures
External textures lack mipmaps, cannot be bound as storage textures, and have restricted sampling parameters. The pipeline copies the external frame into an internal rgba8unorm texture during the first render pass. This copy is a GPU-to-GPU blit, not a CPU readback.
const WORK_TEXTURE_FORMAT = 'rgba8unorm';
function createWorkTexture(device: GPUDevice, width: number, height: number): GPUTexture {
return device.createTexture({
size: [width, height],
format: WORK_TEXTURE_FORMAT,
usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING,
});
}
The normalization pass uses a simple fragment shader that samples the external texture and writes to the internal work texture. This establishes a stable input for all subsequent operations.
GPU render passes cannot safely read from and write to the same texture simultaneously. Chaining multiple effects requires alternating between two textures. This pattern is commonly called ping-pong buffering.
class PingPongBuffer {
private textures: [GPUTexture, GPUTexture];
private views: [GPUTextureView, GPUTextureView];
private activeIndex: 0 | 1 = 0;
constructor(device: GPUDevice, width: number, height: number) {
this.textures = [
createWorkTexture(device, width, height),
createWorkTexture(device, width, height),
];
this.views = this.textures.map(t => t.createView());
}
get read(): GPUTextureView { return this.views[this.activeIndex]; }
get write(): GPUTextureView { return this.views[1 - this.activeIndex]; }
swap(): void {
this.activeIndex = (this.activeIndex + 1) % 2 as 0 | 1;
}
}
The effect chain iterates through enabled filters, rendering from read to write, then swapping. Transforms (position, scale, rotation, perspective) execute before the effect chain. They modify UV coordinates in the vertex or fragment stage, sampling the normalized work texture and outputting to the ping buffer.
// transform.wgsl
@group(0) @binding(0) var srcTex: texture_2d<f32>;
@group(0) @binding(1) var srcSampler: sampler;
struct Uniforms {
transformMatrix: mat4x4<f32>,
opacity: f32,
}
@group(0) @binding(2) var<uniform> u: Uniforms;
@fragment
fn main(input: VertexOutput) -> @location(0) vec4f {
let localUV = input.uv - vec2f(0.5);
let transformedUV = (u.transformMatrix * vec4f(localUV, 0.0, 1.0)).xy + vec2f(0.5);
let clamped = clamp(transformedUV, vec2f(0.0), vec2f(1.0));
let sampled = textureSample(srcTex, srcSampler, clamped);
return vec4f(sampled.rgb, sampled.a * u.opacity);
}
Phase 4: Compute-Based Analysis Passes
Visual effects use render pipelines. Analysis scopes (histograms, waveforms, vectorscopes, optical flow) require parallel reduction and stateful computation. Compute shaders are the correct tool.
Optical flow, for example, estimates pixel displacement between consecutive frames. The pipeline executes grayscale conversion, Gaussian pyramid downsampling, spatial/temporal gradient calculation, and Lucas-Kanade solver passes. Each stage reads from one storage texture and writes to another. The final compute dispatch writes compact statistics to a uniform buffer, not a full frame.
class AnalysisDispatcher {
private device: GPUDevice;
private resultBuffer: GPUBuffer;
constructor(device: GPUDevice) {
this.device = device;
this.resultBuffer = device.createBuffer({
size: 64,
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.UNIFORM,
});
}
dispatchOpticalFlow(encoder: GPUCommandEncoder, prevFrame: GPUTextureView, currFrame: GPUTextureView) {
const pass = encoder.beginComputePass();
pass.setPipeline(this.computePipeline);
pass.setBindGroup(0, this.bindGroup);
pass.dispatchWorkgroups(Math.ceil(currFrame.width / 8), Math.ceil(currFrame.height / 8), 1);
pass.end();
encoder.copyBufferToBuffer(
this.resultBuffer, 0,
this.stagingBuffer, 0,
64
);
}
}
The CPU reads only the 64-byte result buffer. This avoids stalling the pipeline and keeps memory traffic minimal.
Pitfall Guide
1. Full-Frame CPU Readback in the Hot Path
Explanation: Calling readPixels or getImageData on every frame forces GPU-CPU synchronization. The CPU waits for the GPU to finish rendering, copies megabytes of data, and blocks the event loop.
Fix: Route analysis through compute shaders that output compact statistics. Use GPUBufferUsage.COPY_DST and map the buffer asynchronously. Never read full frames during preview.
2. Ignoring External Texture Lifecycle
Explanation: GPUExternalTexture objects are tied to the underlying VideoFrame. Once imported, the frame may be destroyed by the browser. Attempting to reuse the external texture in subsequent passes causes validation errors or undefined behavior.
Fix: Import the external texture, copy it to an internal rgba8unorm texture in the first pass, and immediately close the VideoFrame. Treat external textures as single-use bridges.
3. Synchronous GPU-CPU Barriers
Explanation: Mapping buffers synchronously (mapAsync without proper await patterns) or reading back results before the command queue finishes stalls the main thread.
Fix: Use queue.submit() followed by buffer.mapAsync(GPUMapMode.READ). Chain analysis results through requestAnimationFrame or a dedicated worker thread. Never block the render loop waiting for GPU data.
4. Ping-Pong Buffer Misalignment
Explanation: Swapping textures without updating bind groups or mismatching texture dimensions causes sampling artifacts or validation failures.
Fix: Encapsulate ping-pong logic in a dedicated class that manages view creation, dimension validation, and swap state. Ensure all effect pipelines reference the correct read and write views before dispatch.
Explanation: Uploading uniform buffers every frame without checking for changes wastes bandwidth and triggers unnecessary pipeline rebinds.
Fix: Implement a dirty-flag system. Only call queue.writeBuffer when transform parameters, effect intensities, or analysis modes actually change. Batch uniform updates per frame.
6. Overlooking Adapter Feature Limits
Explanation: Not all WebGPU implementations support rgba8unorm storage textures, external texture imports, or compute shader workgroup sizes above 256.
Fix: Query adapter.features during initialization. Fall back to render-pass-based analysis or reduced resolution if compute/storage features are missing. Always validate texture usage flags against device capabilities.
7. Mismatched Texture Formats in Compute vs Render
Explanation: Render pipelines typically use rgba8unorm. Compute shaders often require r32float or rg32float for gradient calculations. Binding incompatible formats causes pipeline creation failures.
Fix: Explicitly define format conversion passes. Use a dedicated blit shader to convert between render and compute formats. Never assume format compatibility across pipeline types.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Real-time preview (1080p/60fps) | WebCodecs + WebGPU render pipeline | Zero-copy texture routing maintains frame budget | Low (GPU memory only) |
| Offline export / encoding | GPU processing + WebCodecs encoder | Keeps CPU free for muxing and I/O | Medium (requires encoder setup) |
| Mobile / low-tier devices | Render-pass effects + reduced resolution | Compute shaders may exceed thermal limits | Low (adaptive quality) |
| Heavy analysis (optical flow, scopes) | Compute dispatch + compact readback | Parallel reduction avoids full-frame copies | Low (minimal CPU-GPU traffic) |
| Legacy browser support | WebGL2 fallback + CPU analysis | WebGPU not universally available | High (performance degradation) |
Configuration Template
// pipeline.config.ts
export const PipelineConfig = {
texture: {
format: 'rgba8unorm' as GPUTextureFormat,
usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING,
},
compute: {
workgroupSize: [8, 8, 1] as [number, number, number],
resultBufferSize: 64,
},
sync: {
maxPendingFrames: 3,
uniformUpdateThreshold: 0.001,
},
fallback: {
enableCompute: true,
maxResolution: { width: 1920, height: 1080 },
analysisMode: 'compute' as 'compute' | 'render',
},
};
export function validateDeviceCapabilities(device: GPUDevice): boolean {
const required = ['bgra8unorm-storage', 'texture-binding-array'];
return required.every(feat => device.features.has(feat as GPUFeatureName));
}
Quick Start Guide
- Initialize the GPU Context: Request an adapter, validate required features, and create a
GPUDevice. Configure the canvas context with bgra8unorm format and webgpu usage.
- Set Up Frame Import & Normalization: Create a
FrameImporter class. On each decoded frame, call importExternalTexture, execute a blit pass to an internal rgba8unorm texture, and close the source frame.
- Build the Effect Chain: Instantiate a
PingPongBuffer matching the frame dimensions. Loop through enabled effects, calling gpuRender with read and write views, then swap. Apply transforms before the effect loop.
- Dispatch Analysis Passes: Create compute pipelines for histograms or optical flow. Bind storage textures, dispatch workgroups matching frame dimensions, and copy results to a staging buffer. Read compact statistics asynchronously without blocking the render loop.