I Open-Sourced a Browser-Based AI Background Remover — Here's the Full Architecture
Client-Side Image Segmentation: Architecting Private AI Workflows in the Browser
Current Situation Analysis
The standard workflow for AI-powered background removal has remained unchanged for years: capture an image, upload it to a remote endpoint, wait for a cloud GPU cluster to run a segmentation model, and download the processed result. This paradigm introduces three compounding problems. First, data privacy is compromised. Sensitive or proprietary imagery traverses third-party infrastructure with no guarantee of deletion. Second, operational costs scale linearly with usage. Every request incurs compute, bandwidth, and storage fees. Third, latency becomes unpredictable. Network round-trips and queue times degrade user experience, especially on mobile networks.
Many development teams overlook client-side inference because they assume browser environments lack the computational density required for modern neural networks. This assumption is outdated. WebAssembly (WASM) and ONNX Runtime Web have matured to the point where segmentation models can run entirely within the main thread or a dedicated worker. The trade-off is well-documented: a one-time download of approximately 40MB for model weights, followed by instant local caching. Inference typically completes in 2–5 seconds on desktop hardware and 8–15 seconds on mobile devices. Memory consumption peaks around 144MB for a 4000×3000 pixel image when maintaining three full-resolution canvases (original, mask, output). While this exceeds the comfort zone of low-end devices, it eliminates server costs entirely and guarantees zero data egress.
The industry has normalized cloud-dependent AI because it abstracts complexity. However, for privacy-first applications, offline-capable tools, or cost-constrained products, shifting the inference boundary to the client is no longer experimental—it is a production-ready architecture.
WOW Moment: Key Findings
Moving segmentation logic to the browser fundamentally alters the cost, privacy, and scalability profile of image processing tools. The following comparison illustrates the architectural divergence between traditional server-rendered pipelines and modern client-side WASM implementations.
| Approach | Data Privacy | Inference Latency (First Run) | Operational Cost | Scalability Limit | Memory Footprint |
|---|---|---|---|---|---|
| Server-Side API | Low (data leaves device) | 1.5–4s (network + queue) | $0.01–$0.05 per request | Bounded by backend capacity | Negligible client-side |
| Client-Side WASM | High (zero egress) | 2–5s desktop / 8–15s mobile | $0 (after initial 40MB download) | Infinite (user hardware) | ~144MB peak (4K image) |
This finding matters because it decouples product growth from infrastructure provisioning. A client-side architecture scales horizontally across user devices rather than vertically through cloud instances. It also enables offline functionality, reduces compliance overhead (GDPR/CCPA), and removes per-request pricing models that often dictate feature gating. The initial 40MB weight download is a one-time tax that the browser's HTTP cache neutralizes on subsequent visits, effectively turning every user's device into a private inference node.
Core Solution
Building a browser-native segmentation pipeline requires coordinating three distinct subsystems: model acquisition, interactive mask editing, and pixel compositing. The architecture below uses TypeScript, the Canvas 2D API, and ONNX Runtime Web to create a deterministic, privacy-preserving workflow.
Step 1: Model Acquisition & Runtime Initialization
The segmentation backbone relies on @imgly/background-removal, which bundles an ONNX model with the WebAssembly backend of ONNX Runtime Web. Dynamic imports allow the browser to fetch and cache the runtime and weights without blocking the initial render.
import type { RemoveBackgroundOptions, RemoveBackgroundResult } from '@imgly/background-removal';
class SegmentationRuntime {
private removeFn: ((blob: Blob, opts?: RemoveBackgroundOptions) => Promise<RemoveBackgroundResult>) | null = null;
async initialize(): Promise<void> {
if (this.removeFn) return;
const module = await import('https://cdn.jsdelivr.net/npm/@imgly/background-removal@1.5.5/+esm');
this.removeFn = module.removeBackground;
}
get isReady(): boolean {
return this.removeFn !== null;
}
}
Architecture Rationale: Dynamic import() leverages native module caching. The first load downloads the WASM binary and model weights (~40MB). Subsequent instantiations resolve instantly. This avoids bundling heavy assets into the application payload and keeps the initial JS bundle under 50KB.
Step 2: Tensor Conversion & Local Inference
Once initialized, the runtime accepts a Blob or File. The library handles image resizing, pixel normalization, and tensor construction internally. The output is a transparent PNG blob containing the segmented foreground.
async function runInference(runtime: SegmentationRuntime, sourceFile: File): Promise<Blob> {
if (!runtime.isReady) throw new Error('Runtime not initialized');
const result = await runtime.removeFn(sourceFile, {
model: 'medium',
output: { format: 'image/png' },
progress: (stage, current, total) => {
console.debug(`[${stage}] ${Math.round((current / total) * 100)}%`);
}
});
return result;
}
Architecture Rationale: The medium model variant provides the optimal balance between edge fidelity and compute time. Progress callbacks enable non-blocking UI updates. Returning a Blob instead of a base64 string reduces memory overhead during transfer to the compositing stage.
Step 3: Alpha Channel Extraction & Mask Construction
AI segmentation outputs a probability map, not a perfect binary mask. To enable manual correction, we extract the alpha channel and construct a dedicated grayscale canvas. White pixels represent retained foreground; black pixels represent removed background.
function constructEditableMask(segmentedBlob: Blob, width: number, height: number): HTMLCanvasElement {
const maskCanvas = document.createElement('canvas');
maskCanvas.width = width;
maskCanvas.height = height;
const ctx = maskCanvas.getContext('2d', { willReadFrequently: true })!;
const img = new Image();
img.src = URL.createObjectURL(segmentedBlob);
return new Promise((resolve) => {
img.onload = () => {
ctx.drawImage(img, 0, 0);
const pixelData = ctx.getImageData(0, 0, width, height);
for (let i = 3; i < pixelData.data.length; i += 4) {
const alpha = pixelData.data[i];
pixelData.data[i - 3] = alpha;
pixelData.data[i - 2] = alpha;
pixelData.data[i - 1] = alpha;
pixelData.data[i] = 255;
}
ctx.putImageData(pixelData, 0, 0);
URL.revokeObjectURL(img.src);
resolve(maskCanvas);
};
});
}
Architecture Rationale: The willReadFrequently: true hint optimizes the canvas context for repeated getImageData calls. Converting alpha to a grayscale RGB triplet standardizes the mask for brush operations. Explicitly revoking object URLs prevents memory leaks during rapid file processing.
Step 4: Interactive Refinement Engine
Users must correct AI errors around hair, transparent objects, and color-similar backgrounds. A dedicated editing canvas handles brush and eraser strokes, mapping display coordinates to full-resolution mask coordinates.
interface BrushConfig {
size: number;
softness: number;
mode: 'restore' | 'remove';
}
class MaskEditor {
private canvas: HTMLCanvasElement;
private ctx: CanvasRenderingContext2D;
private lastPos: { x: number; y: number } | null = null;
constructor(targetCanvas: HTMLCanvasElement) {
this.canvas = targetCanvas;
this.ctx = targetCanvas.getContext('2d', { willReadFrequently: true })!;
}
applyStroke(clientX: number, clientY: number, config: BrushConfig, displayRect: DOMRect) {
const scaleX = this.canvas.width / displayRect.width;
const scaleY = this.canvas.height / displayRect.height;
const x = (clientX - displayRect.left) * scaleX;
const y = (clientY - displayRect.top) * scaleY;
this.ctx.lineCap = 'round';
this.ctx.lineWidth = config.size;
this.ctx.filter = config.softness > 0 ? `blur(${Math.round(config.size * config.softness * 0.3)}px)` : 'none';
if (config.mode === 'restore') {
this.ctx.globalCompositeOperation = 'lighter';
this.ctx.strokeStyle = '#ffffff';
} else {
this.ctx.globalCompositeOperation = 'source-over';
this.ctx.strokeStyle = '#000000';
}
if (this.lastPos) {
this.ctx.beginPath();
this.ctx.moveTo(this.lastPos.x, this.lastPos.y);
this.ctx.lineTo(x, y);
this.ctx.stroke();
}
this.lastPos = { x, y };
}
reset() {
this.lastPos = null;
this.ctx.clearRect(0, 0, this.canvas.width, this.canvas.height);
}
}
Architecture Rationale: Coordinate scaling ensures strokes align perfectly with the underlying image regardless of CSS viewport constraints. globalCompositeOperation: 'lighter' additive blending prevents hard edges when painting over semi-transparent regions. The blur filter on the canvas context creates feathered brush strokes without requiring complex gradient math.
Step 5: Final Compositing & Export
The final step merges the original RGB data with the edited mask's alpha channel. This pixel-level operation produces the downloadable transparent PNG.
function composeFinalOutput(originalCanvas: HTMLCanvasElement, maskCanvas: HTMLCanvasElement): HTMLCanvasElement {
const width = originalCanvas.width;
const height = originalCanvas.height;
const outputCanvas = document.createElement('canvas');
outputCanvas.width = width;
outputCanvas.height = height;
const outCtx = outputCanvas.getContext('2d')!;
const origData = originalCanvas.getContext('2d')!.getImageData(0, 0, width, height);
const maskData = maskCanvas.getContext('2d')!.getImageData(0, 0, width, height);
const outputData = outCtx.createImageData(width, height);
for (let i = 0; i < origData.data.length; i += 4) {
outputData.data[i] = origData.data[i];
outputData.data[i + 1] = origData.data[i + 1];
outputData.data[i + 2] = origData.data[i + 2];
outputData.data[i + 3] = maskData.data[i];
}
outCtx.putImageData(outputData, 0, 0);
return outputCanvas;
}
Architecture Rationale: Direct Uint8ClampedArray manipulation is significantly faster than drawing layers with globalAlpha. Mapping the mask's red channel to the output's alpha channel preserves semi-transparent values, which is critical for maintaining natural hair edges and motion blur.
Pitfall Guide
1. Coordinate System Drift
Explanation: CSS scales canvases to fit viewports, but getImageData and stroke operations require native pixel coordinates. Failing to map client coordinates to canvas resolution causes strokes to misalign with the image.
Fix: Calculate scaleX = canvas.width / rect.width and scaleY = canvas.height / rect.height on every pointer event. Apply these ratios to clientX and clientY before drawing.
2. Main Thread Blocking During Inference
Explanation: ONNX Runtime Web executes synchronously by default. Large images can freeze the UI for several seconds, triggering browser "page unresponsive" warnings.
Fix: Offload inference to a Web Worker. Use postMessage to transfer the Blob and listen for completion events. Alternatively, yield control periodically using await new Promise(r => setTimeout(r, 0)) if workers are unavailable.
3. Unbounded Undo History
Explanation: Storing full ImageData snapshots for every stroke quickly exhausts memory. A 4K image snapshot consumes ~48MB. Ten strokes equal nearly half a gigabyte.
Fix: Implement a circular buffer with a maximum depth (e.g., 20 states). When the limit is reached, shift the oldest state out before pushing the new one. Compress snapshots using OffscreenCanvas transfer if available.
4. Hard-Edged Brush Artifacts
Explanation: Standard canvas strokes produce sharp boundaries. When correcting AI masks, hard edges create visible halos or unnatural cutouts around subjects.
Fix: Apply ctx.filter = 'blur(Xpx)' proportional to brush size and softness settings. Alternatively, draw radial gradients with decreasing alpha instead of solid lines.
5. Ignoring Alpha Channel Precision
Explanation: Treating the mask as strictly binary (0 or 255) discards valuable semi-transparent data. AI models output probability maps where gray values represent partial foreground confidence. Fix: Preserve 8-bit grayscale values during compositing. When applying the mask to the original image, map the mask's R channel directly to the output's A channel without thresholding.
6. Touch Event Scroll Interference
Explanation: Mobile browsers interpret touch movements as page scrolling. This interrupts brush strokes and causes jarring UI behavior.
Fix: Attach touchstart and touchmove listeners with { passive: false }. Call event.preventDefault() inside the handler to disable default scrolling while editing.
7. Memory Leaks from Blob URLs
Explanation: Creating object URLs for every processed image without revoking them accumulates hidden memory references. Browsers do not garbage collect these URLs automatically.
Fix: Always call URL.revokeObjectURL(url) after the image loads or after compositing completes. Track active URLs in a Set and clear them on component unmount or workflow completion.
Production Bundle
Action Checklist
- Initialize ONNX Runtime Web via dynamic import and verify cache behavior across sessions
- Offload inference to a Web Worker to prevent main thread jank on high-resolution inputs
- Implement coordinate ratio mapping for all pointer events before canvas operations
- Configure a circular undo buffer with a hard limit of 20 states to cap memory usage
- Apply
ctx.filter: blur()dynamically based on brush softness to preserve edge feathering - Attach touch listeners with
passive: falseandpreventDefault()to stabilize mobile editing - Revoke all
URL.createObjectURLreferences immediately after image load or compositing - Add error boundaries around
getImageDatacalls to handle tainted canvas scenarios
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| High-volume SaaS product | Client-side WASM + medium model |
Eliminates per-request API fees; scales infinitely | $0 compute cost after initial CDN delivery |
| Low-end mobile target | Server-side API or tiny model |
Reduces memory pressure and inference time | Increases backend costs or reduces edge fidelity |
| Real-time video processing | WebGL compositing + WebGPU inference | Canvas 2D getImageData is too slow for 30fps |
Requires specialized shader development |
| Enterprise compliance | Client-side pipeline | Zero data egress satisfies strict data residency rules | Higher initial client bundle size (~40MB) |
| Batch processing workflow | Sequential worker queue + IndexedDB | Prevents memory spikes and allows pause/resume | Adds storage complexity and queue management |
Configuration Template
// segmentation.config.ts
export interface SegmentationConfig {
modelVariant: 'tiny' | 'medium' | 'large';
outputFormat: 'image/png' | 'image/webp';
maxUndoStates: number;
enableTouchOptimization: boolean;
workerOffload: boolean;
}
export const DEFAULT_CONFIG: SegmentationConfig = {
modelVariant: 'medium',
outputFormat: 'image/png',
maxUndoStates: 20,
enableTouchOptimization: true,
workerOffload: true,
};
// Usage
import { SegmentationRuntime } from './runtime';
import { DEFAULT_CONFIG } from './segmentation.config';
const engine = new SegmentationRuntime(DEFAULT_CONFIG);
await engine.initialize();
Quick Start Guide
- Initialize the runtime: Create a new TypeScript project and add
@imgly/background-removalas a dependency. Import the runtime class and callinitialize()on application mount. - Wire the file input: Attach a change listener to a
<input type="file">. Pass the selectedFileobject torunInference(). Display a progress indicator using theprogresscallback. - Build the editing surface: Once inference completes, call
constructEditableMask()with the returned blob and original dimensions. Attach pointer event listeners to a dedicated canvas element and route them throughMaskEditor.applyStroke(). - Export the result: When the user confirms edits, call
composeFinalOutput()with the original and mask canvases. Convert the resulting canvas to a blob usingcanvas.toBlob()and trigger a download via a dynamically created<a>element.
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
