Client-Side Image Segmentation: Architecting Private AI Workflows in the Browser

Current Situation Analysis

The standard workflow for AI-powered background removal has remained unchanged for years: capture an image, upload it to a remote endpoint, wait for a cloud GPU cluster to run a segmentation model, and download the processed result. This paradigm introduces three compounding problems. First, data privacy is compromised. Sensitive or proprietary imagery traverses third-party infrastructure with no guarantee of deletion. Second, operational costs scale linearly with usage. Every request incurs compute, bandwidth, and storage fees. Third, latency becomes unpredictable. Network round-trips and queue times degrade user experience, especially on mobile networks.

Many development teams overlook client-side inference because they assume browser environments lack the computational density required for modern neural networks. This assumption is outdated. WebAssembly (WASM) and ONNX Runtime Web have matured to the point where segmentation models can run entirely within the main thread or a dedicated worker. The trade-off is well-documented: a one-time download of approximately 40MB for model weights, followed by instant local caching. Inference typically completes in 2–5 seconds on desktop hardware and 8–15 seconds on mobile devices. Memory consumption peaks around 144MB for a 4000×3000 pixel image when maintaining three full-resolution canvases (original, mask, output). While this exceeds the comfort zone of low-end devices, it eliminates server costs entirely and guarantees zero data egress.

The industry has normalized cloud-dependent AI because it abstracts complexity. However, for privacy-first applications, offline-capable tools, or cost-constrained products, shifting the inference boundary to the client is no longer experimental—it is a production-ready architecture.

WOW Moment: Key Findings

Moving segmentation logic to the browser fundamentally alters the cost, privacy, and scalability profile of image processing tools. The following comparison illustrates the architectural divergence between traditional server-rendered pipelines and modern client-side WASM implementations.

Approach	Data Privacy	Inference Latency (First Run)	Operational Cost	Scalability Limit	Memory Footprint
Server-Side API	Low (data leaves device)	1.5–4s (network + queue)	$0.01–$0.05 per request	Bounded by backend capacity	Negligible client-side
Client-Side WASM	High (zero egress)	2–5s desktop / 8–15s mobile	$0 (after initial 40MB download)	Infinite (user hardware)	~144MB peak (4K image)

This finding matters because it decouples product growth from infrastructure provisioning. A client-side architecture scales horizontally across user devices rather than vertically through cloud instances. It also enables offline functionality, reduces compliance overhead (GDPR/CCPA), and removes per-request pricing models that often dictate feature gating. The initial 40MB weight download is a one-time tax that the browser's HTTP cache neutralizes on subsequent visits, effectively turning every user's device into a private inference node.

Core Solution

Building a browser-native segmentation pipeline requires coordinating three distinct subsystems: model acquisition, interactive mask editing, and pixel compositing. The architecture below uses TypeScript, the Canvas 2D API, and ONNX Runtime Web to create a deterministic, privacy-preserving workflow.

Step 1: Model Acquisition & Runtime Initialization

The segmentation backbone relies on @imgly/background-removal, which bundles an ONNX model with the WebAssembly backend of ONNX Runtime Web. Dynamic imports allow the browser to fetch and cache the runtime and weights without blocking the initial render.

import type { RemoveBackgroundOptions, RemoveBackgroundResult } from '@imgly/background-removal';

class SegmentationRuntime {
  private removeFn: ((blob: Blob, opts?: RemoveBackgroundOptions) => Promise<RemoveBackgroundResult>) | null = null;

  async initialize(): Promise<void> {
    if (this.removeFn) return;
    
    const module = await import('https://cdn.jsdelivr.net/npm/@imgly/background-removal@1.5.5/+esm');
    this.removeFn = module.removeBackground;
  }

  get isReady(): boolean {
    return this.removeFn !== null;
  }
}

Architecture Rationale: Dynamic import() leverages native module caching. The first load downloads the WASM binary and model weights (~40MB). Subsequent instantiations resolve instantly. This avoids bundling heavy assets into the application payload and keeps the initial JS bundle under 50KB.

Step 2: Tensor Conversion & Local Inference

Once initialized, the runtime accepts a Blob or File. The library handles image resizing, pixel normalization, and tensor construction internally. The output is a transparent PNG blob containing the segmented foreground.

async function runInference(runtime: SegmentationRuntime, sourceFile: File): Promise<Blob> {
  if (!runtime.isReady) throw new Error('Runtime not initialized');

  const result = await runtime.removeFn(sourceFile, {
    model: 'medium',
    output: { format: 'image/png' },
    progress: (stage, current, total) => {
      console.debug(`[${stage}] ${Math.round((current / total) * 100)}%`);
    }
  });

  return result;
}

Architecture Rationale: The medium model variant provides the optimal balance between edge fidelity and compute time. Progress callbacks enable non-blocking UI updates. Returning a Blob instead of a base64 string reduces memory overhead during transfer to the compositing stage.

Step 3: Alpha Channel Extraction & Mask Construction

AI segmentation outputs a probability map, not a perfect binary mask. To enable manual correction, we extract the alpha channel and construct a dedicated grayscale canvas. White pixels represent retained foreground; black pixels represent removed background.

function constructEditableMask(segmentedBlob: Blob, width: number, height: number): HTMLCanvasElement {
  const maskCanvas = document.createElement('canvas');
  maskCanvas.width = width;
  maskCanvas.height = height;
  const ctx = maskCanvas.getContext('2d', { willReadFrequently: true })!;

  const img = new Image();
  img.src = URL.createObjectURL(segmentedBlob);
  
  return new Promise((resolve) => {
    img.onload = () => {
      ctx.drawImage(img, 0, 0);
      const pixelData = ctx.getImageData(0, 0, width, height);
      
      for (let i = 3; i < pixelData.data.length; i += 4) {
        const alpha = pixelData.data[i];
        pixelData.data[i - 3] = alpha;
        pixelData.data[i - 2] = alpha;
        pixelData.data[i - 1] = alpha;
        pixelData.data[i] = 255;
      }
      
      ctx.putImageData(pixelData, 0, 0);
      URL.revokeObjectURL(img.src);
      resolve(maskCanvas);
    };
  });
}

Architecture Rationale: The willReadFrequently: true hint optimizes the canvas context for repeated getImageData calls. Converting alpha to a grayscale RGB triplet standardizes the mask for brush operations. Explicitly revoking object URLs prevents memory leaks during rapid file processing.

Step 4: Interactive Refinement Engine

Users must correct AI errors around hair, transparent objects, and color-similar backgrounds. A dedicated editing canvas handles brush and eraser strokes, mapping display coordinates to full-resolution mask coordinates.

interface BrushConfig {
  size: number;
  softness: number;
  mode: 'restore' | 'remove';
}

class MaskEditor {
  private canvas: HTMLCanvasElement;
  private ctx: CanvasRenderingContext2D;
  private lastPos: { x: number; y: number } | null = null;

  constructor(targetCanvas: HTMLCanvasElement) {
    this.canvas = targetCanvas;
    this.ctx = targetCanvas.getContext('2d', { willReadFrequently: true })!;
  }

  applyStroke(clientX: number, clientY: number, config: BrushConfig, displayRect: DOMRect) {
    const scaleX = this.canvas.width / displayRect.width;
    const scaleY = this.canvas.height / displayRect.height;
    const x = (clientX - displayRect.left) * scaleX;
    const y = (clientY - displayRect.top) * scaleY;

    this.ctx.lineCap = 'round';
    this.ctx.lineWidth = config.size;
    this.ctx.filter = config.softness > 0 ? `blur(${Math.round(config.size * config.softness * 0.3)}px)` : 'none';

    if (config.mode === 'restore') {
      this.ctx.globalCompositeOperation = 'lighter';
      this.ctx.strokeStyle = '#ffffff';
    } else {
      this.ctx.globalCompositeOperation = 'source-over';
      this.ctx.strokeStyle = '#000000';
    }

    if (this.lastPos) {
      this.ctx.beginPath();
      this.ctx.moveTo(this.lastPos.x, this.lastPos.y);
      this.ctx.lineTo(x, y);
      this.ctx.stroke();
    }
    this.lastPos = { x, y };
  }

  reset() {
    this.lastPos = null;
    this.ctx.clearRect(0, 0, this.canvas.width, this.canvas.height);
  }
}

Architecture Rationale: Coordinate scaling ensures strokes align perfectly with the underlying image regardless of CSS viewport constraints. globalCompositeOperation: 'lighter' additive blending prevents hard edges when painting over semi-transparent regions. The blur filter on the canvas context creates feathered brush strokes without requiring complex gradient math.

Step 5: Final Compositing & Export

The final step merges the original RGB data with the edited mask's alpha channel. This pixel-level operation produces the downloadable transparent PNG.

function composeFinalOutput(originalCanvas: HTMLCanvasElement, maskCanvas: HTMLCanvasElement): HTMLCanvasElement {
  const width = originalCanvas.width;
  const height = originalCanvas.height;
  const outputCanvas = document.createElement('canvas');
  outputCanvas.width = width;
  outputCanvas.height = height;
  const outCtx = outputCanvas.getContext('2d')!;

  const origData = originalCanvas.getContext('2d')!.getImageData(0, 0, width, height);
  const maskData = maskCanvas.getContext('2d')!.getImageData(0, 0, width, height);
  const outputData = outCtx.createImageData(width, height);

  for (let i = 0; i < origData.data.length; i += 4) {
    outputData.data[i] = origData.data[i];
    outputData.data[i + 1] = origData.data[i + 1];
    outputData.data[i + 2] = origData.data[i + 2];
    outputData.data[i + 3] = maskData.data[i];
  }

  outCtx.putImageData(outputData, 0, 0);
  return outputCanvas;
}

Architecture Rationale: Direct Uint8ClampedArray manipulation is significantly faster than drawing layers with globalAlpha. Mapping the mask's red channel to the output's alpha channel preserves semi-transparent values, which is critical for maintaining natural hair edges and motion blur.

Pitfall Guide

1. Coordinate System Drift

Explanation: CSS scales canvases to fit viewports, but getImageData and stroke operations require native pixel coordinates. Failing to map client coordinates to canvas resolution causes strokes to misalign with the image. Fix: Calculate scaleX = canvas.width / rect.width and scaleY = canvas.height / rect.height on every pointer event. Apply these ratios to clientX and clientY before drawing.

2. Main Thread Blocking During Inference

Explanation: ONNX Runtime Web executes synchronously by default. Large images can freeze the UI for several seconds, triggering browser "page unresponsive" warnings. Fix: Offload inference to a Web Worker. Use postMessage to transfer the Blob and listen for completion events. Alternatively, yield control periodically using await new Promise(r => setTimeout(r, 0)) if workers are unavailable.

3. Unbounded Undo History

Explanation: Storing full ImageData snapshots for every stroke quickly exhausts memory. A 4K image snapshot consumes ~48MB. Ten strokes equal nearly half a gigabyte. Fix: Implement a circular buffer with a maximum depth (e.g., 20 states). When the limit is reached, shift the oldest state out before pushing the new one. Compress snapshots using OffscreenCanvas transfer if available.

4. Hard-Edged Brush Artifacts

Explanation: Standard canvas strokes produce sharp boundaries. When correcting AI masks, hard edges create visible halos or unnatural cutouts around subjects. Fix: Apply ctx.filter = 'blur(Xpx)' proportional to brush size and softness settings. Alternatively, draw radial gradients with decreasing alpha instead of solid lines.

5. Ignoring Alpha Channel Precision

Explanation: Treating the mask as strictly binary (0 or 255) discards valuable semi-transparent data. AI models output probability maps where gray values represent partial foreground confidence. Fix: Preserve 8-bit grayscale values during compositing. When applying the mask to the original image, map the mask's R channel directly to the output's A channel without thresholding.

6. Touch Event Scroll Interference

Explanation: Mobile browsers interpret touch movements as page scrolling. This interrupts brush strokes and causes jarring UI behavior. Fix: Attach touchstart and touchmove listeners with { passive: false }. Call event.preventDefault() inside the handler to disable default scrolling while editing.

7. Memory Leaks from Blob URLs

Explanation: Creating object URLs for every processed image without revoking them accumulates hidden memory references. Browsers do not garbage collect these URLs automatically. Fix: Always call URL.revokeObjectURL(url) after the image loads or after compositing completes. Track active URLs in a Set and clear them on component unmount or workflow completion.

Production Bundle

Action Checklist

Initialize ONNX Runtime Web via dynamic import and verify cache behavior across sessions
Offload inference to a Web Worker to prevent main thread jank on high-resolution inputs
Implement coordinate ratio mapping for all pointer events before canvas operations
Configure a circular undo buffer with a hard limit of 20 states to cap memory usage
Apply ctx.filter: blur() dynamically based on brush softness to preserve edge feathering
Attach touch listeners with passive: false and preventDefault() to stabilize mobile editing
Revoke all URL.createObjectURL references immediately after image load or compositing
Add error boundaries around getImageData calls to handle tainted canvas scenarios

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
High-volume SaaS product	Client-side WASM + `medium` model	Eliminates per-request API fees; scales infinitely	$0 compute cost after initial CDN delivery
Low-end mobile target	Server-side API or `tiny` model	Reduces memory pressure and inference time	Increases backend costs or reduces edge fidelity
Real-time video processing	WebGL compositing + WebGPU inference	Canvas 2D `getImageData` is too slow for 30fps	Requires specialized shader development
Enterprise compliance	Client-side pipeline	Zero data egress satisfies strict data residency rules	Higher initial client bundle size (~40MB)
Batch processing workflow	Sequential worker queue + IndexedDB	Prevents memory spikes and allows pause/resume	Adds storage complexity and queue management

Configuration Template

// segmentation.config.ts
export interface SegmentationConfig {
  modelVariant: 'tiny' | 'medium' | 'large';
  outputFormat: 'image/png' | 'image/webp';
  maxUndoStates: number;
  enableTouchOptimization: boolean;
  workerOffload: boolean;
}

export const DEFAULT_CONFIG: SegmentationConfig = {
  modelVariant: 'medium',
  outputFormat: 'image/png',
  maxUndoStates: 20,
  enableTouchOptimization: true,
  workerOffload: true,
};

// Usage
import { SegmentationRuntime } from './runtime';
import { DEFAULT_CONFIG } from './segmentation.config';

const engine = new SegmentationRuntime(DEFAULT_CONFIG);
await engine.initialize();

Quick Start Guide

Initialize the runtime: Create a new TypeScript project and add @imgly/background-removal as a dependency. Import the runtime class and call initialize() on application mount.
Wire the file input: Attach a change listener to a <input type="file">. Pass the selected File object to runInference(). Display a progress indicator using the progress callback.
Build the editing surface: Once inference completes, call constructEditableMask() with the returned blob and original dimensions. Attach pointer event listeners to a dedicated canvas element and route them through MaskEditor.applyStroke().
Export the result: When the user confirms edits, call composeFinalOutput() with the original and mask canvases. Convert the resulting canvas to a blob using canvas.toBlob() and trigger a download via a dynamically created <a> element.

I Open-Sourced a Browser-Based AI Background Remover — Here's the Full Architecture