Edge-First Vision Pipelines: Architecting Reliable AI for Low-Connectivity Field Operations

Current Situation Analysis

Field operations in construction, logistics, agriculture, and emergency response share a common infrastructure reality: cellular coverage is inconsistent, and dead zones are the norm rather than the exception. Despite this, the default architecture for computer vision applications remains cloud-dependent. Engineering teams routinely integrate third-party vision APIs because they reduce initial development time, but this approach introduces three critical failure modes in production environments.

First, latency becomes non-deterministic. A cloud inference request typically requires image upload, server-side processing, and response delivery. Even on optimal LTE, round-trip times frequently exceed 400ms. In safety-critical workflows where inspectors need real-time feedback on hazards like exposed rebar, missing PPE, or structural anomalies, this delay breaks the inspection loop.

Second, recurring API costs scale linearly with usage. At $0.001–$0.003 per image, a single inspector capturing 200 photos daily across a mid-size commercial build generates $6–$18 per day in inference costs. Multiply this across a fleet of 50 inspectors, and monthly expenses easily surpass $9,000. These costs are often underestimated during prototyping but become a hard constraint during scale.

Third, data sovereignty and liability exposure increase significantly. Construction sites contain proprietary layouts, subcontractor workflows, and sometimes restricted areas. Routing imagery through third-party cloud endpoints creates compliance risks and complicates audit trails. When a safety tooling vendor cannot guarantee where footage is stored or how long it's retained, procurement teams frequently block deployment.

The misconception driving this pattern is that cloud AI is inherently more accurate and easier to maintain. In reality, modern on-device inference engines have closed the accuracy gap for domain-specific tasks while delivering deterministic performance, zero recurring inference costs, and complete data residency control. The engineering challenge shifts from API integration to model optimization, pipeline orchestration, and offline-first data architecture.

WOW Moment: Key Findings

The architectural trade-offs between cloud-dependent and edge-first vision systems become stark when measured against operational requirements rather than prototyping convenience.

Architecture	Inference Latency	Monthly Cost (10k images)	Offline Resilience	Data Residency
Cloud Vision API	400–800 ms	$10–$30	None	Third-party
On-Device Pipeline	30–80 ms	$0	Full	Local
Hybrid Fallback	50–600 ms	$5–$15	Partial	Mixed

This comparison reveals why edge-first design is not merely a cost-saving tactic but a reliability requirement for field operations. On-device inference eliminates network jitter, guarantees consistent frame processing rates, and ensures the application functions identically in a basement utility room as it does on a rooftop. The $0 inference cost removes usage-based pricing anxiety, allowing teams to capture imagery at the frequency safety protocols actually demand. Data staying on the device satisfies strict compliance frameworks without requiring complex data processing agreements.

The finding that matters most: determinism outweighs marginal accuracy gains in safety workflows. A cloud API might achieve 2% higher mAP on benchmark datasets, but if it fails to return results during a connectivity drop, the safety gap becomes unacceptable. Edge pipelines trade theoretical peak accuracy for guaranteed availability, which aligns with operational risk tolerance.

Core Solution

Building a production-ready on-device vision pipeline requires coordinated decisions across model selection, inference orchestration, data persistence, and synchronization. The architecture below demonstrates a two-stage detection-classification workflow optimized for mobile constraints.

Step 1: Model Selection and Conversion

YOLOv8s serves as the detection backbone. The small variant balances mean average precision (mAP) with inference speed, typically weighing ~22 MB when exported to CoreML. The nano variant runs faster but struggles with small objects at distance, which is unacceptable for identifying PPE compliance or structural hazards. MobileNetV3 handles the classification stage, refining detection regions to distinguish between contextual states (e.g., "hard hat worn correctly" vs. "hard hat held in hand").

Conversion requires precise preprocessing alignment. Cloud APIs often apply automatic normalization, but on-device engines expect explicit tensor transformations.

# model_export.py
import coremltools as ct
from ultralytics import YOLO
import torch

def export_detection_model(model_path: str, output_path: str) -> None:
    yolo_model = YOLO(model_path)
    # Export to TorchScript first for controlled conversion
    torch_script = yolo_model.export(format="torchscript", imgsz=640)
    
    # Convert to CoreML with explicit preprocessing
    mlmodel = ct.convert(
        torch_script,
        inputs=[ct.ImageType(name="input_image", shape=(1, 3, 640, 640))],
        compute_precision=ct.precision.FLOAT16,
        minimum_deployment_target=ct.target.iOS15
    )
    mlmodel.save(output_path)
    print(f"Detection model exported to {output_path}")

def export_classification_model(model_path: str, output_path: str) -> None:
    # MobileNetV3 classification export
    classifier = ct.models.MLModel(model_path)
    classifier.save(output_path)
    print(f"Classification model exported to {output_path}")

Step 2: Two-Stage Pipeline Orchestration

The inference pipeline chains detection and classification sequentially. Detection runs on the full frame, returning bounding boxes and confidence scores. Regions of interest (ROIs) are cropped, normalized, and passed to the classifier. This separation reduces compute load by avoiding classification on the entire image.

// visionPipeline.ts
import { CoreMLModel } from './coremlBridge';
import { DetectionResult, ClassificationResult } from './types';

interface PipelineConfig {
  detectionThreshold: number;
  classificationThreshold: number;
  maxConcurrentInferences: number;
}

export class FieldVisionPipeline {
  private detector: CoreMLModel;
  private classifier: CoreMLModel;
  private config: PipelineConfig;
  private inferenceQueue: Array<() => Promise<void>> = [];
  private activeJobs: number = 0;

  constructor(detectorModel: CoreMLModel, classifierModel: CoreMLModel, config: PipelineConfig) {
    this.detector = detectorModel;
    this.classifier = classifierModel;
    this.config = config;
  }

  async processFrame(frameBuffer: Uint8Array, width: number, height: number): Promise<DetectionResult[]> {
    const detections = await this.detector.predict(frameBuffer, width, height);
    const filtered = detections.filter(d => d.confidence >= this.config.detectionThreshold);
    
    const classified = await Promise.all(
      filtered.map(async (det) => {
        const roi = this.extractROI(frameBuffer, width, height, det.bbox);
        const cls = await this.classifier.predict(roi);
        return {
          ...det,
          classification: cls.label,
          classificationConfidence: cls.confidence
        };
      })
    );

    return classified.filter(c => c.classificationConfidence >= this.config.classificationThreshold);
  }

  private extractROI(buffer: Uint8Array, w: number, h: number, bbox: number[]): Uint8Array {
    const [x, y, bw, bh] = bbox.map(v => Math.round(v));
    const roiSize = bw * bh * 3;
    const roi = new Uint8Array(roiSize);
    let srcIdx = 0;
    for (let row = y; row < y + bh; row++) {
      const srcRowStart = (row * w + x) * 3;
      roi.set(buffer.slice(srcRowStart, srcRowStart + bw * 3), srcIdx);
      srcIdx += bw * 3;
    }
    return roi;
  }
}

Step 3: Offline-First Data Architecture

All inspection artifacts must persist locally before any network interaction. Drizzle ORM over SQLite provides type-safe relational storage with deterministic transaction guarantees. The schema captures imagery references, geolocation, detection results, and inspector notes in a single atomic write.

// db/schema.ts
import { sqliteTable, text, integer, blob } from 'drizzle-orm/sqlite-core';
import { sql } from 'drizzle-orm';

export const inspectionSessions = sqliteTable('inspection_sessions', {
  id: text('id').primaryKey(),
  siteId: text('site_id').notNull(),
  inspectorId: text('inspector_id').notNull(),
  startedAt: integer('started_at', { mode: 'timestamp' }).notNull(),
  completedAt: integer('completed_at', { mode: 'timestamp' }),
  syncStatus: text('sync_status').default('pending').notNull()
});

export const hazardDetections = sqliteTable('hazard_detections', {
  id: text('id').primaryKey(),
  sessionId: text('session_id').references(() => inspectionSessions.id).notNull(),
  frameIndex: integer('frame_index').notNull(),
  bbox: text('bbox').notNull(), // JSON stringified
  classification: text('classification').notNull(),
  confidence: real('confidence').notNull(),
  capturedAt: integer('captured_at', { mode: 'timestamp' }).notNull(),
  syncStatus: text('sync_status').default('pending').notNull()
});

Step 4: Synchronization Engine

Connectivity is treated as an optional enhancement, not a prerequisite. A background sync queue batches pending records, handles retry logic with exponential backoff, and resolves conflicts using last-write-wins with audit trails. Supabase serves as the replication target, but the sync engine remains provider-agnostic.

// sync/FieldSyncEngine.ts
import { db } from '../db/connection';
import { hazardDetections, inspectionSessions } from '../db/schema';
import { eq, isNull } from 'drizzle-orm';

interface SyncConfig {
  batchSize: number;
  maxRetries: number;
  backoffMultiplier: number;
  endpoint: string;
}

export class FieldSyncEngine {
  private config: SyncConfig;
  private isRunning: boolean = false;

  constructor(config: SyncConfig) {
    this.config = config;
  }

  async executeSync(): Promise<void> {
    if (this.isRunning) return;
    this.isRunning = true;

    try {
      const pendingSessions = await db.select().from(inspectionSessions).where(eq(inspectionSessions.syncStatus, 'pending')).limit(this.config.batchSize);
      const pendingHazards = await db.select().from(hazardDetections).where(eq(hazardDetections.syncStatus, 'pending')).limit(this.config.batchSize);

      if (pendingSessions.length === 0 && pendingHazards.length === 0) return;

      const payload = { sessions: pendingSessions, hazards: pendingHazards };
      const response = await fetch(this.config.endpoint, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(payload)
      });

      if (response.ok) {
        await this.markSynced(pendingSessions.map(s => s.id), pendingHazards.map(h => h.id));
      } else {
        throw new Error(`Sync failed: ${response.status}`);
      }
    } catch (error) {
      console.warn('Sync retry scheduled', error);
      await this.scheduleRetry();
    } finally {
      this.isRunning = false;
    }
  }

  private async markSynced(sessionIds: string[], hazardIds: string[]): Promise<void> {
    if (sessionIds.length > 0) {
      await db.update(inspectionSessions).set({ syncStatus: 'synced' }).where(eq(inspectionSessions.id, sessionIds[0]));
    }
    if (hazardIds.length > 0) {
      await db.update(hazardDetections).set({ syncStatus: 'synced' }).where(eq(hazardDetections.id, hazardIds[0]));
    }
  }

  private scheduleRetry(): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, 5000 * this.config.backoffMultiplier));
  }
}

Architecture Rationale

The two-stage pipeline reduces compute overhead by isolating heavy classification to cropped regions. YOLOv8s handles spatial localization efficiently, while MobileNetV3 provides lightweight semantic refinement. Keeping the total model bundle under 50 MB ensures App Store cellular auto-download compliance, removing friction during field deployment. Drizzle ORM enforces type safety across the local database, preventing schema drift during offline operations. The sync queue operates independently of the UI thread, guaranteeing that inspection workflows never block on network availability.

Pitfall Guide

1. Preprocessing Mismatch Between Cloud and Edge

Cloud vision APIs often apply implicit normalization, padding, and resizing. On-device engines require explicit tensor transformations. Feeding raw camera buffers without matching the training pipeline causes silent accuracy degradation. Fix: Replicate exact preprocessing steps: mean/std normalization, letterbox padding, and channel ordering. Validate with a known test image across both environments.

2. Hardcoding COCO Default Thresholds

Benchmark datasets use standardized confidence cutoffs that don't translate to domain-specific imagery. Construction sites contain clutter, variable lighting, and occluded objects. Default thresholds produce excessive false negatives. Fix: Calibrate thresholds using precision-recall curves on field-captured data. Lower detection thresholds to 0.3–0.4 and rely on the classification stage to filter noise.

3. Bolting On Offline Sync After UI Development

Adding synchronization logic to an already-built interface creates race conditions, lost writes, and inconsistent state. The UI assumes network availability, causing crashes when connectivity drops. Fix: Design the sync queue and conflict resolution strategy before implementing any inspection screens. Treat network calls as optional background tasks.

4. Over-Quantizing to INT8 Across All Layers

INT8 quantization reduces model size and improves speed but degrades accuracy on small object detection. Applying it uniformly to detection heads and classification layers causes missed hazards. Fix: Use INT8 only for non-critical backbone layers. Keep detection heads and classification outputs in FP16. Validate mAP drop before deployment.

5. Ignoring Thermal Throttling on Mobile SoCs

Sustained inference pushes mobile processors into thermal throttling, causing frame drops and inconsistent latency. Developers often benchmark on cold devices and miss real-world degradation. Fix: Implement duty cycling, skip frames during high CPU load, and leverage Metal Performance Shaders for hardware acceleration. Monitor device temperature and throttle inference frequency accordingly.

6. Sync Conflict Blindness

Multiple inspectors capturing overlapping hazards or editing the same session record creates data collisions. Without conflict resolution, the last write silently overwrites critical safety notes. Fix: Implement operational transforms or append-only audit logs. Tag records with inspector IDs and timestamps. Resolve conflicts using last-write-wins with manual review flags for safety-critical fields.

7. Exceeding Cellular Bundle Limits

App Store policies restrict automatic cellular downloads to 50 MB. Shipping unoptimized models, uncompressed assets, or redundant dependencies triggers manual download prompts, increasing abandonment rates. Fix: Apply aggressive layer pruning, remove unused model branches, compress textures, and use on-demand resource loading. Verify final bundle size with xcrun altool --validate-app.

Production Bundle

Action Checklist

Export YOLOv8s and MobileNetV3 using explicit preprocessing alignment
Implement two-stage pipeline with ROI extraction and classification filtering
Define Drizzle schema with sync status flags on all inspection tables
Build background sync queue with exponential backoff and retry limits
Calibrate confidence thresholds using field-captured precision-recall data
Apply FP16 to detection heads, INT8 to non-critical backbone layers
Verify total model bundle size remains under 50 MB for cellular auto-download
Implement thermal-aware frame dropping and Metal Performance Shader acceleration

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
High-connectivity urban sites	Cloud Vision API	Stable LTE/WiFi enables low-latency round trips	$0.001–0.003/image scales linearly
Low-connectivity construction/logistics	On-Device Pipeline	Deterministic latency, zero network dependency	$0 inference, upfront engineering cost
Compliance-heavy healthcare/defense	On-Device + Local Sync	Data never leaves device, satisfies audit requirements	Higher initial dev, lower long-term liability
Cost-sensitive large-scale deployment	On-Device Pipeline	Eliminates recurring API bills across thousands of devices	CapEx shifts to engineering, OpEx drops to near zero

Configuration Template

// config/pipeline.config.ts
export const VISION_CONFIG = {
  detectionThreshold: 0.35,
  classificationThreshold: 0.60,
  maxConcurrentInferences: 2,
  frameSkipThreshold: 0.8, // Skip frame if CPU > 80%
  thermalThrottleDelay: 150 // ms delay when device temp > 40°C
};

export const SYNC_CONFIG = {
  batchSize: 50,
  maxRetries: 5,
  backoffMultiplier: 1.5,
  endpoint: 'https://api.yourdomain.com/v1/field-sync',
  retryInterval: 5000
};

export const DB_CONFIG = {
  driver: 'expo-sqlite',
  schemaVersion: 3,
  migrationStrategy: 'safe',
  encryptionKey: process.env.DB_ENCRYPTION_KEY
};

Quick Start Guide

Export Models: Run the CoreML conversion script with your domain-trained YOLOv8s and MobileNetV3 checkpoints. Verify preprocessing matches training pipeline exactly.
Initialize Local Database: Apply Drizzle migrations to create inspection and hazard tables. Set default sync status to pending for all new records.
Wire Pipeline to Camera: Attach the FieldVisionPipeline to your camera feed. Process frames at 15 FPS, applying thermal throttling and frame skipping as configured.
Deploy Sync Worker: Start the FieldSyncEngine as a background task. Monitor queue depth and retry logs. Verify Supabase replication when connectivity returns.
Validate Bundle Size: Archive the application and check the final IPA size. Ensure model assets and dependencies stay under the 50 MB cellular threshold. Adjust pruning or compression if needed.

On-Device AI for Construction Safety: Why I'm Skipping the Cloud Entirely