Back to KB
Difficulty
Intermediate
Read Time
10 min

How We Index 15,000+ eSIM Plans Across 120+ Providers

By Codcompass Team··10 min read

Architecting a Multi-Source Telecom Inventory Index: From Fragmented APIs to Unified Search

Current Situation Analysis

Aggregating telecommunications inventory—specifically eSIM data plans—presents a deceptively complex data engineering challenge. The industry pain point isn't search speed or frontend filtering; it's maintaining a live, accurate index across highly fragmented, rapidly changing sources. Providers operate on independent release cycles, use proprietary data schemas, and frequently adjust pricing, coverage maps, and plan availability without standardized notification mechanisms.

This problem is routinely underestimated because engineering teams prioritize query latency and UI/UX over ingestion resilience. Many organizations attempt to treat telecom data like static product catalogs, relying on periodic CSV exports or uniform polling intervals. The reality is that eSIM inventory behaves more like financial market data: prices fluctuate, SKUs expire silently, and coverage boundaries shift. A static snapshot degrades within days. Without a dynamic, tiered ingestion pipeline, comparison platforms default to 30–90 day stale data, which directly impacts conversion rates and user trust.

The scale compounds the difficulty. Indexing 15,000+ plans across 120+ providers requires handling multi-currency pricing, variable data caps, validity windows, feature flags (5G, hotspot, VoIP), and country coverage spanning up to 195 regions. Each provider exposes this data through different channels: structured APIs, public web interfaces, or direct data partnerships. The normalization overhead alone can consume more engineering bandwidth than the search layer itself. Successful architectures treat ingestion as a first-class system, not an afterthought.

WOW Moment: Key Findings

The critical insight emerges when comparing ingestion strategies against operational metrics. Uniform polling or static exports create either excessive infrastructure costs or unacceptable data staleness. A tiered, hybrid approach optimizes freshness, cost, and reliability simultaneously.

ApproachMetric 1Metric 2Metric 3
Static Export720+ hoursLow<1 second
Uniform Polling12 hoursHigh~3 seconds
Tiered Hybrid Ingestion6–24 hoursMedium<2 seconds

This finding matters because it decouples data freshness from infrastructure spend. By routing high-volume providers through tighter refresh cycles and long-tail providers through relaxed intervals, the pipeline maintains sub-2-second query performance while reducing unnecessary API calls and scraper load. The tiered model also isolates failure domains: a broken scraper for a low-traffic provider doesn't cascade into index-wide degradation. Most importantly, it enables real-time price anomaly detection without saturating rate limits, turning a data maintenance problem into a competitive advantage.

Core Solution

Building a resilient aggregation pipeline requires four interconnected components: source classification, schema normalization, tiered scheduling, and anomaly validation. Each component must operate independently but share a unified contract for plan representation.

Step 1: Source Classification & Routing

Not all providers should be treated equally. Classify sources into three categories:

  • API-First: Structured endpoints with documented rate limits and predictable payloads.
  • Scrape-Dependent: Public interfaces lacking APIs or exposing incomplete data.
  • Push-Partner: Direct integrations where providers emit updates via webhooks or message queues.

Route each provider to the appropriate ingestion worker. API workers use authenticated HTTP clients with exponential backoff. Scrape workers run headless browsers or HTTP parsers with UI-change detection. Push workers consume events from a message broker.

Step 2: Schema Normalization Engine

Provider payloads vary wildly. A robust normalizer must handle unit conversions, string parsing, and region mapping without hardcoding provider-specific logic. Use a rule-based transformer that applies sequential parsing stages.

interface RawPlanPayload {
  source: string;
  payload: Record<string, unknown>;
}

interface NormalizedPlan {
  providerId: string;
  planId: string;
  dataBytes: number;
  validityHours: number;
  priceCents: number;
  currency: string;
  coverageIsoCodes: string[];
  features: string[];
  lastVerifiedAt: number;
}

class PlanNormalizer {
  private unitMap: Record<string, number> = {
    GB: 1_073_741_824,
    MB: 1_048_576,
    KB: 1_024,
    TB: 1_0

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back