Swiggy Improves Search Autocomplete Using Real Time Machine Learning Ranking

By Codcompass Team·2026-05-18·8 min read

Architecting Real-Time Learning-to-Rank Autocomplete with OpenSearch and Feature Stores

Current Situation Analysis

Search autocomplete is often treated as a simple prefix-matching problem, but in production environments with large catalogs, it is a high-stakes ranking challenge. Users expect suggestions within milliseconds, yet the relevance of those suggestions directly impacts conversion rates and user retention.

The industry pain point lies in the tension between latency constraints and ranking complexity. Traditional heuristic approaches—such as lexicographical sorting, static popularity boosts, or TF-IDF scoring—fail to capture dynamic user intent. They cannot react to real-time trends, such as a sudden spike in demand for specific items due to weather, events, or inventory changes.

This problem is frequently misunderstood as purely an infrastructure challenge. Engineering teams often assume that simply adding a machine learning model will solve relevance issues. However, deploying ML in autocomplete requires a specialized architecture that decouples candidate retrieval from scoring, manages feature freshness, and enforces strict latency budgets. Without real-time signals, models quickly become stale, ranking items based on historical data that no longer reflects current user behavior.

Data from production search systems indicates that heuristic ranking can degrade relevance metrics (NDCG) by 15-20% compared to models utilizing real-time interaction signals. The critical insight is that feature velocity matters as much as model accuracy. A slightly simpler model fed with fresh, real-time features often outperforms a complex model relying on batch-updated data.

WOW Moment: Key Findings

The following comparison illustrates the trade-offs between common autocomplete strategies. The data highlights why real-time Learning-to-Rank (LTR) architectures are becoming the standard for high-performance search, despite their implementation complexity.

Approach	P99 Latency	Relevance (NDCG@10)	Adaptability to Trends	Engineering Overhead
Heuristic/Static	< 15ms	Low	None	Low
Batch ML Ranking	40-80ms	Medium	Daily/Hourly	Medium
Real-Time LTR	< 50ms	High	Instant	Medium-High

Why this matters: Real-Time LTR achieves the "sweet spot" where relevance is maximized without violating the sub-50ms latency threshold typical for autocomplete. The ability to adapt instantly enables systems to prioritize items based on live signals, such as clicks in the last 5 minutes or current inventory levels, which batch systems miss entirely. This approach transforms autocomplete from a static lookup into a dynamic, context-aware recommendation engine.

Core Solution

Building a real-time LTR autocomplete system requires a decoupled architecture that separates candidate generation from ranking. This design allows each component to scale independently and optimizes for latency by parallelizing I/O and computation.

Architecture Overview

Candidate Generation: A fast retrieval layer (e.g., OpenSearch) fetches potential matches based on the user's input prefix. This step focuses on recall and speed, returning a superset of candidates (e.g., top 50-100 items).
Feature Enrichment: A feature store retrieves real-time signals for each candidate. Features include user context (location, device), item context (price, category), and dynamic signals (click-through rate, conversion rate, inventory status).
Ranking: An LTR model scores the enriched candidates. The model can be deployed as a plugin within the search engine or as a sidecar service.
Response Assembly: The top-K ranked candidates are formatted and returned to the client.

Technical Implementation

The following TypeScript example demonstrates the orchestration logic. This implementation e

mphasizes non-blocking I/O, batch feature fetching, and strict latency management.

import { OpenSearchClient } from '@opensearch-project/opensearch';
import { FeatureStoreClient } from './feature-store';
import { LTRModel } from './ltr-model';

interface AutocompleteRequest {
  query: string;
  userId: string;
  context: {
    location: string;
    device: string;
  };
}

interface Candidate {
  id: string;
  title: string;
  score: number;
}

interface FeatureVector {
  candidateId: string;
  features: Record<string, number>;
}

export class AutocompleteOrchestrator {
  private searchClient: OpenSearchClient;
  private featureStore: FeatureStoreClient;
  private rankingModel: LTRModel;
  private latencyBudgetMs: number;

  constructor(
    searchClient: OpenSearchClient,
    featureStore: FeatureStoreClient,
    rankingModel: LTRModel,
    latencyBudgetMs: number = 45
  ) {
    this.searchClient = searchClient;
    this.featureStore = featureStore;
    this.rankingModel = rankingModel;
    this.latencyBudgetMs = latencyBudgetMs;
  }

  async getRankedSuggestions(request: AutocompleteRequest): Promise<Candidate[]> {
    const startTime = Date.now();
    
    // Step 1: Candidate Generation
    // Retrieve a superset of candidates using prefix matching.
    // This leverages OpenSearch's inverted index for O(1) retrieval speed.
    const candidates = await this.generateCandidates(request.query);
    
    if (candidates.length === 0) return [];

    // Step 2: Feature Enrichment
    // Fetch real-time features in a single batch to minimize network round trips.
    const featureVectors = await this.fetchFeatures(
      candidates.map(c => c.id),
      request.context
    );

    // Step 3: Ranking
    // Score candidates using the LTR model.
    // The model is optimized for low-latency inference.
    const scoredCandidates = this.rankingModel.score(
      candidates,
      featureVectors
    );

    // Step 4: Latency Check and Truncation
    const elapsed = Date.now() - startTime;
    if (elapsed > this.latencyBudgetMs) {
      // Fallback: Return top candidates based on static score if ML ranking is too slow.
      console.warn(`Latency budget exceeded: ${elapsed}ms`);
      return scoredCandidates.slice(0, 5).sort((a, b) => b.score - a.score);
    }

    return scoredCandidates.slice(0, 10);
  }

  private async generateCandidates(query: string): Promise<Candidate[]> {
    const response = await this.searchClient.search({
      index: 'products',
      body: {
        query: {
          match_phrase_prefix: {
            title: query
          }
        },
        size: 50, // Superset size for ranking
        _source: ['id', 'title']
      }
    });

    return response.hits.hits.map(hit => ({
      id: hit._id,
      title: hit._source.title,
      score: 0
    }));
  }

  private async fetchFeatures(
    candidateIds: string[],
    context: AutocompleteRequest['context']
  ): Promise<FeatureVector[]> {
    // Batch fetch features to reduce latency.
    // Feature store handles caching and real-time aggregation.
    const features = await this.featureStore.getFeatures({
      entityIds: candidateIds,
      context: context,
      features: [
        'ctr_last_1h',
        'conversion_rate_last_24h',
        'inventory_level',
        'price_tier',
        'category_popularity'
      ]
    });

    return features;
  }
}

Architecture Decisions and Rationale

Decoupled Retrieval and Ranking: By separating these stages, we can optimize retrieval for speed using OpenSearch's native capabilities while applying complex ML logic only to the candidate set. This prevents the ranking model from becoming a bottleneck for the entire search index.
Batch Feature Fetching: Real-time features are fetched in a single batch request rather than per-candidate. This reduces network overhead and ensures consistent latency regardless of the candidate set size.
Feature Store Integration: A dedicated feature store provides low-latency access to real-time signals. It abstracts the complexity of aggregating user behavior data (e.g., windowed clicks) and ensures feature consistency across training and inference.
Latency Budget Enforcement: The orchestrator includes a hard latency check. If the ML pipeline exceeds the budget, the system falls back to a static ranking. This guarantees a responsive user experience even under load or during feature store degradation.
LTR Model Deployment: The model should be deployed using a mechanism that minimizes inference latency. OpenSearch's LTR plugin allows scoring within the search node, eliminating network hops. Alternatively, a sidecar service with GPU acceleration can handle more complex models if latency permits.

Pitfall Guide

Implementing real-time LTR autocomplete introduces specific risks. The following pitfalls are derived from production experience and include mitigation strategies.

Feature Staleness vs. Latency Trade-off
- Explanation: Real-time features require frequent updates. Aggressive caching can lead to stale data, while excessive database queries increase latency.
- Fix: Implement a tiered caching strategy. Use in-memory caches for hot features (e.g., last 5 minutes) and a backing store for historical data. Configure TTLs based on feature volatility.
Cold Start Problem for New Items
- Explanation: New items lack interaction history, causing the LTR model to assign low scores or default values, burying them in suggestions.
- Fix: Implement a fallback heuristic for items with insufficient signals. Boost new items based on metadata quality or category popularity until sufficient interaction data is collected. Use "exploration" strategies to gather initial signals.
Model Drift Due to Feedback Loops
- Explanation: Real-time models can amplify biases. If users click on a specific item, the model boosts it, leading to more clicks, creating a feedback loop that degrades diversity.
- Fix: Introduce diversity constraints in the ranking logic. Use decay functions for interaction signals to prevent runaway popularity. Monitor ranking distributions and implement anti-bias regularization in the model.
OpenSearch LTR Plugin Overhead
- Explanation: The LTR plugin adds computational overhead to search nodes. Complex models or excessive features can degrade query performance and increase memory usage.
- Fix: Profile model inference time rigorously. Prune low-impact features. Consider offloading ranking to a sidecar service if the plugin impacts cluster stability. Use feature pre-computation for static attributes.
Context Loss in Feature Engineering
- Explanation: Failing to include user context (location, time, device) limits the model's ability to personalize suggestions.
- Fix: Enrich the request with comprehensive context features. Ensure the feature store can join candidate features with user context efficiently. Test models with and without context to quantify the lift.
Latency Budget Blowout During Peak Load
- Explanation: Feature store or search cluster latency can spike during traffic surges, causing the autocomplete response to timeout.
- Fix: Implement circuit breakers and timeouts for all downstream calls. Use adaptive ranking: reduce the candidate set size or disable expensive features when latency thresholds are breached.
Inconsistent Feature Definitions
- Explanation: Discrepancies between training data and inference features lead to performance degradation.
- Fix: Use a unified feature definition schema. Validate features during model deployment. Implement automated tests that compare feature distributions between training and production.

Production Bundle

Action Checklist

Define Latency Budgets: Establish strict P99 latency targets for each stage (retrieval, feature fetch, ranking) and implement monitoring.
Deploy Feature Store: Set up a low-latency feature store with support for real-time aggregations and batch/historical access.
Implement Candidate Generation: Configure OpenSearch with appropriate analyzers and prefix matching for high-recall retrieval.
Train and Validate LTR Model: Use logged user interaction data to train the model. Validate against offline metrics (NDCG) and conduct A/B testing.
Integrate Ranking Pipeline: Connect the orchestrator to the search client, feature store, and model serving layer.
Add Fallback Mechanisms: Implement static ranking fallbacks and graceful degradation for feature store outages.
Monitor Feature Freshness: Track the age of real-time features and alert on staleness.
Set Up A/B Testing: Deploy the new ranking system alongside the baseline to measure impact on conversion and engagement.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
High Volume, Strict Latency	OpenSearch LTR Plugin	Native integration reduces network latency; efficient resource usage.	Medium (Cluster resources)
Complex Models / GPU Required	Sidecar Ranking Service	Flexibility to use advanced models; isolates ranking load from search nodes.	High (Infrastructure)
Static Catalog / Low Traffic	Batch ML Ranking	Simpler architecture; lower operational overhead; sufficient for stable catalogs.	Low
Personalization Critical	Real-Time LTR with Feature Store	Enables user-specific ranking based on live context and behavior.	Medium-High

Configuration Template

OpenSearch LTR Plugin Configuration

This template demonstrates how to configure a feature set and model within OpenSearch for LTR scoring.

PUT /_ltr/_featureset/autocomplete_features
{
  "featureset": {
    "features": [
      {
        "name": "title_match",
        "template": {
          "params": ["features", "user_query"]
        },
        "template_params": {
          "features": {
            "user_query": "{{user_query}}"
          }
        }
      },
      {
        "name": "ctr_signal",
        "template": {
          "params": ["features"]
        },
        "template_params": {
          "features": {
            "field": "ctr_last_1h"
          }
        }
      }
    ]
  }
}

PUT /_ltr/_model/autocomplete_ranker
{
  "model": {
    "featureset": "autocomplete_features",
    "model": {
      "type": "lambdamart",
      "features": ["title_match", "ctr_signal"],
      "trees": [
        // Model tree structure serialized from training
      ]
    }
  }
}

Feature Store Configuration (Example)

feature_store:
  backend: redis
  ttl:
    hot_features: 300s      # 5 minutes for real-time signals
    warm_features: 3600s    # 1 hour for semi-static signals
  aggregation:
    window_size: 5m
    slide_interval: 1m
  features:
    - name: ctr_last_1h
      type: float
      source: click_stream
      aggregation: sum / count
    - name: inventory_level
      type: int
      source: inventory_db
      update_mode: push

Quick Start Guide

Index Data: Load your catalog into OpenSearch with appropriate text analyzers and numeric fields for features.
Setup Feature Store: Deploy the feature store backend and configure real-time aggregations for user interaction signals.
Train Model: Collect interaction logs, generate training data with features, and train an LTR model (e.g., LambdaMART).
Deploy Model: Register the model and feature set in OpenSearch or deploy the sidecar service.
Test and Iterate: Run the orchestrator against the new ranking pipeline. Validate latency and relevance. Conduct A/B testing to measure production impact.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back