Intent-Driven Product Discovery: Hybrid Retrieval Patterns for Conversational Commerce

Current Situation Analysis

The friction between user intent and database schema remains a primary bottleneck in conversational commerce. Traditional product discovery relies on faceted search interfaces where users must manually map their mental model to technical parameters. This translation layer introduces cognitive load, increases error rates, and degrades conversion.

Users rarely conceptualize requirements using database field names. A traveler does not think, "I need a plan with country_code='JP', validity_days>=10, tethering_allowed=true, and price_max=20." Instead, the intent is expressed naturally: "I'm working remotely from Japan for ten days and need a hotspot plan under €20."

When systems force users to construct complex filter combinations, several failure modes emerge:

Schema Mismatch: Users select incorrect filters due to unfamiliarity with technical constraints (e.g., confusing validity duration with data rollover).
Filter Fatigue: High-dimensional filtering leads to decision paralysis or abandonment.
Suboptimal Results: Manual filtering often misses implicit constraints or trade-offs, returning technically valid but practically poor recommendations.

Data from production deployments indicates that filter-based interfaces suffer from significantly lower conversion rates compared to intent-driven systems. The gap exists because the interface demands expertise the user does not possess, while the system fails to leverage the richness of natural language input.

WOW Moment: Key Findings

The breakthrough in building robust conversational retrieval lies in decoupling intent extraction from data retrieval. Analysis of extraction strategies reveals that general-purpose large language models (LLMs) are often overkill for structured extraction, while rule-based systems lack the flexibility to handle natural variance.

A fine-tuned small model offers the optimal balance for high-throughput, low-latency extraction, provided the training data covers domain-specific ambiguity. Furthermore, vector embeddings prove ineffective for retrieving products defined by hard numerical constraints.

Extraction Strategy Comparison

Extraction Strategy	Latency (p95)	Cost per 1k Requests	Handling of Ambiguity	Brittleness
Regex / Rule-Based	<5ms	Negligible	None	High
General LLM (e.g., GPT-4)	>600ms	High	Excellent	Low
Fine-Tuned Small Model (3B)	~50ms	Low	High	Low

Why this matters:

Latency Budgets: Real-time conversational UIs require responses under 200ms. General LLMs often exceed this budget, causing perceived lag. A fine-tuned 3B parameter model can operate in ~50ms, preserving UX fluidity.
Cost Efficiency: At scale, per-request costs of general LLMs become prohibitive. Small models reduce inference costs by orders of magnitude while maintaining high accuracy on constrained extraction tasks.
Data Quality: The success of small models depends entirely on labeled training data. Investing in 10,000+ annotated examples covering edge cases and ambiguities is critical for production readiness.

Core Solution

The architecture for intent-driven discovery follows a hybrid pipeline: Extraction → Ambiguity Resolution → Coarse Filtering → Scoring → Explanation. This separation of concerns ensures deterministic constraints are handled by the database, while semantic understanding and ranking are managed by AI components.

1. Intent Extraction Layer

The first step is converting unstructured input into a strongly typed intent object. This should be handled by a model fine-tuned specifically for the domain. The model must output structured data that maps directly to the database schema.

TypeScript Implementation:

export interface TravelIntent {
  targetRegions: string[];
  stayLength: number;
  dataAllowanceGB?: number;
  priceCeiling?: number;
  requiresTethering: boolean;
  simSlots: number;
  needs5G: boolean;
  voipAllowed: boolean;
}

export interface ExtractionResult {
  intent: TravelIntent;
  confidence: number;
  ambiguities: string[];
}

export class IntentExtractor {
  private model: InferenceClient;

  constructor(model: InferenceClient) {
    this.model = model;
  }

  async resolve(input: string): Promise<ExtractionResult> {
    // Inference call to fine-tuned 3B model
    const rawOutput = await this.model.generate(input);
    
    const parsed = this.parseRawOutput(rawOutput);
    
    return {
      intent: parsed.intent,
      confidence: parsed.confidence,
      ambiguities: this.detectAmbiguities(parsed)
    };
  }

  private detectAmbiguities(parsed: any): string[] {
    const issues: string[] = [];
    if (parsed.regions.length > 1 && parsed.budget !== undefined) {
      issues.push("Multi-region coverage may exceed budget constraints.");
    }
    if (parsed.stayLength < 3) {
      issues.push("Short duration plans may have higher daily rates.");
    }
    return issues;
  }
}

Architecture Decisions:

Fine-Tuning vs. Prompting: Fine-tuning a 3B model on 10,000+ labeled queries ensures consistent output formatting and reduces hallucination compared to zero-shot prompting.
Confidence Scoring: The extraction layer should return a confidence metric. Low-confidence extractions trigger clarification flows rather than proceeding with potentially incorrect assumptions.
Ambiguity Detection: Explicitly identifying ambiguous fields allows the system to surface assumptions to the user (e.g., "Assuming you need coverage in France based on 'Paris'. Correct?").

2. Hybrid Retrieval Pipeline

Once the intent is extracted, retrieval must respect hard constraints while optimizing for user preferences.

Stage 1: Coarse Filtering Use a structured query language (SQL) to filter the candidate set based on deterministic requirements. This stage eliminates plans that violate hard constraints.

SELECT plan_id, provider, price, data_gb, validity_days, tethering, regions
FROM eSIM_plans
WHERE 
  regions @> ARRAY['JP']::varchar[]
  AND validity_days >= 10
  AND price <= 20.00
  AND tethering = TRUE
  AND data_gb >= 1.0;

Rationale: SQL filters are deterministic, fast, and indexable. They guarantee that returned results satisfy all hard constraints. This stage typically reduces the candidate set from thousands to a manageable subset (e.g., 100–500 plans).

Stage 2: Composite Scoring and Ranking Apply a scoring function to rank the filtered candidates. The score should weigh multiple factors based on the user's intent.

function scorePlan(plan: Plan, intent: TravelIntent): number {
  let score = 0;
  
  // Price efficiency: Lower price per GB is better
  if (intent.dataAllowanceGB) {
    const pricePerGB = plan.price / plan.data_gb;
    score += (1 / pricePerGB) * 40; 
  }
  
  // Validity alignment: Plans closer to stay length reduce waste
  const validityDiff = Math.abs(plan.validity_days - intent.stayLength);
  score += Math.max(0, 30 - validityDiff);
  
  // Feature match: Bonus for exact feature matches
  if (plan.tethering === intent.requiresTethering) score += 15;
  if (plan.data_gb >= (intent.dataAllowanceGB || 0)) score += 10;
  
  // Provider reliability weight
  score += plan.reliability_score * 5;
  
  return score;
}

Rationale: Ranking must be context-aware. A plan with unlimited data might be suboptimal for a short trip if the price is high. The scoring function balances cost, relevance, and quality metrics.

Stage 3: LLM Explanation Use an LLM to generate natural language explanations for the top-ranked results. This provides transparency and helps users understand trade-offs.

async function generateTradeoffs(
  topPlans: Plan[], 
  intent: TravelIntent
): Promise<string[]> {
  const prompt = `
    User wants: ${JSON.stringify(intent)}
    Top plans: ${JSON.stringify(topPlans)}
    
    Explain the key trade-offs between these plans in plain language.
    Highlight price vs. data, validity, and unique features.
  `;
  
  return await llmClient.summarize(prompt);
}

Rationale: Users value understanding why a plan is recommended. The LLM translates technical differences into user-centric insights (e.g., "Plan A is cheaper but has less data; Plan B offers faster activation but costs more").

Pitfall Guide

1. The Embedding Trap

Mistake: Using vector embeddings to search for products with structured numerical attributes. Explanation: Embeddings capture semantic similarity but struggle with precise numerical constraints. A vector search might return a plan with 100GB when the user asked for 5GB because the embeddings are similar, violating the hard constraint. Fix: Always use structured filters for numerical and categorical constraints. Reserve embeddings for semantic search on unstructured text fields.

2. LLM as Database

Mistake: Allowing the LLM to filter or generate plan details directly. Explanation: LLMs hallucinate. If asked to filter plans, the model may invent prices, data allowances, or coverage regions that do not exist in the database. Fix: Restrict the LLM to intent extraction and explanation. All data retrieval must come from the database via deterministic queries.

3. Ambiguity Silencing

Mistake: Proceeding with low-confidence extractions without user confirmation. Explanation: Guessing user intent leads to irrelevant results and erodes trust. For example, interpreting "Paris" as a person's name instead of a location. Fix: Implement confidence thresholds. When confidence is low, ask clarifying questions or surface assumptions explicitly.

4. Over-Engineering Extraction

Mistake: Using massive general-purpose models for simple extraction tasks. Explanation: This increases latency and cost without improving accuracy for constrained domains. Fix: Fine-tune small models (e.g., 3B parameters) on domain-specific data. This achieves comparable accuracy with significantly lower latency and cost.

5. Static Scoring Models

Mistake: Using a one-size-fits-all ranking algorithm. Explanation: Different users prioritize different factors. A budget traveler cares about price; a business traveler cares about reliability and activation speed. Fix: Make the scoring function dynamic based on extracted intent. Adjust weights based on user context (e.g., prioritize speed for short trips).

6. Schema Drift

Mistake: The intent schema diverges from the database schema over time. Explanation: As new features are added to the database, the extraction model may not recognize them, leading to missing data. Fix: Maintain strict versioning between the intent schema and database schema. Retrain the extraction model whenever the schema changes.

7. Ignoring Latency Budgets

Mistake: Designing a pipeline that exceeds the user's patience threshold. Explanation: Conversational interfaces require near-instant feedback. Latency >200ms breaks the flow. Fix: Profile each stage of the pipeline. Optimize model inference, use connection pooling for database queries, and implement caching for frequent intents.

Production Bundle

Action Checklist

Define Intent Schema: Create a strict TypeScript interface mapping user intent to database fields.
Curate Training Data: Label 10,000+ examples covering common queries, edge cases, and ambiguities.
Fine-Tune Extraction Model: Train a small model (e.g., 3B) on the labeled dataset for low-latency inference.
Implement Coarse Filter: Build SQL queries to handle hard constraints deterministically.
Develop Composite Scorer: Create a ranking function that weights price, relevance, and quality based on intent.
Add Ambiguity Handling: Implement confidence scoring and clarification flows for low-confidence extractions.
Build Explanation Layer: Use an LLM to generate trade-off explanations for top-ranked results.
Monitor Performance: Track latency, extraction accuracy, and user engagement metrics.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Real-Time Chat Interface	Fine-Tuned 3B + SQL Filter	Sub-100ms latency required for UX	Low
Batch Analysis / Reporting	General LLM (e.g., GPT-4)	Accuracy over speed; no latency constraint	High
Niche Domain with Limited Data	RAG + General LLM	Fewer training samples needed; leverages context	Medium
High-Volume E-Commerce	Fine-Tuned Small Model + Vector Hybrid	Balance of cost, speed, and semantic search	Low

Configuration Template

// retrieval.config.ts
export const RetrievalConfig = {
  extraction: {
    model: "fine-tuned-3b-travel-intent",
    latencyBudgetMs: 50,
    confidenceThreshold: 0.85,
    maxAmbiguities: 2,
  },
  filtering: {
    maxCandidates: 500,
    hardConstraints: ["regions", "validity", "price", "tethering"],
  },
  scoring: {
    weights: {
      priceEfficiency: 0.4,
      validityAlignment: 0.2,
      featureMatch: 0.25,
      reliability: 0.15,
    },
    topN: 5,
  },
  explanation: {
    model: "lightweight-llm-explainer",
    tone: "helpful",
    maxTokens: 300,
  },
};

Quick Start Guide

Initialize Schema: Define your TravelIntent interface and ensure it aligns with your database schema.
Deploy Extractor: Set up the fine-tuned model endpoint and integrate the IntentExtractor class into your API.
Build Filter Query: Write the SQL query for coarse filtering based on the intent fields.
Implement Scorer: Add the composite scoring function to rank filtered results.
Test End-to-End: Validate the pipeline with sample queries, checking latency, accuracy, and explanation quality.

Building a Conversational AI Interface for Travel Data