ngClient: EmbeddingClient,
db: MongoClient['db']
): Promise<VectorCandidate[]> {
const queryEmbedding = await embeddingClient.embedQuery(queryText, {
model: 'voyage-large-2',
dimensions: 2048,
inputType: 'query'
});
const collection = db.collection('content_documents');
const cursor = collection.aggregate([
{
$vectorSearch: {
index: 'vector_semantic_index',
path: 'content_embedding',
queryVector: queryEmbedding,
numCandidates: 200,
limit: 25
}
},
{
$project: {
_id: 1,
similarityScore: { $meta: 'vectorSearchScore' }
}
}
]);
return cursor.toArray();
}
**Architecture Rationale:** We cap the initial vector retrieval at 25 documents. This aligns with typical pagination sizes and ensures the lexical engine only processes a manageable number of boost clauses. The `numCandidates` parameter is set higher (200) to provide the vector index sufficient scope for approximate nearest neighbor (ANN) traversal while keeping the final result set tight.
### Phase 2: Score Normalization & Boost Clause Construction
Vector similarity scores (typically dot product or cosine similarity) operate on a different scale than lexical TF-IDF/BM25 scores. Direct injection would either drown out keyword matches or fail to elevate semantic candidates. We apply a configurable multiplier to align the scales.
```typescript
function buildSemanticBoostClauses(
candidates: VectorCandidate[],
boostMultiplier: number = 8.5
): object[] {
return candidates.map(candidate => ({
equals: {
path: '_id',
value: candidate.id,
score: {
boost: {
value: candidate.similarityScore * boostMultiplier
}
}
}
}));
}
Architecture Rationale: The multiplier is not arbitrary; it must be calibrated against your lexical baseline scores. In practice, vector scores often range between 0.6 and 0.95 for dot product. Multiplying by 8.5 pushes these into the 5-8 range, which typically competes effectively with strong BM25 matches without dominating them. This value should be treated as a tunable hyperparameter, not a hardcoded constant.
Phase 3: Lexical Execution with Injected Signals
The final pipeline combines standard text operators with the semantic boost clauses inside a compound.should structure. This ensures documents matching either lexical terms or semantic candidates are returned, with boosted scores applied where overlaps occur.
async function executeSemanticInjectionSearch(
queryText: string,
boostClauses: object[],
db: MongoClient['db']
) {
const collection = db.collection('content_documents');
const lexicalOperators = [
{
text: {
query: queryText,
path: ['title', 'summary'],
score: { boost: { value: 2.0 } }
}
},
{
text: {
query: queryText,
path: ['body_content']
}
}
];
const combinedShouldClauses = [...lexicalOperators, ...boostClauses];
const searchPipeline = [
{
$search: {
index: 'lexical_fulltext_index',
compound: {
should: combinedShouldClauses
},
facet: {
operator: {
compound: { should: combinedShouldClauses }
},
facets: {
category: { type: 'string', path: 'tags' },
publication_year: { type: 'number', path: 'year' }
}
},
highlight: {
path: ['title', 'summary', 'body_content']
}
}
},
{
$project: {
title: 1,
summary: 1,
tags: 1,
year: 1,
searchScore: { $meta: 'searchScore' },
highlights: { $meta: 'searchHighlights' }
}
},
{ $limit: 25 },
{
$facet: {
results: [],
metadata: [
{ $replaceWith: '$$SEARCH_META' },
{ $limit: 1 }
]
}
},
{
$set: {
'metadata': { $arrayElemAt: ['$metadata', 0] }
}
}
];
const [result] = await collection.aggregate(searchPipeline).toArray();
return result;
}
Architecture Rationale:
- The
compound.should structure ensures OR logic: a document ranks if it matches lexical terms, semantic candidates, or both.
- Faceting wraps the same
compound operator to guarantee facet counts align with the actual result set, not just lexical matches.
- Highlighting is applied to multiple fields to provide context regardless of which clause triggered the match.
- The
$facet stage consolidates results and search metadata into a single payload, simplifying client-side parsing.
Pitfall Guide
1. Score Scale Mismatch
Explanation: Vector similarity scores and lexical BM25 scores operate on fundamentally different mathematical scales. Injecting raw vector scores without normalization causes either semantic candidates to be ignored (if scores are too low) or lexical precision to be overridden (if scores are too high).
Fix: Implement a dynamic multiplier calibrated against your dataset's lexical baseline. Run A/B tests with multipliers ranging from 5.0 to 15.0 and measure precision@10 across query categories.
2. Over-Boosting Semantic Candidates
Explanation: Applying a uniform multiplier to all vector results can cause marginally relevant semantic matches to outrank highly relevant lexical matches, especially for short or ambiguous queries.
Fix: Apply a logarithmic or threshold-based scaling function. Only inject boosts for candidates exceeding a minimum similarity threshold (e.g., score > 0.72). Discard low-confidence vector matches entirely.
3. Analyzer Misalignment
Explanation: The lexical index analyzer determines tokenization, stemming, and stop-word removal. If the analyzer strips critical query terms that the vector search captured, the lexical phase will fail to match even when semantic signals are present.
Fix: Use lucene.english or a custom analyzer that preserves domain-specific terminology. Test analyzer behavior against your query corpus using the $searchMeta stage to verify token generation.
Explanation: Enabling scoreDetails: true in production pipelines forces the engine to compute and return granular scoring breakdowns for every clause. This adds significant CPU overhead and increases payload size.
Fix: Disable score details in production. Reserve them for debugging or offline evaluation pipelines. Use application-level logging to track boost effectiveness without impacting query latency.
5. Empty Vector Candidate Pool
Explanation: If the vector index returns zero results (due to embedding model mismatch, index misconfiguration, or highly out-of-distribution queries), the boost clause array becomes empty. The lexical search still runs, but developers sometimes incorrectly assume the pipeline failed.
Fix: Implement explicit fallback logic. Log vector pool size separately from lexical results. Consider routing out-of-distribution queries to a dedicated semantic-only pipeline if lexical recall drops below a threshold.
6. Facet Path Configuration Errors
Explanation: Faceting requires explicit field type definitions in the search index. If a field is dynamically mapped as text but faceted as a string, the engine will either fail or return incorrect counts due to tokenization.
Fix: Explicitly define facetable fields in the index mapping. Use type: 'token' for categorical fields and type: 'string' for exact-match faceting. Verify facet counts against known document distributions.
7. Hardcoded Boost Multipliers
Explanation: Treating the boost multiplier as a static configuration value ignores query intent variance. Technical queries benefit from lower semantic weighting, while exploratory queries require higher semantic influence.
Fix: Implement query classification routing. Use a lightweight intent classifier to dynamically adjust the multiplier based on query structure (e.g., question marks, technical jargon density, or length).
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Exploratory/Abstract Queries | Semantic Injection (High Multiplier) | Vector signals capture intent better than keyword matching | Medium (2x pipeline execution) |
| Exact Match/Technical Queries | Semantic Injection (Low Multiplier) | Lexical precision dominates; semantic signals act as tiebreakers | Medium (2x pipeline execution) |
| High-Frequency Production Traffic | RRF with Caching | Post-hoc fusion allows aggressive result caching; lower CPU per query | Low (cache hit rate dependent) |
| Strict Latency Budget (<50ms) | Pure Lexical with Synonym Dictionary | Single pipeline execution; synonym expansion handles conceptual gaps | Low (index build cost only) |
| Dynamic User Behavior Signals | Semantic Injection with Click Signals | Injection pattern natively supports external signal weighting | Medium-High (signal pipeline complexity) |
Configuration Template
// MongoDB Atlas Search Index Definitions
const vectorIndexDefinition = {
fields: [
{
type: 'vector',
path: 'content_embedding',
numDimensions: 2048,
similarity: 'dotProduct'
}
]
};
const lexicalIndexDefinition = {
mappings: {
dynamic: true,
fields: {
tags: { type: 'token' },
year: { type: 'number' }
}
},
analyzer: 'lucene.english'
};
// Runtime Configuration
interface SemanticInjectionConfig {
vectorLimit: number;
vectorCandidates: number;
boostMultiplier: number;
minSimilarityThreshold: number;
enableHighlighting: boolean;
facetableFields: string[];
}
const productionConfig: SemanticInjectionConfig = {
vectorLimit: 25,
vectorCandidates: 200,
boostMultiplier: 8.5,
minSimilarityThreshold: 0.72,
enableHighlighting: true,
facetableFields: ['tags', 'year']
};
Quick Start Guide
- Initialize Embedding Pipeline: Configure your embedding service to generate 2048-dimensional vectors using
voyage-large-2 or equivalent. Ensure query and document embeddings use identical model parameters.
- Create Dual Indexes: Deploy a vector index on the embedding field and a lexical index with
lucene.english analyzer. Explicitly map facetable fields to prevent tokenization conflicts.
- Implement Two-Shot Handler: Build the candidate generation function, apply threshold filtering, construct boost clauses with calibrated multiplier, and execute the combined
$search pipeline.
- Validate with Test Queries: Run abstract and exact-match queries through the pipeline. Verify that semantic candidates appear in results, highlights render correctly, and facet counts align with expected distributions.
- Monitor & Tune: Track vector pool size, average similarity scores, and lexical match rates. Adjust the boost multiplier and threshold values based on precision@10 metrics across your query corpus.