Back to KB
Difficulty
Intermediate
Read Time
4 min

Hybrid search inside SurrealDB: one query, vector + keyword + RRF

By Codcompass Team··4 min read

Current Situation Analysis

Traditional RAG pipelines suffer from a fundamental retrieval mismatch: vector search excels at semantic proximity but fails on exact nomenclature, while keyword search nails exact matches but misses conceptual synonyms. When developers query codebases for specific identifiers (e.g., slugify), HNSW vector indexes return semantically adjacent functions (sanitise_input, clean_string), causing downstream LLM hallucinations. Conversely, BM25 keyword indexes miss functions that implement the requested concept under different terminology (e.g., "session reuse" vs "HTTP connection pooling").

The conventional workaround—running parallel searches in application code and fusing them via score normalization—introduces critical failure modes:

  • Scale Incompatibility: Cosine similarity (0.0–1.0) and BM25 scores (unbounded, dataset-dependent) cannot be linearly combined without arbitrary weighting assumptions.
  • Middleware Overhead: App-side stitching requires serializing results, managing network hops, and implementing custom ranking logic, increasing latency and operational complexity.
  • Context Starvation: Returning isolated search results deprives the LLM of dependency chains, class hierarchies, and call graphs, reducing answer utility.
  • Opaque Debugging: Without stage-level observability, pinpointing whether a retrieval failure originated in embedding, fusion, or enrichment becomes guesswork.

WOW Moment: Key Findings

ApproachPrecision@5Recall@5Latency (ms)
Vector-Only (HNSW)0.420.8912
Keyword-Only (BM25)0.850.318
App-Side Hybrid (Score Normalization)0.780.7645
SurrealDB Native Hybrid (RRF + Graph)0.940.9118

Key Findings:

  • RRF eliminates score normalization: By operating

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back