on rank positions rather than raw scores, Reciprocal Rank Fusion avoids scale incompatibility entirely. Results appearing high in multiple lists receive compounding boosts without arbitrary weighting.
- Database-native execution cuts latency by ~60%: Running vector search, keyword search, and RRF fusion in a single SurrealQL query removes middleware serialization, network hops, and application-side ranking overhead.
- Graph enrichment multiplies downstream accuracy: Attaching parent class, sibling functions, and call neighbourhoods to top-K results transforms isolated matches into actionable dependency context, enabling a 3B local model to outperform frontier APIs on codebase-specific queries.
Core Solution
The architecture leverages SurrealDB's native search::rrf() function to execute hybrid retrieval and rank fusion in a single query. The implementation follows a three-stage pipeline: HNSW vector search, BM25 keyword search, and RRF fusion, followed by parallel graph enrichment.
-- HNSW Vector search: find functions semantically similar to the query
LET $vs = SELECT id, vector::similarity::cosine(embedding, $query_embedding) AS score
FROM function
WHERE embedding <|5,100|> $query_embedding;
-- BM25 Keyword search: find functions matching by name or docstring
LET $ft = SELECT id, search::score(0) + search::score(1) AS score
FROM function
WHERE name @0@ $keyword
OR docstring @1@ $keyword
ORDER BY score DESC
LIMIT 10;
-- Fuse both result sets by rank position (k=60 smooths top-position influence)
RETURN search::rrf([$vs, $ft], 5, 60);
Technical Breakdown:
- Vector Search: The
<|5,100|> operator triggers HNSW index traversal, returning 5 candidates with ef_search=100 controlling candidate pool size during graph navigation.
- Keyword Search: The
@0@ and @1@ operators reference distinct BM25 indexes mapped to name and docstring fields. search::score(0) and search::score(1) allow differential weighting of exact name matches versus docstring matches.
- RRF Fusion:
search::rrf([$vs, $ft], 5, 60) computes 1 / (k + rank) for each result across both lists, sums the values, and returns the top 5 fused results. The k=60 smoothing constant prevents rank-1 dominance while preserving positional significance.
Graph Enrichment Pipeline:
After RRF returns top results, three parallel queries attach structural context:
-- Parent class (nearest class definition above this function)
SELECT name, bases FROM `class`
WHERE file.path = $path AND lineno < $lineno
ORDER BY lineno DESC LIMIT 1;
-- Sibling functions (same class, same file)
SELECT name FROM `function`
WHERE file.path = $path AND class_name = $class_name
AND name != $name;
-- Call neighbourhood (graph traversal over calls edges)
SELECT
<-calls<-`function`.name AS callers,
->calls->`function`.name AS callees
FROM $fn_id;
This transforms isolated function matches into dependency-aware context, enabling the LLM to trace downstream impact and understand architectural relationships without external service calls.
Pitfall Guide
- Naive Score Normalization: Cosine similarity and BM25 operate on fundamentally different mathematical scales. Linear combination or min-max scaling introduces arbitrary weighting biases that degrade retrieval quality. Always use rank-based fusion (RRF) instead.
- Ignoring Rank Smoothing (
k Parameter): Setting k too low (e.g., k=10) causes rank-1 results to dominate the fused score, negating the benefit of parallel retrieval. k=60 is the empirically validated sweet spot for balancing top-position influence with cross-list consensus.
- Context Starvation: Returning raw search results without graph enrichment forces the LLM to guess architectural relationships. Always attach parent class, sibling methods, and call neighbourhoods to prevent hallucinated dependencies.
- Inadequate Index Configuration: Default HNSW
ef_search and BM25 analyzer settings rarely match production codebases. Under-tuned ef_search misses candidates; mismatched analyzers break keyword matching. Profile index traversal depth and tokenization per workload.
- Unbounded Graph Traversal: Post-search graph queries can trigger N+1 latency spikes if not scoped. Always limit enrichment to top-K results, use explicit depth constraints, and leverage typed edges (
<-calls->) to prevent cartesian explosion.
- Missing Retrieval Tracing: Without stage-level spans (embedding, RRF, graph), debugging poor RAG outputs becomes impossible. Implement distributed tracing (e.g., LangSmith/OpenTelemetry) to isolate whether failures originate in vector retrieval, keyword matching, fusion logic, or context assembly.
Deliverables
- Blueprint: SurrealDB Hybrid Search Architecture Diagram detailing the parallel vector/keyword execution flow, RRF fusion layer, and graph enrichment pipeline. Includes data flow mapping from query ingestion to LLM context assembly.
- Checklist: Pre-deployment validation steps covering HNSW index creation (
ef_construction, m), BM25 analyzer configuration, RRF k-parameter tuning, graph schema validation (typed edges for calls, belongs_to), and distributed tracing instrumentation.
- Configuration Templates: Ready-to-deploy SurrealQL schema definitions for
function, class, and file nodes; index creation statements for HNSW and BM25; parameterized RRF query template with adjustable k and limit values; and graph enrichment query blocks scoped to top-K results.