arch breadth during index building. Higher values produce higher-quality graphs but slow down ingestion.
- efSearch: Controls the search breadth at query time. This is the primary lever for tuning the recall-latency trade-off.
Implementation: TypeScript Vector Graph
The following implementation demonstrates a production-oriented HNSW structure. It emphasizes memory efficiency via Float32Array, enforces L2 normalization for dot-product optimization, and separates graph topology from vector storage.
/**
* Production-grade HNSW implementation focusing on memory efficiency
* and normalized vector optimization.
*/
interface VectorNode {
id: string;
embedding: Float32Array;
assignedLevel: number;
}
interface SearchResult {
id: string;
score: number;
}
class SemanticGraph {
// Adjacency list: Map<Level, Map<NodeID, Set<NeighborID>>>
private graph: Map<number, Map<string, Set<string>>>;
private store: Map<string, Float32Array>;
private m: number;
private efConstruction: number;
private maxLevel: number;
private entryPoint: string | null;
constructor(m: number = 16, efConstruction: number = 200) {
this.m = m;
this.efConstruction = efConstruction;
this.graph = new Map();
this.store = new Map();
this.maxLevel = 0;
this.entryPoint = null;
}
/**
* Ingests a vector. Assumes input is already L2-normalized.
* Best Practice: Normalize embeddings at the API gateway or ingestion pipeline
* to avoid redundant computation during search.
*/
public add(id: string, embedding: Float32Array): void {
if (this.store.has(id)) {
this.update(id, embedding);
return;
}
const level = this.assignLevel();
const node: VectorNode = { id, embedding, assignedLevel: level };
this.store.set(id, embedding);
this.ensureLayers(level);
// Connect to neighbors in each layer
for (let l = 0; l <= level; l++) {
const candidates = this.searchLayer(node.embedding, l, this.efConstruction);
const selectedNeighbors = this.selectNeighbors(candidates, this.m);
this.connectNode(id, selectedNeighbors, l);
}
if (this.entryPoint === null || level > this.maxLevel) {
this.entryPoint = id;
this.maxLevel = level;
}
}
/**
* Retrieves K nearest neighbors.
* @param queryVector Must be L2-normalized.
* @param k Number of results.
* @param efSearch Search breadth. Increase for higher recall.
*/
public search(queryVector: Float32Array, k: number, efSearch: number = 50): SearchResult[] {
if (!this.entryPoint) return [];
let currentCandidates = [{ id: this.entryPoint, score: this.innerProduct(queryVector, this.store.get(this.entryPoint)!) }];
// Descend from top layer to bottom
for (let l = this.maxLevel; l > 0; l--) {
currentCandidates = this.searchLayer(queryVector, l, 1, currentCandidates);
}
// Final search at layer 0 with expanded beam
const results = this.searchLayer(queryVector, 0, efSearch, currentCandidates);
return results
.sort((a, b) => b.score - a.score)
.slice(0, k);
}
private searchLayer(
query: Float32Array,
level: number,
ef: number,
entryPoints: SearchResult[] = []
): SearchResult[] {
const visited = new Set<string>();
const candidates = new Map<string, number>();
const queue = new Map<string, number>();
// Initialize with entry points
for (const ep of entryPoints) {
candidates.set(ep.id, ep.score);
queue.set(ep.id, ep.score);
visited.add(ep.id);
}
while (queue.size > 0) {
// Extract node with highest score (simulated priority queue)
let bestId = "";
let bestScore = -Infinity;
for (const [id, score] of queue) {
if (score > bestScore) {
bestScore = score;
bestId = id;
}
}
queue.delete(bestId);
if (bestScore < candidates.values().next().value && candidates.size >= ef) {
break;
}
const neighbors = this.graph.get(level)?.get(bestId) ?? [];
for (const neighborId of neighbors) {
if (!visited.has(neighborId)) {
visited.add(neighborId);
const neighborVec = this.store.get(neighborId)!;
const score = this.innerProduct(query, neighborVec);
candidates.set(neighborId, score);
queue.set(neighborId, score);
if (candidates.size > ef) {
// Remove lowest score to maintain ef size
let minId = "";
let minScore = Infinity;
for (const [cid, cscore] of candidates) {
if (cscore < minScore) {
minScore = cscore;
minId = cid;
}
}
candidates.delete(minId);
queue.delete(minId);
}
}
}
}
return Array.from(candidates.entries()).map(([id, score]) => ({ id, score }));
}
private innerProduct(a: Float32Array, b: Float32Array): number {
let sum = 0;
for (let i = 0; i < a.length; i++) {
sum += a[i] * b[i];
}
return sum;
}
private assignLevel(): number {
// Geometric distribution for layer assignment
const random = Math.random();
return Math.floor(-Math.log(random) * (1.0 / Math.log(this.m)));
}
// ... Helper methods for graph manipulation (ensureLayers, connectNode, selectNeighbors)
// omitted for brevity but follow standard HNSW neighbor selection logic.
}
Architecture Decisions:
Float32Array over number[]: Reduces memory overhead by 50% compared to standard JS arrays and enables SIMD optimizations in underlying engines.
- Dot Product vs. Cosine: The implementation uses
innerProduct. This is valid only because vectors are L2-normalized. This eliminates square root and division operations, significantly accelerating query time.
- Separation of Store and Graph:
store holds vectors; graph holds topology. This allows swapping storage backends (e.g., to mmap files) without rewriting graph logic.
efConstruction vs efSearch: The constructor accepts efConstruction to tune index quality. The search method accepts efSearch to tune query behavior, allowing dynamic trade-offs per request.
Pitfall Guide
| Pitfall | Explanation | Fix |
|---|
| The Normalization Trap | Using cosine similarity on unnormalized vectors while assuming dot product equivalence. This yields incorrect rankings. | Enforce L2 normalization at ingestion. Use dot product exclusively for normalized vectors. |
| Parameter Blindness | Using default M and ef values without benchmarking. Defaults often favor memory over recall or vice versa. | Run recall benchmarks against a brute-force baseline. Tune M for memory constraints and efSearch for latency budgets. |
| Memory Bleed | Loading the entire index into heap memory without paging. Causes OOM crashes on large datasets. | Use mmap to bind index files to virtual memory. The OS handles paging based on access patterns. |
| Quantization Shock | Applying Product Quantization (PQ) with a low M value. PQ introduces error; low connectivity amplifies this, causing recall collapse. | Increase M when using PQ to compensate for quantization error. Validate recall drop before deployment. |
| Entry Point Stagnation | Relying on a single static entry point can trap search in local minima, especially in non-uniform distributions. | Implement randomized multi-entry points or maintain multiple entry points for different clusters. |
| Dimensionality Mismatch | Mixing embeddings from different models or dimensions in the same index. | Enforce schema validation. Reject vectors that do not match the expected dimensionality and model fingerprint. |
The efSearch Latency Trap | Setting efSearch too high in an attempt to maximize recall, causing p99 latency to violate SLAs. | Profile latency vs. recall curves. Set efSearch at the "elbow" point where recall gains diminish relative to latency cost. |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| < 50k Vectors | Brute Force | Simplicity outweighs complexity. No index build time. | Lowest infrastructure cost. |
| 50k - 10M Vectors | HNSW (Raw) | Optimal balance of latency and recall. Standard industry practice. | Moderate RAM usage. |
| > 10M Vectors | HNSW + PQ | Reduces memory by ~90%. Enables billion-scale indices on limited RAM. | Slight recall drop (~2-5%). |
| Strict Latency (< 1ms) | HNSW + SIMD + PQ | SIMD accelerates distance calc; PQ reduces memory bandwidth pressure. | Higher CPU utilization per query. |
| Dynamic Schema | HNSW with Partitioning | Isolate vectors by model/dimension to prevent mismatch errors. | Increased operational complexity. |
Configuration Template
Use this JSON structure to parameterize your HNSW deployment. Adjust values based on benchmark results.
{
"index": {
"type": "hnsw",
"dimensions": 3072,
"metric": "dot_product",
"normalization": "l2"
},
"hnsw_params": {
"m": 24,
"ef_construction": 400,
"ef_search_default": 64,
"ef_search_max": 256
},
"quantization": {
"enabled": true,
"type": "product_quantization",
"sub_vectors": 64,
"centroids_per_subvector": 256
},
"storage": {
"backend": "mmap",
"path": "/data/vectors/index.bin",
"page_size_kb": 4
}
}
Quick Start Guide
- Initialize Graph: Instantiate
SemanticGraph with m=16 and efConstruction=200.
- Ingest Data: Pass L2-normalized
Float32Array vectors to add(). Ensure dimensions match configuration.
- Persist Index: Serialize the graph structure and store vectors using
mmap for durability and memory efficiency.
- Query: Call
search() with a normalized query vector. Start with efSearch=50 and adjust based on latency/recall profiling.
- Scale: If memory usage exceeds thresholds, enable Product Quantization and re-index, or increase
mmap page efficiency.