ontrols the candidate pool size during retrieval.
CREATE INDEX idx_knowledge_hnsw
ON agent_knowledge_base
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Step 3: Two-Stage Query Architecture
The first stage executes a standard HNSW nearest-neighbor search. The second stage uses LATERAL JOIN to pass each candidate into a scoring function that computes a cubic relevance metric. This metric combines normalized cosine similarity, recency decay, and role-based weighting.
-- Stage 1: Fast ANN retrieval
WITH candidate_pool AS (
SELECT
id,
entity_type,
title,
author_role,
created_at,
embedding
FROM agent_knowledge_base
ORDER BY embedding <=> $1::vector
LIMIT 50
)
-- Stage 2: Precise cubic scoring via LATERAL JOIN
SELECT
cp.id,
cp.title,
cp.entity_type,
rs.final_score
FROM candidate_pool cp
CROSS JOIN LATERAL (
SELECT compute_cubic_relevance(
cp.embedding,
$1::vector,
cp.created_at,
cp.author_role,
$2::text
) AS final_score
) rs
ORDER BY rs.final_score DESC
LIMIT 10;
Step 4: Cubic Scoring Function Implementation
The cubic function applies a third-degree polynomial to normalized similarity. This creates a non-linear response curve: scores below 0.6 are suppressed, scores between 0.6-0.8 scale moderately, and scores above 0.8 receive exponential amplification. Metadata multipliers adjust the baseline before polynomial transformation.
CREATE OR REPLACE FUNCTION compute_cubic_relevance(
target_vec vector,
query_vec vector,
record_ts TIMESTAMPTZ,
record_role TEXT,
query_role TEXT
) RETURNS DOUBLE PRECISION AS $$
DECLARE
raw_sim DOUBLE PRECISION;
norm_sim DOUBLE PRECISION;
recency_factor DOUBLE PRECISION;
role_match DOUBLE PRECISION;
base_score DOUBLE PRECISION;
BEGIN
-- Cosine similarity (1 - distance)
raw_sim := 1.0 - (target_vec <=> query_vec);
-- Normalize to 0-1 range (adjust bounds based on your embedding model)
norm_sim := GREATEST(0.0, LEAST(1.0, (raw_sim - 0.4) / 0.6));
-- Recency decay (exponential, 30-day half-life)
recency_factor := EXP(-0.0231 * EXTRACT(EPOCH FROM (NOW() - record_ts)) / 86400.0);
-- Role alignment multiplier
role_match := CASE WHEN record_role = query_role THEN 1.2 ELSE 0.9 END;
-- Base weighted score
base_score := (norm_sim * 0.7) + (recency_factor * 0.2) + (role_match * 0.1);
-- Cubic amplification: a*x^3 + b*x^2 + c*x
RETURN (0.5 * POWER(base_score, 3)) + (0.3 * POWER(base_score, 2)) + (0.2 * base_score);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
Step 5: TypeScript Application Layer
The application layer handles embedding generation, query parameterization, and result mapping. Keep database interactions stateless and parameterized to prevent injection and enable connection pooling.
import { Pool } from 'pg';
interface SearchQuery {
text: string;
targetRole: string;
limit: number;
}
interface SearchResult {
id: string;
title: string;
entityType: string;
score: number;
}
export class RetrievalPipeline {
private db: Pool;
private embedModel: any; // Placeholder for your embedding client
constructor(dbPool: Pool, embedClient: any) {
this.db = dbPool;
this.embedModel = embedClient;
}
async execute(query: SearchQuery): Promise<SearchResult[]> {
const queryVector = await this.embedModel.generate(query.text);
const sql = `
WITH candidate_pool AS (
SELECT id, entity_type, title, author_role, created_at, embedding
FROM agent_knowledge_base
ORDER BY embedding <=> $1::vector
LIMIT 50
)
SELECT cp.id, cp.title, cp.entity_type, rs.final_score
FROM candidate_pool cp
CROSS JOIN LATERAL (
SELECT compute_cubic_relevance(cp.embedding, $1::vector, cp.created_at, cp.author_role, $2::text) AS final_score
) rs
ORDER BY rs.final_score DESC
LIMIT $3;
`;
const result = await this.db.query(sql, [
`[${queryVector.join(',')}]`,
query.targetRole,
query.limit
]);
return result.rows.map(row => ({
id: row.id,
title: row.title,
entityType: row.entity_type,
score: parseFloat(row.final_score)
}));
}
}
Architecture Rationale
- Why HNSW for Stage 1? HNSW provides logarithmic search complexity with high recall. It's optimized for cosine distance and scales efficiently to millions of vectors. Setting
ef_search at query time allows dynamic precision tuning without rebuilding indexes.
- Why LATERAL JOIN for Stage 2?
LATERAL JOIN executes the scoring function per row from the candidate pool. This avoids materializing intermediate results, reduces memory pressure, and guarantees that expensive polynomial calculations run only on pre-filtered candidates.
- Why Cubic Scoring? Linear weighting treats all similarity differences equally. A cubic curve creates a decision boundary that separates high-confidence matches from ambiguous ones. This is critical for agent routing, where false positives trigger costly downstream tool calls.
Pitfall Guide
1. Unbounded ef_search Values
Explanation: Setting ef_search too high forces HNSW to explore excessive graph paths, degrading latency to linear scan levels. Values above 200 rarely improve recall but multiply CPU usage.
Fix: Benchmark your dataset. Start with ef_search = 64 for 50-candidate retrieval. Increase incrementally only if NDCG drops below your threshold. Use connection-level query hints to adjust dynamically based on query complexity.
2. Ignoring Index Maintenance
Explanation: HNSW indexes degrade with heavy INSERT/UPDATE workloads. Fragmented graphs increase hop counts during traversal, causing latency drift over time.
Fix: Schedule REINDEX INDEX idx_knowledge_hnsw; during low-traffic windows. Monitor pg_stat_user_indexes for index bloat. For write-heavy pipelines, consider partitioning by time and rebuilding indexes per partition.
3. Cubic Function Numerical Instability
Explanation: Polynomial operations on unnormalized inputs produce overflow or negative scores. Raw cosine distances range from -1 to 1, but cubic functions expect bounded inputs.
Fix: Always normalize similarity to [0, 1] before polynomial transformation. Use GREATEST/LEAST guards. Validate function outputs with unit tests covering edge cases (identical vectors, orthogonal vectors, null metadata).
4. LATERAL JOIN Cartesian Explosion
Explanation: Forgetting to LIMIT the candidate pool before the LATERAL JOIN causes the scoring function to execute against thousands of rows, negating the two-stage optimization.
Fix: Always wrap Stage 1 in a CTE with an explicit LIMIT. Verify execution plans with EXPLAIN ANALYZE to confirm the lateral join operates on the bounded set.
5. Mixing Distance Metrics Without Conversion
Explanation: HNSW supports vector_cosine_ops, vector_l2_ops, and vector_ip_ops. Using cosine distance in the index but L2 in the scoring function creates mathematical misalignment.
Fix: Standardize on one metric across the pipeline. If using cosine, convert distance to similarity via 1.0 - distance. Document the metric choice in schema comments and enforce it in application-layer validators.
6. Over-Reliance on Vector Similarity
Explanation: Embeddings capture semantic meaning but ignore exact matches. Queries containing proper nouns, IDs, or strict filters suffer when vector search is the sole retrieval mechanism.
Fix: Pre-filter candidates using WHERE clauses on indexed metadata columns before the ORDER BY embedding. Combine exact-match boosts with vector similarity in the cubic function's base score calculation.
7. Cold Start Cache Misses
Explanation: HNSW graph traversal loads nodes into memory on first access. Initial queries experience 2-3x latency until the working set is cached.
Fix: Implement a lightweight query warmup routine during deployment. Cache frequent query vectors at the application layer. Use pg_prewarm to load index pages into shared buffers during maintenance windows.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| < 100K vectors, low write volume | Single-stage HNSW + linear scoring | Simpler architecture, sufficient precision for basic retrieval | Lowest compute & maintenance |
| 100K-5M vectors, complex agent routing | Two-stage HNSW + LATERAL JOIN + cubic scoring | Balances speed with non-linear relevance amplification | Moderate DB CPU, zero external services |
| > 5M vectors, strict compliance filtering | Two-stage + materialized metadata pre-filter | Reduces candidate pool before scoring, ensures policy adherence | Higher storage, lower query latency |
| Multi-modal retrieval (text + images) | External reranker + vector DB | Cross-encoder models handle heterogeneous embeddings better | High infrastructure cost, network latency |
Configuration Template
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Core table
CREATE TABLE agent_knowledge_base (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
entity_type TEXT NOT NULL CHECK (entity_type IN ('document', 'profile', 'policy')),
title TEXT NOT NULL,
content_summary TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
author_role TEXT,
embedding vector(1536) NOT NULL
);
-- HNSW index
CREATE INDEX idx_knowledge_hnsw
ON agent_knowledge_base
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Metadata pre-filter index
CREATE INDEX idx_knowledge_metadata
ON agent_knowledge_base (entity_type, author_role, created_at DESC);
-- Cubic scoring function
CREATE OR REPLACE FUNCTION compute_cubic_relevance(
target_vec vector,
query_vec vector,
record_ts TIMESTAMPTZ,
record_role TEXT,
query_role TEXT
) RETURNS DOUBLE PRECISION AS $$
DECLARE
raw_sim DOUBLE PRECISION;
norm_sim DOUBLE PRECISION;
recency_factor DOUBLE PRECISION;
role_match DOUBLE PRECISION;
base_score DOUBLE PRECISION;
BEGIN
raw_sim := 1.0 - (target_vec <=> query_vec);
norm_sim := GREATEST(0.0, LEAST(1.0, (raw_sim - 0.4) / 0.6));
recency_factor := EXP(-0.0231 * EXTRACT(EPOCH FROM (NOW() - record_ts)) / 86400.0);
role_match := CASE WHEN record_role = query_role THEN 1.2 ELSE 0.9 END;
base_score := (norm_sim * 0.7) + (recency_factor * 0.2) + (role_match * 0.1);
RETURN (0.5 * POWER(base_score, 3)) + (0.3 * POWER(base_score, 2)) + (0.2 * base_score);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
-- Optimized retrieval query
CREATE OR REPLACE FUNCTION retrieve_relevant_entities(
query_embedding vector,
target_role TEXT,
candidate_limit INT DEFAULT 50,
result_limit INT DEFAULT 10
) RETURNS TABLE(
id UUID,
title TEXT,
entity_type TEXT,
final_score DOUBLE PRECISION
) AS $$
BEGIN
RETURN QUERY
WITH candidate_pool AS (
SELECT id, entity_type, title, author_role, created_at, embedding
FROM agent_knowledge_base
WHERE entity_type IN ('document', 'policy')
ORDER BY embedding <=> query_embedding
LIMIT candidate_limit
)
SELECT cp.id, cp.title, cp.entity_type, rs.final_score
FROM candidate_pool cp
CROSS JOIN LATERAL (
SELECT compute_cubic_relevance(cp.embedding, query_embedding, cp.created_at, cp.author_role, target_role) AS final_score
) rs
ORDER BY rs.final_score DESC
LIMIT result_limit;
END;
$$ LANGUAGE plpgsql STABLE;
Quick Start Guide
- Install Dependencies: Add
pgvector to your PostgreSQL instance (CREATE EXTENSION vector;). Ensure your server runs PostgreSQL 14+ for optimal HNSW support.
- Deploy Schema: Execute the configuration template SQL. Verify index creation with
\di and confirm vector_cosine_ops is active.
- Seed Embeddings: Generate 1536-dimensional vectors using your preferred model (OpenAI
text-embedding-3-small, Cohere, or open-source alternatives). Insert records with metadata populated.
- Initialize Application Client: Instantiate the
RetrievalPipeline class with your database pool and embedding client. Configure ef_search via query parameters or environment variables.
- Validate & Monitor: Run
EXPLAIN ANALYZE on sample queries. Confirm the execution plan shows Limit β Sort β Nested Loop β CTE Scan β Index Scan using idx_knowledge_hnsw. Track p95 latency and adjust candidate_limit based on your precision requirements.