uest handling. Define the schema with a dimension matching your embedding model.
import pg from 'pg';
import { fromSql, toSql } from 'pgvector';
const { Pool } = pg;
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
});
async function createSchema(): Promise<void> {
await pool.query(`
CREATE TABLE IF NOT EXISTS documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
source TEXT,
embedding VECTOR(1536)
)
`);
}
Embedding Generation & Insertion
Convert JavaScript arrays to PostgreSQL's expected string format using toSql.
import OpenAI from 'openai';
import { toSql } from 'pgvector';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function embed(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
async function insertDocument(content: string, source?: string): Promise<void> {
const embedding = await embed(content);
await pool.query(
'INSERT INTO documents (content, source, embedding) VALUES ($1, $2, $3)',
[content, source ?? null, toSql(embedding)]
);
}
Vector Similarity Querying
Cosine distance (<=>) is the standard for text embeddings. The similarity score is calculated as 1 - distance.
interface SearchResult {
id: number;
content: string;
source: string | null;
similarity: number;
}
async function search(query: string, limit = 5): Promise<SearchResult[]> {
const queryEmbedding = await embed(query);
const result = await pool.query<SearchResult>(
`SELECT id, content, source, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT $2`,
[toSql(queryEmbedding), limit]
);
return result.rows;
}
HNSW Indexing & Query Tuning
Build the index after data insertion. Tune recall vs. speed at query time using hnsw.ef_search.
CREATE INDEX CONCURRENTLY documents_embedding_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
async function searchWithHighRecall(query: string, limit = 5) {
const queryEmbedding = await embed(query);
const client = await pool.connect();
try {
await client.query('SET LOCAL hnsw.ef_search = 100');
const result = await client.query(
'SELECT id, content, 1 - (embedding <=> $1) AS similarity FROM documents ORDER BY embedding <=> $1 LIMIT $2',
[toSql(queryEmbedding), limit]
);
return result.rows;
} finally {
client.release();
}
}
Complete RAG Pipeline
Retrieve context, inject into LLM prompt, and return generated output.
async function retrieve(query: string, k = 5): Promise<string[]> {
const qEmbed = await embed(query);
const result = await pool.query(
'SELECT content FROM documents ORDER BY embedding <=> $1 LIMIT $2',
[toSql(qEmbed), k]
);
return result.rows.map((r) => r.content as string);
}
async function answer(question: string): Promise<string> {
const docs = await retrieve(question);
const context = docs.join('\n\n');
const completion = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'Answer questions using only the provided context.' },
{ role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },
],
});
return completion.choices[0].message.content ?? '';
}
Leverage standard SQL WHERE clauses alongside vector search.
async function searchBySource(query: string, source: string, limit = 5) {
const queryEmbedding = await embed(query);
const result = await pool.query(
`SELECT id, content, source, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE source = $2
ORDER BY embedding <=> $1
LIMIT $3`,
[toSql(queryEmbedding), source, limit]
);
return result.rows;
}
Next.js Connection Pool Pattern
Prevent hot-reload pool duplication using globalThis.
// lib/db.ts
import pg from 'pg';
declare global {
var pgPool: pg.Pool | undefined;
}
export const pool =
globalThis.pgPool ??
new pg.Pool({ connectionString: process.env.DATABASE_URL });
if (process.env.NODE_ENV !== 'production') {
globalThis.pgPool = pool;
}
Pitfall Guide
- Forgetting
toSql on insert: Passing raw JavaScript number[] arrays directly to pool.query triggers invalid input syntax for type vector. Always wrap embeddings with toSql(embedding) to serialize them into PostgreSQL's expected {x,y,z} format.
- Using
SET hnsw.ef_search without LOCAL: Connection pools reuse database sessions. Global SET commands persist across requests, causing unpredictable recall/latency behavior. Always use SET LOCAL inside a transaction block to scope configuration changes.
- Creating the HNSW index before loading data: Building an index on an empty or partially populated table degrades graph connectivity and recall quality. Load your dataset first, then run
CREATE INDEX CONCURRENTLY to ensure optimal neighbor linking.
- Dimension mismatch between model and schema: Defining
VECTOR(1536) but using a model that outputs 3072 or 768 dimensions causes immediate insertion failures. Validate embedding dimensions at runtime and align schema definitions with the exact model version.
- Connection pool exhaustion in serverless/Next.js: Instantiating a new
Pool per request or failing to release clients in finally blocks exhausts PostgreSQL's max_connections. Use singleton pool patterns with globalThis and always pair pool.connect() with explicit client.release().
Deliverables
Blueprint: Production-Ready pgvector Architecture for Node.js
- Includes connection pool lifecycle management, HNSW index configuration matrix (
m, ef_construction, ef_search), RAG prompt injection template, and metadata filtering strategy.
- Configuration templates for
pgvector extension setup, environment variable mapping, and TypeScript type definitions for vector queries.
Checklist: Pre-Deployment & Scaling Verification