tion management for serverless environments.
Step 1: Schema Design with Embedded Vectors
Store embeddings alongside the metadata they describe. This enables atomic filtering and eliminates sync jobs.
// src/db/schema.ts
import { sql } from 'drizzle-orm';
import { pgTable, uuid, text, timestamp, vector } from 'drizzle-orm/pg-core';
export const knowledgeBase = pgTable('ai_knowledge_base', {
id: uuid('id').defaultRandom().primaryKey(),
tenantId: uuid('tenant_id').notNull(),
documentChunk: text('chunk_content').notNull(),
embedding: vector('embedding', { dimensions: 1536 }),
createdAt: timestamp('created_at').defaultNow().notNull(),
updatedAt: timestamp('updated_at').defaultNow().notNull(),
});
Step 2: Row Level Security for Tenant Isolation
RLS policies run inside Postgres, composing naturally with joins and vector operators. Create a policy that restricts visibility to the authenticated tenant.
-- Enable RLS
ALTER TABLE ai_knowledge_base ENABLE ROW LEVEL SECURITY;
-- Tenant isolation policy
CREATE POLICY "tenant_scope_access" ON ai_knowledge_base
FOR ALL
USING (tenant_id = current_setting('app.current_tenant', true)::uuid);
-- Index for vector similarity
CREATE INDEX idx_knowledge_vector ON ai_knowledge_base
USING hnsw (embedding vector_cosine_ops);
Step 3: Serverless-Optimized Retrieval Client
Supabase Edge Functions and Next.js route handlers create connections aggressively. Use the platform's connection pooler (Supavisor) in transaction mode to prevent process exhaustion.
// src/lib/vector-retrieval.ts
import { createClient } from '@supabase/supabase-js';
import type { Database } from './database.types';
const supabaseUrl = process.env.SUPABASE_URL!;
const supabaseKey = process.env.SUPABASE_SERVICE_KEY!;
const adminClient = createClient<Database>(supabaseUrl, supabaseKey, {
db: { schema: 'public' },
global: { headers: { 'x-tenant-id': process.env.TENANT_ID! } }
});
export async function retrieveContext(
queryEmbedding: number[],
tenantId: string,
limit: number = 5
) {
// Set tenant context for RLS evaluation
await adminClient.rpc('set_tenant_context', { tenant_uuid: tenantId });
const { data, error } = await adminClient
.from('ai_knowledge_base')
.select('chunk_content, embedding <=> $1 as similarity')
.order('similarity', { ascending: true })
.limit(limit)
.eq('tenant_id', tenantId);
if (error) throw new Error(`Retrieval failed: ${error.message}`);
return data.map(row => row.chunk_content);
}
Architecture Rationale
- Why
pgvector over external stores? A single query plan handles filtering, sorting, and similarity scoring. Postgres query planner optimizes index usage, reducing network hops and serialization overhead.
- Why RLS over application guards? Database-level enforcement guarantees isolation even when queries originate from Edge Functions, scheduled jobs, or third-party integrations. Application-layer checks are easily bypassed during refactoring.
- Why Supavisor transaction mode? Postgres spawns a backend process per connection. Serverless runtimes create short-lived connections rapidly. Transaction mode reuses backend processes across multiple client requests, preventing connection limit exhaustion. The trade-off is losing session-level features like
LISTEN/NOTIFY and prepared statement caching, which is acceptable for stateless retrieval workloads.
Pitfall Guide
1. Connection Exhaustion on Serverless Runtimes
Explanation: Postgres allocates a dedicated OS process for each connection. Serverless functions spin up thousands of short-lived connections during traffic spikes, quickly hitting the default max_connections limit (usually 100β200 on Pro).
Fix: Enable Supavisor explicitly. Use the transaction-mode connection string for stateless API routes. Monitor pg_stat_activity and set statement_timeout to prevent runaway queries from holding connections.
2. Realtime Throughput Misalignment
Explanation: Supabase Realtime tails Postgres logical replication and broadcasts over WebSockets. It handles low-frequency updates smoothly but degrades under high-write workloads like multiplayer cursors or tick feeds. The replication lag increases as write volume scales.
Fix: Reserve Realtime for presence and low-frequency state sync. For high-frequency data, use Edge Functions with broadcast channels or route through a dedicated pub/sub system like NATS or Redis Streams. Benchmark your exact write pattern before committing.
3. RLS Denial UX Confusion
Explanation: When RLS blocks a query, Postgres returns an empty result set, not an HTTP 403 or explicit error. Applications expecting permission errors may misinterpret empty arrays as "no data found" rather than "access denied."
Fix: Design your frontend to handle empty arrays gracefully. If explicit permission errors are required, add a lightweight application-level guard that verifies tenant ownership before executing the vector query, or use a custom RPC function that raises an exception when policies fail.
4. Vector Index Drift and Maintenance Neglect
Explanation: HNSW and IVFFlat indexes degrade as data mutates. Insert-heavy workloads cause index bloat, increasing query latency and memory consumption. Unlike B-tree indexes, vector indexes don't auto-rebalance efficiently.
Fix: Schedule periodic REINDEX operations during low-traffic windows. Monitor pg_stat_user_indexes for idx_scan vs idx_tup_read ratios. Tune lists (IVFFlat) and m/ef_construction (HNSW) parameters based on your dataset cardinality and recall requirements.
5. Edge Function Timeout Assumptions
Explanation: Supabase Edge Functions run on Deno with a 60-second timeout on Free and 150 seconds on Pro. Cold starts average 400β600ms. Teams frequently attempt to run LLM inference or heavy embedding generation directly in Edge Functions, hitting timeout limits.
Fix: Treat Edge Functions as orchestration layers, not compute engines. Offload inference to external workers (RunPod, Modal, AWS Lambda). Use Edge Functions to trigger jobs, poll status, and format responses. Implement circuit breakers and fallback caches for timeout scenarios.
6. Pricing Cliff Navigation
Explanation: The jump from Pro ($25/mo) to Team ($599/mo) is steep. Team unlocks read replicas, extended point-in-time recovery, and larger compute. Teams often architect for scale prematurely or hit the wall unexpectedly during growth.
Fix: Design your data access layer to abstract the database client. If you anticipate needing read replicas, implement read/write splitting at the application level early. Consider hybrid deployments: keep hot data on Supabase Pro, archive cold embeddings to S3 + self-hosted Postgres, and sync via scheduled jobs.
7. Supabase-Specific Schema Lock-in
Explanation: While your data is portable Postgres, RLS policies, Auth schema, Storage buckets, and Edge Functions are tightly coupled to Supabase's control plane. Migrating off requires rewriting authentication flows and storage access patterns.
Fix: Keep business logic and data models framework-agnostic. Use Drizzle or Prisma for schema migrations instead of relying solely on Supabase dashboard UI. Document your RLS policies as SQL files in version control. If portability is a priority, implement a repository pattern that isolates Supabase client calls behind standard interfaces.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Low-frequency collaborative UI | Supabase Realtime + Broadcast | Built-in WebSocket tailing reduces boilerplate; presence channels handle lightweight state | $0 (included in Pro) |
| High-frequency tick/multiplayer feed | Edge Functions + Redis/NATS | Logical replication introduces lag under high write volume; dedicated pub/sub scales horizontally | +$10β$30/mo (external service) |
| Multi-tenant RAG retrieval | Unified Postgres + RLS + pgvector | Single query plan enforces security and filters semantically; eliminates sync jobs | $0β$25/mo (Pro tier) |
| Heavy LLM inference | External GPU workers + Edge orchestration | Edge Functions timeout at 150s; cold starts add latency; external workers provide predictable scaling | +$50β$200/mo (compute credits) |
| Predictable read scaling | Application-level read/write splitting | Team plan read replicas cost $599/mo; app-level routing delays cost until traffic justifies it | $0 initially, scales with traffic |
Configuration Template
-- 1. Enable extensions
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- 2. Core table with tenant isolation
CREATE TABLE ai_documents (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
tenant_id UUID NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- 3. Vector index (HNSW for cosine similarity)
CREATE INDEX idx_ai_documents_embedding
ON ai_documents USING hnsw (embedding vector_cosine_ops);
-- 4. RLS policies
ALTER TABLE ai_documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY "tenant_isolation" ON ai_documents
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::uuid);
CREATE POLICY "service_bypass" ON ai_documents
FOR ALL
TO service_role
USING (true);
-- 5. Context setter function
CREATE OR REPLACE FUNCTION set_tenant_context(tenant_uuid UUID)
RETURNS void AS $$
BEGIN
PERFORM set_config('app.tenant_id', tenant_uuid::text, false);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
Quick Start Guide
- Initialize project: Run
supabase init locally, then supabase start to spin up the Docker stack. Note the API URL and anon key from the dashboard.
- Apply schema: Copy the SQL template into
supabase/migrations/001_init_vector_db.sql. Run supabase db reset to apply migrations and seed the database.
- Configure client: Install
@supabase/supabase-js. Create a typed client using supabase gen types typescript > src/db/database.types.ts. Initialize the client with your project credentials.
- Test retrieval: Generate a dummy embedding (1536 dimensions), insert a row with a
tenant_id, then call the retrieval function. Verify that RLS blocks cross-tenant queries by switching the app.tenant_id context.
- Deploy Edge Function: Run
supabase functions new vector-retrieval. Implement the orchestration logic, test locally with supabase functions serve, then deploy with supabase functions deploy vector-retrieval.
This architecture delivers a production-grade AI backend without fragmenting your data layer. By anchoring retrieval, security, and scaling decisions to Postgres fundamentals, you eliminate synchronization overhead, enforce tenant isolation at the engine level, and maintain a clear migration path as workloads grow.