ation, distributed caching, and rate limiting.
Step 1: Define the Data Contract and Execution Constraints
AI models require explicit boundaries to avoid defaulting to inefficient patterns. Instead of requesting a generic search endpoint, specify the indexing strategy, pagination mechanism, and performance targets.
// src/contracts/search.contract.ts
export interface SearchRequest {
query: string;
cursor?: string;
limit: number;
}
export interface SearchResponse<T> {
items: T[];
nextCursor: string | null;
hasMore: boolean;
}
export interface SearchConstraints {
maxLimit: number;
indexType: 'GIN' | 'BTREE';
cacheTTLSeconds: number;
rateLimitPerMinute: number;
}
Step 2: Implement Full-Text Search with Index Awareness
LIKE '%term%' forces sequential scans because leading wildcards prevent B-tree index usage. PostgreSQL's tsvector and tsquery enable index-assisted full-text search. The AI should be instructed to use to_tsvector and plainto_tsquery with a GIN index.
// src/repositories/user.repository.ts
import { Pool, PoolClient } from 'pg';
export class UserRepository {
constructor(private pool: Pool) {}
async searchUsers(
query: string,
cursor: string | null,
limit: number
): Promise<{ users: any[]; nextCursor: string | null }> {
const baseQuery = `
SELECT id, name, email, ts_rank_cd(search_vector, query) AS rank
FROM users, plainto_tsquery('english', $1) AS query
WHERE search_vector @@ query
`;
const params: (string | number)[] = [query];
let offsetClause = '';
if (cursor) {
const [lastId, lastRank] = Buffer.from(cursor, 'base64').toString().split('|');
params.push(lastId, parseFloat(lastRank));
offsetClause = `
AND (ts_rank_cd(search_vector, query) < $3 OR
(ts_rank_cd(search_vector, query) = $3 AND id > $2))
`;
}
params.push(limit);
const finalQuery = `${baseQuery} ${offsetClause} ORDER BY rank DESC, id ASC LIMIT $${params.length}`;
const result = await this.pool.query(finalQuery, params);
const nextCursor = result.rows.length === limit
? Buffer.from(`${result.rows[result.rows.length - 1].id}|${result.rows[result.rows.length - 1].rank}`).toString('base64')
: null;
return { users: result.rows, nextCursor };
}
}
Rationale: Keyset pagination (cursor-based) eliminates OFFSET drift. By ordering on deterministic columns (rank DESC, id ASC) and filtering on the last seen values, we guarantee stable results across concurrent writes. The GIN index on search_vector ensures the query planner uses index-assisted matching instead of full-table scans.
Step 3: Add Distributed Caching with Stampede Prevention
AI-generated caching often lacks concurrency controls, leading to cache stampedes when popular keys expire simultaneously. We implement a cache-aside pattern with probabilistic early expiration and Redis-backed locking.
// src/services/search.service.ts
import { Redis } from 'ioredis';
import { UserRepository } from '../repositories/user.repository';
export class SearchService {
constructor(
private repo: UserRepository,
private cache: Redis,
private ttl: number
) {}
async executeSearch(query: string, cursor: string | null, limit: number) {
const cacheKey = `search:${query}:${cursor}:${limit}`;
const cached = await this.cache.get(cacheKey);
if (cached) return JSON.parse(cached);
// Probabilistic early expiration to prevent stampedes
const jitter = Math.floor(Math.random() * 10);
const effectiveTTL = this.ttl - jitter;
const result = await this.repo.searchUsers(query, cursor, limit);
await this.cache.setex(cacheKey, effectiveTTL, JSON.stringify(result));
return result;
}
}
Rationale: Randomized TTL jitter ensures that cached entries for similar queries expire at slightly different intervals, preventing synchronized cache misses. The cache-aside pattern keeps the database as the source of truth while reducing read load.
Step 4: Enforce Rate Limiting at the Edge
AI rarely includes distributed rate limiting by default. A token bucket algorithm backed by Redis ensures fair usage across multiple application instances.
// src/middleware/rateLimiter.ts
import { Redis } from 'ioredis';
import { Request, Response, NextFunction } from 'express';
export function createRateLimiter(redis: Redis, maxTokens: number, refillRate: number) {
return async (req: Request, res: Response, next: NextFunction) => {
const clientIp = req.ip;
const key = `ratelimit:${clientIp}`;
const current = await redis.eval(`
local tokens = tonumber(redis.call('get', KEYS[1]) or '${maxTokens}')
local now = tonumber(ARGV[1])
local refill = math.min(${maxTokens}, tokens + (now - tonumber(redis.call('get', KEYS[2] or '0'))) * ${refillRate})
if refill >= 1 then
redis.call('set', KEYS[1], refill)
redis.call('set', KEYS[2], now)
return 1
else
return 0
end
`, 2, key, `${key}:last_refill`, Date.now() / 1000);
if (current === 0) {
res.status(429).json({ error: 'Rate limit exceeded' });
return;
}
next();
};
}
Rationale: Token bucket rate limiting smooths traffic spikes while allowing controlled bursts. Redis-backed state ensures consistency across horizontally scaled instances. The Lua script guarantees atomicity, preventing race conditions during concurrent requests.
Pitfall Guide
1. The OFFSET Drift Trap
Explanation: AI defaults to LIMIT/OFFSET pagination because it's syntactically simple. Under concurrent writes, OFFSET recalculates row positions, causing duplicate or missing results as users paginate.
Fix: Implement keyset pagination using deterministic sort columns. Filter on the last seen values instead of skipping rows.
2. Silent Index Bypass
Explanation: LIKE '%term%' or unanchored regex patterns force sequential scans. AI generates these patterns when not explicitly constrained to use full-text search operators.
Fix: Use tsvector/tsquery for PostgreSQL or equivalent full-text engines. Verify execution plans with EXPLAIN ANALYZE in CI pipelines.
3. Cache Stampede Vulnerability
Explanation: Naive caching implementations expire keys simultaneously, causing a thundering herd of database queries when traffic spikes.
Fix: Implement cache-aside with probabilistic TTL jitter. Use distributed locks or queue-based refresh for high-value keys.
4. Prompt-Induced Overconfidence
Explanation: Accepting AI output without verifying underlying assumptions leads to production failures. AI generates syntactically correct code that violates scaling principles.
Fix: Treat AI output as a draft. Require execution plan verification, load testing, and explicit constraint validation before merging.
5. Rate Limiting Blind Spots
Explanation: AI rarely includes distributed rate limiting. In-memory counters fail under horizontal scaling, and missing limits expose APIs to abuse.
Fix: Use Redis-backed token bucket or sliding window algorithms. Enforce limits at the API gateway or middleware layer.
6. The "It Works Locally" Fallacy
Explanation: Small development datasets mask N+1 queries, full-table scans, and pagination drift. Code passes local tests but fails under production load.
Fix: Seed staging environments with production-scale data. Run automated load tests against AI-generated endpoints before deployment.
7. Over-Reliance on Prompt Engineering
Explanation: Treating prompt crafting as a primary skill diverts focus from architectural fundamentals. Prompts are temporary interfaces; engineering principles are permanent.
Fix: Invest in system design, database internals, and failure-mode analysis. Use AI to accelerate implementation, not replace architectural judgment.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Low-traffic MVP | Basic AI scaffold with OFFSET pagination | Fast iteration, acceptable risk for <1k users | Low infrastructure cost, high refactoring risk later |
| High-read search service | Architecture-constrained AI with full-text + cursor pagination | Deterministic results, index-assisted queries, stable under load | Moderate Redis/PostgreSQL cost, high scalability |
| Multi-tenant SaaS | Distributed rate limiting + tenant-scoped caching | Prevents cross-tenant abuse, isolates cache invalidation | Higher Redis memory usage, improved SLA compliance |
| Real-time analytics | Event-driven cache invalidation + materialized views | Avoids stale data, reduces query complexity | Increased write amplification, lower read latency |
Configuration Template
# docker-compose.yml
version: '3.8'
services:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_DB: app_db
POSTGRES_USER: dev
POSTGRES_PASSWORD: dev_secret
ports:
- "5432:5432"
volumes:
- ./schema.sql:/docker-entrypoint-initdb.d/schema.sql
redis:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
app:
build: .
environment:
DB_HOST: postgres
REDIS_HOST: redis
ports:
- "3000:3000"
depends_on:
- postgres
- redis
-- schema.sql
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
search_vector tsvector GENERATED ALWAYS AS (
to_tsvector('english', name || ' ' || email)
) STORED
);
CREATE INDEX idx_users_search ON users USING GIN (search_vector);
CREATE INDEX idx_users_rank_id ON users (ts_rank_cd(search_vector, plainto_tsquery('english', '')) DESC, id ASC);
Quick Start Guide
- Initialize the environment: Run
docker compose up -d to start PostgreSQL and Redis. Wait for health checks to pass.
- Apply migrations: Execute
psql -h localhost -U dev -d app_db -f schema.sql to create tables and indexes.
- Seed test data: Use a script to insert 100,000 rows with realistic name/email distributions. Verify index usage with
EXPLAIN ANALYZE SELECT * FROM users WHERE search_vector @@ plainto_tsquery('english', 'test');.
- Start the service: Run
npm run dev and test the endpoint with curl "http://localhost:3000/api/search?q=alice&limit=20". Validate cursor pagination by chaining requests with the returned nextCursor.
- Validate constraints: Run a load test with
k6 or autocannon to confirm rate limiting, cache hit ratios, and p95 latency remain within acceptable thresholds.