Application code sets the session variable once per request lifecycle:
// TypeScript/Node.js example
async function withWorkspaceContext<T>(
pool: Pool,
workspaceId: string,
fn: (client: PoolClient) => Promise<T>
): Promise<T> {
const client = await pool.connect();
try {
await client.query(`SET app.active_workspace = $1`, [workspaceId]);
return await fn(client);
} finally {
await client.query(`RESET app.active_workspace`);
client.release();
}
}
Architecture Rationale: RLS eliminates the need for manual filter injection in ORMs or raw queries. The FORCE directive ensures even migration scripts or admin tools cannot accidentally bypass isolation. Index strategy leads with the isolation column: (workspace_id, inserted_at DESC). This aligns with query patterns that always filter by workspace first, then sort or paginate by time.
2. Collision-Free Background Dispatch
The naive queue pattern (SELECT pending, then UPDATE to running) creates a race window where multiple workers claim the same row. PostgreSQL's FOR UPDATE SKIP LOCKED clause solves this atomically within a single statement.
-- Atomic claim with skip-locked concurrency
UPDATE background_tasks
SET
task_status = 'processing',
worker_id = $1,
attempt_count = attempt_count + 1,
claimed_at = now()
WHERE id = (
SELECT id
FROM background_tasks
WHERE task_status = 'queued'
AND scheduled_at <= now()
ORDER BY priority_rank DESC, scheduled_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING *;
Supporting index for optimal dispatch performance:
CREATE INDEX idx_queue_dispatch
ON background_tasks (priority_rank DESC, scheduled_at ASC)
WHERE task_status = 'queued';
Architecture Rationale: This pattern eliminates double-processing entirely. SKIP LOCKED instructs the engine to bypass rows already claimed by other transactions, allowing N workers to pull disjoint sets without coordination. The partial index ensures only relevant rows are scanned during dispatch. No external message broker is required; PostgreSQL's MVCC locking handles concurrency safely.
3. Scalable Event Retention
Audit and event logs grow monotonically. Monolithic tables require expensive DELETE operations that trigger tuple bloat and aggressive autovacuum. Declarative partitioning transforms retention into a metadata operation.
-- Create partitioned audit table
CREATE TABLE system_events (
event_id uuid NOT NULL,
workspace_id uuid NOT NULL,
occurred_at timestamptz NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL,
PRIMARY KEY (event_id, occurred_at) -- Partition key must be in PK
) PARTITION BY RANGE (occurred_at);
-- Monthly partitions (automated via extension or cron)
CREATE TABLE system_events_2024_01 PARTITION OF system_events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE system_events_2024_02 PARTITION OF system_events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Retention becomes instantaneous:
-- Drop old partition instead of deleting millions of rows
DROP TABLE system_events_2023_01;
Architecture Rationale: The composite primary key (event_id, occurred_at) satisfies PostgreSQL's requirement that partition keys be included in unique constraints. Query performance improves dramatically because the planner prunes irrelevant partitions when filtering on occurred_at. For append-heavy workloads, add a BRIN index on occurred_at to maintain minimal index size while supporting range scans.
4. Idempotent External Event Processing
External providers deliver webhooks at-least-once and frequently out of order. Applying events without safeguards causes state corruption and duplicate charges. The solution combines a unique constraint with a temporal guard.
CREATE TABLE billing_events (
event_id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
provider_ref_id text NOT NULL,
workspace_id uuid NOT NULL,
processed_at timestamptz NOT NULL DEFAULT now(),
state_version int NOT NULL DEFAULT 1,
event_data jsonb NOT NULL,
UNIQUE (provider_ref_id) -- Idempotency guarantee
);
-- Temporal guard application logic (TypeScript)
async function processBillingEvent(
client: PoolClient,
incoming: ExternalWebhook
): Promise<void> {
const existing = await client.query(
`SELECT state_version, processed_at
FROM billing_events
WHERE provider_ref_id = $1`,
[incoming.id]
);
if (existing.rows.length > 0) {
const current = existing.rows[0];
if (incoming.timestamp <= current.processed_at) {
return; // Stale event, skip
}
}
await client.query(
`INSERT INTO billing_events (provider_ref_id, workspace_id, state_version, event_data)
VALUES ($1, $2, $3, $4)
ON CONFLICT (provider_ref_id) DO UPDATE
SET state_version = EXCLUDED.state_version,
event_data = EXCLUDED.event_data,
processed_at = now()
WHERE billing_events.state_version < EXCLUDED.state_version`,
[incoming.id, incoming.workspaceId, incoming.version, JSON.stringify(incoming.payload)]
);
}
Architecture Rationale: The UNIQUE constraint on provider_ref_id prevents duplicate processing at the database level. The state_version or processed_at comparison ensures newer events overwrite older ones, while stale retries are silently discarded. This pattern handles provider retries, network glitches, and out-of-order delivery without application-side deduplication logic.
Pitfall Guide
1. RLS Bypass via Table Owners
Explanation: PostgreSQL disables RLS for table owners and superusers by default. Migration scripts or admin tools running as the owner can read/write across tenant boundaries.
Fix: Apply ALTER TABLE ... FORCE ROW LEVEL SECURITY to every tenant-scoped table. Use dedicated application roles with limited privileges instead of superuser accounts for routine operations.
2. Queue Polling Storms
Explanation: Workers polling the queue table every 100ms generate unnecessary lock contention and CPU overhead, especially when the queue is empty.
Fix: Implement exponential backoff in the application layer, or use LISTEN/NOTIFY to trigger workers only when new rows are inserted. Keep queue transactions short to minimize lock hold time.
3. Partition Key Omission in Queries
Explanation: Forgetting to include the partition key (occurred_at) in WHERE clauses disables partition pruning, forcing sequential scans across all partitions.
Fix: Always include the partition column in range queries. Verify execution plans with EXPLAIN ANALYZE to confirm Partition Pruning appears in the output.
4. Soft-Delete Unique Conflicts
Explanation: Standard UNIQUE constraints block re-insertion of logically deleted rows, causing duplicate key errors when users recreate previously deleted entities.
Fix: Use partial unique indexes: CREATE UNIQUE INDEX idx_unique_active ON resources (slug) WHERE deleted_at IS NULL;. This allows duplicate slugs for soft-deleted rows while enforcing uniqueness for active records.
5. Webhook Replay Without Versioning
Explanation: Applying events strictly by arrival time overwrites newer state when providers deliver delayed or out-of-order payloads.
Fix: Implement monotonic versioning (state_version) or strict timestamp comparison before mutation. Reject events where incoming.timestamp <= last_processed_at.
6. Missing Composite Indexes for Isolation
Explanation: Indexing only the tenant/workspace column ignores actual query patterns, resulting in index scans that still require filtering large result sets.
Fix: Lead indexes with the isolation column, followed by sort/filter columns: (workspace_id, inserted_at DESC). This enables index-only scans for paginated tenant queries.
7. Connection Pool Exhaustion from Long Workers
Explanation: Background workers holding database connections for extended periods exhaust pool limits, causing request queuing and timeouts.
Fix: Use PgBouncer in transaction pooling mode. Keep worker transactions short by fetching work, releasing the connection, processing externally, then reconnecting to update status.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| < 50 tenants, simple CRUD | App-level WHERE filters | Lower initial complexity, RLS overhead not justified | Low infrastructure, higher dev risk |
| 50-5000 tenants, compliance required | Database-enforced RLS + session vars | Eliminates data leakage, audit-ready, engine-guaranteed | Moderate setup, near-zero breach risk |
| < 1k jobs/day, single worker | Sequential SELECT/UPDATE | Simpler code, acceptable collision rate | Low compute, moderate duplicate risk |
| > 1k jobs/day, multi-worker | SKIP LOCKED atomic dispatch | Zero collisions, reduced roundtrips, MVCC-safe | Slightly higher CPU, zero duplicate cost |
| Audit log < 10M rows | Monolithic table with periodic DELETE | Simpler schema, manageable vacuum cycles | Moderate I/O cost during cleanup |
| Audit log > 10M rows | Monthly declarative partitions | Instant retention, predictable query performance | Higher DDL management, lower long-term cost |
| External webhooks, idempotent providers | Direct state mutation | Faster implementation, assumes provider guarantees | Low dev cost, high corruption risk |
| External webhooks, at-least-once delivery | Idempotency key + temporal guard | Handles retries, out-of-order, network glitches | Slightly higher query cost, zero corruption |
Configuration Template
-- Consolidated production schema foundation (PostgreSQL 16)
BEGIN;
-- 1. Workspace isolation
CREATE TABLE workspaces (
workspace_id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
name text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE workspace_assets (
asset_id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id uuid NOT NULL REFERENCES workspaces(workspace_id),
name text NOT NULL,
inserted_at timestamptz NOT NULL DEFAULT now(),
deleted_at timestamptz
);
ALTER TABLE workspace_assets ENABLE ROW LEVEL SECURITY;
ALTER TABLE workspace_assets FORCE ROW LEVEL SECURITY;
CREATE POLICY workspace_isolation ON workspace_assets
USING (workspace_id = current_setting('app.active_workspace')::uuid);
CREATE INDEX idx_assets_workspace_time ON workspace_assets (workspace_id, inserted_at DESC);
CREATE UNIQUE INDEX idx_assets_unique_active ON workspace_assets (workspace_id, name) WHERE deleted_at IS NULL;
-- 2. Background task queue
CREATE TABLE background_tasks (
task_id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
task_type text NOT NULL,
task_status text NOT NULL DEFAULT 'queued' CHECK (task_status IN ('queued', 'processing', 'completed', 'failed')),
priority_rank int NOT NULL DEFAULT 0,
scheduled_at timestamptz NOT NULL DEFAULT now(),
worker_id text,
attempt_count int NOT NULL DEFAULT 0,
claimed_at timestamptz,
completed_at timestamptz
);
CREATE INDEX idx_queue_dispatch ON background_tasks (priority_rank DESC, scheduled_at ASC) WHERE task_status = 'queued';
-- 3. Partitioned event log
CREATE TABLE system_events (
event_id uuid NOT NULL,
workspace_id uuid NOT NULL,
occurred_at timestamptz NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL,
PRIMARY KEY (event_id, occurred_at)
) PARTITION BY RANGE (occurred_at);
CREATE INDEX idx_events_brin ON system_events USING BRIN (occurred_at);
-- 4. Idempotent billing mirror
CREATE TABLE billing_events (
event_id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
provider_ref_id text NOT NULL,
workspace_id uuid NOT NULL,
processed_at timestamptz NOT NULL DEFAULT now(),
state_version int NOT NULL DEFAULT 1,
event_data jsonb NOT NULL,
UNIQUE (provider_ref_id)
);
COMMIT;
Quick Start Guide
- Initialize the schema: Run the configuration template against a clean PostgreSQL 16 instance. Verify partition creation with
\dt+ system_events*.
- Configure connection pooling: Deploy PgBouncer in transaction mode. Set
pool_mode = transaction, max_client_conn = 1000, default_pool_size = 50. Point your application to PgBouncer instead of direct PostgreSQL.
- Set session context: Implement a middleware or connection wrapper that executes
SET app.active_workspace = $1 before queries and RESET app.active_workspace after. Verify RLS enforcement by querying without the variable; expect zero rows.
- Deploy queue workers: Implement the atomic
UPDATE ... FOR UPDATE SKIP LOCKED pattern. Start with 2 workers and monitor pg_stat_activity for lock contention. Add exponential backoff when RETURNING yields empty results.
- Validate with EXPLAIN: Run
EXPLAIN ANALYZE on critical paths. Confirm Partition Pruning appears for event queries, Index Only Scan for workspace asset lookups, and SkipLocked in the queue dispatch plan. Adjust indexes if sequential scans appear.