xport top queries by total_exec_time and mean_exec_time. Correlate these with cost metrics from your cloud provider.
2. Query Optimization and Index Strategy
Inefficient queries cause full table scans, high IOPS consumption, and lock contention.
TypeScript Example: Avoiding N+1 and Select Star
// β ANTI-PATTERN: N+1 Query and SELECT *
async function getProductsWithReviews(productId: number) {
// Fetches all columns, including large text blobs, increasing I/O and memory
const product = await db.query('SELECT * FROM products WHERE id = $1', [productId]);
// N+1 problem: Executes a query per product in a loop
const reviews = [];
for (const item of product) {
const res = await db.query('SELECT * FROM reviews WHERE product_id = $1', [item.id]);
reviews.push(...res.rows);
}
return { product, reviews };
}
// β
OPTIMIZED: Single Query with Join and Column Projection
async function getProductsWithReviewsOptimized(productId: number) {
// Selects only required columns. Reduces network payload and buffer usage.
// Uses a JOIN to fetch related data in a single round-trip.
const query = `
SELECT
p.id, p.name, p.price,
r.id AS review_id, r.rating, r.comment
FROM products p
LEFT JOIN reviews r ON p.id = r.product_id
WHERE p.id = $1;
`;
const result = await db.query(query, [productId]);
// Transform rows to application structure efficiently
return transformRowsToStructure(result.rows);
}
Indexing Rationale:
- Covering Indexes: Create indexes that include all columns needed by a query to avoid heap lookups. This reduces IOPS significantly.
CREATE INDEX idx_product_search ON products(category, price) INCLUDE (name, image_url);
- Partial Indexes: Index only a subset of rows to save storage and reduce write amplification.
-- Index only active users, ignoring archived ones
CREATE INDEX idx_active_users ON users(last_login) WHERE status = 'active';
- BRIN Indexes: For time-series or append-only data, BRIN (Block Range INdexes) are orders of magnitude smaller than B-Tree indexes, saving storage and memory.
3. Schema Design for Cost
Schema decisions directly impact storage costs and query performance.
- Data Types: Use the smallest appropriate data type.
SMALLINT vs INT vs BIGINT. VARCHAR(n) vs TEXT. In PostgreSQL, TEXT and VARCHAR have identical performance, but VARCHAR enforces length constraints which can prevent accidental bloating.
- JSONB vs Relational: JSONB is flexible but storage-heavy. If you query specific fields frequently, extract them to relational columns and index them. Storing large JSON objects that are rarely queried wastes storage and increases WAL (Write-Ahead Log) volume.
- Partitioning: Implement declarative partitioning for large tables. This allows for efficient pruning and enables dropping old partitions instantly (O(1) operation) rather than expensive
DELETE operations that cause bloat.
TypeScript Example: Partitioning Strategy Implementation
// Strategy: Monthly partitioning for audit logs
// Reduces query scan size and enables cheap data expiration
const CREATE_PARTITION_SQL = `
CREATE TABLE audit_logs_2024_01 PARTITION OF audit_logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
`;
// Automated partition management via migration or cron job
async function ensurePartitionExists(targetMonth: Date) {
const year = targetMonth.getFullYear();
const month = String(targetMonth.getMonth() + 1).padStart(2, '0');
const tableName = `audit_logs_${year}_${month}`;
// Check if partition exists; create if missing
// Use advisory locks to prevent race conditions in distributed environments
await db.query(`
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_class WHERE relname = '${tableName}') THEN
EXECUTE ${escapeLiteral(`
CREATE TABLE ${tableName} PARTITION OF audit_logs
FOR VALUES FROM ('${year}-${month}-01') TO ('${getNextMonth(year, month)}-01')
`)};
END IF;
END $$;
`);
}
// Cheap expiration: Drop partition instead of DELETE
async function expireDataOlderThan(cutoffDate: Date) {
const tableName = `audit_logs_${cutoffDate.getFullYear()}_${String(cutoffDate.getMonth() + 1).padStart(2, '0')}`;
await db.query(`DROP TABLE IF EXISTS ${tableName};`);
// Reclaims storage immediately with no bloat
}
4. Storage Lifecycle and Tiering
Not all data needs high-performance storage.
- TTL Policies: Implement Time-To-Live for ephemeral data (sessions, caches, logs).
- Blob Storage: Move binary assets (images, PDFs) to object storage (S3, GCS). Databases are expensive for unstructured data and lack efficient CDN integration.
- Archive Tiers: For compliance data accessed rarely, move to archive storage classes. In AWS Aurora, this can be achieved via cross-region replication to a cluster using slower storage or by exporting to S3 and querying via Athena.
5. Infrastructure Right-Sizing
- Serverless vs Provisioned: Use serverless databases (e.g., Aurora Serverless v2, DynamoDB on-demand) for spiky or unpredictable workloads. Use provisioned instances with Reserved Instances (RIs) for steady, predictable loads.
- Connection Pooling: Configure
PgBouncer or equivalent to multiplex connections. This allows a smaller database instance to handle high concurrency without memory exhaustion.
- Read Replicas: Offload read-heavy queries to replicas. This allows the primary instance to be sized for writes, which are typically less resource-intensive than reads in read-heavy apps.
Pitfall Guide
1. Over-Indexing
Mistake: Creating indexes for every query pattern.
Impact: Indexes consume storage and slow down write operations due to maintenance overhead. Every INSERT, UPDATE, or DELETE must update all relevant indexes.
Best Practice: Review index usage statistics (pg_stat_user_indexes). Drop unused indexes. Limit indexes to those required by critical read paths.
2. Connection Leaks and Unbounded Pools
Mistake: Allowing the application to open unlimited connections or failing to release them.
Impact: Database memory exhaustion. Cloud providers charge for memory; leaks force upgrades. Connection setup overhead increases latency.
Best Practice: Implement strict connection pool limits (e.g., max: 20 per worker). Use PgBouncer in transaction mode. Monitor active connections and idle-in-transaction timeouts.
3. Storing Blobs in Database
Mistake: Storing images, videos, or large documents in BYTEA or BLOB columns.
Impact: Increases database storage costs significantly. Slows down backups and replication. Increases WAL volume.
Best Practice: Store files in object storage. Keep only the URL or metadata in the database.
4. Ignoring Vacuum and Maintenance
Mistake: Relying on default auto-vacuum settings without tuning for write-heavy workloads.
Impact: Table bloat increases storage costs and degrades scan performance. Transaction ID wraparound risks.
Best Practice: Tune autovacuum thresholds for high-churn tables. Monitor n_dead_tup vs n_live_tup. Schedule maintenance windows for critical tables if auto-vacuum falls behind.
5. "Select *" in High-Throughput Loops
Mistake: Fetching all columns when only a few are needed, especially in loops or high-frequency API endpoints.
Impact: Wasted network bandwidth, increased memory usage in app and DB, slower serialization/deserialization.
Best Practice: Always specify columns. Use column projection in ORMs.
6. Wrong Instance Family
Mistake: Using General Purpose instances for memory-intensive workloads (e.g., large sorts, joins, or caching).
Impact: Performance bottlenecks lead to scaling compute unnecessarily. Memory-optimized instances may provide better cost/performance ratio for these workloads.
Best Practice: Analyze wait events. If Memory waits dominate, switch instance family rather than scaling size.
7. Lack of Granular Monitoring
Mistake: Monitoring only aggregate CPU and memory.
Impact: Missing query-level cost drivers. Cannot attribute cost to specific features or tenants.
Best Practice: Implement query tagging. Correlate slow query logs with cost data. Set up alerts for IOPS spikes and connection saturation.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Spiky, Unpredictable Traffic | Serverless Database (e.g., Aurora Serverless v2) | Scales to zero during idle periods; handles bursts without over-provisioning. | High savings during idle/low traffic; pay-per-use model. |
| Steady, High Throughput | Provisioned Instance + Reserved Instances | Predictable load justifies upfront commitment; RIs offer up to 60% discount. | Significant reduction in baseline compute cost. |
| Read-Heavy Workload (10:1 Read/Write) | Primary + Read Replicas + Caching | Offloads reads to replicas; caching reduces DB load further. Primary sized for writes. | Lower primary instance cost; replica cost offset by reduced primary scaling. |
| Time-Series / Audit Logs | Partitioning + BRIN Indexes + Drop Partitions | Efficient pruning; instant data expiration; minimal storage overhead. | Reduced storage growth; lower query latency on recent data. |
| Multi-Tenant SaaS | Schema-per-tenant or Row-Level Security | Isolates data; allows tiered storage based on tenant value. | Optimized storage per tenant; reduced blast radius. |
Configuration Template
Terraform Module for Cost-Optimized RDS with Auto-Scaling and Tags
resource "aws_db_instance" "optimized_db" {
identifier = "app-optimized-db"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.r6g.large" # Memory optimized for join-heavy workloads
allocated_storage = 100
storage_type = "gp3" # gp3 offers better price/performance than gp2
storage_throughput = 250 # Provision throughput independently of storage size
# Auto-scaling storage
max_allocated_storage = 500
storage_encrypted = true
# Connection Management
port = 5432
publicly_accessible = false
# Maintenance and Backup
maintenance_window = "sun:03:00-sun:04:00"
backup_retention_period = 7
delete_automation_protect = true
# Performance Insights for monitoring
performance_insights_enabled = true
performance_insights_retention_period = 7
# Cost Allocation Tags
tags = {
Name = "app-optimized-db"
Environment = "production"
Team = "backend"
CostCenter = "CC-12345"
Optimized = "true"
}
}
# PgBouncer Configuration via Parameter Group
resource "aws_db_parameter_group" "pgbouncer_optimized" {
name = "pgbouncer-optimized"
family = "postgres15"
parameter {
name = "max_connections"
value = "500" # Lowered since PgBouncer handles multiplexing
}
parameter {
name = "shared_buffers"
value = "256MB" # Adjusted for instance size
}
parameter {
name = "autovacuum_vacuum_cost_limit"
value = "200" # Tune for write-heavy workloads
}
}
PgBouncer Configuration (pgbouncer.ini)
[databases]
mydb = host=optimized-db.cluster-xxx.us-east-1.rds.amazonaws.com port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5
admin_users = pgbouncer_admin
stats_users = stats
Quick Start Guide
- Enable Query Statistics: Run
CREATE EXTENSION IF NOT EXISTS pg_stat_statements; in your database. Wait 24 hours to collect baseline data.
- Analyze Top Queries: Query
pg_stat_statements to find queries with high mean_time or high calls. Use EXPLAIN ANALYZE on these queries to identify missing indexes or sequential scans.
- Apply Indexes and Tune: Add targeted indexes based on analysis. Update
autovacuum settings if you observe high dead tuple counts.
- Configure Connection Pooling: Deploy
PgBouncer or update your application connection string to use a pooler. Set max_connections on the database to a value that matches your pool size to prevent memory exhaustion.
- Verify and Monitor: Compare cost metrics and latency before and after changes. Set up alerts for connection saturation and storage growth to catch drift early.