I cut my AWS bill by 93% by ditching Fargate for a single Lightsail VM

Current Situation Analysis

Pre-revenue and low-traffic applications frequently fall into the managed service cost trap. The original architecture deployed a production-grade, multi-AZ Fargate stack designed for Series-A SaaS reliability, but applied to a directory site with zero users. This created a severe mismatch between infrastructure overhead and actual workload demand.

The primary failure mode stems from infrastructure plumbing costs: NAT gateways, ALBs, VPC interface endpoints, and ElastiCache nodes incur hard monthly floors regardless of traffic volume. In the original setup, ~$87/mo (25% of the total bill) was spent purely on networking and orchestration layers that performed no application logic. Traditional auto-scaling and serverless paradigms fail to mitigate this because baseline provisioning, cross-AZ redundancy, and managed service minimums (e.g., Aurora Serverless v2 defaulting to 0.5 ACU) create an inescapable cost floor. Optimizing Fargate configurations (reducing tasks, enabling auto-pause, switching to Spot) only shaved costs to ~$140/mo because the underlying architectural primitives remain inherently expensive for single-tenant, low-throughput workloads.

WOW Moment: Key Findings

Consolidating the entire stack onto a single Lightsail VM eliminated networking overhead, removed managed service minimums, and preserved the exact same Next.js + Postgres + Redis + BullMQ runtime. The migration demonstrated that vertical consolidation on a fixed-cost VM drastically outperforms horizontally distributed serverless components when traffic is unpredictable or pre-revenue.

Approach	Monthly Cost	Infrastructure Plumbing Overhead	Setup Complexity	Time to Deploy	Scalability Headroom
Original Fargate Setup	$345	~25% ($87/mo)	High (CDK, multi-AZ, IAM, VPC, WAF)	Hours	Enterprise (auto-scale, multi-AZ)
Optimized Fargate (Phase 1)	$140	~20% ($28/mo)	Medium (CDK tweaks, Spot, auto-pause tuning)	Hours	Medium (constrained by hard floors)
Lightsail Monolith (Phase 2)	$12	~0% (consolidated)	Low (Docker Compose, single VM, Caddy)	Afternoon	Low-Medium (vertical scale only)

Key Findings:

Hard Cost Floors: NAT, ALB, ElastiCache, and VPC endpoints cannot be scaled to zero. They dictate the minimum viable spend for Fargate.
Memory-Optimized Builds: Next.js 14 builds spike to 1.5–2GB RAM. A 2GB VM requires explicit swap configuration to prevent OOM kills during compilation.
Zero-Traffic Efficiency: Lightsail’s fixed pricing model aligns perfectly with pre-revenue projects, delivering identical stack functionality at 93% lower cost.

Core Solution

The migration strategy replaced distributed managed services with a containerized monolith on a $12/mo Lightsail VM. Architecture decisions prioritized cost elimination over high availability, accepting single-point-of-failure trade-offs appropriate for a pre-revenue directory site.

1. Docker Compose Consolidation All services (Postgres, Redis, Next.js web, BullMQ worker) run in isolated containers with explicit memory limits to prevent resource contention on the 2GB VM.

services:
  postgres:
    image: postgres:16-alpine
    volumes: [./data/postgres:/var/lib/postgresql/data]
    deploy: { resources: { limits: { memory: 512M } } }

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes --maxmemory 128mb --maxmemory-policy noeviction
    volumes: [./data/redis:/data]
    deploy: { resources: { limits: { memory: 192M } } }

  web:
    image: tm-web:latest
    ports: ["127.0.0.1:3000:3000"]
    env_file: .env
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
    deploy: { resources: { limits: { memory: 768M } } }

  worker:
    image: tm-worker:latest
    env_file: .env
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
    deploy: { resources: { limits: { memory: 384M } } }

2. TLS Termination & Reverse Proxy Caddy replaces CloudFront/ALB/ACM complexity. It automatically provisions Let’s Encrypt certificates via HTTP-01 challenge on first request.

toolmango.com, www.toolmango.com {
    reverse_proxy 127.0.0.1:3000
    encode gzip zstd
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
    }
}

3. Data Migration from Private Aurora Subnet Since Aurora Serverless v2 resided in a PRIVATE_ISOLATED subnet, direct pg_dump from outside failed. A one-off Fargate task in the same VPC performed the dump and staged it to S3.

aws ecs run-task \
  --cluster tm-prod-compute \
  --task-definition tm-prod-pgdump \
  --launch-type FARGATE \
  --network-configuration 'awsvpcConfiguration={subnets=[subnet-...],securityGroups=[sg-...],assignPublicIp=DISABLED}'

The task definition executes the dump pipeline:

pg_dump --no-owner --no-acl --clean --if-exists -h $DB_HOST -U $DB_USER -d toolmango \
  | gzip > /tmp/dump.sql.gz \
  && aws s3 cp /tmp/dump.sql.gz s3://tm-prod-assets/migration/dump.sql.gz

On the Lightsail VM, the dump is retrieved via presigned URL (bypassing IAM role limitations) and piped directly into the local Postgres container:

gunzip -c /tmp/dump.sql.gz | docker compose exec -T postgres psql -U tmadmin -d toolmango

4. Cross-Architecture Build & Memory Management Fargate ran ARM64; Lightsail runs x86_64. Building directly on the target VM eliminated registry push/pull latency. Next.js compilation required 2GB of swap to avoid OOM termination:

docker build --network=host -f Dockerfile.web -t tm-web:latest \
  --build-arg NEXT_PUBLIC_SITE_URL=https://toolmango.com \
  --build-arg NEXT_PUBLIC_PLAUSIBLE_DOMAIN=toolmango.com \
  .

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
echo "/swapfile none swap sw 0 0" | sudo tee -a /etc/fstab

Pitfall Guide

Infrastructure Plumbing Blind Spot: NAT gateways, ALBs, VPC endpoints, and ElastiCache nodes create hard cost floors that do not scale with traffic. Assuming serverless auto-scaling will reduce bills to zero is mathematically incorrect for distributed architectures.
Next.js Build Memory Spikes: Next.js 14 App Router builds frequently peak at 1.5–2GB RAM. Deploying to 2GB VMs without configuring swap triggers OOM kills during next build. Always provision swap equal to or greater than peak build memory.
Private Subnet Data Extraction Failure: Aurora/managed databases in PRIVATE_ISOLATED subnets reject external pg_dump connections. You must spin up a same-VPC jump host or ephemeral task to bridge the network boundary before exporting data.
Managed Service Auto-Pause Misconfiguration: Aurora Serverless v2 defaults to minCapacity: 0.5, incurring continuous charges. Auto-pause only triggers when minCapacity: 0 and secondsUntilAutoPause are explicitly configured. Default settings negate cost-saving intentions.
Cross-Architecture Registry Friction: Building ARM64 images locally and pushing to an x86_64 VM introduces pull latency and potential emulation overhead. Building directly on the target architecture skips registry overhead and guarantees native binary compatibility.
Redis Eviction Policy Misconfiguration: Using default Redis eviction policies (e.g., allkeys-lru) on BullMQ can silently drop pending jobs. Explicitly set --maxmemory-policy noeviction and pair it with --maxmemory limits to fail fast rather than corrupt queue state.
TLS Termination Over-Engineering: Traditional stacks chain CloudFront + ACM + ALB for HTTPS. For single-VM deployments, Caddy’s automatic HTTP-01 challenge reduces certificate provisioning to a single configuration stanza, eliminating certificate rotation complexity and CDN egress costs.

Deliverables

Blueprint: Lightsail-Monolith-Architecture.pdf — Network topology, container resource allocation matrix, backup strategy (local cron + S3 sync), and rollback procedures.
Checklist: Pre-Migration-Audit.md — Validates cost baseline, identifies hard-floor services, confirms data volume size, verifies swap requirements, and documents DNS propagation steps.
Configuration Templates:
- docker-compose.prod.yml — Pre-tuned resource limits, health checks, and volume mounts for Postgres/Redis/Next.js/BullMQ.
- Caddyfile.template — Production-ready reverse proxy with HSTS, compression, and security headers.
- swap-provision.sh — Idempotent script for safe swapfile creation and fstab persistence.
- pgdump-ecs-task.json — Ready-to-deploy ECS task definition for private-subnet data extraction.