event payload.
import { EventBridgeHandler } from 'aws-lambda';
import { DynamoDBClient, PutCommand, GetCommand } from '@aws-sdk/client-dynamodb';
const dynamo = new DynamoDBClient({ region: process.env.AWS_REGION });
const getIdempotencyKey = (event: any): string => {
return `${event.detailType}:${event.detail.id}`;
};
export const handler: EventBridgeHandler<any, any> = async (event, context) => {
const key = getIdempotencyKey(event);
// Check idempotency store
const existing = await dynamo.send(new GetCommand({
TableName: process.env.IDEMPOTENCY_TABLE!,
Key: { id: { S: key } }
}));
if (existing.Item) {
console.log(`Duplicate event skipped: ${key}`);
return { statusCode: 200, body: 'Already processed' };
}
try {
// Business logic
await processEvent(event);
// Mark as processed
await dynamo.send(new PutCommand({
TableName: process.env.IDEMPOTENCY_TABLE!,
Item: { id: { S: key }, ttl: { N: String(Math.floor(Date.now() / 1000) + 86400) } }
}));
return { statusCode: 200, body: 'Processed' };
} catch (err) {
// Let Lambda retry; DLQ catches exhausted retries
console.error('Processing failed:', err);
throw err;
}
};
async function processEvent(event: any) {
// Implement domain logic here
}
Step 2: Implement Async Orchestration for Multi-Step Workflows
Complex workflows require state management. Step Functions replace custom retry loops with deterministic state transitions, timeout controls, and visual error routing.
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as stepfunctions from 'aws-cdk-lib/aws-stepfunctions';
import * as tasks from 'aws-cdk-lib/aws-stepfunctions-tasks';
import * as path from 'path';
export class ServerlessWorkflowStack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const validateFn = new lambda.Function(this, 'ValidateFunction', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'validate.handler',
code: lambda.Code.fromAsset(path.join(__dirname, 'lambda')),
timeout: cdk.Duration.seconds(10)
});
const transformFn = new lambda.Function(this, 'TransformFunction', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'transform.handler',
code: lambda.Code.fromAsset(path.join(__dirname, 'lambda')),
timeout: cdk.Duration.seconds(30)
});
const loadFn = new lambda.Function(this, 'LoadFunction', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'load.handler',
code: lambda.Code.fromAsset(path.join(__dirname, 'lambda')),
timeout: cdk.Duration.seconds(60)
});
const workflow = new stepfunctions.StateMachine(this, 'ETLStateMachine', {
definition: new stepfunctions.Chain
.start(new tasks.LambdaInvoke(this, 'Validate', { lambdaFunction: validateFn }))
.next(new tasks.LambdaInvoke(this, 'Transform', { lambdaFunction: transformFn }))
.next(new tasks.LambdaInvoke(this, 'Load', { lambdaFunction: loadFn })),
timeout: cdk.Duration.minutes(15)
});
}
}
Failure routing must be explicit. SQS DLQs capture exhausted retries, enabling offline analysis without blocking the execution path. Structured logging with correlation IDs ensures traceability across async boundaries.
// Lambda middleware for correlation tracking
import { Context, APIGatewayProxyEvent } from 'aws-lambda';
export const withCorrelationId = (handler: any) => async (event: APIGatewayProxyEvent, context: Context) => {
const correlationId = event.headers['x-correlation-id'] || context.awsRequestId;
console.log(JSON.stringify({
level: 'info',
message: 'Request received',
correlationId,
path: event.path,
timestamp: new Date().toISOString()
}));
// Inject into execution context for downstream services
(context as any).correlationId = correlationId;
return handler(event, context);
};
Architecture Decisions & Rationale
- EventBridge over direct Lambda triggers: Decouples producers from consumers, enables event filtering, and supports fan-out patterns without code changes.
- Idempotency at the consumer layer: Guarantees exactly-once processing semantics despite at-least-once delivery guarantees from event sources.
- Step Functions over custom orchestrators: Eliminates retry logic sprawl, provides native timeout/catch configurations, and reduces operational overhead by 60% compared to self-managed state machines.
- DynamoDB for idempotency state: Low-latency key-value access aligns with Lambda execution constraints. TTL automates cleanup without scheduled jobs.
- Structured JSON logging: Enables automated parsing in CloudWatch/ELK, supporting SLO tracking and incident correlation without custom log shippers.
Pitfall Guide
1. Treating Functions as Long-Running Containers
Lambda executes in ephemeral environments with strict timeout limits. Running database connection pools, background threads, or file system watchers causes silent failures or resource exhaustion.
Best Practice: Design for stateless, short-lived execution. Externalize state to managed services. Use connection pooling via RDS Proxy or DocumentDB, not in-function caching.
2. Ignoring Cold Start Optimization Strategies
Cold starts impact latency-sensitive APIs. Default configurations waste memory and CPU, increasing initialization time.
Best Practice: Profile with AWS X-Ray. Right-size memory to match CPU allocation. Use provisioned concurrency only for predictable traffic spikes. Minimize deployment package size by tree-shaking dependencies and isolating business logic.
3. Missing Idempotency Enforcement
Event sources guarantee at-least-once delivery. Without idempotency, duplicate events corrupt data, trigger duplicate charges, or create orphaned resources.
Best Practice: Implement deterministic idempotency keys. Store processed event fingerprints with TTL. Validate state before mutation. Never assume single delivery.
4. Synchronous Blocking in Async Handlers
Awaiting external HTTP calls or database queries without timeout configuration causes Lambda to hang until the service timeout, wasting billable milliseconds and exhausting concurrency limits.
Best Practice: Set explicit timeouts for all external calls. Use circuit breakers (e.g., opossum) for downstream dependencies. Implement fallback responses for non-critical paths.
5. Over-Provisioning Memory Without Profiling
Memory allocation directly impacts CPU, network, and Lambda concurrency. Blindly increasing memory to "fix" performance increases cost without addressing root causes like inefficient serialization or synchronous I/O.
Best Practice: Use AWS Lambda Power Tuning to identify the optimal memory/CPU ratio. Measure cost-per-invocation, not just raw latency. Align memory with actual workload characteristics.
6. Inadequate Error Routing & Missing DLQs
Uncaught exceptions trigger automatic retries. Without DLQs, failed events disappear into retry loops, causing silent data loss and billing accumulation.
Best Practice: Attach SQS DLQs to all event sources. Configure retry attempts (3β5) and backoff strategy. Implement DLQ consumers for alerting and replay. Log structured error payloads with correlation IDs.
7. Vendor Lock-In via Proprietary SDKs
Tightly coupling business logic to cloud-specific SDKs prevents migration, complicates local testing, and increases refactoring cost during platform updates.
Best Practice: Abstract infrastructure calls behind domain interfaces. Use infrastructure-as-code (CDK/Terraform) for deployment. Keep business logic framework-agnostic. Implement adapter layers for cloud services.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| User-facing API with steady traffic | API Gateway + Lambda with provisioned concurrency | Predictable latency, minimal cold starts | +15-20% baseline cost, reduces P99 latency by 60% |
| Bursty event ingestion (logs, telemetry) | EventBridge + SQS + Batch Lambda | Connection pooling, parallel processing, lower invocation count | -30% vs direct Lambda triggers, scales linearly with throughput |
| Multi-step business workflow | Step Functions + Lambda | Deterministic state, native error routing, visual debugging | +$0.025/state transition, eliminates custom orchestration overhead |
| Real-time data transformation | Kinesis Data Streams + Lambda | Ordered processing, shard-level scaling, replay capability | +$0.015/GB, requires shard management but guarantees ordering |
| Scheduled maintenance tasks | EventBridge Scheduler + Lambda | Native cron expressions, retry policies, no external orchestrator | ~$0 cost, pay only for execution time |
Configuration Template
// cdk/lib/serverless-infrastructure-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';
import * as sqs from 'aws-cdk-lib/aws-sqs';
import * as path from 'path';
export class ServerlessInfrastructureStack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const dlq = new sqs.Queue(this, 'EventDLQ', {
retentionPeriod: cdk.Duration.days(14),
encryption: sqs.QueueEncryption.KMS_MANAGED
});
const eventProcessor = new lambda.Function(this, 'EventProcessor', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'index.handler',
code: lambda.Code.fromAsset(path.join(__dirname, '../src/lambda')),
timeout: cdk.Duration.seconds(30),
memorySize: 512,
deadLetterQueue: dlq,
retryAttempts: 3,
environment: {
IDEMPOTENCY_TABLE: process.env.IDEMPOTENCY_TABLE || 'idempotency-store',
NODE_OPTIONS: '--enable-source-maps'
},
tracing: lambda.Tracing.ACTIVE
});
const eventBus = new events.EventBus(this, 'ApplicationBus');
const rule = new events.Rule(this, 'ProcessEventRule', {
eventBus,
eventPattern: {
source: ['com.app.events'],
detailType: ['OrderCreated', 'PaymentProcessed']
},
targets: [new targets.LambdaFunction(eventProcessor)]
});
}
}
Quick Start Guide
- Initialize project:
npx aws-cdk init app --language typescript
- Install dependencies:
npm install aws-cdk-lib constructs @aws-sdk/client-dynamodb
- Replace stack content with the Configuration Template above
- Deploy infrastructure:
npx cdk deploy --require-approval never
- Test event routing:
npx cdk aws events put-events --entries '[{"Source":"com.app.events","DetailType":"OrderCreated","Detail":"{\"id\":\"test-123\"}"}]'
Verify execution in CloudWatch Logs. The function will process the event, store the idempotency key, and route failures to the DLQ. Production patterns are now active.