ent. Each layer must be isolated, testable, and externally configurable.
Step 1: Model the Subscription Lifecycle as a State Machine
Subscriptions are not booleans. They are finite state machines with explicit transitions. Define states, allowed transitions, and side effects.
type SubscriptionState =
| 'draft'
| 'active'
| 'past_due'
| 'canceled'
| 'expired';
interface StateTransition {
from: SubscriptionState;
to: SubscriptionState;
event: string;
handler: (sub: Subscription) => Promise<void>;
}
const ALLOWED_TRANSITIONS: StateTransition[] = [
{ from: 'draft', to: 'active', event: 'payment_succeeded', handler: activateSubscription },
{ from: 'active', to: 'past_due', event: 'payment_failed', handler: markPastDue },
{ from: 'past_due', to: 'active', event: 'payment_recovered', handler: recoverSubscription },
{ from: 'past_due', to: 'canceled', event: 'max_dunning_reached', handler: cancelSubscription },
{ from: 'active', to: 'canceled', event: 'user_canceled', handler: cancelSubscription },
{ from: 'canceled', to: 'expired', event: 'grace_period_ended', handler: expireSubscription },
];
export async function transitionSubscription(
sub: Subscription,
event: string
): Promise<Subscription> {
const transition = ALLOWED_TRANSITIONS.find(
t => t.from === sub.state && t.event === event
);
if (!transition) {
throw new Error(`Invalid transition: ${sub.state} -> ${event}`);
}
await transition.handler(sub);
return { ...sub, state: transition.to, lastTransitionAt: new Date() };
}
Step 2: Decouple Metering from Billing
Usage metering must be aggregated at ingestion, not calculated on demand. Bucket events by subscription, meter, and time window. Store deltas to enable deterministic reconciliation.
interface MeterEvent {
subscriptionId: string;
meterKey: string; // e.g., 'api_requests', 'storage_gb'
quantity: number;
timestamp: string; // ISO 8601
}
interface MeterBucket {
subscriptionId: string;
meterKey: string;
windowStart: string; // e.g., '2024-01-01T00:00:00Z'
windowEnd: string;
totalQuantity: number;
version: number; // for idempotent updates
}
export class MeteringAggregator {
async ingest(event: MeterEvent): Promise<void> {
const window = this.getWindow(event.timestamp);
const bucketKey = `${event.subscriptionId}:${event.meterKey}:${window.start}`;
await this.redis.incrby(bucketKey, event.quantity);
await this.redis.expire(bucketKey, 60 * 60 * 24 * 32); // retain for billing cycle
// Persist delta to event store for reconciliation
await this.eventStore.append({
type: 'meter.ingested',
payload: { ...event, windowKey: bucketKey },
idempotencyKey: `${event.subscriptionId}:${event.timestamp}:${event.meterKey}`
});
}
async getUsage(subId: string, meterKey: string, window: TimeWindow): Promise<number> {
const key = `${subId}:${meterKey}:${window.start}`;
const raw = await this.redis.get(key);
return raw ? Number(raw) : 0;
}
private getWindow(timestamp: string): TimeWindow {
const date = new Date(timestamp);
const start = new Date(date.getFullYear(), date.getMonth(), 1);
const end = new Date(date.getFullYear(), date.getMonth() + 1, 0, 23, 59, 59);
return { start: start.toISOString(), end: end.toISOString() };
}
}
Step 3: Build an Entitlement Resolution Engine
Entitlements must be decoupled from billing state. A subscription can be past_due but still grant access during a grace period. Entitlements should be resolved via policy evaluation, not conditional database queries.
interface EntitlementPolicy {
feature: string;
condition: (sub: Subscription, usage: Record<string, number>) => boolean;
}
const POLICIES: EntitlementPolicy[] = [
{
feature: 'api_unlimited',
condition: (sub) => sub.state === 'active' && sub.plan.tier === 'enterprise'
},
{
feature: 'api_rate_limited',
condition: (sub, usage) =>
(sub.state === 'active' || sub.state === 'past_due') &&
usage['api_requests'] < sub.plan.monthlyLimit
},
{
feature: 'storage_basic',
condition: (sub) => sub.state !== 'expired'
}
];
export class EntitlementResolver {
async resolve(sub: Subscription, usage: Record<string, number>): Promise<string[]> {
return POLICIES
.filter(p => p.condition(sub, usage))
.map(p => p.feature);
}
}
Step 4: Implement Idempotent Webhook Handlers
Payment providers deliver events asynchronously. Handlers must verify signatures, enforce idempotency, and route to the state machine.
import { createHmac } from 'crypto';
export async function handlePaymentWebhook(
payload: string,
signature: string,
secret: string
): Promise<void> {
const expected = createHmac('sha256', secret)
.update(payload)
.digest('hex');
if (!crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected))) {
throw new Error('Invalid webhook signature');
}
const event = JSON.parse(payload);
const idempotencyKey = `${event.type}:${event.id}`;
const processed = await this.dlq.isProcessed(idempotencyKey);
if (processed) return;
try {
await this.subscriptionService.processEvent(event);
await this.dlq.markProcessed(idempotencyKey);
} catch (err) {
await this.dlq.enqueue({ event, idempotencyKey, retryCount: 0, nextRetry: Date.now() });
throw err;
}
}
Step 5: Handle Proration & Cycle Alignment Mathematically
Proration must be deterministic. Avoid floating-point accumulation. Use cent-based integers and explicit day-count algorithms.
export function calculateProration(
planAmountCents: number,
cycleDays: number,
daysUsed: number
): number {
const dailyRate = Math.floor(planAmountCents / cycleDays);
const usedCents = dailyRate * daysUsed;
const remainingCents = planAmountCents - usedCents;
return Math.max(0, remainingCents);
}
Architecture Decisions & Rationale
- Event Sourcing for Billing Events: Financial state changes must be auditable. Append-only logs enable replay, reconciliation, and compliance reporting.
- CQRS for Reads vs Writes: Billing writes go through the state machine and event store. Reads (dashboards, entitlement checks) use a materialized view updated via event projection.
- Externalized Policy Config: Pricing tiers, metering rules, and entitlement conditions live in version-controlled YAML/JSON. Product teams modify policies without deploying code.
- Dead Letter Queue for Webhooks: Payment webhooks fail. DLQ with exponential backoff and signature verification prevents data loss and duplicate charges.
- Cent-Based Currency Math: Floating-point decimals cause rounding drift. All monetary values are stored as integers representing smallest currency units.
Pitfall Guide
-
Assuming Webhooks Are Reliable
Payment providers retry failed deliveries, but network partitions, timeout limits, and signature rotation cause gaps. Always verify signatures, enforce idempotency keys, and implement a DLQ with retry scheduling. Synchronous polling is a fallback, not a primary strategy.
-
Hardcoding Pricing Tiers in Code
Embedding rates in conditionals forces code deployments for every price change. It also breaks multi-currency and regional pricing. Use declarative plan schemas loaded at runtime. Validate schemas against a strict type system before deployment.
-
Ignoring Timezone and Calendar Boundaries
Billing cycles anchor to UTC, not local time. Calculating cycle days using Date.now() without explicit ISO 8601 boundaries causes off-by-one errors in proration and metering windows. Always compute cycles using calendar-aware libraries (e.g., date-fns, luxon) with explicit timezone awareness.
-
Metering Drift from On-Demand Aggregation
Calculating usage at billing time requires joining millions of events, causing timeout failures and inconsistent totals. Aggregate at ingestion into time-bound buckets. Run nightly reconciliation against the event store to correct drift. Store deltas, not snapshots.
-
Deferring Tax and VAT Compliance
Tax rules change quarterly. Hardcoding rates or calculating manually triggers audit failures. Integrate a tax engine (Avalara, TaxJar, or Stripe Tax) early. Cache jurisdiction rules, apply them at checkout, and store tax breakdowns per transaction for reporting.
-
Coupling Authentication to Billing State
Checking subscription.status === 'active' inside auth middleware creates tight coupling. When billing state changes, auth must invalidate sessions. Instead, resolve entitlements via a dedicated service that emits access tokens. Auth validates tokens; billing manages state.
-
Poor Dunning Logic
Charging immediately on failure, using aggressive retry intervals, or skipping grace periods kills recovery rates. Implement smart dunning: exponential backoff, payment method update prompts, 3–7 day grace periods, and automated email/SMS nudges. Track recovery funnels to optimize timing.
Best Practices from Production:
- Feature flag billing experiments. Roll out pricing changes to 5% of users, monitor charge success rates, and compare LTV before full deployment.
- Implement circuit breakers on payment provider calls. Network failures should not block subscription state transitions.
- Maintain a financial audit trail. Every charge, refund, proration, and state change must emit an immutable event with correlation IDs.
- Monitor billing health separately from app metrics. Track charge success rate, dunning recovery rate, metering reconciliation lag, and tax calculation failure rate.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Early-stage SaaS (<5k subs) | Managed processor (Stripe/Paddle) with hosted checkout | Reduces compliance burden, accelerates time-to-market, handles tax/dunning out-of-box | Low upfront, 2.9% + $0.30 per transaction |
| Usage-heavy API platform | Event-driven metering + hybrid billing engine | Supports granular aggregation, real-time entitlements, and overage pricing without provider limits | Medium engineering cost, scales linearly with usage volume |
| Enterprise B2B with contracts | Custom billing service + ERP integration | Handles net-30 terms, invoice-based billing, custom discount structures, and audit compliance | High engineering cost, reduces payment processing fees |
| Multi-region digital asset marketplace | Paddle/Chargebee with localized tax engine | Manages VAT/GST, currency conversion, merchant of record requirements, and regional compliance | Moderate cost, eliminates legal risk in EU/APAC |
Configuration Template
# subscription-config.yaml
version: "2.0"
plans:
- id: "starter"
currency: "USD"
amount_cents: 2900
billing_cycle: "monthly"
entitlements:
- "api_rate_limited"
- "storage_basic"
metering:
api_requests:
monthly_limit: 10000
overage_rate_cents: 50
unit: "request"
- id: "pro"
currency: "USD"
amount_cents: 9900
billing_cycle: "monthly"
entitlements:
- "api_unlimited"
- "storage_premium"
- "priority_support"
metering:
api_requests:
monthly_limit: 0 # unlimited
overage_rate_cents: 0
unit: "request"
dunning:
grace_period_days: 5
retry_schedule: [1, 3, 7, 14]
max_attempts: 4
notify_channels: ["email", "dashboard"]
tax:
provider: "stripe_tax"
fallback_rate_cents: 0
jurisdiction_cache_ttl_hours: 24
entitlement:
resolution_strategy: "policy_eval"
cache_ttl_seconds: 300
fallback_state: "read_only"
Quick Start Guide
- Initialize the billing domain: Install the subscription SDK, generate the state machine skeleton, and scaffold the metering aggregator. Run
npx @codcompass/billing init --domain=subscriptions to create event store tables, Redis bucket schemas, and webhook routing.
- Load plan configuration: Place
subscription-config.yaml in your config directory. Run billing validate-config to verify schema compliance, currency formats, and entitlement mappings.
- Deploy webhook endpoint: Expose
/webhooks/billing with signature verification middleware. Configure your payment provider to route invoice.payment_succeeded, invoice.payment_failed, and customer.subscription.updated events to this endpoint.
- Start metering ingestion: Add the
MeteringAggregator.ingest() call to your API gateway or service middleware. Tag events with subscriptionId and meterKey. Verify bucket accumulation in Redis.
- Test lifecycle transitions: Use the provider sandbox to trigger trial end, payment failure, and recovery. Confirm state transitions, dunning emails, and entitlement revocation match your policy schema. Deploy to staging, run charge simulation suite, then promote to production.