decouples policy definition from enforcement, leverages existing observability infrastructure, and ensures immutable audit trails.
Step 1: Model Compliance Requirements as Policy-as-Code
Translate regulatory and internal requirements into declarative rules using a policy engine like Open Policy Agent (OPA). Policies should evaluate telemetry attributes, configuration state, and access patterns.
# policy/compliance/gdpr_data_access.rego
package compliance.gdpr
import rego.v1
default data_access_violation := false
data_access_violation {
input.event.type == "data.query"
input.context.data_classification == "pii"
not input.context.consent_record.exists
input.context.tenant_id != "internal-ops"
}
Step 2: Instrument Applications with Compliance Telemetry
Attach compliance context to spans, logs, and metrics. Use OpenTelemetry to propagate policy-relevant attributes without coupling business logic to compliance rules.
import { trace, context, SpanStatusCode } from '@opentelemetry/api';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
export function instrumentComplianceContext(span: ReturnType<typeof trace.getActiveSpan>) {
if (!span) return;
// Attach data classification and consent status
span.setAttribute('compliance.data_classification', 'pii');
span.setAttribute('compliance.consent_record', 'true');
span.setAttribute('compliance.tenant_id', 'acme-corp');
// Mark compliance-critical operations
span.setAttribute('compliance.scope', 'gdpr-article-6');
}
export async function executeCompliantQuery(query: string, userId: string) {
const tracer = trace.getTracer('compliance-service');
return tracer.startActiveSpan('data.query', async (span) => {
try {
instrumentComplianceContext(span);
span.setAttribute(SemanticAttributes.DB_STATEMENT, query);
// Simulate policy evaluation hook
const policyResult = await evaluatePolicyAgainstSpan(span);
if (policyResult.violation) {
span.setStatus({ code: SpanStatusCode.ERROR, message: 'policy_violation' });
span.setAttribute('compliance.violation_id', policyResult.ruleId);
}
const result = await database.execute(query, userId);
span.end();
return result;
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR });
span.end();
throw err;
}
});
}
Step 3: Route Telemetry to a Compliance-Aware Pipeline
Configure the OpenTelemetry Collector to enrich, filter, and forward compliance telemetry to a policy evaluation endpoint and immutable storage.
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
resource:
attributes:
- key: compliance.environment
value: "production"
action: upsert
exporters:
otlp/policy:
endpoint: policy-engine.internal:8181
tls:
insecure: true
file/compliance_audit:
path: /var/log/compliance/audit-*.json
rotation:
max_megabytes: 256
max_days: 365
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlp/policy, file/compliance_audit]
Step 4: Implement Continuous Validation & Evidence Generation
The policy engine evaluates incoming telemetry against active rules. Violations trigger alerts and automatically package evidence: span context, policy version, evaluation timestamp, and remediation instructions. Store evidence in append-only storage with cryptographic checksums for audit integrity.
Architecture Decisions & Rationale
- OpenTelemetry as the telemetry backbone: Standardizes instrumentation across languages, eliminates vendor lock-in, and provides a unified data model for metrics, logs, and traces.
- OPA/Rego for policy evaluation: Declarative, version-controllable, and supports complex rule composition. Decouples policy authors from developers.
- Immutable audit export: Append-only JSON/Parquet files with checksums satisfy regulatory retention requirements and prevent evidence tampering.
- Policy evaluation at ingestion: Evaluating policies during collection avoids storing non-compliant data in analytics warehouses, reducing storage costs and privacy risk.
- Context propagation via span attributes: Keeps compliance metadata attached to the exact request lifecycle, enabling root-cause analysis and blast-radius determination.
Pitfall Guide
-
Treating compliance as a separate monitoring stack
Building isolated compliance tools creates data fragmentation. Telemetry must flow through a unified pipeline to correlate performance anomalies with policy violations.
-
Over-instrumenting without signal-to-noise filtering
Capturing every attribute overwhelms storage and evaluation engines. Define a minimum viable compliance telemetry set: data classification, consent state, access scope, and tenant context.
-
Ignoring data lineage and retention conflicts
Compliance requirements often mandate data deletion (e.g., GDPR right to erasure), while observability demands retention. Implement tiered storage: hot telemetry for evaluation, cold immutable archives for audit, and automated purging for raw PII.
-
Deploying static policies in ephemeral environments
Containerized and serverless workloads change state rapidly. Policies must be versioned, dynamically loaded, and evaluated against live configuration snapshots, not static IaC files.
-
Assuming dashboard health equals audit readiness
Green dashboards often mask missing evidence chains. Audit readiness requires cryptographic proof of policy evaluation, not just pass/fail metrics. Always generate tamper-evident evidence bundles.
-
Failing to correlate traces with policy decisions
Isolated policy alerts lack context. Attach policy evaluation results to distributed traces so engineers can reconstruct exactly which service, user, and data path triggered a violation.
-
Neglecting role-based access to compliance data
Compliance telemetry contains sensitive operational details. Implement strict RBAC: engineers see violation context, auditors see evidence bundles, and security teams see policy drift trends.
Best Practices from Production:
- Treat policies as code: review, test, and deploy via CI/CD.
- Run policy canaries: evaluate new rules in
log-only mode before enforcement.
- Automate evidence packaging: generate PDF/JSON audit bundles on schedule or on-demand.
- Cross-train teams: compliance officers must understand telemetry models; engineers must understand regulatory boundaries.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Early-stage startup (pre-Series A) | Lightweight OTel + OPA with file-based audit export | Minimal overhead, rapid deployment, satisfies basic SOC 2 requirements | Low ($500β$1,200/mo infrastructure) |
| Regulated enterprise (HIPAA/PCI-DSS) | Full OTel pipeline + centralized policy engine + immutable compliance lake + automated evidence packaging | Meets strict audit trails, supports complex policy composition, enables cross-service correlation | Medium ($3,000β$8,000/mo) |
| Multi-tenant SaaS with global users | Policy-as-code with region-aware rule evaluation + data residency enforcement + automated consent tracking | Handles jurisdictional differences, scales with tenant growth, prevents cross-border violations | High ($8,000β$15,000/mo) |
Configuration Template
# otel-collector-compliance.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_max_size: 1000
resource:
attributes:
- key: compliance.version
value: "1.4.2"
action: upsert
- key: compliance.policy_set
value: "gdpr-hipaa-baseline"
action: upsert
exporters:
otlp/policy_engine:
endpoint: opa.internal:8181/v1/data/compliance/evaluate
tls:
insecure: false
headers:
Authorization: "Bearer ${OPA_SERVICE_TOKEN}"
file/compliance_evidence:
path: /data/compliance/evidence-${date}.json
rotation:
max_megabytes: 512
max_days: 1095
encoding: json
flush_interval: 10s
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlp/policy_engine, file/compliance_evidence]
logs:
receivers: [otlp]
processors: [batch]
exporters: [file/compliance_evidence]
Quick Start Guide
- Initialize policy repository: Create a Git repo with
policy/ directory. Add baseline OPA rules for your target framework (e.g., GDPR, SOC 2). Run opa test ./policy to validate syntax and logic.
- Deploy OPA with bundle loading: Start OPA with
--server --bundle /policies --log-level info. Configure it to pull policy bundles from your Git repo or OCI registry.
- Instrument a service: Add OpenTelemetry SDK to your TypeScript service. Attach
compliance.* attributes to database queries, auth endpoints, and data exports. Export spans to the OTel Collector.
- Verify pipeline: Trigger a compliance-sensitive operation. Check OPA logs for evaluation results. Confirm evidence files are written to
/data/compliance/. Validate that violations appear in your observability dashboard with attached span context.
- Enable automated alerts: Configure your alerting system (e.g., PagerDuty, Slack, webhook) to trigger on
compliance.violation_id attributes. Set escalation rules based on severity and data classification.