nance rules in a declarative language, version controls them alongside infrastructure code, and evaluates them at multiple stages of the lifecycle.
Step-by-Step Implementation
- Select a Policy Engine: Adopt a standard PaC engine like Open Policy Agent (OPA). OPA provides a high-level declarative language (Rego) and runs as a standalone service or embedded library, supporting enforcement across Terraform, Kubernetes, and CI/CD pipelines.
- Define Atomic Policies: Decompose regulatory requirements into atomic, testable rules. Avoid monolithic policies. For example, separate "encryption at rest" from "public access blocking." This enables granular error reporting and easier maintenance.
- Integrate Shift-Left Gates: Embed policy evaluation in the CI pipeline. When infrastructure code is merged, generate a plan (e.g.,
terraform plan -json) and evaluate the plan against policies. Block the merge if violations exist.
- Deploy Continuous Monitoring: Use OPA Gatekeeper (for Kubernetes) or drift detection agents (for cloud APIs) to monitor runtime state. This catches violations introduced via console access or third-party integrations.
- Automate Remediation: For critical violations, implement self-healing workflows. If a resource violates a policy, trigger a remediation Lambda or runbook that reverts the configuration or tags the resource for isolation.
Code Examples
TypeScript Policy Integration Wrapper
While Rego is the standard for policy definition, TypeScript is often used to orchestrate checks or validate configurations in custom deployment scripts. The following example demonstrates a TypeScript function that invokes an OPA evaluation against a resource configuration, suitable for integration into a CDKTF or Pulumi workflow.
import { Opa } from 'opa'; // Hypothetical wrapper for OPA evaluation
interface ResourceConfig {
type: string;
properties: Record<string, any>;
}
interface PolicyResult {
allowed: boolean;
messages: string[];
}
/**
* Evaluates a resource configuration against a loaded OPA policy.
* Returns structured results for CI/CD gate logic.
*/
export async function validateResourceCompliance(
resource: ResourceConfig,
policyPath: string
): Promise<PolicyResult> {
const opa = new Opa();
try {
// Load policy bundle
await opa.loadPolicy(policyPath);
// Input structure expected by Rego policy
const input = {
resource_type: resource.type,
properties: resource.properties,
};
// Evaluate against the 'allow' rule
const result = await opa.evaluate({
input,
path: 'data.policy.allow',
});
if (result.result === true) {
return { allowed: true, messages: [] };
} else {
// Extract violation messages from policy decision
const messages = result.messages || ['Resource violates compliance policy.'];
return { allowed: false, messages };
}
} catch (error) {
// Fail-closed: if policy engine fails, block deployment
console.error(`Policy evaluation failed: ${error}`);
return {
allowed: false,
messages: ['Policy engine error. Deployment blocked for safety.']
};
}
}
// Usage in a deployment hook
async function deploymentGate(resource: ResourceConfig) {
const validation = await validateResourceCompliance(
resource,
'./policies/s3_encryption.rego'
);
if (!validation.allowed) {
throw new Error(
`Compliance check failed:\n${validation.messages.join('\n')}`
);
}
console.log('Compliance check passed.');
}
Rego Policy Example
This Rego policy enforces encryption on S3 buckets. It is language-agnostic and can be applied to Terraform plans, CloudFormation templates, or runtime JSON.
package policy.s3
import rego.v1
deny contains msg if {
input.resource_type == "aws_s3_bucket"
not input.properties.server_side_encryption_configuration
msg := sprintf("S3 bucket %s must have server-side encryption enabled.", [input.properties.id])
}
deny contains msg if {
input.resource_type == "aws_s3_bucket"
input.properties.public_access_block_configuration.block_public_acls == false
msg := sprintf("S3 bucket %s must block public ACLs.", [input.properties.id])
}
Architecture Decisions and Rationale
- OPA over Proprietary Tools: OPA avoids vendor lock-in and supports multi-cloud environments. It integrates seamlessly with Terraform via
conftest and Kubernetes via Gatekeeper.
- Fail-Closed Evaluation: Policy evaluation errors must block deployment. If the policy engine cannot determine compliance, the system must assume non-compliance to prevent risk.
- Policy Versioning: Policies must be versioned in the same repository as infrastructure code. This ensures that infrastructure changes are evaluated against the policy state intended for that release, preventing retroactive failures.
- Separation of Concerns: Policies should define what is allowed, not how to implement it. This allows engineering teams flexibility in resource configuration while maintaining strict governance boundaries.
Pitfall Guide
-
Monolithic Policy Bundles:
- Mistake: Creating a single large policy file for all compliance rules.
- Impact: Makes debugging difficult, slows evaluation performance, and complicates versioning.
- Best Practice: Structure policies by domain (e.g.,
networking/, storage/, iam/). Use OPA bundles for efficient distribution.
-
Ignoring Exception Management:
- Mistake: Hard-blocking all violations without a mechanism for approved exceptions.
- Impact: Engineers bypass controls or halt production fixes. Audit trails become incomplete.
- Best Practice: Implement an exception workflow where waivers are requested, approved, and automatically expire. Store exceptions in a versioned config that policies reference.
-
Policy-Infrastructure Version Mismatch:
- Mistake: Updating policies independently of infrastructure code.
- Impact: New policies break existing deployments, or old policies fail to catch new violations.
- Best Practice: Use a "policy pinning" strategy where infrastructure manifests reference specific policy versions. Update policies in lockstep with infrastructure changes.
-
False Positive Fatigue:
- Mistake: Writing overly restrictive policies that flag legitimate edge cases.
- Impact: Teams disable checks or ignore warnings, nullifying the automation.
- Best Practice: Implement a "warn-only" mode during policy rollout. Collect metrics on violations, refine rules, and transition to "deny" only after validation.
-
Lack of Drift Detection:
- Mistake: Relying solely on CI/CD gates without runtime monitoring.
- Impact: Manual changes or third-party integrations introduce violations that gates miss.
- Best Practice: Deploy continuous compliance scanners that reconcile actual state against policy. Trigger alerts or remediation for drift.
-
Hardcoding Secrets in Policy Logic:
- Mistake: Embedding sensitive data or allowed values directly in policy files.
- Impact: Security risks and inflexibility when allowed values change.
- Best Practice: Use OPA data inputs to inject allowed lists or secrets at runtime. Keep policies generic and data-driven.
-
Over-Automation of Remediation:
- Mistake: Automatically fixing all violations without human review.
- Impact: Automated remediation can cause cascading failures or data loss if the remediation logic is flawed.
- Best Practice: Automate remediation only for low-risk, idempotent violations (e.g., tagging, minor config tweaks). High-risk violations require manual intervention with automated playbooks.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Kubernetes-Native Workloads | OPA Gatekeeper + Kyverno | Native admission control; real-time enforcement at API server level. | Low infra cost; requires dev training on Rego. |
| Multi-Cloud Terraform | OPA + Conftest in CI | Language-agnostic; validates plans before apply; works across AWS/Azure/GCP. | Medium CI compute cost; high reuse of policies. |
| Legacy Console Management | CSP-Native Tools (AWS Config/Azure Policy) | Easiest deployment; covers drift from manual changes; no code required. | High recurring license cost; limited flexibility. |
| Strict Audit Requirements | HashiCorp Sentinel + Exception DB | Enterprise-grade audit trails; strict gating; integrates with Terraform Cloud. | High operational overhead; licensing fees. |
Configuration Template
GitHub Actions Workflow with OPA Enforcement
This template demonstrates how to integrate policy evaluation into a Terraform workflow.
name: Infrastructure Compliance Check
on:
pull_request:
paths:
- 'infrastructure/**'
jobs:
compliance-check:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Initialize Terraform
run: terraform init
working-directory: ./infrastructure
- name: Generate Terraform Plan
run: terraform plan -out=tfplan -no-color
working-directory: ./infrastructure
- name: Convert Plan to JSON
run: terraform show -json tfplan > plan.json
working-directory: ./infrastructure
- name: Setup Conftest
uses: instrumenta/conftest-action@v1
with:
conftest_version: '0.40.0'
- name: Run Policy Checks
run: |
conftest test plan.json \
--policy ./policies \
--combine \
--update ./policies/policy-bundle.tar.gz \
--fail
working-directory: ./infrastructure
OPA Policy Bundle Structure
policies/
βββ policy-bundle.tar.gz # Compiled bundle
βββ src/
βββ s3_encryption.rego
βββ iam_no_wildcards.rego
βββ network_no_public_sg.rego
Quick Start Guide
- Install Tools: Install Terraform, OPA, and Conftest locally.
brew install terraform opa conftest
- Write a Policy: Create
policies/no_public_sg.rego to deny security groups with ingress from 0.0.0.0/0.
- Generate Plan: Run
terraform plan -json > plan.json in your infrastructure directory.
- Evaluate: Run
conftest test plan.json --policy ./policies. Verify the output blocks violations.
- Integrate CI: Add the Conftest step to your CI pipeline configuration and configure the branch protection rule to require the check to pass.