compare values" creates artificial coordination overhead. The AI handles internal logic better when the block represents a complete functional unit. This is the functional integrity rule: a block must be atomic in responsibility but complete in execution.
Each block lives in its own file. The file begins with a standardized header containing two sections: a human-readable Description and an AI-ready Prompt. This eliminates duplication, keeps contracts versioned alongside code, and forces explicit boundary definition.
The Description states what the block does in plain English. The Prompt specifies the exact contract the AI must implement: function signature, input/output types, error conditions, constraints, and a usage example. The prompt acts as a compiler contract; the AI's job is to satisfy it, not interpret vague intent.
Example: threshold_evaluator.py header
# =============================================================================
# DESCRIPTION
# =============================================================================
# Evaluates normalized metric values against predefined thresholds.
# Returns a structured result indicating pass, warn, or critical states.
# =============================================================================
# PROMPT
# =============================================================================
# Generate a Python function for a Threshold Evaluator block.
#
# Function signature:
# def evaluate_threshold(metric_name: str, value: float, limits: dict) -> dict:
#
# Inputs:
# - metric_name: string identifier for the metric
# - value: float representing the current measurement
# - limits: dict with keys 'warning' and 'critical', both floats
#
# Outputs:
# - dict with keys 'status' (str: 'ok', 'warning', 'critical'),
# 'metric_name' (str), 'value' (float), 'thresholds' (dict)
#
# Error handling:
# - Raise ValueError if limits lacks 'warning' or 'critical' keys
# - Raise TypeError if value is not a float or int
# - Raise ValueError if warning >= critical
#
# Requirements:
# 1. Status must be 'ok' if value < warning
# 2. Status must be 'warning' if warning <= value < critical
# 3. Status must be 'critical' if value >= critical
# 4. Return dict must include all input values for auditability
# 5. Do not mutate the limits dict
#
# Implementation notes:
# - Use type hints and a comprehensive docstring
# - Keep the function pure; no I/O or external calls
# - Validate inputs before evaluation
#
# Example:
# result = evaluate_threshold("cpu_usage", 85.5, {"warning": 70.0, "critical": 90.0})
# # Returns: {"status": "warning", "metric_name": "cpu_usage", "value": 85.5, "thresholds": {"warning": 70.0, "critical": 90.0}}
# =============================================================================
This structure forces precision. The AI receives explicit boundaries, error contracts, and validation rules. Humans retain control over architectural decisions while delegating implementation to the model.
Phase 3: Isolated Generation & Validation
Feed the Prompt section to your AI assistant. Review the output against the contract before proceeding. Verify signature alignment, type correctness, error paths, and constraint satisfaction. If the AI deviates, refine the prompt rather than patching the code. Patching breaks contract traceability and creates maintenance debt.
Once validated, paste the generated code below the header. Test the block in isolation. Unit tests should cover happy paths, boundary conditions, and error branches. Do not advance to integration until the block passes its own test suite.
Example generated implementation:
def evaluate_threshold(metric_name: str, value: float, limits: dict) -> dict:
"""
Evaluates a metric value against warning and critical thresholds.
Args:
metric_name: Identifier for the metric being evaluated.
value: Current measurement value.
limits: Dictionary containing 'warning' and 'critical' float thresholds.
Returns:
Dictionary with evaluation status and input metadata.
Raises:
TypeError: If value is not numeric.
ValueError: If limits are missing keys or warning >= critical.
"""
if not isinstance(value, (int, float)):
raise TypeError("value must be a numeric type")
if 'warning' not in limits or 'critical' not in limits:
raise ValueError("limits must contain 'warning' and 'critical' keys")
if limits['warning'] >= limits['critical']:
raise ValueError("warning threshold must be less than critical threshold")
if value < limits['warning']:
status = "ok"
elif value < limits['critical']:
status = "warning"
else:
status = "critical"
return {
"status": status,
"metric_name": metric_name,
"value": float(value),
"thresholds": dict(limits)
}
The implementation matches the contract exactly. No external dependencies. No side effects. Pure evaluation logic. This isolation guarantees predictable behavior when the block is later wired into the larger system.
Phase 4: Orchestration Integration
Integration is a human responsibility. AI can generate boilerplate wiring, but humans must define data flow, error propagation strategies, and cross-cutting concerns. Create an orchestrator.py file that imports each block, sequences execution, and handles boundary failures.
The orchestrator should:
- Define explicit data transformation between blocks
- Implement retry, timeout, and fallback logic
- Centralize logging and telemetry
- Validate end-to-end contracts
Example orchestrator structure:
from data_aggregator import fetch_and_normalize
from threshold_evaluator import evaluate_threshold
from report_renderer import format_report
from dispatch_router import route_payload
def run_metrics_pipeline(config: dict) -> dict:
raw_data = fetch_and_normalize(config["source_url"])
normalized = normalize_metrics(raw_data)
evaluations = []
for metric in normalized:
result = evaluate_threshold(
metric_name=metric["name"],
value=metric["current_value"],
limits=config["thresholds"]
)
evaluations.append(result)
report = format_report(evaluations, config["output_format"])
delivery_status = route_payload(report, config["destinations"])
return {"pipeline_status": "complete", "deliveries": delivery_status}
The orchestrator contains no business logic. It only moves data between blocks, handles failures, and ensures contracts are respected. AI can generate this file, but humans must validate the execution order and error handling strategy.
Pitfall Guide
1. Over-Fragmentation
Explanation: Breaking atomic operations into multiple blocks creates artificial coordination overhead. AI handles internal branching better than cross-block handoffs.
Fix: Apply the one-sentence rule. If a block requires multiple sentences to describe its responsibility, merge it.
2. Implicit Contracts
Explanation: Omitting error types, return shapes, or validation rules in the prompt forces the AI to guess. Guesswork causes integration failures.
Fix: Always specify input/output types, error conditions, and constraints in the Prompt header. Treat it as a type signature.
3. Skipping Isolation Validation
Explanation: Moving to integration before testing blocks individually masks defects. Cross-module bugs become impossible to trace.
Fix: Write unit tests for each block before wiring. Use pytest with explicit contract assertions.
4. Glue Code Bloat
Explanation: Embedding business logic, transformations, or conditional branching in the orchestrator violates separation of concerns.
Fix: Keep the orchestrator thin. Move logic into dedicated blocks. The orchestrator should only sequence and route.
5. Prompt Drift
Explanation: Modifying generated code without updating the prompt breaks contract traceability. Future regenerations will overwrite fixes.
Fix: Version prompts alongside code. If implementation changes, update the prompt first, then regenerate.
6. Ignoring Cross-Cutting Concerns
Explanation: Blocks often lack logging, timeout handling, or retry logic because prompts focus on core logic.
Fix: Add implementation notes to prompts specifying observability requirements. Centralize cross-cutting logic in the orchestrator or middleware layer.
7. Treating AI Output as Production-Ready
Explanation: AI-generated code passes syntax checks but may lack performance optimizations, security hardening, or edge-case handling.
Fix: Run static analysis, type checkers, and security scanners on all generated blocks. Treat AI output as a first draft, not a final release.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Rapid prototype / PoC | Monolithic AI prompting | Speed outweighs maintainability | Low initial, high refactoring later |
| Production microservice | Contract-first blocks | Predictable integration, easier debugging | Higher upfront, lower long-term maintenance |
| Performance-critical path | Hand-written implementation | AI lacks optimization context for tight loops | Higher dev cost, lower runtime cost |
| Frequently changing requirements | Block architecture | Isolated updates prevent cascading rewrites | Moderate setup, high adaptability |
| Team with limited AI experience | Structured prompts + review gates | Reduces hallucination risk and enforces standards | Training overhead, higher code quality |
Configuration Template
Standardized block header template for consistent contract definition:
# =============================================================================
# DESCRIPTION
# =============================================================================
# [One-sentence summary of block responsibility]
# [Optional: secondary constraint or behavioral note]
# =============================================================================
# PROMPT
# =============================================================================
# Generate a Python function for a [Block Name] block.
#
# Function signature:
# def [function_name]([params]) -> [return_type]:
#
# Inputs:
# - [param]: [type] - [description]
#
# Outputs:
# - [return_type] - [structure description]
#
# Error handling:
# - Raise [ErrorType] if [condition]
#
# Requirements:
# 1. [Constraint 1]
# 2. [Constraint 2]
# 3. [Constraint 3]
#
# Implementation notes:
# - [Note on purity, dependencies, or performance]
#
# Example:
# [usage_example]
# =============================================================================
CI/CD integration snippet for automated contract validation:
# .github/workflows/block-validation.yml
name: Block Contract Validation
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install pytest mypy ruff
- name: Run type checking
run: mypy blocks/
- name: Run isolation tests
run: pytest tests/blocks/ -v
- name: Lint generated code
run: ruff check blocks/
Quick Start Guide
- Identify boundaries: List system responsibilities and group them into single-sentence descriptions. Create one
.py file per responsibility.
- Draft contracts: Populate each file with the Description + Prompt header. Specify signatures, types, errors, and constraints explicitly.
- Generate & validate: Feed prompts to your AI assistant. Review output against contracts. Write unit tests for each block. Run tests in isolation.
- Wire the system: Create an orchestrator file. Import blocks, sequence execution, and implement error routing. Keep business logic out of the orchestrator.
- Enforce quality gates: Add type checking, linting, and contract tests to your CI pipeline. Version prompts alongside code to prevent drift.
This workflow transforms AI from an unpredictable code generator into a reliable implementation engine. Humans own architecture and contracts; AI owns bodies and details. The result is systems that scale, debug, and evolve without collapsing under their own complexity.