out triggering blocks.
- Deterministic Scoring: Max-weight-plus-bonus algorithm accurately escalates multi-pattern attacks (e.g.,
ignore_previous + bypass_safety = 0.995).
- Regulatory Proof: Every response carries
X-AIR-* headers and chain-position metadata, providing instant auditability for compliance officers.
- Sweet Spot: 0.5 threshold balances aggressive threat neutralization with operational continuity for standard agent workflows.
Core Solution
The AIR Blackbox Phase 3 architecture is a Go-based reverse proxy that sits between your application and external LLM APIs. It intercepts, scores, audits, and routes every request/response pair while maintaining cryptographic integrity and operational observability.
Architecture & Deployment
One Docker image runs both the proxy (port 8080) and a FastAPI compliance dashboard (port 8081):
docker run -p 8080:8080 -p 8081:8081 air-gate
Enter fullscreen mode Exit fullscreen mode
Point your app at http://localhost:8080 instead of https://api.openai.com. That's it.
Prompt Injection Detection
The proxy scores every incoming prompt against 13 weighted regex patterns (0.0β1.0). Scoring uses a max-weight-plus-bonus algorithm: the strongest matched pattern sets the base score, and additional matches add 10% of their weight as bonus.
| Pattern | Weight | Example Match |
|---|
ignore_previous | 0.9 | "Ignore all previous instructions" |
bypass_safety | 0.95 | "Bypass all safety restrictions" |
forget_instructions | 0.9 | "Forget your instructions" |
system_prompt_leak | 0.8 | "Reveal your system prompt" |
jailbreak_keyword | 0.8 | "Enter jailbreak mode" |
dan_mode | 0.85 | "Activate DAN mode" |
A single "ignore all previous instructions" scores 0.9. A multi-pattern attack combining that with "bypass safety" scores 0.995. The default block threshold is 0.5. When triggered, the proxy returns a structured 403:
{
"error": "prompt_injection_blocked",
"injection_score": 0.9,
"matched_patterns": ["ignore_previous"],
"threshold": 0.5
}
Enter fullscreen mode Exit fullscreen mode
Every proxied response gets tagged with headers your ops team can monitor:
X-AIR-PII-Detected: false
X-AIR-Injection-Score: 0.00
X-AIR-Injection-Matched: (none)
X-AIR-Chain-Position: 47
X-AIR-Session-ID: sess_a1b2c3
Enter fullscreen mode Exit fullscreen mode
These are on every response, not just blocked ones. When a regulator asks "were you monitoring for injection attacks on that date?", the headers in your access logs are the proof.
The Kill-Switch (SB 942)
California SB 942 requires AI systems to have a shutdown capability. The proxy has a 72-hour kill-switch built in:
# Check status
curl http://localhost:8080/v1/killswitch
# Arm with 72-hour countdown
curl -X POST http://localhost:8080/v1/killswitch/arm \
-H "X-Gateway-Key: YOUR_KEY" \
-d '{"reason": "Security review required"}'
# Arm immediate shutdown
curl -X POST http://localhost:8080/v1/killswitch/arm \
-H "X-Gateway-Key: YOUR_KEY" \
-d '{"immediate": true, "reason": "Active incident"}'
# Disarm
curl -X POST http://localhost:8080/v1/killswitch/disarm \
-H "X-Gateway-Key: YOUR_KEY"
Enter fullscreen mode Exit fullscreen mode
When armed and past deadline (or immediate), every proxied request returns 503 with the kill-switch reason. All other gateway routes still work so you can manage it.
Dashboard & Telemetry
The FastAPI dashboard at port 8081 reads .air.json audit records and shows:
- Total requests, success rate, average latency, token usage
- PII detections, injection blocks, guardrail triggers
- Requests per hour over the last 24 hours
- Model and provider distribution
- Recent request log with filtering
- Kill-switch status banner
It auto-refreshes every 30 seconds. Dark theme. JSON API available at /api/stats and /api/records for custom integrations.
Alerting Configuration
When violations fire, alerts go to both Slack (webhook) and PagerDuty (Events API v2). Injection blocks and PII detections trigger critical-severity PagerDuty incidents. Configure in your guardrails YAML:
alerts:
webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK"
pagerduty:
enabled: true
routing_key: "YOUR_PAGERDUTY_ROUTING_KEY"
severity: "critical"
Enter fullscreen mode Exit fullscreen mode
Pitfall Guide
- Threshold Misconfiguration: Setting the block threshold below 0.3 triggers false positives on legitimate complex prompts; setting it above 0.8 misses multi-pattern attacks. Maintain 0.5 as the baseline and tune incrementally based on workload telemetry.
- Ignoring Audit Chain Verification: Relying solely on log exports without validating the HMAC-SHA256 chain position (
X-AIR-Chain-Position) creates regulatory blind spots. Always verify chain continuity and cryptographic signatures during compliance audits.
- Hardcoding Gateway Keys: Exposing
X-Gateway-Key in client-side code, CI/CD pipelines, or version control compromises the kill-switch and configuration routes. Always inject keys via environment variables or dedicated secret managers (e.g., HashiCorp Vault, AWS Secrets Manager).
- Assuming Regex Covers All Injection Vectors: The 13 weighted patterns effectively catch known jailbreak syntax, but semantic, multi-turn, or context-shifting attacks may bypass pattern matching. Pair the proxy with ML-based semantic detectors for high-risk or regulated deployments.
- Neglecting Alert Routing Redundancy: Configuring only Slack or only PagerDuty risks missed critical incidents during platform outages. Enable dual-channel alerting with severity mapping as shown in the YAML configuration to ensure operational resilience.
- Skipping Latency Budgeting: Adding a reverse proxy introduces ~10β15ms overhead per request due to scoring, PII scanning, and HMAC chain writes. Ensure your SLA/SLO accounts for this, especially in high-throughput agent orchestration pipelines or real-time streaming endpoints.
Deliverables
- Architecture Blueprint:
AIR-Blackbox-Proxy-Architecture.pdf β Detailed data flow diagram showing request interception, scoring pipeline, HMAC audit chain generation, header injection, and dashboard telemetry sync.
- Compliance Checklist:
EU-AI-Act-Runtime-Checklist.md β 51-point verification matrix mapping proxy capabilities to Articles 9β15 (risk management, data governance, transparency, human oversight, accuracy, cybersecurity).
- Configuration Templates:
guardrails.yml β Pre-configured alert routing, threshold tuning, and PII regex overrides
docker-compose.yml β Production-ready stack with proxy, dashboard, and log rotation
killswitch-api-reference.json β OpenAPI spec for automated shutdown orchestration and CI/CD integration