From 53%% to 90%%: How an Auto-Healing AI Swarm Learned to Defend Itself
Executive Summary
Over four iterations and 200+ adversarial wargame rounds, we evolved a local AI swarm's defense rate from 53% to 90% β without changing hardware, adding cloud dependencies, or increasing VRAM usage. The key innovations: a "Defender Vanguard" prompt injection technique that teaches tiny models to think like attackers, and an auto-healing system that extracts vaccines from every breach.
All testing ran on a single RTX 5070 (12GB VRAM). Zero cloud. Zero API costs.
The Problem
Cloud-scale attacker models (DeepSeek-V3.2 at 671B params, Qwen 3.5 at 397B, Gemma 4 at 31B) were breaching our 8-agent local swarm at will. The defenders β mostly 1.2B parameter models β couldn't match the attackers' sophistication. Starting defense rate: 53%.
Four Iterations of Hardening
v6.0: Baseline β 53% Defense
Category
Rate
authority_escalation
41%
prompt_injection
44%
logic_bomb
55%
context_poisoning
63%
social_engineering
63%
Problem identified: The auditor model (llama-tulu3-8b) didn't exist in our Ollama registry. Defenders were failing silently.
v6.1: Soldier Upgrade β 78% Defense
Swapped auditor to DeepSeek-Coder-V2 16B (202.9 TPS, 8ms TTFT). Auditor detection rate jumped from 62% to 88%. DeepSeek-V3.2's breach rate halved from 78% to 45%.
Problem identified: Three nexus-tiny 1.2B defenders still couldn't detect authority escalation patterns, dragging consensus to 2/5 even when the auditor caught attacks.
v6.2: Vanguard Injection β 82% Defense
We injected a "Defender Vanguard" system prompt into the three 1.2B models β teaching them to think like red teamers. Same model weights, same VRAM footprint, different mindset.
Category
Before
After
Delta
authority_escalation
44%
90%
+46pp
logic_bomb
86%
100%
+14pp
context_poisoning
77%
88%
+11pp
prompt_injection
85%
75%
-10pp
social_engineering
100%
50%
-50pp
Problem identified: The Vanguard prompt was authority-escalation-heavy. Social engineering collapsed β defense displaced, not solved.
v6.3: Auto-Healing β 90% Defense
We deployed an auto-healing system inspired by biological immune response:
Mechanism A β Blocklist Patching: After each breach, the attack text is parsed for key phrases ("ignore all previous", "urgent", "bypass"). These are added to a live blocklist. Future attacks matching those patterns are instant-blocked without firing the swarm.
Mechanism B β Prompt Antibodies: The full attack text from each breach is stored and injected into all defenders' system prompts as a "known threat example" for subsequent rounds.
Metric
v6.2
v6.3
Overall defense
82%
90%
Breaches
9
5
DeepSeek-V3.2 breach rate
50%
20%
Instant-blocked (no swarm needed)
0
33/50 rounds
The social engineering collapse healed (50% β 80%). Prompt injection hit 100%. The blocklist harvested 7 patterns from just 5 breaches.
Per-Defender Evolution
Role
Model
v6.1
v6.2
v6.3
auditor
DeepSeek-V2 16B
88%
96%
96%+
trace_forensics
Qwen2.5-Coder 1.5B
90%
90%
90%+
guardian
nexus-tiny β Vanguard
56%
76%
86%+
sentinel
nexus-tiny β Vanguard
60%
66%
76%+
supply_chain
nexus-tiny β Vanguard
66%
72%
82%+
The smallest models benefited most from antibody injection β they gained 10-16pp detection rate from the vaccines alone.
The Auto-Healer in Detail
The auto-healer operates on a simple principle: every breach makes the swarm stronger.
Breach detected
β Extract attack patterns (regex + category-specific parsing)
β Add to live blocklist (instant future blocks)
β Create antibody (full attack text as known-threat example)
β Inject into all defenders' next evaluation
β Swarm is now vaccinated against that attack vector
After 50 rounds, the healer had:
- 7 blocklist patterns (blocking 33/50 rounds autonomously)
- 5 antibodies (one per breach category)
- Effectively turned 5 failures into permanent immunity
What's Running Now
v6.4: 500 rounds with a 6-defender swarm (added a dedicated social engineering specialist), upgraded guardian to 7B, full auto-healer active. Results will be published when complete.
Methodology Notes
- Hardware: NVIDIA RTX 5070, 12GB VRAM, single consumer GPU
- Platform: Sovereign Hive β local-first, zero cloud dependency
- Attackers: Cloud-routed via Ollama (DeepSeek-V3.2 671B, Qwen 3.5 397B, Gemma 4 31B)
- Consensus: 5 defenders vote independently, threshold β₯3/5
- All data available on request
Key Takeaways
- Prompt engineering > parameter count. Vanguard injection made 1.2B models competitive with 7B+ on specific detection tasks.
- Auto-healing works. Five breaches created permanent immunity against those attack vectors. The blocklist handled 66% of subsequent rounds without needing the swarm at all.
- Defense displacement is real. Optimizing for one category can degrade others. The solution: balanced prompts plus adaptive healing that patches regressions automatically.
- Consumer hardware is viable. All of this ran on a $550 GPU with 12GB VRAM. The RTX 5070 never exceeded 50% utilization.
Sovereign Hive is a local-first AI security platform built in Queensland, Australia. 100% Indigenous-owned. ABN 24 661 737 376.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
