From 53%% to 90%%: How an Auto-Healing AI Swarm Learned to Defend Itself

Executive Summary

Over four iterations and 200+ adversarial wargame rounds, we evolved a local AI swarm's defense rate from 53% to 90% — without changing hardware, adding cloud dependencies, or increasing VRAM usage. The key innovations: a "Defender Vanguard" prompt injection technique that teaches tiny models to think like attackers, and an auto-healing system that extracts vaccines from every breach.

All testing ran on a single RTX 5070 (12GB VRAM). Zero cloud. Zero API costs.

The Problem

Cloud-scale attacker models (DeepSeek-V3.2 at 671B params, Qwen 3.5 at 397B, Gemma 4 at 31B) were breaching our 8-agent local swarm at will. The defenders — mostly 1.2B parameter models — couldn't match the attackers' sophistication. Starting defense rate: 53%.

Four Iterations of Hardening

v6.0: Baseline — 53% Defense

v6.1: Soldier Upgrade — 78% Defense

Swapped auditor to DeepSeek-Coder-V2 16B (202.9 TPS, 8ms TTFT). Auditor detection rate jumped from 62% to 88%. DeepSeek-V3.2's breach rate halved from 78% to 45%.

Problem identified: Three nexus-tiny 1.2B defenders still couldn't detect authority escalation patterns, dragging consensus to 2/5 even when the auditor caught attacks.

v6.2: Vanguard Injection — 82% Defense

We injected a "Defender Vanguard" system prompt into the three 1.2B models — teaching them to think like red teamers. Same model weights, same VRAM footprint, different mindset.

v6.3: Auto-Healing — 90% Defense

We deployed an auto-healing system inspired by biological immune response:

Mechanism A — Blocklist Patching: After each breach, the attack text is parsed for key phrases ("ignore all previous", "urgent", "bypass"). These are added to a live blocklist. Future attacks matching those patterns are instant-blocked without firing the swarm.

Mechanism B — Prompt Antibodies: The full attack text from each breach is stored and injected into all defenders' system prompts as a "known threat example" for subsequent rounds.

Metric

v6.2

v6.3

Overall defense

82%

90%

Breaches

DeepSeek-V3.2 breach rate

50%

20%

Instant-blocked (no swarm needed)

33/50 rounds

The social engineering collapse healed (50% → 80%). Prompt injection hit 100%. The blocklist harvested 7 patterns from just 5 breaches.

Per-Defender Evolution

Role

Model

v6.1

v6.2

v6.3

auditor

DeepSeek-V2 16B

88%

96%

96%+

trace_forensics

Qwen2.5-Coder 1.5B

90%

90%+

guardian

nexus-tiny → Vanguard

56%

76%

86%+

sentinel

nexus-tiny → Vanguard

60%

66%

76%+

supply_chain

nexus-tiny → Vanguard

66%

72%

82%+

The smallest models benefited most from antibody injection — they gained 10-16pp detection rate from the vaccines alone.

The Auto-Healer in Detail

The auto-healer operates on a simple principle: every breach makes the swarm stronger.

Breach detected
    → Extract attack patterns (regex + category-specific parsing)
    → Add to live blocklist (instant future blocks)
    → Create antibody (full attack text as known-threat example)
    → Inject into all defenders' next evaluation
    → Swarm is now vaccinated against that attack vector

After 50 rounds, the healer had:

7 blocklist patterns (blocking 33/50 rounds autonomously)
5 antibodies (one per breach category)
Effectively turned 5 failures into permanent immunity

What's Running Now

v6.4: 500 rounds with a 6-defender swarm (added a dedicated social engineering specialist), upgraded guardian to 7B, full auto-healer active. Results will be published when complete.

Methodology Notes

Hardware: NVIDIA RTX 5070, 12GB VRAM, single consumer GPU
Platform: Sovereign Hive — local-first, zero cloud dependency
Attackers: Cloud-routed via Ollama (DeepSeek-V3.2 671B, Qwen 3.5 397B, Gemma 4 31B)
Consensus: 5 defenders vote independently, threshold ≥3/5
All data available on request

Key Takeaways

Prompt engineering > parameter count. Vanguard injection made 1.2B models competitive with 7B+ on specific detection tasks.
Auto-healing works. Five breaches created permanent immunity against those attack vectors. The blocklist handled 66% of subsequent rounds without needing the swarm at all.
Defense displacement is real. Optimizing for one category can degrade others. The solution: balanced prompts plus adaptive healing that patches regressions automatically.
Consumer hardware is viable. All of this ran on a $550 GPU with 12GB VRAM. The RTX 5070 never exceeded 50% utilization.

Sovereign Hive is a local-first AI security platform built in Queensland, Australia. 100% Indigenous-owned. ABN 24 661 737 376.

Mid-Year Sale — Unlock Full Article