Back to KB
Difficulty
Advanced
Read Time
53 min

I read a multi-agent reasoning paper, built the Claude-native version, and measured everything

By Codcompass Team··53 min read

Relay architecture: Planner-Critic-Solver loop, agents share extended thinking blocksRecursiveMAS (arXiv 2604.25917) showed that agents sharing internal reasoning state outperform agents that share only final outputs. The average accuracy gain across benchmarks was 8.3 points. The mechanism: each agent passes not just its answer but the latent embeddings from its own reasoning process, and the next agent conditions on both. The paper is a good result.

The catch is access. RecursiveMAS requires open-weight models with hidden states exposed at inference time. That rules out Claude, GPT-4o, and Gemini. I built a Claude-native version using the Anthropic extended thinking API. The core idea transfers: instead of passing latent vectors, pass the full thinking text. The paper calls it internal state sharing; the Claude version calls it thinking-block relay.

The architecture problem

Claude's extended thinking blocks carry an encrypted signature tied to the originating conversation. You cannot pass a signed thinking block into a different agent's messages array. The API rejects it. The workaround: extract the text from the thinking block and inject it as a regular user message.

# Extract thinking text from Agent 1
thinking_text = next(
    (b.thinking for b in response.content if b.type == "thinking"), ""
)

# Inject into Agent 2 as regular context, not as a thinking block
context = f"Prior agent reasoning:\n{thinking_text}"

Enter fullscreen mode Exit fullscreen mode

The signature does not transfer. The reasoning does.

relay-structured: what I built first

The first architecture was a Planner > Critic > Solver loop where each agent emits a compact mental model JSON instead of raw thinking text. Raw thinking at a 1024-token budget is often compressed and fragmented. The hypothesis was that 150 tokens of structured signal carries more information per token than 1024 tokens of compressed prose.

The schema each agent emits:

{
  "interpretation": "how the agent read the problem",
  "key_steps": ["step 1", "step 2"],
  "rejected_approaches": ["approach tried and discarded"],
  "confidence": 0.85,
  "potenti

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back