Back to KB
Difficulty
Intermediate
Read Time
10 min

tool-output-format: Render Tool Results as LLM-Friendly Markdown

By Codcompass TeamΒ·Β·10 min read

Optimizing Agent Tool Outputs: A Serialization Strategy for LLM Reasoning

Current Situation Analysis

Agent architectures routinely serialize tool return values into raw JSON and inject them directly into tool_result content blocks. This pattern persists because JSON is the de facto standard for API interoperability. However, treating JSON as a universal serialization format for language models introduces a critical mismatch: LLMs are not optimized for dense key-value parsing. They are attention-based sequence models trained predominantly on natural language, structured documentation, and markdown-formatted technical content.

When an agent returns a payload containing dozens of records, each with multiple fields, the model receives a flat stream of braces, brackets, and quoted strings. The tokenizer breaks this into subword tokens, but the structural boundaries become visually and statistically noisy. In practice, this causes attention degradation. The model skims over repetitive patterns, misses edge-case values, or hallucinates summaries to satisfy downstream reasoning steps. A dataset of 40 records Γ— 12 fields yields 480 discrete key-value pairs. Packed into a single JSON blob, this consumes context tokens while providing minimal structural scaffolding for the model to latch onto.

This problem is frequently overlooked because developers conflate machine readability with model comprehension. JSON is perfectly parseable by code, but language models do not parse; they predict. They rely on consistent delimiters, visual hierarchy, and predictable formatting to allocate attention efficiently. Raw JSON lacks these cues. The industry has largely accepted this friction as a "model limitation" rather than a serialization design flaw.

Empirical evaluations across multiple agent frameworks consistently show that when structured data is converted to markdown tables, bullet lists, or fenced code blocks before injection, field-level extraction accuracy improves by 15–30%, and hallucination rates on numerical or categorical values drop significantly. The model doesn't need less data; it needs better structural signaling.

WOW Moment: Key Findings

The following comparison illustrates the operational impact of switching from raw JSON serialization to type-aware markdown rendering for tool outputs.

ApproachAttention Retention RateToken OverheadReasoning FidelityImplementation Complexity
Raw JSON InjectionLow (dense KV noise)BaselineHigh error rate on large payloadsMinimal (native)
Markdown SerializationHigh (structured delimiters)+8–12% (alignment chars)Consistent field extractionModerate (registry + dispatch)
LLM-Summarized OutputVariable-40–60% (compressed)Loss of granular dataHigh (requires extra API call)

Markdown serialization introduces a modest token overhead due to alignment characters (|, -, indentation), but this cost is offset by dramatically improved reasoning fidelity. The model no longer needs to mentally reconstruct tabular relationships from nested braces. Instead, it reads pre-aligned columns and hierarchical lists, which map directly to how transformers allocate attention across structured text.

This finding enables a critical architectural shift: tool outputs should be treated as a presentation layer for the model, not just a data transport mechanism. By decoupling serialization from transmission, you gain deterministic control over how the model perceives tool results, reducing downstream reasoning failures without increasing API costs or latency.

Core Solution

The solution centers on a type-aware output renderer that intercepts tool return values and converts them into markdown structures optimized for LLM consumption. The architecture relies on three core components: a registry for custom overrides, a type-dispatch router, and deterministic formatting algorithms that require zero external dependencies.

Step 1: Registry and Dispatch Architecture

The renderer maintains a lookup table keyed by tool identifiers. When a tool completes, the system checks the registry first. If a custom formatter exists, it executes. Otherwise, the router inspects the Python type of the return value and delegates to the appropriate built-in renderer.

from typing import Any, Callable, Dict, List, Union
from collections import OrderedDict

class OutputRenderer:
    def __init__(self, max_lines:

πŸŽ‰ Mid-Year Sale β€” Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register β€” Start Free Trial

7-day free trial Β· Cancel anytime Β· 30-day money-back