Back to KB
Difficulty
Intermediate
Read Time
5 min

How to Test LLM Agents Without Calling the Real API

By Codcompass Team··5 min read

Testing an LLM agent in CI is annoying. The API costs money on every run. Rate limits bite you at the worst time. And the outputs are nondeterministic, so a naive assertion will flake 20% of the time regardless of whether your code is correct.

Most teams solve this by not testing. They run the agent manually, confirm it looks right, and push. That works until it doesn't. A changed prompt, a new tool, a slightly different system message, and you get a regression you only find in production.

There is a better approach. You do not need the real API in CI. You need three patterns: a FakeProvider for unit tests, agentsnap for regression snapshots, and module-level mocks for integration tests. Each has a job. None of them replaces the others.

The FakeProvider Pattern

A FakeProvider is a drop-in replacement for your LLM client that returns canned responses. No HTTP. No token spend. Deterministic output every time.

# fake_provider.py
from typing import Iterator

class FakeMessage:
    def __init__(self, content: str):
        self.content = content
        self.tool_calls = []

class FakeChoice:
    def __init__(self, content: str):
        self.message = FakeMessage(content)

class FakeCompletion:
    def __init__(self, content: str):
        self.choices = [FakeChoice(content)]

class FakeProvider:
    """Drop-in stub for an OpenAI-compatible client."""

    def __init__(self, responses: list[str]):
        self._responses = list(responses)
        self._call_count = 0
        self.calls: list[dict] = []

    @property
    def chat(self):
        return self

    @property
    def completions(self):
        return self

    def create(self, **kwargs) -> FakeCompletion:
        self.calls.append(kwargs)
        if not self._responses:
            raise ValueError("FakeProvider ran out of responses")
        response = self._responses[self._call_count % len(self._responses)]
        self._call_count += 1
        return FakeCompletion(response)


# tests/test_summarizer.py
from fake_provider import FakeProvider
from myagent.summarizer import summarize_document

def test_summarizer_trims_long_output():
    provider = FakeProvider([
        "Here is the summary: " + "word " * 300,  # 300-word stub response
    ])
    result = summarize_document("some long text", client=provider, max_wo

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back