Difficulty

Intermediate

Read Time

11 min

Demystifying AI Agents: Building an Agentic Pipeline From Scratch in Pure Python

By Codcompass Team·2026-05-21·11 min read

Building Autonomous AI Workflows: A Low-Level Architecture Guide

Current Situation Analysis

The rapid proliferation of high-level AI orchestration frameworks has created a significant knowledge gap in the engineering community. Tools like LangChain, CrewAI, and Microsoft AutoGen allow developers to instantiate complex "AI agents" with minimal boilerplate. While this accelerates prototyping, it obscures the fundamental runtime mechanics that drive these systems.

Many developers can assemble an agent pipeline using abstractions without understanding the underlying execution loop. This leads to fragile implementations where debugging becomes difficult because the "magic" inside the framework hides state management, context window limits, and tool invocation logic.

The reality is that modern agentic systems are not mystical black boxes. They are deterministic state machines built on a small set of primitives:

Prompt Orchestration: Managing the flow of instructions and context.
Stateful Memory: Persisting conversation history and intermediate results.
Tool Execution: Bridging the model's text output with executable code.
Control Loops: The iterative cycle of reasoning and acting.
Structured Outputs: Parsing model responses into actionable data formats.

Understanding these primitives is essential for building production-grade systems that are reliable, observable, and cost-effective.

WOW Moment: Key Findings

The distinction between a standard LLM interaction and an agentic workflow is often misunderstood. A standard interaction is a single-shot transaction, whereas an agent operates within a continuous feedback loop.

The following comparison highlights the architectural differences between a direct API call and a fully realized agentic pipeline.

Feature	Standard LLM Interaction	Agentic Pipeline
Execution Model	Single-shot request/response	Iterative loop (Think → Act → Observe)
State Management	Stateless (requires full history resend)	Stateful (manages context window and memory)
Tool Usage	None (relies on training data)	Dynamic (calls external functions/APIs based on need)
Output Format	Unstructured text	Structured JSON or natural language
Control Flow	Linear	Non-linear (determined by model decisions)
Error Handling	API-level errors only	Application-level retries and fallbacks

Why this matters: Recognizing that an agent is simply a control loop wrapping an LLM allows engineers to build custom solutions without heavy dependencies. It enables precise control over context window usage, custom tool routing, and deterministic error recovery strategies that generic frameworks often obscure.

Core Solution

We will construct a production-inspired agentic pipeline using pure Python. This implementation avoids heavy SDKs and orchestration libraries, focusing instead on the core mechanics: HTTP communication, memory management, and the execution loop.

Architecture Overview

The system is divided into four distinct modules to mirror production separation of concerns:

Configuration: Externalizes runtime parameters.
Infrastructure Layer: Handles raw HTTP communication with the LLM provider.
Memory Manager: Maintains state and manages the sliding context window.
Agent Engine: Orchestrates the loop, tool registration, and execution.

Step 1: Configuration Management

Hardcoding credentials and model parameters is an anti-pattern. We use a JSON configuration file to externalize these settings.

config.json

{
  "llm": {
    "provider": "openai",
    "model": "gpt-4o",
    "api_key": "sk-your-api-key",
    "temperature": 0.2,
    "max_tokens": 1024
  }
}

Production Note: In a real environment, the api_key should be injected via environment variables or a secrets manager (e.g., AWS Secrets Manager, HashiCorp Vault) rather than stored in a static file.

Step 2: The Infrastructure Layer

Every LLM interaction is fundamentally an HTTP request. High-level SDKs abstract this away, but understanding the raw payload structure is crucial for debugging and optimization.

llm_client.py This module handles serialization, request transmission, and response parsing.

import json
import urllib.request
import urllib.error
from typing import Dict, List

class LLMClient:
    """
    Low-level client for interacting with OpenAI-compatible chat completion endpoints.
    Handles payload serialization and HTTP transport.
    """
    def __init__(self, config: Dict):
        self.config = config["llm"]
        self.api_key = self.config["api_key"]

    def chat_completion(
        self,
        messages: List[Dict],
        temperature: float = None
    ) -> str:
        """
        Sends a chat completion request to the LLM provider.
        
        Args:
            messages: List of message dictionaries (role, content).
            temperature: Sampling temperature override.

    Returns:
        The content string from the model's response.
    """
    payload = {
        "model": self.config["model"],
        "messages": messages,
        "temperature": temperature or self.config.get("temperature", 0.2),
        "max_tokens": self.config.get("max_tokens", 1024)
    }

    data = json.dumps(payload).encode("utf-8")

    req = urllib.request.Request(
        "https://api.openai.com/v1/chat/completions",
        data=data,
        method="POST"
    )

    req.add_header("Content-Type", "application/json")
    req.add_header("Authorization", f"Bearer {self.api_key}")

    try:
        with urllib.request.urlopen(req) as response:
            result = json.loads(response.read().decode())
            return result["choices"][0]["message"]["content"].strip()
    except urllib.error.HTTPError as e:
        error_body = e.read().decode()
        raise Exception(f"LLM API error: {e.code} - {error_body}")


**Rationale:**
*   **No SDK Dependency:** Using `urllib` removes external dependencies, reducing the attack surface and installation footprint.
*   **Explicit Payload Construction:** We manually construct the JSON payload, giving full control over the request structure.
*   **Error Propagation:** HTTP errors are caught and re-raised with context, allowing the agent engine to handle retries or fallbacks.

### Step 3: Stateful Memory Management

LLMs are stateless; they do not retain information between requests. To maintain context, we must resend the conversation history with every turn. However, context windows have limits. We implement a sliding window strategy to manage memory efficiently.

**`memory.py`**

```python
from typing import List, Dict

class AgentMemory:
    """
    Manages the conversation history and implements a sliding window
    to keep the context within token limits.
    """
    def __init__(self, max_messages: int = 20):
        self.messages: List[Dict] = []
        self.max_messages = max_messages

    def add(self, role: str, content: str):
        """
        Appends a new message to the history.
        If the limit is exceeded, oldest messages are pruned,
        preserving the system prompt.
        """
        self.messages.append({
            "role": role,
            "content": content
        })

        if len(self.messages) > self.max_messages:
            # Preserve the system prompt at index 0
            system_prompt = self.messages[0]
            
            # Keep the most recent messages
            active_history = self.messages[1:]
            self.messages = (
                [system_prompt] + 
                active_history[-(self.max_messages - 1):]
            )

    def get_messages(self) -> List[Dict]:
        """Returns a copy of the current message history."""
        return self.messages.copy()

    def clear(self):
        """Resets the memory state."""
        self.messages.clear()

Rationale:

Sliding Window: This prevents the context window from overflowing as the agent performs multiple tool calls.
System Prompt Preservation: The system instructions are always retained, ensuring the agent maintains its persona and constraints.
Immutability: get_messages returns a copy to prevent accidental modification of the internal state by external components.

Step 4: The Agent Engine

The agent engine is the orchestrator. It manages the control loop, registers tools, parses structured outputs, and executes functions.

agent.py

from llm_client import LLMClient
from memory import AgentMemory
from typing import Dict, Callable
import json

class Agent:
    """
    Orchestrates the agentic workflow: Think -> Act -> Observe.
    """
    def __init__(self, system_prompt: str, config_path: str = "config.json"):
        with open(config_path) as f:
            self.config = json.load(f)
        
        self.llm = LLMClient(self.config)
        self.memory = AgentMemory()
        self.system_prompt = system_prompt
        self.tools: Dict[str, dict] = {}

        # Initialize memory with system instructions
        self.memory.add("system", system_prompt)

    def register_tool(self, name: str, func: Callable, description: str):
        """
        Registers a callable function as a tool available to the agent.
        
        Args:
            name: The identifier for the tool.
            func: The Python callable to execute.
            description: Natural language description for the model.
        """
        self.tools[name] = {
            "func": func,
            "description": description
        }

    def _get_tool_descriptions(self) -> str:
        """Formats registered tools into a prompt-friendly string."""
        if not self.tools:
            return "No tools available."
        return "\n".join([
            f"- {name}: {info['description']}"
            for name, info in self.tools.items()
        ])

    def think(self, user_input: str) -> str:
        """
        Sends the current context to the LLM and retrieves a response.
        Enhances the prompt with tool descriptions if available.
        """
        self.memory.add("user", user_input)
        messages = self.memory.get_messages()
        tool_info = self._get_tool_descriptions()

        if self.tools:
            # Inject tool instructions into the user prompt
            enhanced_content = (
                f"{user_input}\n\n"
                f"AVAILABLE TOOLS:\n"
                f"{tool_info}\n\n"
                f"If you need a tool, respond ONLY with JSON:\n"
                f'{{"tool":"tool_name","args":{{}}}}\n\n'
                f"If the task is complete, respond naturally and include 'FINAL ANSWER'."
            )
            messages[-1]["content"] = enhanced_content

        response = self.llm.chat_completion(messages)
        self.memory.add("assistant", response)
        return response

    def act(self, response: str):
        """
        Parses the LLM response for tool calls and executes them.
        Returns the result of the tool execution or None.
        """
        if "{" in response and "}" in response:
            try:
                # Extract JSON block from response
                start = response.find("{")
                end = response.rfind("}") + 1
                tool_json = json.loads(response[start:end])

                tool_name = tool_json.get("tool")
                args = tool_json.get("args", {})

                if tool_name in self.tools:
                    # Execute the registered function
                    result = self.tools[tool_name]["func"](**args)
                    
                    # Feed observation back into memory
                    self.memory.add(
                        "system", 
                        f"Observation from '{tool_name}': {result}"
                    )
                    return result
            except Exception as e:
                error_msg = f"Tool execution failed: {str(e)}"
                self.memory.add("system", error_msg)
                return error_msg
        return None

Rationale:

Tool Registration: Tools are registered as Python callables with descriptions. This decouples the agent logic from specific tool implementations.
Prompt Injection: Tool descriptions are injected into the user prompt dynamically. This informs the model of its capabilities without modifying the system prompt.
Structured Output Parsing: The agent looks for JSON blocks in the response. This is a simple but effective way to handle tool calls without relying on function calling APIs.
Observation Loop: The result of a tool execution is added to memory as a "system" message, allowing the model to see the output and decide the next step.

Step 5: The Execution Loop

The main entry point ties everything together. It implements the iterative loop that drives the agent.

main.py

from agent import Agent

def get_current_weather(location: str) -> str:
    """Simulates a weather API call."""
    # In production, this would call a real weather service
    return f"The weather in {location} is sunny and 72°F."

def main():
    # Initialize the agent with system instructions
    system_prompt = (
        "You are a helpful assistant. "
        "Use tools when necessary to answer questions. "
        "Always provide a final answer when the task is complete."
    )
    
    agent = Agent(system_prompt=system_prompt)
    
    # Register tools
    agent.register_tool(
        name="get_weather",
        func=get_current_weather,
        description="Retrieves the current weather for a given location."
    )

    # User input
    user_query = "What is the weather like in San Francisco?"
    
    # Execution Loop
    max_iterations = 5
    iteration = 0
    
    print(f"User: {user_query}")
    
    while iteration < max_iterations:
        # 1. THINK: Get response from LLM
        response = agent.think(user_query)
        print(f"Agent: {response}")
        
        # 2. ACT: Execute tool if requested
        result = agent.act(response)
        
        # 3. OBSERVE: If tool was executed, loop continues
        # If no tool was executed, check for final answer
        if result is None:
            if "FINAL ANSWER" in response:
                print("Task complete.")
                break
            else:
                # If no tool and no final answer, assume task done
                print("Task complete.")
                break
        
        # Update user_query to empty string for subsequent turns
        # The agent relies on memory for context
        user_query = ""
        iteration += 1

if __name__ == "__main__":
    main()

Rationale:

Iterative Control: The loop continues until a final answer is detected or a maximum iteration count is reached.
Context Persistence: After the first turn, user_query is set to an empty string. The agent relies on the memory history to maintain context, avoiding redundant input.
Termination Condition: The loop checks for "FINAL ANSWER" in the response to determine when to stop.

Pitfall Guide

Building agentic systems from scratch exposes several common pitfalls. Understanding these is critical for production reliability.

Context Window Overflow
- Explanation: As the agent performs multiple tool calls, the conversation history grows. If not managed, it will exceed the model's context window, causing errors or truncation.
- Fix: Implement a sliding window strategy (as shown in AgentMemory) to prune older messages while preserving the system prompt.
Unstructured Tool Outputs
- Explanation: If tools return unstructured or verbose output, it can confuse the model or consume excessive tokens.
- Fix: Ensure tools return concise, structured data (e.g., JSON or short strings). Sanitize outputs before adding them to memory.
Infinite Loops
- Explanation: The agent may get stuck in a loop, repeatedly calling the same tool or failing to reach a conclusion.
- Fix: Implement a maximum iteration limit and monitor for repetitive patterns. Add a timeout mechanism if necessary.
Tool Execution Errors
- Explanation: Tools may fail due to network issues, invalid arguments, or internal errors. If not handled, the agent may crash or produce incorrect responses.
- Fix: Wrap tool execution in try-except blocks. Log errors and feed them back to the model as observations so it can retry or adjust its strategy.
Prompt Injection
- Explanation: Malicious user input could manipulate the agent's behavior by injecting commands into the prompt.
- Fix: Sanitize user inputs and validate tool arguments. Use system prompts to enforce constraints and restrict the agent's capabilities.
Cost Management
- Explanation: Agentic workflows can consume significant tokens, especially with multiple tool calls and iterations.
- Fix: Monitor token usage and optimize prompts. Use cheaper models for simpler tasks and reserve expensive models for complex reasoning.
State Inconsistency
- Explanation: If the memory state is not properly synchronized with the agent's actions, the model may make decisions based on outdated information.
- Fix: Ensure that all state changes are immediately reflected in the memory. Use atomic operations for updates where possible.

Production Bundle

Action Checklist

Define System Prompt: Clearly articulate the agent's role, constraints, and available tools.
Implement Memory Management: Use a sliding window to manage context and prevent overflow.
Register Tools: Define tools with clear descriptions and robust error handling.
Parse Structured Outputs: Ensure the agent can reliably extract tool calls from model responses.
Implement Control Loop: Create an iterative loop with termination conditions and iteration limits.
Handle Errors: Add try-except blocks around tool execution and API calls.
Monitor Token Usage: Track token consumption to manage costs and optimize performance.
Test Thoroughly: Validate the agent's behavior with various inputs and edge cases.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Simple Q&A	Direct LLM Call	No tool usage needed; lower latency and cost.	Low
Data Retrieval	Agentic Pipeline with Tools	Requires dynamic data fetching; tools enable real-time access.	Medium
Complex Reasoning	Agentic Pipeline with Memory	Iterative loop allows for multi-step reasoning and state management.	High
High Volume	Optimized Prompting + Caching	Reduces redundant API calls; improves throughput.	Low
Critical Tasks	Human-in-the-Loop	Ensures accuracy and safety for high-stakes decisions.	Medium

Configuration Template

config.json

{
  "llm": {
    "provider": "openai",
    "model": "gpt-4o",
    "api_key": "sk-your-api-key",
    "temperature": 0.2,
    "max_tokens": 1024
  },
  "agent": {
    "max_iterations": 5,
    "max_messages": 20
  }
}

Quick Start Guide

Install Dependencies: Ensure Python 3.8+ is installed. No external packages are required for this implementation.
Create Configuration: Set up config.json with your LLM provider details and API key.
Define Tools: Implement the functions you want the agent to use (e.g., get_current_weather).
Initialize Agent: Create an instance of the Agent class with a system prompt.
Run Execution Loop: Call the main function to start the agent and interact with it via the console.

This guide provides a foundational understanding of agentic systems and a practical implementation for building custom AI workflows. By stripping away abstractions, you gain full control over the runtime behavior, enabling more robust and efficient solutions.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back