Returns:
The content string from the model's response.
"""
payload = {
"model": self.config["model"],
"messages": messages,
"temperature": temperature or self.config.get("temperature", 0.2),
"max_tokens": self.config.get("max_tokens", 1024)
}
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(
"https://api.openai.com/v1/chat/completions",
data=data,
method="POST"
)
req.add_header("Content-Type", "application/json")
req.add_header("Authorization", f"Bearer {self.api_key}")
try:
with urllib.request.urlopen(req) as response:
result = json.loads(response.read().decode())
return result["choices"][0]["message"]["content"].strip()
except urllib.error.HTTPError as e:
error_body = e.read().decode()
raise Exception(f"LLM API error: {e.code} - {error_body}")
**Rationale:**
* **No SDK Dependency:** Using `urllib` removes external dependencies, reducing the attack surface and installation footprint.
* **Explicit Payload Construction:** We manually construct the JSON payload, giving full control over the request structure.
* **Error Propagation:** HTTP errors are caught and re-raised with context, allowing the agent engine to handle retries or fallbacks.
### Step 3: Stateful Memory Management
LLMs are stateless; they do not retain information between requests. To maintain context, we must resend the conversation history with every turn. However, context windows have limits. We implement a sliding window strategy to manage memory efficiently.
**`memory.py`**
```python
from typing import List, Dict
class AgentMemory:
"""
Manages the conversation history and implements a sliding window
to keep the context within token limits.
"""
def __init__(self, max_messages: int = 20):
self.messages: List[Dict] = []
self.max_messages = max_messages
def add(self, role: str, content: str):
"""
Appends a new message to the history.
If the limit is exceeded, oldest messages are pruned,
preserving the system prompt.
"""
self.messages.append({
"role": role,
"content": content
})
if len(self.messages) > self.max_messages:
# Preserve the system prompt at index 0
system_prompt = self.messages[0]
# Keep the most recent messages
active_history = self.messages[1:]
self.messages = (
[system_prompt] +
active_history[-(self.max_messages - 1):]
)
def get_messages(self) -> List[Dict]:
"""Returns a copy of the current message history."""
return self.messages.copy()
def clear(self):
"""Resets the memory state."""
self.messages.clear()
Rationale:
- Sliding Window: This prevents the context window from overflowing as the agent performs multiple tool calls.
- System Prompt Preservation: The system instructions are always retained, ensuring the agent maintains its persona and constraints.
- Immutability:
get_messages returns a copy to prevent accidental modification of the internal state by external components.
Step 4: The Agent Engine
The agent engine is the orchestrator. It manages the control loop, registers tools, parses structured outputs, and executes functions.
agent.py
from llm_client import LLMClient
from memory import AgentMemory
from typing import Dict, Callable
import json
class Agent:
"""
Orchestrates the agentic workflow: Think -> Act -> Observe.
"""
def __init__(self, system_prompt: str, config_path: str = "config.json"):
with open(config_path) as f:
self.config = json.load(f)
self.llm = LLMClient(self.config)
self.memory = AgentMemory()
self.system_prompt = system_prompt
self.tools: Dict[str, dict] = {}
# Initialize memory with system instructions
self.memory.add("system", system_prompt)
def register_tool(self, name: str, func: Callable, description: str):
"""
Registers a callable function as a tool available to the agent.
Args:
name: The identifier for the tool.
func: The Python callable to execute.
description: Natural language description for the model.
"""
self.tools[name] = {
"func": func,
"description": description
}
def _get_tool_descriptions(self) -> str:
"""Formats registered tools into a prompt-friendly string."""
if not self.tools:
return "No tools available."
return "\n".join([
f"- {name}: {info['description']}"
for name, info in self.tools.items()
])
def think(self, user_input: str) -> str:
"""
Sends the current context to the LLM and retrieves a response.
Enhances the prompt with tool descriptions if available.
"""
self.memory.add("user", user_input)
messages = self.memory.get_messages()
tool_info = self._get_tool_descriptions()
if self.tools:
# Inject tool instructions into the user prompt
enhanced_content = (
f"{user_input}\n\n"
f"AVAILABLE TOOLS:\n"
f"{tool_info}\n\n"
f"If you need a tool, respond ONLY with JSON:\n"
f'{{"tool":"tool_name","args":{{}}}}\n\n'
f"If the task is complete, respond naturally and include 'FINAL ANSWER'."
)
messages[-1]["content"] = enhanced_content
response = self.llm.chat_completion(messages)
self.memory.add("assistant", response)
return response
def act(self, response: str):
"""
Parses the LLM response for tool calls and executes them.
Returns the result of the tool execution or None.
"""
if "{" in response and "}" in response:
try:
# Extract JSON block from response
start = response.find("{")
end = response.rfind("}") + 1
tool_json = json.loads(response[start:end])
tool_name = tool_json.get("tool")
args = tool_json.get("args", {})
if tool_name in self.tools:
# Execute the registered function
result = self.tools[tool_name]["func"](**args)
# Feed observation back into memory
self.memory.add(
"system",
f"Observation from '{tool_name}': {result}"
)
return result
except Exception as e:
error_msg = f"Tool execution failed: {str(e)}"
self.memory.add("system", error_msg)
return error_msg
return None
Rationale:
- Tool Registration: Tools are registered as Python callables with descriptions. This decouples the agent logic from specific tool implementations.
- Prompt Injection: Tool descriptions are injected into the user prompt dynamically. This informs the model of its capabilities without modifying the system prompt.
- Structured Output Parsing: The agent looks for JSON blocks in the response. This is a simple but effective way to handle tool calls without relying on function calling APIs.
- Observation Loop: The result of a tool execution is added to memory as a "system" message, allowing the model to see the output and decide the next step.
Step 5: The Execution Loop
The main entry point ties everything together. It implements the iterative loop that drives the agent.
main.py
from agent import Agent
def get_current_weather(location: str) -> str:
"""Simulates a weather API call."""
# In production, this would call a real weather service
return f"The weather in {location} is sunny and 72°F."
def main():
# Initialize the agent with system instructions
system_prompt = (
"You are a helpful assistant. "
"Use tools when necessary to answer questions. "
"Always provide a final answer when the task is complete."
)
agent = Agent(system_prompt=system_prompt)
# Register tools
agent.register_tool(
name="get_weather",
func=get_current_weather,
description="Retrieves the current weather for a given location."
)
# User input
user_query = "What is the weather like in San Francisco?"
# Execution Loop
max_iterations = 5
iteration = 0
print(f"User: {user_query}")
while iteration < max_iterations:
# 1. THINK: Get response from LLM
response = agent.think(user_query)
print(f"Agent: {response}")
# 2. ACT: Execute tool if requested
result = agent.act(response)
# 3. OBSERVE: If tool was executed, loop continues
# If no tool was executed, check for final answer
if result is None:
if "FINAL ANSWER" in response:
print("Task complete.")
break
else:
# If no tool and no final answer, assume task done
print("Task complete.")
break
# Update user_query to empty string for subsequent turns
# The agent relies on memory for context
user_query = ""
iteration += 1
if __name__ == "__main__":
main()
Rationale:
- Iterative Control: The loop continues until a final answer is detected or a maximum iteration count is reached.
- Context Persistence: After the first turn,
user_query is set to an empty string. The agent relies on the memory history to maintain context, avoiding redundant input.
- Termination Condition: The loop checks for "FINAL ANSWER" in the response to determine when to stop.
Pitfall Guide
Building agentic systems from scratch exposes several common pitfalls. Understanding these is critical for production reliability.
-
Context Window Overflow
- Explanation: As the agent performs multiple tool calls, the conversation history grows. If not managed, it will exceed the model's context window, causing errors or truncation.
- Fix: Implement a sliding window strategy (as shown in
AgentMemory) to prune older messages while preserving the system prompt.
-
Unstructured Tool Outputs
- Explanation: If tools return unstructured or verbose output, it can confuse the model or consume excessive tokens.
- Fix: Ensure tools return concise, structured data (e.g., JSON or short strings). Sanitize outputs before adding them to memory.
-
Infinite Loops
- Explanation: The agent may get stuck in a loop, repeatedly calling the same tool or failing to reach a conclusion.
- Fix: Implement a maximum iteration limit and monitor for repetitive patterns. Add a timeout mechanism if necessary.
-
Tool Execution Errors
- Explanation: Tools may fail due to network issues, invalid arguments, or internal errors. If not handled, the agent may crash or produce incorrect responses.
- Fix: Wrap tool execution in try-except blocks. Log errors and feed them back to the model as observations so it can retry or adjust its strategy.
-
Prompt Injection
- Explanation: Malicious user input could manipulate the agent's behavior by injecting commands into the prompt.
- Fix: Sanitize user inputs and validate tool arguments. Use system prompts to enforce constraints and restrict the agent's capabilities.
-
Cost Management
- Explanation: Agentic workflows can consume significant tokens, especially with multiple tool calls and iterations.
- Fix: Monitor token usage and optimize prompts. Use cheaper models for simpler tasks and reserve expensive models for complex reasoning.
-
State Inconsistency
- Explanation: If the memory state is not properly synchronized with the agent's actions, the model may make decisions based on outdated information.
- Fix: Ensure that all state changes are immediately reflected in the memory. Use atomic operations for updates where possible.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Simple Q&A | Direct LLM Call | No tool usage needed; lower latency and cost. | Low |
| Data Retrieval | Agentic Pipeline with Tools | Requires dynamic data fetching; tools enable real-time access. | Medium |
| Complex Reasoning | Agentic Pipeline with Memory | Iterative loop allows for multi-step reasoning and state management. | High |
| High Volume | Optimized Prompting + Caching | Reduces redundant API calls; improves throughput. | Low |
| Critical Tasks | Human-in-the-Loop | Ensures accuracy and safety for high-stakes decisions. | Medium |
Configuration Template
config.json
{
"llm": {
"provider": "openai",
"model": "gpt-4o",
"api_key": "sk-your-api-key",
"temperature": 0.2,
"max_tokens": 1024
},
"agent": {
"max_iterations": 5,
"max_messages": 20
}
}
Quick Start Guide
- Install Dependencies: Ensure Python 3.8+ is installed. No external packages are required for this implementation.
- Create Configuration: Set up
config.json with your LLM provider details and API key.
- Define Tools: Implement the functions you want the agent to use (e.g.,
get_current_weather).
- Initialize Agent: Create an instance of the
Agent class with a system prompt.
- Run Execution Loop: Call the main function to start the agent and interact with it via the console.
This guide provides a foundational understanding of agentic systems and a practical implementation for building custom AI workflows. By stripping away abstractions, you gain full control over the runtime behavior, enabling more robust and efficient solutions.