mport BaseModel, Field
from typing import Optional
class MarketQuerySchema(BaseModel):
ticker: str = Field(
description="Stock ticker symbol. Must be 1-5 uppercase alphabetic characters (e.g., 'AAPL', 'TSLA')."
)
metric: str = Field(
description="Data point to retrieve. Allowed values: 'price', 'volume', 'pe_ratio'."
)
class MarketDataFetcher:
"""Retrieves real-time market metrics for publicly traded equities.
Usage Guidelines:
- Always validate ticker format before calling.
- Returns structured JSON with price, currency, and timestamp.
- Returns explicit error codes for invalid inputs or rate limits.
Examples:
Success: {"ticker": "NVDA", "metric": "price"} -> {"status": "ok", "data": {...}}
Failure: {"ticker": "INVALID!", "metric": "price"} -> {"status": "error", "code": "INVALID_TICKER", ...}
"""
**Rationale:** LLMs treat docstrings as executable specifications. Including explicit examples for both success and failure paths trains the model to anticipate error branches, reducing speculative retries.
### Step 2: Strict Input Validation via Pydantic
Never trust the LLM's output format. Language models are probabilistic; they will occasionally generate malformed strings, out-of-range numbers, or unexpected types. Pydantic's `BaseModel` acts as a deterministic gatekeeper, enforcing type coercion and constraint validation before business logic executes.
```python
def execute(self, query: MarketQuerySchema) -> dict:
# Validation occurs automatically during instantiation
# Pydantic handles type coercion (e.g., "100" -> 100) and constraint checking
normalized_ticker = query.ticker.strip().upper()
# Business logic only runs after validation passes
if normalized_ticker not in self._supported_tickers:
return self._build_error("UNSUPPORTED_TICKER", f"Ticker {normalized_ticker} not in registry.")
if query.metric not in self._allowed_metrics:
return self._build_error("INVALID_METRIC", f"Metric must be one of: {', '.join(self._allowed_metrics)}")
return self._fetch_market_data(normalized_ticker, query.metric)
Rationale: Separating validation from execution prevents business logic from handling edge cases. Pydantic's field validators run synchronously during schema instantiation, guaranteeing that downstream code only receives sanitized, type-safe inputs.
Step 3: Security Sandboxing and Defense-in-Depth
Tools that interact with file systems, databases, or external APIs must enforce strict boundaries. Prompt injection can manipulate an agent into passing malicious payloads. Defense requires multiple validation layers that operate independently.
def _validate_file_access(self, resource_path: str) -> Optional[str]:
# Layer 1: Fast string rejection for traversal patterns
if ".." in resource_path or any(c in resource_path for c in ["\\", "/", "~"]):
return "ACCESS_DENIED: Path traversal characters detected."
# Layer 2: Whitelist format enforcement
import re
if not re.match(r"^[a-zA-Z0-9_\-\.]+$", resource_path):
return "ACCESS_DENIED: Filename contains invalid characters."
# Layer 3: Physical path resolution to prevent symlink bypasses
from pathlib import Path
sandbox_root = Path("/var/agent/sandbox").resolve()
target_path = (sandbox_root / resource_path).resolve()
if not str(target_path).startswith(str(sandbox_root)):
return "ACCESS_DENIED: Resolved path escapes sandbox boundary."
return None
Rationale: Single-layer validation is easily bypassed. Layer 1 catches obvious attacks quickly. Layer 2 enforces strict character sets. Layer 3 resolves symlinks and relative paths to verify the actual filesystem location, closing the most common privilege escalation vectors.
Step 4: Error Normalization for Agent Consumption
Raw exceptions are useless to an LLM. Tools must return structured error envelopes that the agent can parse and act upon. This includes machine-readable codes, human-readable messages, and optional recovery hints.
def _build_error(self, code: str, message: str, hint: Optional[str] = None) -> dict:
return {
"status": "error",
"code": code,
"message": message,
"recovery_hint": hint or "Verify input parameters and retry.",
"timestamp": datetime.utcnow().isoformat()
}
Rationale: Structured errors enable the agent to distinguish between transient failures (retry), invalid inputs (correct parameters), and hard limits (abort). This reduces token waste and improves final response accuracy.
Pitfall Guide
1. Assuming Framework Fault Tolerance Equals Good UX
Explanation: Orchestration frameworks catch unhandled exceptions and feed them back to the LLM. Developers mistake this for resilience, but raw stack traces force the model to guess root causes, increasing retry loops and token costs.
Fix: Implement explicit error handling that returns structured payloads. Never let uncaught exceptions bubble up to the framework layer.
2. Relying on Prompt Constraints for Validation
Explanation: Prompt engineering can suggest format requirements, but LLMs are probabilistic. They will occasionally ignore constraints, especially under complex reasoning loads.
Fix: Treat prompts as suggestions and code as law. Enforce all constraints programmatically using schema validation before execution.
3. Returning Raw Exceptions to the Agent
Explanation: Python ValueError, KeyError, or ConnectionError objects contain technical details irrelevant to the LLM's decision-making process. They pollute context windows and degrade response quality.
Fix: Wrap all exceptions in a standardized error envelope with a machine-readable code and actionable message.
Explanation: Agents may retry tool calls due to timeouts or ambiguous responses. Without idempotency checks, duplicate calls can trigger duplicate charges, double database writes, or inconsistent state.
Fix: Implement idempotency keys or deterministic hashing of input parameters. Cache results for identical requests within a sliding window.
5. Overlooking Symlink/Path Resolution Bypasses
Explanation: String-level path validation can be defeated by symbolic links or mount points that resolve outside intended directories.
Fix: Always resolve paths to their absolute physical location using Path.resolve() and verify the result stays within the allowed boundary.
6. Hardcoding Rate Limits Without Telemetry
Explanation: Static rate limits (e.g., 10 calls/minute) fail under variable load patterns. Without observability, you cannot distinguish between legitimate spikes and abuse.
Fix: Implement token bucket or sliding window algorithms with metrics export. Log throttle events and adjust limits dynamically based on upstream API health.
7. Missing Explicit Failure Examples in Documentation
Explanation: LLMs learn by pattern matching. If docstrings only show success cases, the model assumes failure is impossible or handles it generically.
Fix: Include explicit failure examples in tool documentation. Show the exact error structure the agent should expect and how to interpret it.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single-parameter utility tool | Inline validation with type hints | Low overhead, sufficient for simple constraints | Minimal |
| Multi-parameter business tool | Pydantic schema with field validators | Enforces complex constraints, auto-coerces types, separates concerns | Moderate (schema compilation) |
| File system / database access | Defense-in-depth sandboxing + parameterized queries | Prevents traversal, injection, and privilege escalation | Low (runtime checks) |
| High-frequency API calls | Token bucket rate limiter + circuit breaker | Prevents upstream exhaustion, enables graceful degradation | Low (memory for state) |
| State-mutating operations | Idempotency keys + deterministic input hashing | Prevents duplicate execution on agent retries | Low (cache/storage) |
Configuration Template
from pydantic import BaseModel, Field, field_validator
from typing import Optional
import re
from pathlib import Path
import time
class ExecutionConfig(BaseModel):
max_retries: int = Field(default=3, ge=1, le=10)
timeout_seconds: float = Field(default=15.0, gt=0)
sandbox_root: Path = Field(default=Path("/var/agent/sandbox"))
class SecureToolInterface:
def __init__(self, config: ExecutionConfig):
self.config = config
self._call_timestamps: list[float] = []
self._rate_limit = 20 # calls per window
self._rate_window = 60 # seconds
def _check_rate_limit(self) -> bool:
now = time.time()
self._call_timestamps = [t for t in self._call_timestamps if now - t < self._rate_window]
if len(self._call_timestamps) >= self._rate_limit:
return False
self._call_timestamps.append(now)
return True
def _enforce_security(self, resource: str) -> Optional[str]:
if ".." in resource or any(c in resource for c in ["\\", "/", "~"]):
return "SECURITY_VIOLATION: Traversal pattern detected."
if not re.match(r"^[\w\-\.]+$", resource):
return "SECURITY_VIOLATION: Invalid character set."
target = (self.config.sandbox_root / resource).resolve()
if not str(target).startswith(str(self.config.sandbox_root.resolve())):
return "SECURITY_VIOLATION: Path escapes sandbox."
return None
def _normalize_response(self, success: bool, data: Optional[dict] = None, error_code: Optional[str] = None, message: Optional[str] = None) -> dict:
return {
"status": "success" if success else "error",
"data": data,
"error": {"code": error_code, "message": message} if not success else None,
"metadata": {"timestamp": time.time(), "retries_allowed": self.config.max_retries}
}
Quick Start Guide
- Define your schema: Create a Pydantic
BaseModel with explicit field descriptions, type constraints, and validation rules. This becomes your tool's execution contract.
- Implement security layers: Add string rejection, format whitelisting, and physical path resolution checks. Never trust input format or origin.
- Wrap execution in error normalization: Catch all exceptions, map them to structured error codes, and return consistent payloads that the LLM can parse deterministically.
- Test adversarial inputs: Run your tool against path traversal strings, injection payloads, out-of-range values, and malformed types. Verify that validation catches them before business logic executes.
- Instrument and monitor: Add latency tracking, error rate logging, and throttle metrics. Deploy to staging and observe agent retry patterns before production rollout.