ripts\activate
Install:
pip install anthropic python-dotenv
Create your `.env`:
ANTHROPIC_API_KEY=your-key-here
### 2. Core API Function
The system prompt enforces output constraints to prevent conversational filler.
from dotenv import load_dotenv
from anthropic import Anthropic
load_dotenv()
client = Anthropic()
def generate_docstring(function_code: str) -> str:
"""Generate a docstring for a given Python function."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=(
"You are a Python documentation assistant. "
"When given a Python function, return only a Google-style docstring for it. "
"Include: a one-line summary, Args, Returns, Raises (if applicable), and Example. "
"Return only the docstring text inside triple quotes. No explanation, no extra text."
),
messages=[
{"role": "user", "content": function_code}
]
)
return response.content[0].text
### 3. Validation & Testing
sample_function = """
def calculate_mean(values):
total = sum(values)
return total / len(values)
"""
docstring = generate_docstring(sample_function)
print(docstring)
Output:
"""
Calculate the arithmetic mean of a list of values.
Args:
values: A list of numeric values.
Returns:
The arithmetic mean as a float.
Raises:
ZeroDivisionError: If the input list is empty.
TypeError: If the list contains non-numeric values.
Example:
mean = calculate_mean([1, 2, 3, 4, 5])
# Returns: 3.0
"""
### 4. Docstring Insertion Logic
def insert_docstring(function_code: str, docstring: str) -> str:
"""Insert a generated docstring into a function definition."""
lines = function_code.split("\n")
# Find the line with the function definition
for i, line in enumerate(lines):
if line.strip().startswith("def "):
# Insert the docstring after the def line
indent = " " # Standard 4-space indent
docstring_lines = docstring.strip().split("\n")
indented = [indent + line for line in docstring_lines]
lines = lines[:i+1] + indented + lines[i+1:]
break
return "\n".join(lines)
Test it:
sample_function = """
def calculate_mean(values):
total = sum(values)
return total / len(values)
"""
docstring = generate_docstring(sample_function)
documented = insert_docstring(sample_function, docstring)
print(documented)
Output:
def calculate_mean(values):
"""
Calculate the arithmetic mean of a list of values.
Args:
values: A list of numeric values.
Returns:
The arithmetic mean as a float.
Raises:
ZeroDivisionError: If the input list is empty.
TypeError: If the list contains non-numeric values.
Example:
mean = calculate_mean([1, 2, 3, 4, 5])
# Returns: 3.0
"""
total = sum(values)
return total / len(values)
### 5. Batch File Processing
import re
def extract_functions(file_content: str) -> list[str]:
"""Extract all function definitions from a Python file."""
pattern = r"(def \w+(.?):(?:\n(?: .+|\s))*)"
return re.findall(pattern, file_content, re.MULTILINE)
def document_file(input_path: str, output_path: str) -> None:
"""Read a Python file, document all functions, and save the result."""
with open(input_path, "r") as f:
content = f.read()
functions = extract_functions(content)
print(f"Found {len(functions)} functions. Generating docstrings...\n")
documented_content = content
for i, function in enumerate(functions):
print(f"Processing function {i+1}/{len(functions)}...")
# Skip functions that already have docstrings
if '"""' in function or "'''" in function:
print(f" Already documented, skipping.")
continue
docstring = generate_docstring(function)
documented_function = insert_docstring(function, docstring)
documented_content = documented_content.replace(function, documented_function)
with open(output_path, "w") as f:
f.write(documented_content)
print(f"\nDone. Documented file saved to: {output_path}")
Usage:
document_file("statistics_assignment.py", "statistics_assignment_documented.py")
### 6. Full Script Structure
import re
from dotenv import load_dotenv
from anthropic import Anthropic, APIError, RateLimitError, APIConnectionError
load_dotenv()
client = Anthropic()
def generate_docstring(function_code: str) -> str:
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=(
## Pitfall Guide
1. **Unconstrained LLM Output**: Omitting strict output directives (`"Return only the docstring text inside triple quotes. No explanation, no extra text."`) causes the model to wrap responses in markdown, conversational filler, or code fences, breaking automated string replacement.
2. **Fragile Regex Extraction**: The pattern `r"(def \w+\(.*?\):(?:\n(?: .+|\s*))*)"` fails on decorators, multi-line signatures, or nested functions. Production systems should use Python's `ast` module for accurate boundary detection.
3. **Hardcoded Indentation Drift**: Assuming `indent = " "` breaks when processing files using 2-space indentation, tabs, or class methods with deeper nesting. Implement dynamic indentation detection based on the first indented line after `def`.
4. **API Rate Limits & Cost Blowouts**: Processing large modules without exponential backoff, token counting, or concurrency limits triggers `RateLimitError` and unexpected billing. Implement retry logic with `time.sleep` and track `max_tokens` usage per call.
5. **Redundant Processing**: Skipping the `if '"""' in function or "'''" in function` check forces the API to regenerate documentation for already-complete functions, wasting tokens and potentially overwriting manually curated notes.
6. **Hallucinated Exception Handling**: LLMs may invent `Raises` clauses for errors the code doesn't explicitly handle. Always validate inferred exceptions against actual `try/except` blocks or type hints before committing to production codebases.
## Deliverables
- **π Automation Blueprint**: Step-by-step architecture diagram covering environment initialization, prompt constraint design, AST-safe insertion logic, and batch processing pipeline. Includes token budgeting guidelines and rate-limit mitigation strategies.
- **β
Implementation Checklist**:
- [ ] Virtual environment isolated with `anthropic` and `python-dotenv`
- [ ] `.env` file secured with `ANTHROPIC_API_KEY`
- [ ] System prompt hardened with strict output constraints
- [ ] Docstring insertion logic validated against 4-space and 2-space indentation
- [ ] Batch processor configured to skip pre-documented functions
- [ ] Error handling wrapped around `client.messages.create()` for `RateLimitError` and `APIConnectionError`
- [ ] Output file written to a separate path to preserve source integrity
- **π¦ Configuration Templates**: Ready-to-use `.env` structure, `requirements.txt` snapshot, and modular script layout for CI/CD integration or academic submission pipelines.