, it maintains server-side conversation state, automatically preserves thought signatures across turns, and optimizes token routing for multi-modal, multi-step workflows.
from google import genai
from typing import List, Dict, Any
class AgenticRuntime:
def __init__(self, model_id: str = "gemini-3.5-flash"):
self.client = genai.Client()
self.model_id = model_id
self.current_interaction = None
def start_session(self, initial_prompt: str) -> Dict[str, Any]:
interaction = self.client.interactions.create(
model=self.model_id,
input=initial_prompt,
generation_config={"thinking_level": "medium"}
)
self.current_interaction = interaction
return {"output": interaction.output_text, "id": interaction.id}
Architecture Rationale: We wrap the client in a session manager to track interaction.id. This enables seamless continuation without manually reconstructing conversation history. The medium thinking level is set at initialization because it provides the optimal balance of reasoning depth and latency for most agentic tasks.
Step 2: Implement Strict Function Response Routing
Function calling in Gemini 3.x enforces strict contract matching. Every response must align with the preceding call's id, name, and payload structure. Mismatches trigger silent failures or empty responses with finish_reason: STOP.
def execute_tool_response(self, tool_call: Any, result_data: Any) -> Dict[str, Any]:
formatted_result = {
"type": "function_result",
"name": tool_call.name,
"call_id": tool_call.id,
"result": [{"type": "text", "text": str(result_data)}]
}
next_turn = self.client.interactions.create(
model=self.model_id,
previous_interaction_id=self.current_interaction.id,
input=[formatted_result]
)
self.current_interaction = next_turn
return {"output": next_turn.output_text, "id": next_turn.id}
Architecture Rationale: The call_id and name are extracted directly from the tool call object to guarantee exact matching. Results are wrapped in a list of content parts to comply with the multimodal payload structure. This prevents state desynchronization and ensures the model correctly associates outputs with pending tool requests.
Step 3: Handle Multi-Turn Thought Preservation
Thought preservation is automatic in the Interactions API. The runtime carries forward intermediate reasoning steps across turns without requiring manual history reconstruction. When using the legacy GenerateContent API, you must explicitly pass the full conversation chain including thought signatures.
def continue_reasoning(self, user_follow_up: str) -> Dict[str, Any]:
continuation = self.client.interactions.create(
model=self.model_id,
previous_interaction_id=self.current_interaction.id,
input=user_follow_up
)
self.current_interaction = continuation
return {"output": continuation.output_text, "id": continuation.id}
Architecture Rationale: By passing only the follow-up input and the previous interaction ID, the runtime automatically merges the new prompt with the preserved reasoning chain. This eliminates the need for manual context window management and prevents reasoning fragmentation during iterative debugging or code refactoring workflows.
Pitfall Guide
1. Sampling Parameter Interference
Explanation: Developers frequently pass temperature, top_p, or top_k to control output variance. In Gemini 3.x, these parameters disrupt the model's internal token allocation and reasoning routing, leading to inconsistent outputs and degraded agentic performance.
Fix: Remove all sampling parameters from generation_config. Rely on explicit system instructions and thinking_level to control behavior. The model's defaults are mathematically optimized for reasoning tasks.
2. Numeric Thinking Budgets
Explanation: The legacy thinking_budget parameter expects a raw token count. This forces developers to guess optimal reasoning depth, often resulting in either truncated thoughts or excessive token consumption.
Fix: Replace numeric budgets with the thinking_level enum (minimal, low, medium, high). The runtime dynamically scales reasoning steps based on task complexity, eliminating manual budget tuning.
3. Function Response Desynchronization
Explanation: Returning a function result with a mismatched id, name, or incorrect payload count causes the model to drop the response or return empty outputs. This is a silent failure that breaks agentic loops.
Fix: Always map call_id and name directly from the incoming FunctionCall object. Ensure exactly one FunctionResponse is returned per FunctionCall. Validate payload structure before submission.
4. Multimodal Content Leakage
Explanation: Placing images, audio, or documents outside the function response array confuses the model's attention mechanism. The runtime may treat the media as a new user prompt rather than tool output, causing thought leakage and degraded reasoning.
Fix: Embed all multimodal content directly inside the result array of the function_result payload. The model processes media in the correct context when it's structurally bound to the tool response.
5. Fragmented Inline Instructions
Explanation: Appending follow-up instructions as separate content parts after a function response breaks the reasoning chain. The model may interpret the instruction as a new turn rather than a continuation of the tool output.
Fix: Concatenate inline instructions to the end of the function response text, separated by exactly two newlines (\n\n). This keeps the instruction bound to the tool output while maintaining clear semantic boundaries.
6. Over-Provisioning Reasoning Depth
Explanation: Defaulting to high thinking effort on simple queries or routine tool calls wastes tokens and increases latency. The model will over-analyze straightforward tasks, slowing down agentic loops.
Fix: Start with medium. Scale to high only for complex mathematical proofs, multi-file code refactoring, or ambiguous reasoning tasks. Use low or minimal for high-throughput chat or simple factual retrieval.
7. Ignoring Thought Signature Propagation
Explanation: When using the GenerateContent API, dropping conversation history or stripping thought signatures breaks multi-turn reasoning. The model loses intermediate steps, causing iterative debugging to fail.
Fix: Always pass the complete, unmodified conversation history including thought signatures. The Python SDK handles this automatically when using the Interactions API, but manual history management requires strict signature preservation.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-throughput chat / FAQ | thinking_level: minimal | Optimizes for speed; skips deep reasoning on factual queries | Lowest token cost |
| Routine code generation / simple agents | thinking_level: low | Balances latency with sufficient reasoning for straightforward tasks | Moderate cost reduction |
| General agentic workflows / multi-step coding | thinking_level: medium (default) | Provides optimal reasoning depth without unnecessary token overhead | Baseline cost |
| Complex math / multi-file refactoring / ambiguous debugging | thinking_level: high | Enables extended thought chains and deeper tool exploration | Highest token cost |
| Tool call overuse / runaway loops | System instruction + medium/low | Restricts action budget while maintaining reasoning quality | Reduces wasted tool tokens by 30-50% |
Configuration Template
from google import genai
import json
class ProductionAgent:
def __init__(self):
self.client = genai.Client()
self.model = "gemini-3.5-flash"
self.session = None
def initialize(self, system_prompt: str, initial_input: str):
self.session = self.client.interactions.create(
model=self.model,
input=initial_input,
system_instruction=system_prompt,
generation_config={
"thinking_level": "medium"
}
)
return self.session.output_text
def process_tool_output(self, tool_call_obj, result_payload):
response_part = {
"type": "function_result",
"name": tool_call_obj.name,
"call_id": tool_call_obj.id,
"result": [
{"type": "text", "text": f"{json.dumps(result_payload)}\n\nProceed with next step."}
]
}
self.session = self.client.interactions.create(
model=self.model,
previous_interaction_id=self.session.id,
input=[response_part]
)
return self.session.output_text
def continue_conversation(self, user_input: str):
self.session = self.client.interactions.create(
model=self.model,
previous_interaction_id=self.session.id,
input=user_input
)
return self.session.output_text
Quick Start Guide
- Install the SDK: Run
pip install -U google-genai to ensure you have the latest runtime supporting Interactions API and thinking_level.
- Initialize the Client: Create a
genai.Client() instance and set your target model to gemini-3.5-flash.
- Start an Interaction: Call
client.interactions.create() with your initial prompt and thinking_level: medium. Capture the returned id for state continuation.
- Handle Tool Responses: When the model requests a tool call, extract
id and name, execute the function, and return the result using the exact contract structure. Pass the previous_interaction_id to maintain state.
- Verify Continuity: Run a multi-turn debugging or coding task. Confirm that intermediate reasoning steps persist automatically without manual history reconstruction.