mapping. The following implementation demonstrates a production-ready pattern using the @google/genai SDK.
Step 1: SDK Initialization and Client Wrapper
The official package requires Node.js 18 or later. Wrap the SDK in a dedicated client class to centralize configuration, timeout handling, and error boundaries.
import { GoogleGenAI, Type, GenerateContentResponse } from "@google/genai";
interface GeminiClientConfig {
apiKey: string;
modelId: string;
timeoutMs?: number;
}
export class GeminiOrchestrator {
private readonly client: GoogleGenAI;
private readonly model: string;
private readonly timeout: number;
constructor(config: GeminiClientConfig) {
this.client = new GoogleGenAI({ apiKey: config.apiKey });
this.model = config.modelId;
this.timeout = config.timeoutMs ?? 30000;
}
private async withTimeout<T>(promise: Promise<T>): Promise<T> {
return Promise.race([
promise,
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error("Gemini request timed out")), this.timeout)
),
]);
}
}
Architecture Rationale: Explicit timeout configuration prevents hanging connections during high-load periods. Wrapping the SDK isolates vendor-specific logic, making it easier to swap models or implement fallback routing later.
Step 2: Streaming Implementation
Blocking calls are unsuitable for interactive interfaces. The SDK exposes an async generator for streaming, which must be consumed without blocking the event loop.
async streamGeneration(prompt: string): Promise<AsyncIterable<string>> {
const stream = await this.withTimeout(
this.client.models.generateContentStream({
model: this.model,
contents: prompt,
})
);
return {
async *[Symbol.asyncIterator]() {
for await (const chunk of stream) {
const text = chunk.text;
if (text) yield text;
}
},
};
}
Architecture Rationale: Returning an AsyncIterable decouples the generation layer from the transport layer. This allows the same stream to be piped into WebSocket connections, Server-Sent Events, or CLI outputs without modifying the core logic. The chunk.text guard prevents undefined values from leaking into downstream consumers.
Gemini 3.x introduced a strict requirement: every function call response must include the exact id generated by the model. Omitting this field breaks the conversation state.
async executeToolCall(
prompt: string,
toolDefinitions: Array<{ name: string; description: string; schema: any }>
): Promise<string> {
const formattedTools = toolDefinitions.map((t) => ({
functionDeclarations: [
{
name: t.name,
description: t.description,
parameters: {
type: Type.OBJECT,
properties: t.schema.properties,
required: t.schema.required ?? [],
},
},
],
}));
const initialResponse = await this.withTimeout(
this.client.models.generateContent({
model: this.model,
contents: prompt,
config: { tools: formattedTools },
})
);
const calls = initialResponse.functionCalls;
if (!calls || calls.length === 0) {
return initialResponse.text ?? "";
}
const conversationHistory: any[] = [
{ role: "user", parts: [{ text: prompt }] },
initialResponse.candidates?.[0]?.content,
];
for (const call of calls) {
const executionResult = await this.invokeExternalTool(call.name, call.args);
conversationHistory.push({
role: "user",
parts: [
{
functionResponse: {
id: call.id,
name: call.name,
response: { result: executionResult },
},
},
],
});
}
const finalResponse = await this.withTimeout(
this.client.models.generateContent({
model: this.model,
contents: conversationHistory,
config: { tools: formattedTools },
})
);
return finalResponse.text ?? "";
}
private async invokeExternalTool(toolName: string, args: any): Promise<any> {
switch (toolName) {
case "fetch_repository_metrics":
return { stars: 12400, openIssues: 42, lastCommit: "2026-05-18" };
case "validate_schema":
return { valid: true, errors: [] };
default:
throw new Error(`Unknown tool: ${toolName}`);
}
}
Architecture Rationale: The tool execution loop processes multiple parallel calls in a single turn, matching Gemini's capability to request several functions simultaneously. The id field is explicitly preserved and echoed back, satisfying the 3.x API contract. Using Type.OBJECT from the SDK enum ensures schema validation aligns with the model's expectations, preventing silent parsing failures.
Pitfall Guide
Explanation: The Gemini 3.x API attaches a unique identifier to every function call request. If your response does not echo this exact id, the model cannot correlate the result with the original request, causing state desynchronization.
Fix: Always map call.id directly into the functionResponse.id field before appending to conversation history.
Explanation: Output generation typically consumes 2 to 4 times the token volume of the input prompt. Teams that calculate costs using only input pricing consistently underestimate monthly spend.
Fix: Model your budget using a 1:3 input-to-output ratio as a baseline. Track actual output volume in production and adjust multipliers per workflow.
3. Blocking the Event Loop During Stream Consumption
Explanation: Using synchronous loops or awaiting entire responses before processing chunks defeats the purpose of streaming and increases perceived latency.
Fix: Use for await...of with async generators. Pipe chunks directly to transport layers (SSE, WebSockets) without intermediate buffering unless explicitly required.
4. Passing String Literals for Schema Types
Explanation: The SDK expects the Type enum for parameter definitions. Passing raw strings like "object" or "string" may work in development but causes validation mismatches in production.
Fix: Import Type from @google/genai and use Type.OBJECT, Type.STRING, etc., consistently across all tool declarations.
5. Ignoring the Search Grounding Quota
Explanation: The free tier enforces a 5,000 prompt monthly limit for Google Search grounding across all Gemini 3 models. Agentic workflows that repeatedly trigger search queries exhaust this cap quickly.
Fix: Monitor grounding usage via the GCP console. Implement caching for repeated queries or switch to a paid tier with explicit per-query billing ($14 per 1,000 queries) before hitting the cap.
6. Assuming Single-Call Pricing Equals Task Cost
Explanation: Multi-turn agentic tasks involve multiple model invocations, tool executions, and context window management. Per-call pricing does not reflect the cumulative cost of a completed workflow.
Fix: Instrument your application to track total tokens consumed per task completion. Compare 3.5 Flash and 2.5 Flash using task-level metrics, not individual API call pricing.
7. Missing Monthly Spend Caps
Explanation: The free tier provides limited access, but paid tiers remove daily caps. Without explicit budget controls, unexpected traffic spikes can generate unbounded charges.
Fix: Configure monthly spend limits in the Google Cloud Console. Implement application-level circuit breakers that degrade gracefully when approaching budget thresholds.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Real-time chat or coding assistant | Gemini 3.5 Flash | Low latency improves UX; streaming throughput justifies premium | Higher per-token cost, lower infrastructure wait time |
| Batch document processing | Gemini 2.5 Flash | No user waiting; cost per token dominates economics | Significantly lower monthly spend at scale |
| Multi-step agentic loop | Gemini 3.5 Flash | Faster output reduces loop iterations and timeout risks | Premium offset by fewer total calls per task |
| Prototyping / internal tools | Free tier (either model) | Sufficient for development; rate limits prevent accidental overage | Zero cost until quota exhaustion |
Configuration Template
// config/gemini.ts
import { GeminiOrchestrator } from "../core/GeminiOrchestrator";
export const geminiClient = new GeminiOrchestrator({
apiKey: process.env.GEMINI_API_KEY ?? "",
modelId: "gemini-3.5-flash",
timeoutMs: 25000,
});
export const toolRegistry = [
{
name: "fetch_repository_metrics",
description: "Retrieves star count, open issues, and last commit date for a given repository.",
schema: {
properties: {
owner: { type: "string", description: "GitHub username or organization" },
repo: { type: "string", description: "Repository name" },
},
required: ["owner", "repo"],
},
},
{
name: "validate_schema",
description: "Checks if a JSON payload conforms to the expected structure.",
schema: {
properties: {
payload: { type: "object", description: "Raw JSON object to validate" },
},
required: ["payload"],
},
},
];
Quick Start Guide
- Install dependencies: Run
npm install @google/genai and ensure your runtime is Node.js 18 or newer.
- Set credentials: Export
GEMINI_API_KEY in your environment or inject it via your deployment platform's secret manager.
- Initialize the client: Import the wrapper class, pass your API key and target model ID, and configure a reasonable timeout.
- Test streaming: Call the streaming method with a sample prompt and pipe the async iterator to your transport layer or console.
- Validate tool calling: Register a simple function declaration, trigger a prompt that requires it, and verify the
id echo in the response chain.