Deploying a Multi-Agent System with Terraform and Cloud Run

By Codcompass Team·2026-05-09·6 min read

Current Situation Analysis

Transitioning a multi-agent system from a local prototype to a production-grade service introduces critical architectural and operational challenges that traditional deployment patterns fail to address. Local environments lack persistent state management, making it impossible to maintain user preferences or cross-session memory. Manual cloud provisioning leads to configuration drift, inconsistent IAM policies, and severe security vulnerabilities when API credentials are hardcoded or passed via plain environment variables. Furthermore, traditional microservice deployments treat agents as stateless HTTP endpoints, ignoring the complex reasoning paths, tool invocations, and memory retrieval cycles inherent to LLM-based architectures. Without structured telemetry, debugging cognitive failures versus system timeouts becomes nearly impossible, and the absence of automated infrastructure-as-code results in non-reproducible environments that cannot scale securely.

WOW Moment: Key Findings

By adopting the Agent Starter Pack patterns combined with Terraform provisioning and Cloud Run deployment, teams achieve a standardized, secure, and observable production backbone. The integration of Vertex AI Memory Bank with ADK telemetry transforms opaque agent behavior into actionable, visualized reasoning paths.

Approach	Deployment Time	Secret Management	Observability Coverage	State Persistence	Security Posture
Local/Manual Script	45-60 mins	Hardcoded/Env Vars	None/Basic Logs	In-Memory Only	Low (Broad IAM)
Cloud Run + Terraform + ADK	<10 mins	Secret Manager Injection	Full Agent Traces	Vertex AI Memory Bank	High (Least-Privilege IAM)

Key Findings:

Terraform reduces infrastructure provisioning time by ~80% while enforcing reproducible, version-controlled state.
ADK's otel_to_cloud=True flag automatically exports structured "Agent Traces" to Cloud Trace, enabling visual waterfall analysis of LLM invocations and MCP tool calls.
Runtime secret injection via Secret Manager eliminates credential leakage risks and supports dynamic rotation without container rebuilds.
Vertex AI Memory Bank provides persistent, cross-session state management, critical for personalized multi-agent interactions.

Core Solution

The production deployment relies on three interconnected layers: a FastAPI application server for request routing and memory binding, OpenTelemetry-based telemetry for reasoning visibility, and Terraform-driven infrastructure provisioning for secure, scalable cloud resources.

The Application Server

The fast_api_app.py file transforms core agent logic into a production-ready FastAPI server. It establishes the critical connection to the Vertex AI memory bank via MEMORY_URI, enabling the ADK framework to persist and retrieve user preferences across production sessions. The server also initializes production-grade telemetry and securely unpacks runtime secrets without polluting the environment namespace.

cd ..

Paste the following code in dev_signal_agent/fast_api_app.py:

import os
from fastapi import FastAPI
from google.adk.cli.fast_api import get_fast_api_app
from google.cloud import logging as cloud_logging
from vertexai import agent_engines
from dev_signal_agent.app_utils.env import init_environment

# --- Initialization & Secure Secret Retrieval ---
# We now unpack the SECRETS dictionary returned by our updated env.py
PROJECT_ID, MODEL_LOC, SERVICE_LOC, SECRETS = init_environment()
logger = cloud_logging.Client().logger(__name__)

# Access sensitive credentials from the SECRETS dictionary
# These keys stay in memory and are NOT injected into os.environ
REDDIT_CLIENT_ID = SECRETS.get("REDDIT_CLIENT_ID")
REDDIT_CLIENT_SECRET = SECRETS.get("REDDIT_CLIENT_SECRET")
REDDIT_USER_AGENT = SECRETS.get("REDDIT_USER_AGENT")
DK_API_KEY = SECRETS.get("DK_API_KEY")

# --- Configuration & Sessions ---
AGENT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Non-sensitive configuration uses environment variables
BUCKET = os.environ.get("AI_ASSETS_BUCKET")
USE_IN_MEMORY = os.environ.get("USE_IN_MEMORY_SESSION", "").lower() in

("true", "1")

--- MEMORY BANK CONNECTION ---

def _get_memory_bank_uri(): if USE_IN_MEMORY: return None, None # We use 'dev_signal_agent' as the display name for the Vertex AI memory bank name = os.environ.get("AGENT_ENGINE_MEMORY_BANK_NAME", "dev_signal_agent") existing = list(agent_engines.list(filter=f"display_name={name}")) ae = existing[0] if existing else agent_engines.create(display_name=name) uri = f"agentengine://{ae.resource_name}" print(f"DEBUG: Connecting to Memory Bank: {uri} (display_name={name})") return uri, uri

SESSION_URI, MEMORY_URI = _get_memory_bank_uri()

--- Initialize FastAPI with ADK ---

app: FastAPI = get_fast_api_app( agents_dir=AGENT_DIR, web=True, artifact_service_uri=f"gs://{BUCKET}" if BUCKET else None, allow_origins=os.getenv("ALLOW_ORIGINS", "").split(",") if os.getenv("ALLOW_ORIGINS") else None, session_service_uri=SESSION_URI, memory_service_uri=MEMORY_URI, # <--- Connects the Memory Bank otel_to_cloud=True, # <--- Enables production telemetry )

if name == "main": import uvicorn # Standard Cloud Run port is 8080 uvicorn.run(app, host="0.0.0.0", port=8080)


### Implementing Telemetry
Production visibility requires structured tracing of agent reasoning. Setting `otel_to_cloud=True` in the ADK initialization automatically instruments the application, exporting "Agent Traces" to Google Cloud Console. These traces render a visual waterfall of cognitive operations, LLM invocations, and MCP tool calls, enabling precise differentiation between reasoning failures and infrastructure bottlenecks.

**Monitoring vs. Targeted Evaluation:**
Cloud Run applies trace sampling to balance performance and cost. System traces monitor aggregate behavior (latency, timeouts), while reasoning traces require targeted evaluation calls to capture full request details for quality assessment.

**Viewing the Trace:**
Navigate to Trace Explorer in Google Cloud Console, filter by service name (e.g., `dev-signal`), and open specific Trace IDs to view Gantt-style breakdowns. This reveals cognitive decision paths versus physical system constraints.

### Infrastructure as Code: Provisioning Secure Cloud Resources
Terraform automates the creation of a security-first platform, enforcing least-privilege IAM, automated secret injection, and reproducible resource provisioning. The infrastructure is modularized into logical blocks:
- **Resources & Variables**: Project, region, and secret mappings
- **Core Infrastructure**: API enablement and private Artifact Registry
- **IAM**: Specialized service accounts with scoped permissions
- **Secret Management**: Secure ingestion into Google Secret Manager
- **Cloud Run Configuration**: Container environment, resource limits, and runtime secret binding

To begin provisioning, return to the root folder and create the deployment structure:

cd .. mkdir deployment cd deployment mkdir terraform cd terraform


### Terraform Resources and Variables
The `variables.tf` file defines configurable deployment parameters, enabling environment customization without logic modification. It includes project/region settings, service naming, and a secrets map for secure runtime credential injection.

variable "project_id" { description = "The Google Cloud Project ID" type = string } variable "region" { description = "The Google Cloud region to deploy to" type = string default = "us-central1" } variable "service_name" { description = "The name of the Cloud Run service" type = string default = "dev-signal" } variable "secrets" { description = "A map of secret names and