t. The following implementation demonstrates how to structure this in TypeScript.
Step 1: Define the Workspace Blueprint
The manifest acts as a starting-point contract. It materializes files, repositories, environment variables, and storage mounts before the sandbox initializes. Keep this declarative and scoped to actual agent requirements.
import { WorkspaceBlueprint, createFile, mountStorage } from '@openai/agents-sdk';
const projectBlueprint = new WorkspaceBlueprint({
initialState: {
"specs/requirements.md": createFile({
content: "# Payment Gateway Integration\n- Target: Stripe v3 API\n- Compliance: PCI-DSS Level 1\n- Deadline: 2026-03-20\n"
}),
"specs/known_issues.md": createFile({
content: "# Current Blockers\n- Webhook signature validation failing on edge cases\n- Rate limit handling not implemented\n"
}),
"config/runtime.env": createFile({
content: "LOG_LEVEL=info\nMAX_RETRIES=3\n"
})
},
storageMounts: [
mountStorage({
provider: "s3",
bucket: "agent-artifacts-prod",
prefix: "payment-integration/",
scope: "read-write"
})
],
environment: {
NODE_ENV: "production",
AGENT_SESSION_ID: "auto-generated"
}
});
Step 2: Initialize the Control Plane Harness
The harness owns the agent loop. It routes tool calls, persists conversation state, handles streaming responses, and manages recovery. Configure it with the blueprint, model settings, and tool definitions.
import { HarnessController, SandboxClient } from '@openai/agents-sdk';
const harness = new HarnessController({
blueprint: projectBlueprint,
modelConfig: {
provider: "openai",
model: "gpt-4o",
temperature: 0.2,
maxTokens: 8192
},
toolRegistry: {
shell: { timeout: 30000, interactive: true },
filesystem: { readOnlyPaths: ["specs/"], writablePaths: ["output/"] },
mcp: { servers: ["linear-mcp", "slack-mcp"] }
},
recovery: {
enableSnapshots: true,
snapshotInterval: 5, // turns
resumeOnFailure: true
}
});
Step 3: Attach Execution Client & Run
The sandbox client is part of run configuration, not agent definition. Swap environments without modifying the harness or blueprint.
import { DockerSandbox, HostedSandbox } from '@openai/agents-sdk/clients';
// Production deployment
const runtimeClient = new HostedSandbox({
provider: "vercel",
region: "us-east-1",
credentials: {
// Injected by provider runtime, never in code
storageKey: process.env.S3_MOUNT_KEY,
networkPolicy: "egress-only"
}
});
// Execute the agent loop
async function runAgentWorkflow() {
const session = await harness.initialize(runtimeClient);
session.on("toolDispatch", (call) => {
console.log(`[Harness] Routing ${call.tool} to sandbox`);
});
session.on("stateSnapshot", (snapshot) => {
console.log(`[Harness] Persisted turn state: ${snapshot.turnId}`);
});
const result = await session.execute({
prompt: "Analyze the payment gateway specs, implement webhook validation, and commit changes to the output directory.",
maxTurns: 12
});
console.log("Execution complete:", result.status);
}
Architecture Rationale
- Separation of Control and Compute: The harness retains auth, billing, audit trails, and human approval gates. The sandbox only executes model-directed commands. This prevents credential leakage and enables independent scaling.
- Declarative Workspace Contract: The blueprint initializes a deterministic environment. Every sandbox session starts from the same baseline, eliminating drift and simplifying debugging.
- Provider-Agnostic Runtime: Swapping
DockerSandbox for HostedSandbox requires zero changes to the harness or blueprint. This enables environment parity from local development to production.
- Automatic State Persistence: The harness serializes conversation turns, tool results, and workspace metadata. Recovery logic is built-in, removing the need for external state stores or custom checkpointing.
Pitfall Guide
1. Prompt Leakage of Secrets
Explanation: Developers occasionally embed API keys, tokens, or database credentials directly in the agent prompt or manifest files. The sandbox executes arbitrary code, making these values accessible to model-generated scripts.
Fix: Treat all credentials as runtime configuration. Use provider-native secret managers (Vercel Secrets, Cloudflare KV, AWS Secrets Manager). Inject values via environment variables marked as ephemeral, and never reference them in prompts or workspace files.
2. Conflating Control and Compute Planes
Explanation: Running the harness inside the sandbox for convenience creates a single point of failure. If the sandbox crashes or is compromised, the control plane, audit logs, and recovery state are lost.
Fix: Maintain strict boundary separation. The harness should run in trusted infrastructure (serverless functions, orchestration services). Sandboxes should only handle file I/O, shell execution, and mounted storage access.
3. Unscoped Storage Mounts
Explanation: Mounting entire cloud buckets or repositories gives the agent unrestricted read/write access. This violates least-privilege principles and increases blast radius for accidental deletions or data exfiltration.
Fix: Scope mounts to specific prefixes or directories. Use scope: "read-only" for input data and scope: "read-write" only for designated output directories. Review mounted paths during manifest validation.
4. Ignoring Sandbox Lifecycle Management
Explanation: Sandboxes consume compute resources. Failing to terminate idle sessions or clean up temporary workspaces leads to cost accumulation and resource exhaustion.
Fix: Implement explicit lifecycle hooks. Use session.terminate() after execution completes. Configure provider-level TTL policies (e.g., 15-minute idle timeout). Log sandbox creation/destruction events for audit compliance.
Explanation: Waiting for long-running shell commands or file operations to complete before yielding control back to the harness causes token waste and increases latency.
Fix: Leverage the harness's streaming capabilities. Mark long-running tools as interactive: true or async: true. Allow the model to proceed with other tasks while the sandbox executes in the background. Use polling or webhook callbacks for completion signals.
6. Overcomplicating the Manifest
Explanation: Packing the blueprint with unnecessary files, environment variables, or mount points increases initialization time and introduces configuration drift.
Fix: Follow the principle of minimal workspace. Only include files the agent actively reads or writes. Use relative paths. Keep task specifications in dedicated workspace files (AGENTS.md, TASK.md) rather than embedding them in the harness configuration.
7. Neglecting Snapshot/Resume Logic
Explanation: Assuming the harness automatically handles all failure modes without explicit snapshot strategy leads to lost progress during network interruptions or provider outages.
Fix: Enable periodic snapshots (snapshotInterval: 5). Store snapshot metadata in durable storage. Implement resume handlers that validate workspace integrity before continuing execution. Test failure scenarios explicitly in staging.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Local development & rapid iteration | UnixLocal client | Zero infrastructure overhead, instant workspace teardown | Negligible |
| CI/CD pipeline validation | Docker client | Deterministic environment, reproducible builds, isolated network | Low (container runtime) |
| Production traffic with auto-scaling | Hosted provider (Vercel/Cloudflare) | Managed isolation, provider-native secrets, horizontal scaling | Medium-High (compute + egress) |
| Multi-agent collaboration | Shared mounted storage + scoped prefixes | Enables artifact sharing without direct sandbox-to-sandbox communication | Medium (storage I/O) |
| Compliance-heavy workloads | Hosted provider + audit logging + snapshot retention | Meets regulatory requirements for traceability and data isolation | High (logging + storage) |
Configuration Template
// production-agent.config.ts
import { WorkspaceBlueprint, HarnessController } from '@openai/agents-sdk';
import { HostedSandbox } from '@openai/agents-sdk/clients';
export const agentBlueprint = new WorkspaceBlueprint({
initialState: {
"workspace/task.md": createFile({ content: "# Task: Refactor auth module\n- Replace JWT with OAuth2\n- Update test suite\n" }),
"workspace/output/": createDirectory({ permissions: "0755" })
},
storageMounts: [
mountStorage({ provider: "r2", bucket: "prod-artifacts", prefix: "auth-refactor/", scope: "read-write" })
],
environment: {
NODE_ENV: "production",
LOG_FORMAT: "json"
}
});
export const productionHarness = new HarnessController({
blueprint: agentBlueprint,
modelConfig: { provider: "openai", model: "gpt-4o", temperature: 0.1 },
toolRegistry: {
shell: { timeout: 45000, interactive: true },
filesystem: { readOnlyPaths: ["workspace/task.md"], writablePaths: ["workspace/output/"] },
mcp: { servers: ["github-mcp", "datadog-mcp"] }
},
recovery: { enableSnapshots: true, snapshotInterval: 4, resumeOnFailure: true }
});
export const productionClient = new HostedSandbox({
provider: "cloudflare",
region: "auto",
credentials: {
storageKey: process.env.CF_R2_KEY,
networkPolicy: "egress-only",
cpuLimit: "2vCPU",
memoryLimit: "4GB"
}
});
Quick Start Guide
- Install the SDK: Run
npm install @openai/agents-sdk in your project directory.
- Define Your Blueprint: Create a
WorkspaceBlueprint with only the files, mounts, and environment variables the agent requires.
- Initialize the Harness: Configure
HarnessController with your model settings, tool registry, and recovery preferences.
- Select a Runtime Client: Instantiate
UnixLocal for development or HostedSandbox for production. Pass provider credentials via environment variables.
- Execute & Monitor: Call
harness.initialize(client) and run session.execute(). Attach event listeners for toolDispatch, stateSnapshot, and termination to observe execution flow.