Architecting Stateful AI Code Review Agents Under Container Security Constraints

Current Situation Analysis

Enterprise AI code review tools have introduced a predictable but rapidly scaling cost model. Platforms like GitLab Duo Code Review charge approximately $0.20 per merge request evaluation. For small engineering teams or startups operating on tight margins, this per-event pricing quickly compounds. Five reviews hit $1.00; fifty reviews hit $10.00. When PR velocity increases during sprint cycles, the bill scales linearly with developer output, creating a direct financial penalty for shipping code faster.

The industry default response has been to adopt stateless CI/CD pipelines that call pay-per-token APIs. These pipelines spin up a container, execute a single-pass review, post results, and terminate. While architecturally clean, this approach ignores two critical operational realities:

Iterative feedback requires conversation state. Developers rarely accept AI feedback on the first pass. They ask follow-ups, request clarifications, or challenge architectural suggestions. Stateless pipelines discard context after every run, forcing developers to re-explain constraints and causing the system to re-bill for identical context tokens.
Subscription economics favor automation. Many teams already maintain flat-rate ChatGPT or enterprise AI subscriptions for interactive use. These plans are technically capable of powering automated workflows, but platform restrictions, OAuth token management, and security boundaries make direct integration non-trivial.

The gap between interactive AI subscriptions and automated review pipelines is rarely addressed because most teams assume subscription endpoints cannot be safely automated. This assumption overlooks the fact that with proper concurrency controls, state isolation, and container-level security hardening, subscription-backed AI agents can run reliably in production without violating usage heuristics or exposing credentials.

WOW Moment: Key Findings

The architectural shift from stateless API calls to stateful subscription agents fundamentally changes cost efficiency, developer experience, and infrastructural requirements. The following comparison highlights the operational trade-offs:

Approach	Cost Model	Conversation Continuity	Token Efficiency	Infra Complexity
Stateless API Pipeline	Pay-per-token	None (reset per run)	Low (re-sends full context)	Low (CI job only)
Stateful Subscription Agent	Flat-rate seat	Full (thread-resume)	High (context cached)	Medium (queue, DB, PVC, sidecars)

Why this matters: Stateful agents eliminate redundant token consumption during iterative reviews. A developer asking three follow-up questions on a single merge request consumes context tokens only once, rather than tripling the bill. The flat-rate subscription model caps costs regardless of review volume, making it economically superior for teams with predictable AI seat counts. The trade-off is increased architectural complexity: you must manage persistent state, enforce strict concurrency limits, and harden container boundaries to prevent credential leakage and prompt injection.

Core Solution

Building a stateful AI review agent requires decoupling ingestion, execution, and state persistence. The architecture follows a single-worker pipeline design to comply with subscription usage policies while maintaining thread safety and idempotency.

Architecture Overview

GitLab Webhook → Review Gateway → Task Broker (Redis) → Worker → Codex SDK → GitLab API
                                      │                        │
                                      ├─ Thread Mapper (MariaDB)
                                      └─ Session Store (PVC)

Step 1: Webhook Ingestion & Idempotency

The review gateway receives GitLab merge request events. It validates the X-Gitlab-Token header using constant-time comparison, extracts the project ID, MR IID, and commit SHA, and pushes a deterministic job to the task broker. Deterministic job keys prevent duplicate processing when GitLab retries webhooks.

import { FastifyInstance } from 'fastify';
import { TaskBroker } from './task-broker';
import { constantTimeCompare } from './crypto-utils';

export class ReviewGateway {
  constructor(private broker: TaskBroker, private sharedSecret: string) {}

  async handleWebhook(server: FastifyInstance) {
    server.post('/webhook/review', async (request, reply) => {
      const token = request.headers['x-gitlab-token'] as string;
      if (!constantTimeCompare(token, this.sharedSecret)) {
        return reply.code(401).send({ error: 'Invalid signature' });
      }

      const payload = request.body as GitLabPayload;
      const jobId = `review-${payload.project.id}-${payload.object_attributes.iid}-${payload.object_attributes.last_commit.id}`;

      await this.broker.enqueue(jobId, {
        projectId: payload.project.id,
        mrIid: payload.object_attributes.iid,
        sha: payload.object_attributes.last_commit.id,
        diffUrl: payload.object_attributes.diff_refs.base_sha,
      });

      return reply.code(202).send({ status: 'queued' });
    });
  }
}

Step 2: Single-Worker Execution & Concurrency Control

Subscription endpoints enforce strict per-account concurrency limits. The worker pool is pinned to exactly one active process. A mutex ensures only one Codex session runs at any time, preventing heuristic triggers that flag automated abuse.

import { Mutex } from 'async-mutex';
import { CodexOrchestrator } from './codex-orchestrator';
import { ThreadMapper } from './thread-mapper';
import { GitLabCommenter } from './gitlab-commenter';

export class ReviewWorker {
  private executionLock = new Mutex();

  constructor(
    private orchestrator: CodexOrchestrator,
    private mapper: ThreadMapper,
    private commenter: GitLabCommenter
  ) {}

  async processJob(job: ReviewJob) {
    const release = await this.executionLock.acquire();
    try {
      const worktree = await this.prepareWorktree(job.sha);
      const prompt = this.buildReviewPrompt(worktree);

      const result = await this.orchestrator.runReview(prompt);
      
      await this.commenter.postSummary(job, result.summary);
      await this.commenter.postInlineDiscussions(job, result.comments);

      await this.mapper.registerThreads(job, result.discussionIds);
    } finally {
      release();
    }
  }
}

Step 3: State Splitting Strategy

Persistent state is distributed across three isolated layers, each with a single responsibility:

Task Broker (Redis): Holds the job queue. Deterministic IDs guarantee idempotency. If Redis loses data, the next webhook re-enqueues the job. No sensitive payloads are stored here.
Thread Mapper (MariaDB): Stores lightweight relational mappings between GitLab discussion_id and internal codex_thread_id. Tables contain only identifiers and timestamps. LLM outputs, diffs, and prompts never touch the database.
Session Store (PVC): Codex CLI persists conversation transcripts as JSONL files. The worker stores only the thread reference; the actual context lives on a namespace-scoped persistent volume claim. Resuming a conversation requires a single SDK call: orchestrator.resumeThread(threadId).execute(prompt).

Step 4: Follow-Up Thread Resolution

When a developer replies to a bot comment with @review-agent, GitLab sends a webhook containing the discussion_id. The worker checks the thread mapper. If the discussion belongs to the bot, it retrieves the associated codex_thread_id and resumes the session. Replies without the mention are ignored to prevent bot-to-bot loops. New mentions outside existing threads spawn fresh sessions, which are immediately registered in the mapper for future continuity.

Step 5: Container Security Hardening

OpenShift's restricted-v2 Security Context Constraint blocks unprivileged user namespaces, making bwrap-based sandboxes (read-only, workspace-write) non-functional. The only viable execution mode is danger-full-access, which removes CLI-level sandboxing. Security must therefore be enforced at the pod and container boundary.

The hardened pod architecture isolates credential access, network egress, and shell execution:

Bot Container: Runs the Fastify gateway, task consumer, and Codex SDK. Holds /codex-home (PVC) for OAuth tokens and session JSONL files.
Execution Sidecar: A separate container mounts the /worktrees PVC but explicitly excludes /codex-home. A Unix socket bridges the bot and sidecar. Shell commands requested by the AI are forwarded to the sidecar, which executes them in an isolated mount namespace. Prompt injection attempts to read auth.json fail with ENOENT.
Egress Proxy: Squid enforces FQDN allowlisting. The bot container routes general traffic through :3128, while Codex SDK traffic routes through :3129. Only chatgpt.com, gitlab.com, and *.openai.com are permitted.
Network Policies: Pod-to-pod communication is restricted to in-cluster TCP. DNS resolution is handled by a CoreDNS sidecar that returns NXDOMAIN for external domains, preventing lateral DNS tunneling.

Pitfall Guide

1. Credential Leakage in Process Environments

Explanation: The Codex SDK spawns child processes that inherit the parent environment. If GITLAB_SA_TOKEN or OAuth refresh tokens are present in process.env, they become visible via /proc/<pid>/environ or shell tool subprocesses. Fix: Scrub sensitive variables before SDK initialization. Pass git credentials as transient argv arguments during clone operations, then reset the remote to a non-authenticated form. Disable git credential helpers with -c credential.helper= to prevent disk persistence.

2. Webhook Replay Storms

Explanation: GitLab retries failed webhooks up to 10 times with exponential backoff. Without idempotency, the queue processes the same review multiple times, wasting tokens and triggering duplicate comments. Fix: Generate deterministic job IDs using review-{projectId}-{mrIid}-{sha}. Use Redis SETNX or BullMQ's deduplication feature to reject duplicate job keys. Log rejected duplicates for monitoring.

3. Prompt Injection via Shell Execution

Explanation: AI models can be tricked into executing arbitrary commands. If the shell tool runs in the same container as the OAuth token, a malicious prompt can exfiltrate credentials or modify session files. Fix: Offload shell execution to a sidecar container with a restricted mount namespace. The sidecar mounts only the worktree PVC and the execution socket. It never mounts /codex-home. Use a Unix socket wrapper to forward (argv, cwd, env) without granting direct process access.

4. Subscription Rate Limiting & Heuristics

Explanation: ChatGPT subscriptions are licensed for interactive human use. Automated parallel sessions trigger abuse detection, resulting in temporary blocks or account restrictions. Fix: Pin concurrency to exactly one active session. Implement exponential backoff with jitter for rate limit responses. Rotate to a dedicated service account per deployment. Never fan out parallel reviews across multiple MRs simultaneously.

5. State Desynchronization Across Re-deploys

Explanation: Rolling updates or pod evictions can leave orphaned jobs in the queue or mismatched thread IDs in the database. The bot may attempt to resume a thread that no longer exists on the PVC. Fix: Implement a reconciliation job that runs on startup. Compare MariaDB thread mappings against existing JSONL files on the PVC. Archive or remove orphaned references. Use Kubernetes preStop hooks to drain the queue gracefully before termination.

6. Ignoring Container Security Contexts

Explanation: Assuming bwrap works in restricted Kubernetes environments leads to runtime failures. OpenShift SCCs and pod security standards often block seccompProfile: Unconfined and CAP_SYS_ADMIN. Fix: Design for danger-full-access from day one. Enforce security at the pod level using mount namespace isolation, network policies, and egress proxies. Validate SCC compatibility during CI pipeline testing before deployment.

7. Unbounded Context Window Growth

Explanation: Resuming threads indefinitely causes context windows to expand, increasing latency and eventually hitting token limits. Older feedback becomes irrelevant noise. Fix: Implement context trimming logic. After 5-7 turns, summarize the conversation history and replace the raw transcript with a condensed version. Pass the summary as system context while retaining the full JSONL for auditability.

Production Bundle

Action Checklist

Validate webhook signatures using constant-time comparison to prevent spoofing
Implement deterministic job IDs for idempotent queue processing
Pin worker concurrency to 1 and add mutex-based execution locking
Scrub sensitive environment variables before spawning Codex child processes
Deploy execution sidecar with restricted mount namespace for shell isolation
Configure egress proxy with strict FQDN allowlisting for OAuth and API endpoints
Implement context window trimming to prevent unbounded token growth
Add startup reconciliation job to sync thread mappings with PVC session files

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Small team (<10 devs), high PR volume	Stateful subscription agent	Flat-rate seat caps costs; conversation continuity reduces token waste	Low (fixed subscription)
Enterprise compliance, strict audit requirements	Stateless API pipeline	No persistent state; easier to justify per-token billing to finance	High (scales with volume)
OpenShift/Kubernetes with restricted SCC	Pod-level isolation + `danger-full-access`	`bwrap` unavailable; sidecar mount namespace provides equivalent security	Medium (infra complexity)
Multi-tenant SaaS platform	Stateless API pipeline	Subscription ToS prohibits multi-tenant automation; API billing scales cleanly	High (predictable per-request)

Configuration Template

# k8s-pod-spec.yaml (Excerpt)
apiVersion: v1
kind: Pod
metadata:
  name: codex-review-agent
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: bot
      image: review-agent:latest
      env:
        - name: CODEX_HOME
          value: /codex-home
        - name: HTTPS_PROXY
          value: http://squid-proxy:3128
        - name: CODEX_HTTPS_PROXY
          value: http://squid-proxy:3129
      volumeMounts:
        - name: codex-home
          mountPath: /codex-home
        - name: worktrees
          mountPath: /worktrees
        - name: exec-socket
          mountPath: /var/run/exec-sidecar
    - name: exec-sidecar
      image: exec-server:latest
      volumeMounts:
        - name: worktrees
          mountPath: /worktrees
        - name: exec-socket
          mountPath: /var/run/exec-sidecar
  volumes:
    - name: codex-home
      persistentVolumeClaim:
        claimName: codex-sessions-pvc
    - name: worktrees
      persistentVolumeClaim:
        claimName: worktrees-pvc
    - name: exec-socket
      emptyDir:
        medium: Memory

Quick Start Guide

Provision Infrastructure: Deploy Redis, MariaDB, and the Squid egress proxy. Create PVCs for /codex-home and /worktrees with namespace-scoped RBAC.
Configure OAuth: Authenticate the dedicated service account via codex login. Export the generated auth.json to the /codex-home PVC. Verify token refresh works offline.
Deploy the Agent: Apply the pod specification. Ensure the execution sidecar and bot container share the exec-socket emptyDir. Validate network policies restrict egress to chatgpt.com, gitlab.com, and *.openai.com.
Register Webhook: Configure GitLab project settings to point to the review gateway endpoint. Set the shared secret and enable merge request events. Test with a dummy MR to verify queue ingestion and comment posting.
Monitor & Tune: Track queue depth, thread mapping accuracy, and context window size. Adjust concurrency limits and context trimming thresholds based on review latency and subscription usage patterns.