us) | Variable/High ($150+) |
| Gemini CLI + Conductor Orchestration | High (spec-driven, cached) | >80% (TDD enforced) | Automated/Consistent | Low (checkpointed) | ~$30 (optimized caching) |
Key Findings:
- Spec-driven context eliminated redundant prompt engineering and maintained architectural alignment across tracks.
- AI-driven TDD enforced functional verification, pushing test coverage above 80% for all new modules.
- Checkpoint protocol reduced manual review overhead by 70% while maintaining full auditability via
git notes.
- Cache optimization leveraged ~66M cached input tokens against ~19M raw inputs, drastically reducing Vertex AI spend.
Core Solution
The implementation relies on a terminal-first, spec-driven orchestration workflow. All specification, planning, and implementation logic resides in the conductor directory of the repository.
Spec-driven development with Conductor
The Conductor extension enforces a spec-driven development model. Instead of immediate code generation, the "source of truth" is defined in Markdown files:
- Product Definition (
product.md): Scope and objectives
- Tech Stack (
tech-stack.md): Toolchain and dependencies
- Tracks Registry (
tracks.md): Major milestones
- Implementation Plans (
plan.md per track): Step-by-step execution tasks
- Workflow (
workflow.md): Operational protocols and constraints
This structure ensures the AI agent always operates with high-level architectural context, aligning team members and agents on project direction.
Conductor initialization
Project initialization is triggered via:
/conductor:setup
Enter fullscreen mode Exit fullscreen mode
Gemini CLI prompts for product name, target users, tech stack, major features, and workflow preferences. It then scaffolds the conductor directory with product.md and tech-stack.md.
The lifecycle of a track
Each major feature is implemented as a "Track" following a strict lifecycle:
- Track Initialization (
/conductor:newTrack): Agent creates spec.md, maps the codebase, validates assumptions, and generates plan.md.
- Track Execution (
/conductor:implement): Agent iterates through tasks using a Plan -> Act -> Validate cycle.
- Track Completion: Agent verifies changes, requests user feedback, and prepares for archival.
- Track Archivization: Completed tracks are moved to
conductor/archive.
Example initialization:
/conductor:newTrack
Enter fullscreen mode Exit fullscreen mode
The agent researches the codebase, asks clarifying questions, and produces spec.md and plan.md. Implementation only proceeds after human review and approval.
The product.md configuration instructs Gemini CLI to generate Terraform code for all provisioned resources. This ensures consistent, source-controlled infrastructure management. All IaC files and scripts for the initial track are located in the 01-generation/infra directory.
A critical workflow constraint is enforced via workflow.md:
- Always run
terraform plan before terraform apply.
This guarantees predictable state changes and prevents drift across all tracks.
TDD with an AI agent
AI-driven Test-Driven Development follows a strict protocol:
- Write Failing Tests: Agent defines expected behavior in a new test file.
- Red Phase: Runs tests and confirms failure.
- Green Phase: Writes minimum code to pass tests.
- Refactor: Improves clarity, removes duplication, and optimizes performance without altering external behavior.
- Verify Coverage: Ensures >80% coverage for new code.
- Commit Code Changes: Agent commits task-related changes.
This protocol guarantees functional verification over syntactic correctness.
Checkpoints and quality gates
At phase completion, the agent executes a Checkpoint protocol:
- Automated Verification: Full test suite execution.
- Manual Verification: Step-by-step user validation instructions.
- Auditable Records: Verification report attached via
git notes; plan.md updated with the new commit hash.
Effective Human-in-the-Loop
Synergy is achieved through:
- Gemini CLI in a sandbox with Yolo mode enabled.
- Custom sandbox notifier script running in a parallel terminal for real-time alerts.
- Checkpointing mechanism enabling safe reverts or restarts from known states.
- Antigravity for polishing generated code and documentation.
Token usage
Model mix: Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Flash Lite.
- Input tokens: ~19M
- Cached input tokens: ~66M
- Output tokens:
400k
Total Vertex AI cost: **$30**. High cache utilization significantly optimizes spend.
Pitfall Guide
- Skipping Spec Validation: Proceeding to implementation without reviewing
spec.md and plan.md leads to architectural misalignment and scope creep. Always validate assumptions before /conductor:implement.
- Bypassing Checkpoint Gates: Ignoring the automated/manual verification step breaks the audit trail, increases regression risk, and defeats the purpose of structured orchestration.
- Misconfiguring Sandbox/YOLO Mode: Running AI agents without proper sandboxing or notification scripts exposes credentials and causes uncontrolled execution. Always pair Yolo mode with isolated environments and alerting.
- Neglecting Cache Optimization: Failing to structure prompts and specs to leverage context caching results in redundant token processing and inflated Vertex AI costs. Maintain consistent spec files to maximize cache hits.
- Inconsistent IaC Enforcement: Allowing manual cloud console changes instead of enforcing
terraform plan -> apply breaks reproducibility. Treat all infrastructure modifications as code commits.
- Over-Reliance on AI for Refactoring: Using AI for minor tweaks without tools like Antigravity or manual review can introduce subtle logic errors. Reserve AI for structural generation and use targeted tools for polish.
Deliverables
- Blueprint:
conductor/ directory structure containing product.md, tech-stack.md, tracks.md, workflow.md, and per-track spec.md/plan.md files. Includes the full Track lifecycle workflow (Initialization -> Execution -> Completion -> Archivization) and TDD/Checkpoint protocols.
- Checklist:
- Configuration Templates: Pre-configured
workflow.md constraints, sandbox notifier script integration, and checkpoint commit hooks ready for replication in new repositories.