ommands. The state reflects what the system knows about the page's validity, not just its content.
State Definitions:
draft: Newly ingested or restored. Awaiting structural validation.
active: Passed lint checks. Safe for downstream consumption.
stale: Source file SHA-256 hash changed since last ingest. Content may be outdated.
contradicted: New source material conflicts with existing claims. Requires manual resolution.
archived: Retired from active circulation. Preserved for historical reference.
Automatic Transitions:
null β draft: Triggered by IngestAgent on initial compilation.
draft β active: Triggered by LintAgent after passing structural and consistency checks.
active β stale: Triggered by LintAgent when source hash mismatch is detected.
stale β draft: Triggered by IngestAgent during forced re-ingest.
any β contradicted: Triggered by IngestAgent when conflict detection identifies incompatible claims. Bypasses standard transition API.
Manual Transitions:
synthadoc lifecycle activate <slug>: Promotes draft β active without waiting for scheduled lint.
synthadoc lifecycle archive <slug>: Moves any state to archived.
synthadoc lifecycle restore <slug>: Moves archived β draft, re-queuing for lint.
Every transition writes an immutable record to the audit database, capturing timestamp, triggering agent/user, and reason string. This ledger enables forensic reconstruction of content evolution.
# View complete state history for a specific page
synthadoc lifecycle log payment-gateway-v2
Slug From To Triggered By Timestamp Reason
----------------------------------------------------------------------------------------------------
payment-gateway-v2 null draft ingest 2026-03-10T14:22:01 initial compilation
payment-gateway-v2 draft active lint 2026-03-10T14:45:18 structural validation passed
payment-gateway-v2 active stale lint 2026-04-02T03:11:44 source hash mismatch detected
payment-gateway-v2 stale draft ingest 2026-04-02T09:30:12 forced re-ingest applied
payment-gateway-v2 draft active lint 2026-04-02T09:52:07 validation passed post-update
2. Candidates Staging: The Pre-Admission Quarantine
Lifecycle management begins only after a page enters the wiki. Staging controls whether it enters at all. This quarantine layer prevents low-quality or unverified content from polluting search indices, context packs, and exports.
Staging Policies:
off: Direct admission to wiki/ as draft. No quarantine.
all: All new pages route to wiki/candidates/. Requires explicit promotion.
threshold: Pages meeting a minimum confidence rating bypass quarantine. Others route to wiki/candidates/.
Pages in wiki/candidates/ are invisible to search, excluded from lifecycle tracking, omitted from synthadoc status counts, and filtered out of all exports. They exist on disk but hold no operational weight until promoted.
# Configure threshold-based staging
synthadoc staging policy threshold --min-confidence high
# Review quarantined pages
synthadoc candidates list
Candidates (4):
auth-service-migration confidence: medium ingested: 2026-05-15
rate-limiter-config confidence: low ingested: 2026-05-15
webhook-retry-logic confidence: high ingested: 2026-05-15
db-connection-pooling confidence: medium ingested: 2026-05-15
# Promote verified content, discard noise
synthadoc candidates promote webhook-retry-logic
synthadoc candidates discard rate-limiter-config
3. Zero-Cost Provenance Export
Export serialization operates entirely on stored wiki state. No prompts, no completions, no additional LLM calls. Four machine-readable formats are available, each optimized for different downstream consumers. The --status flag enables precise filtering, ensuring only relevant states are serialized.
The JSON format is particularly valuable for instrumentation. It includes sentence-level provenance mapping, complete lifecycle history, and per-page compilation cost. This eliminates the need for custom tracking layers when building downstream tooling or compliance reports.
{
"slug": "payment-gateway-v2",
"status": "active",
"compilation_cost_usd": 0.0018,
"provenance": [
{
"claim": "Webhook retries use exponential backoff with jitter.",
"source_file": "raw_sources/stripe-docs.md",
"line_range": [112, 128]
},
{
"claim": "Idempotency keys must be UUIDv4 formatted.",
"source_file": "raw_sources/api-spec.yaml",
"line_range": [45, 51]
}
],
"lifecycle_history": [
{"from": "draft", "to": "active", "timestamp": "2026-03-10T14:45:18", "actor": "lint"},
{"from": "active", "to": "stale", "timestamp": "2026-04-02T03:11:44", "actor": "lint"}
]
}
Architecture Rationale:
- Orthogonal Design: Staging and lifecycle operate independently. Staging controls admission; lifecycle controls state. This prevents coupling and allows teams to disable quarantine without breaking state tracking.
- Hash-Based Drift Detection: SHA-256 comparison provides deterministic staleness signaling without semantic analysis. It's fast, reproducible, and immune to LLM hallucination.
- Immutable Audit Ledger: Every transition is appended, never overwritten. This supports compliance requirements and enables rollback analysis.
- State-Aware Serialization: Filtering by
--status ensures downstream consumers receive only validated content, reducing context window bloat and improving retrieval precision.
Pitfall Guide
1. Treating draft as Production-Ready
Explanation: Draft pages have not passed lint validation. They may contain structural inconsistencies, broken references, or incomplete claims.
Fix: Never export or index draft pages. Enforce active status as the minimum threshold for downstream consumption. Use synthadoc lint run to promote drafts systematically.
2. Bypassing Staging for Low-Confidence Ingests
Explanation: Disabling staging or setting an unrealistically low confidence threshold floods the wiki with unverified content. This increases lint queue backlog and pollutes search results.
Fix: Start with threshold policy set to high. Review candidates daily during the first month. Adjust only after establishing baseline quality metrics.
3. Exporting Unfiltered States to LLM Context Windows
Explanation: Dumping all pages into RAG context windows wastes tokens on stale, contradicted, or archived content. This degrades retrieval accuracy and increases costs.
Fix: Always use --status active for LLM context feeds. Reserve --status contradicted for forensic analysis and --status archived for historical audits.
4. Ignoring the contradicted State
Explanation: Contradicted pages indicate source conflicts. Leaving them unresolved creates ambiguity in downstream queries and breaks trust in the knowledge base.
Fix: Schedule weekly contradiction reviews. Use synthadoc export --status contradicted to isolate conflicts. Archive resolved pages or re-ingest with corrected sources.
5. Assuming Confidence Scores Replace Lint Validation
Explanation: Confidence ratings reflect generation quality, not structural validity. A high-confidence page can still fail lint checks due to formatting errors or missing citations.
Fix: Treat confidence as a staging filter, not a validation substitute. Always require lint passage before active promotion.
6. Misinterpreting Hash Mismatches as Content Errors
Explanation: SHA-256 drift only indicates source file modification, not necessarily factual inaccuracy. Minor formatting changes or comment updates trigger staleness.
Fix: Pair hash detection with semantic review. Use synthadoc lifecycle log <slug> to trace transition reasons. Re-ingest only when substantive changes are confirmed.
7. Neglecting Audit Log Rotation and Compliance
Explanation: Immutable audit trails grow indefinitely. Unmanaged logs consume storage and complicate compliance reporting.
Fix: Implement log archival policies. Export audit trails quarterly to cold storage. Retain active logs for 90 days for operational debugging.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Rapid Prototyping | staging off, export --status active | Minimizes friction during early development. Accepts higher staleness risk for speed. | Low operational overhead, moderate token waste if stale pages leak. |
| Regulated Compliance | staging all, full audit export, weekly contradiction review | Ensures human validation, complete provenance, and defensible state tracking. | Higher manual review cost, zero LLM export overhead, audit-ready. |
| High-Velocity Dev | staging threshold --min-confidence high, daily lint, active-only exports | Balances automation with quality gates. Keeps context windows lean. | Moderate review cost, optimal token efficiency, fast feedback loop. |
| LLM Context Feeding | Export --status active to JSON/llms.txt, filter by claim-level provenance | Eliminates stale/contradicted content from retrieval pipelines. Improves accuracy. | Zero additional API cost, reduced context window usage, higher retrieval precision. |
Configuration Template
# synthadoc.config.yaml
wiki:
name: "engineering-knowledge-base"
root: "./wiki"
candidates_dir: "./wiki/candidates"
staging:
policy: "threshold"
min_confidence: "high"
lifecycle:
auto_promote_drafts: false
lint_schedule: "0 2 * * *" # Daily at 2 AM
hash_algorithm: "sha256"
export:
default_format: "json"
include_provenance: true
include_lifecycle_history: true
include_compilation_cost: true
status_filter: "active" # Override via CLI flag
audit:
retention_days: 90
archive_path: "./audit-archive"
log_rotation: "quarterly"
Quick Start Guide
- Initialize the wiki structure: Run
synthadoc init --name my-project to generate the base directory layout and default configuration.
- Configure staging and lint: Edit
synthadoc.config.yaml to set staging.policy: threshold and staging.min_confidence: high. Verify lint schedule matches your ingestion cadence.
- Ingest and review: Execute
synthadoc ingest --source ./raw_sources. Check quarantined pages with synthadoc candidates list, then promote verified content using synthadoc candidates promote <slug>.
- Validate and export: Run
synthadoc lint run to promote drafts to active. Export verified content with synthadoc export --format json --status active --output ./exports/wiki.json for downstream consumption.