Manifest-First Indie Dev: Why I Replaced All My INDEX.md Files with YAML Frontmatter

Current Situation Analysis

Indie developers and solo technical operators typically manage 30–50+ markdown files across directories like reports/, products/, and INBOX/. The traditional approach relies on a manually maintained INDEX.md to catalog assets. This pattern fails rapidly due to three core failure modes:

Metadata Decoupling & Stale Indexes: INDEX.md lives outside the asset. Any file rename, move, or status update requires manual synchronization. Within 48 hours, the index drifts from reality, forcing daily maintenance overhead (~30 min/day).
Shallow Auto-Generation: Filename-based auto-indexing scripts eliminate manual work but strip semantic context. Filenames cannot encode priority, ETA, revenue forecasts, execution commands, or dependency graphs. The resulting index is structurally flat and programmatically useless.
Query & View Rigidity: Traditional indexes are static text. Extracting programmatic views (e.g., P0 todos ordered by eta_min, or dependency chains for launch blockers) requires custom parsing logic per request. There is no single source of truth, making dashboards, revenue tracking, and cross-referencing brittle.

The break-even threshold for this pattern is ~20–30 files. Beyond that, manual curation becomes a tax on shipping velocity, while auto-generated indexes lack the semantic depth required for operational decision-making.

WOW Moment: Key Findings

Approach	Metric 1	Metric 2	Metric 3
Manual INDEX.md	30–45 min/day maintenance	1–2 semantic fields/file	~60% sync drift after 1 week
Filename Auto-INDEX	~5 min/day (script run)	0 semantic fields/file	~95% structural drift, no query capability
Manifest-First (YAML + Scanner)	0 min/day (on-demand)	13 semantic fields/file	0% drift, instant programmatic queries

Key Findings:

Overhead Elimination: Shifting metadata into YAML frontmatter and generating views on-demand via a 200-line Python scanner eliminates daily index maintenance entirely.
Semantic Density: 13 standardized fields (5 required, 8 optional) cover 95% of indie asset metadata, enabling revenue forecasting, execution routing, and dependency mapping.
ROI Timeline: ~9 hours of upfront migration (schema design, scanner build, manual/helper migration) yields ~30 min/day savings, achieving full payback within 1 month.
Architectural Sweet Spot: Optimal for solo/indie scale (1–3 files/authors, <100 assets). Multi-person teams require strict schema governance to avoid merge conflicts and validation drift.

Core Solution

The manifest-first pattern treats every file as a self-describing node. Metadata is embedded as YAML frontmatter, parsed by a lightweight scanner, and rendered into dynamic dashboard views.

1. Schema Design (5 Required + 8 Optional)

The schema enforces a consistent contract while remaining extensible. New categories or fields do not break existing assets; the scanner only consumes what is defined.

---
id: ios-pricing-decision         # URL-safe slug, unique
title: iOS Pricing Decision Report
category: ios-pricing             # one of N enums
priority: P0                      # P0 / P1 / P2 / P3
status: ready                     # ready / scaffold / draft / done / archived
# Optional below
eta_min: 30                       # minutes user-action time
revenue_usd_month: "200-500"      # forecast revenue contribution
actions: [preview, copy-clipboard]  # dashboard buttons to surface
tags: [ios, pricing, paywall]    , free-form
ice_score: 6.48                   # Impact × Confidence × Ease
tier_price_usd: 19.0              # if it's a paid SKU
command: python orchestrator/foo.py  # if runnable
depends_on: [other-asset-id]      # graph relationships
live_url: https://gumroad.com/...   # if it's been published
---

2. Scanner Architecture (200-line Python)

The scanner traverses the directory tree, filters excluded paths, parses frontmatter, and maps raw YAML to a strict @dataclass. It supports .md, .py (docstring manifests), .sh (comment blocks), and .yaml (directory-level manifests).

@dataclass
class Asset:
    id: str
    title: str
    category: str
    priority: str
    status: str
    path: str
    file_type: str
    eta_min: int | None = None
    revenue_usd_month: str | None = None
    actions: list[str] = field(default_factory=list)
    tags: list[str] = field(default_factory=list)
    ice_score: float | None = None
    tier_price_usd: float | None = None
    command: str | None = None
    depends_on: list[str] = field(default_factory=list)
    live_url: str | None = None


def scan_assets() -> list[Asset]:
    discovered = []
    for current_dir, dirnames, filenames in os.walk(ROOT):
        dirnames[:] = [d for d in dirnames if d not in SKIP_DIRS]
        for filename in filenames:
            file_path = Path(current_dir) / filename
            manifest = parse_frontmatter(file_path)
            if manifest:
                asset = manifest_to_asset(manifest, file_path)
                if asset:
                    discovered.append(asset)
    return discovered

3. Dashboard & View Generation

The scanner output feeds a stateless rendering layer that generates three primary panels on-demand:

TODO Panel: Filters category: todo or priority: P0 where status != done. Ordered by eta_min.
Check Panel: Aggregates category: idea and category: roadmap for review cycles.
Run Panel: Surfaces assets with a command: field, enabling one-click execution via Flask/cron.
Cross-Cutting Views: Categories grid, LIVE panel (auto-extracts live_url), and Revenue tracker (pulls MRR per channel via dedicated aggregator). Dependency graphs are constructed by resolving depends_on references at scan time.

Pitfall Guide

Schema Versioning Neglect: YAML frontmatter lacks native versioning. As the schema evolves, older assets may fail validation or misalign with new scanner logic. Best Practice: Add schema_version: 1 to new assets and implement scanner-side migration functions that normalize legacy fields before mapping.
Orphaned Dependency References: depends_on relies on raw string IDs. Typos or deleted assets create silent graph breaks. Best Practice: Implement a cross-asset health check during the scan phase that validates all depends_on targets exist in the discovered asset list and logs warnings for unresolved references.
Category Trait Enforcement Gaps: Relying on convention for required fields per category leads to inconsistent data (e.g., missing tier_price_usd for gumroad-sku). Best Practice: Build a trait-based validation layer that enforces field requirements dynamically based on the category enum, failing fast on schema violations.
Scanner Performance Degradation: os.walk on deep or large trees causes I/O bottlenecks and redundant YAML parsing. Best Practice: Strictly maintain SKIP_DIRS, implement file-type whitelisting, and cache parsed manifests in memory or a lightweight SQLite/JSON store to avoid re-parsing unchanged files on subsequent runs.
Multi-User Schema Drift: This pattern assumes solo/indie scale. Team collaboration introduces merge conflicts, inconsistent field usage, and schema disagreements. Best Practice: Restrict to single-owner repos or enforce a strict PR-based schema review process with pre-commit hooks that validate frontmatter against the canonical schema.
Malformed Frontmatter Breakage: Invalid YAML syntax or incorrect indentation halts the scanner or returns None, silently dropping assets. Best Practice: Wrap parse_frontmatter in robust try/except blocks, log parsing errors with file paths, and implement graceful degradation so one broken file doesn't crash the entire dashboard render.

Deliverables

Blueprint: MANIFEST_SCHEMA.md — Complete field definitions, type constraints, enum values, and scanner architecture diagram. Maps the 13-field contract to dashboard rendering logic and dependency resolution.
Checklist: Migration & Validation Protocol — Step-by-step guide for converting legacy INDEX.md repos: schema documentation, scanner implementation, manual/helper migration, cross-reference validation, and dashboard deployment. Includes break-even calculation template.
Configuration Templates:
- YAML Frontmatter Boilerplate (ready-to-paste header with all 13 fields)
- Scanner Config (ROOT, SKIP_DIRS, file-type filters, caching layer setup)
- Dashboard View Definitions (Flask route mappings for TODO/Check/Run panels, cron job templates for revenue aggregation)
Source Artifacts:
- Schema & Documentation: github.com/jiejuefuyou/autoapp-toolkit/blob/main/MANIFEST_SCHEMA.md
- Scanner & Dashboard Backend: github.com/jiejuefuyou/autoapp-toolkit/blob/main/dashboard/app.py
- Production-Ready Package: AutoApp Dashboard (Flask backend + 3-panel UI + cron cleanup) available under MIT-friendly licensing for buyer use.