I Audited 30 llms.txt Files in the Wild. 5 Anti-Patterns Are Already Forming.
Engineering llms.txt for Agent Consumption: A Production-Grade Implementation Guide
Current Situation Analysis
The llms.txt standard has transitioned from a proposal to a de facto requirement for developer-facing infrastructure. Data from May 2026 indicates approximately 844,000 domains have adopted the format, reflecting a 500% year-over-year growth trajectory. Despite this rapid adoption, implementation quality remains critically low.
An audit of 30 production domains across AI labs, infrastructure providers, and developer tooling revealed that 80% contained structural flaws that degrade agent utility. The primary failure mode is a category error: engineering teams frequently treat llms.txt as a secondary sitemap or a marketing landing page rather than a structured index for machine consumption.
This misalignment creates immediate technical debt. When a file exceeds the recommended 10KB threshold or lists URLs blocked by robots.txt, it wastes context window budget and confuses retrieval pipelines. While major model providers have not issued formal endorsements, the consumption layer is already adapting. IDE agents such as Cursor, Cline, and Continue, alongside AI search engines like Perplexity and ChatGPT Search, actively parse these files. The cost of a robust implementation is negligible compared to the risk of training agents to ignore your domain due to poor signal-to-noise ratios.
WOW Moment: Key Findings
The divergence between a naive implementation and an optimized one is measurable across three dimensions: context efficiency, parse success rate, and maintenance overhead. The highest-leverage optimization is the .md companion pattern, which bridges the gap between discovery and clean content ingestion.
| Implementation Strategy | Context Efficiency | Agent Parse Success | Maintenance Overhead |
|---|---|---|---|
| Flat Sitemap Dump | Low (High token waste) | Moderate (HTML noise) | Low (Initial) / High (Staleness) |
| Curated + .md Twins | High (Focused tokens) | High (Clean Markdown) | Medium (Build integration) |
| Per-Product Sharding | Very High (On-demand fetch) | Very High | High (Orchestration) |
Why this matters: The .md companion pattern, originally proposed by Jeremy Howard, allows agents to fetch clean Markdown without executing JavaScript or parsing navigation clutter. Only 20% of audited domains implemented this. Teams that adopted .md twins saw a significant reduction in parsing errors and improved retrieval accuracy for documentation-heavy queries.
Core Solution
The most reliable approach to llms.txt is automation. Hand-curated files inevitably suffer from staleness and human error. The recommended architecture involves a typed configuration source, a build-time generator, and validation against crawl policies.
Architecture Decisions
- Source of Truth: Define links and metadata in a TypeScript configuration file. This enables type safety and integration with existing build pipelines.
- Build-Time Generation: Generate
llms.txtduring the CI/CD process. This ensures the file always reflects the current state of the documentation. - Markdown Companion Generation: Implement a mechanism to serve clean Markdown at a
.mdsuffix. This can be achieved via build-time file generation or edge function routing. - Robots.txt Validation: Include a validation step that cross-references
llms.txtURLs againstrobots.txtrules for known AI crawlers.
Implementation: TypeScript Generator
The following example demonstrates a generator script that reads a configuration, enforces link limits, and outputs a compliant llms.txt.
llms.config.ts
export interface LLMsLink {
title: string;
url: string;
description: string;
priority: 'high' | 'medium' | 'low';
}
export interface LLMsConfig {
siteName: string;
summary: string;
primaryLinks: LLMsLink[];
optionalLinks?: LLMsLink[];
products?: {
name: string;
file: string;
description: string;
}[];
}
export const config: LLMsConfig = {
siteName: "DevKit Platform",
summary: "Comprehensive documentation and API references for the DevKit infrastructure suite.",
primaryLinks: [
{
title: "Getting Started",
url: "/docs/getting-started",
description: "Installation and quickstart guide for new users.",
priority: "high",
},
{
title: "API Reference",
url: "/docs/api",
description: "Complete REST and GraphQL API documentation.",
priority: "high",
},
{
title: "Authentication",
url: "/docs/auth",
description: "OAuth2 and API key management guides.",
priority: "medium",
},
],
optionalLinks: [
{
title: "Migration Guide v2",
url: "/docs/migration-v2",
description: "Steps to migrate from version 1 to version 2.",
priority: "low",
},
],
products: [
{
name: "DevKit Compute",
file: "/llms-compute.txt",
description: "Documentation specific to the compute engine.",
},
],
};
scripts/generate-llms.ts
import fs from 'fs';
import path from 'path';
import { config } from '../llms.config';
const MAX_PRIMARY_LINKS = 20;
const MAX_FILE_SIZE_KB = 10;
function generateMarkdownUrl(url: string): string {
// Append .md suffix for clean Markdown consumption
return `${url}.md`;
}
function validateLinks(links: typeof config.primaryLinks): void {
if (links.length > MAX_PRIMARY_LINKS) {
throw new Error(
`Primary links exceed limit of ${MAX_PRIMARY_LINKS}. Current: ${links.length}. Move excess to optionalLinks.`
);
}
}
function buildContent(): string {
validateLinks(config.primaryLinks);
let content = `# ${config.siteName}\n\n`;
content += `> ${config.summary}\n\n`;
content += `## Primary Resources\n\n`;
config.primaryLinks.forEach((link) => {
const mdUrl = generateMarkdownUrl(link.url);
content += `- [${link.title}](${mdUrl}): ${link.description}\n`;
});
if (config.optionalLinks && config.optionalLinks.length > 0) {
content += `\n## Optional Resources\n\n`;
config.optionalLinks.forEach((link) => {
const mdUrl = generateMarkdownUrl(link.url);
content += `- [${link.title}](${mdUrl}): ${link.description}\n`;
});
}
if (config.products) {
content += `\n## Product Documentation\n\n`;
config.products.forEach((product) => {
content += `- [${product.name}](${product.file}): ${product.description}\n`;
});
}
return content;
}
function writeToFile(content: string): void {
const outputPath = path.join(process.cwd(), 'public', 'llms.txt');
fs.writeFileSync(outputPath, content, 'utf-8');
const sizeBytes = Buffer.byteLength(content, 'utf-8');
const sizeKB = sizeBytes / 1024;
if (sizeKB > MAX_FILE_SIZE_KB) {
console.warn(`Warning: llms.txt size (${sizeKB.toFixed(2)}KB) exceeds recommended 10KB.`);
}
console.log(`Generated llms.txt at ${outputPath} (${sizeKB.toFixed(2)}KB)`);
}
// Execution
const content = buildContent();
writeToFile(content);
Rationale
- Type Safety: The configuration interface prevents malformed entries and ensures descriptions are always present.
- Link Limits: The validator enforces the 20-link cap, forcing teams to prioritize content. Excess links must be moved to
optionalLinksor sharded. - Markdown Suffix: The
generateMarkdownUrlfunction automatically appends.md, encouraging the companion pattern without manual effort. - Sharding Support: The
productsarray enables the Cloudflare pattern, where a root file links to per-product files, keeping individual files small and focused.
Pitfall Guide
1. The Sitemap Trap
Explanation: Treating llms.txt as a comprehensive index of all pages. Files with hundreds of links waste context window budget and dilute the signal for high-value content.
Fix: Enforce a hard limit of 10β20 primary links. Use ## Optional for secondary content. For large documentation sets, shard by product or topic.
2. Robots.txt Inconsistency
Explanation: Listing URLs in llms.txt that are blocked by robots.txt for AI crawlers. This creates a contradiction where the index points to inaccessible resources.
Fix: Implement a CI check that diffs llms.txt URLs against robots.txt rules for agents like GPTBot and ClaudeBot. Ensure all listed paths are allowed.
3. HTML Parsing Barrier
Explanation: Providing only HTML URLs forces agents to parse navigation, ads, and scripts. This increases token usage and introduces noise.
Fix: Implement .md companions. For static sites, generate Markdown files at build time. For dynamic sites, use an edge function to return serialized Markdown on .md requests.
4. Narrative Contamination
Explanation: Including marketing copy, mission statements, or founder quotes in the file body. LLMs require structured pointers, not prose. Fix: Restrict the file structure to H1, blockquote summary, and link lists. Remove all narrative content. The summary should be a functional description, not a pitch.
5. Staleness Decay
Explanation: Files that are shipped once and never updated. Links rot, product names change, and versions become obsolete. Fix: Automate generation. Integrate with analytics to rotate "featured" links quarterly. Add a CI step that validates URLs return 200 status codes.
Production Bundle
Action Checklist
- Define
llms.config.tswith typed links and metadata. - Implement a build-time generator script to output
llms.txt. - Enforce a maximum of 20 primary links; move excess to
optionalLinks. - Add
.mdsuffix generation for all primary URLs. - Create a CI validation step to check
robots.txtalignment. - Remove all marketing copy; retain only H1, summary, and links.
- Configure sharding for multi-product domains using the
productsarray. - Schedule quarterly reviews to update links based on analytics data.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Single Product, <50 Pages | Monolithic llms.txt |
Simplicity outweighs sharding complexity. | Low |
| Multi-Product, >200 Pages | Sharded llms.txt |
Prevents file bloat; enables targeted fetching. | Medium |
| High Churn Documentation | Dynamic Generation | Ensures freshness; reduces manual maintenance. | Medium |
| Legacy CMS | Edge Function .md |
Avoids build-time complexity; serves clean content on demand. | Low |
Configuration Template
Copy this template to initialize your llms.txt configuration. Adjust the primaryLinks to reflect your most critical documentation.
// llms.config.ts
export const config = {
siteName: "Your Product Name",
summary: "A concise, functional description of your platform and documentation scope.",
primaryLinks: [
{
title: "Core Feature Guide",
url: "/docs/core-feature",
description: "Detailed guide for the primary use case.",
priority: "high",
},
// Add up to 20 links max
],
optionalLinks: [
{
title: "Advanced Configuration",
url: "/docs/advanced",
description: "Deep dive into configuration options.",
priority: "low",
},
],
// Use products array for sharding
products: [],
};
Quick Start Guide
- Install Dependencies: Ensure your project supports TypeScript and file system operations.
- Create Config: Add
llms.config.tsto your project root with your site metadata and links. - Add Generator: Copy
scripts/generate-llms.tsand update imports to match your config. - Integrate Build: Add the generator script to your build pipeline (e.g.,
npm run buildor CI step). - Verify: Run the build and check
public/llms.txtfor correct formatting, link limits, and.mdsuffixes. Validate againstrobots.txt.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
