Four small CLIs to make your site visible to AI engines
Most of the GEO/SEO tooling on the market right now reads like it was written to sell a course, not to solve a problem.
So I wrote four tools instead.
Four Node CLIs, zero runtime dependencies, MIT, each one does one thing. They all live under the @geosuite scope on npm, and the source is at github.com/TryGeoSuite.
Here's what they do, and the design call behind each one.
1. @geosuite/ai-crawler-bots
What it does: tells you whether GPTBot, ClaudeBot, PerplexityBot, and ~20 other AI crawlers can actually reach your site, and where the block is coming from when they can't.
npx @geosuite/ai-crawler-bots robots https://your-site.com
Enter fullscreen mode Exit fullscreen mode
The non-obvious part: when a request comes back 403, the result distinguishes between an edge block (Cloudflare / CloudFront / Vercel / Akamai / Fastly / Netlify fingerprint in the response) and an origin block (no such fingerprint β your application or web server). The remediation is different in each case: edge means flip a toggle in your CDN dashboard, origin means update a config.
It also parses robots.txt with line-level provenance, so when a bot is Disallowed it tells you which line in which group did it. And it detects the # BEGIN Cloudflare Managed content β¦ # END Cloudflare Managed Content markers Cloudflare injects when "Block AI Bots" is enabled β if your own rules would have allowed the bot but the managed block disallows it, the report says so.
UA strings come from operator docs, not third-party SEO blogs that copy each other. We don't accept entries without a docs link.
2. @geosuite/schema-templates
What it does: ships 23 copy-paste-ready schema.org JSON-LD templates plus an offline structural validator.
npx @geosuite/schema-templates list
npx @geosuite/schema-templates show Product
Enter fullscreen mode Exit fullscreen mode
JSON-LD is the cheapest, least ambiguous signal you can give an AI assistant about what your page is. It will not on its own make ChatGPT cite you β authority and freshness still matter β but it removes a class of avoidable failures. The AI no longer has to guess your prices, your author, or whether a number on the page is a benchmark or a typo.
I deliberately excluded fields that aren't truly recommended for each type. Padding templates with every optional schema.org property dilutes the signal. If you need a field that's not there, schema.org is the source of truth β add it yourself.
There's also geosuite-schema fill <Type> --url <url> --ai if you want the LLM to populate placeholders from a real page, but the deterministic side (templates + validator) does not need a network or an API key.
3. @geosuite/llms-txt-generator
What it does: turns a sitemap.xml into an llms.txt file per the proposed standard at llmstxt.org.
npx @geosuite/llms-txt-generator https://your-site.com/sitemap.xml \
--name="Your Site" --enrich --out=public/llms.txt
Enter fullscreen mode Exit fullscreen mode
llms.txt is intended to be the LLM-shaped equivalent of a sitemap: a curated, sectioned, markdown index of your most important pages. The format is small enough to be parsed by classical tooling (regex) and also legible to a model β that's the point.
The generator is deterministic. With --enrich it fetches each URL once and pulls <title> + <meta name="description"> via regex. No headless browser, no LLM dependency in the default path. (--ai is opt-in if you want the LLM to rewrite descriptions; we send only URL + title + meta, never the page body.)
Sitemap-index files are flattened automatically. Pass them like a flat sitemap.
4. @geosuite/sitemap-builder
What it does: crawls a site and emits a valid sitemap.xml. For sites that ship without one (more common than you'd think on custom builds).
npx @geosuite/sitemap-builder https://your-site.com --output sitemap.xml
Enter fullscreen mode Exit fullscreen mode
BFS, same-origin only, three caps stack: page count, depth, wall-clock budget. Whichever fires first wins. Drops obvious non-HTML extensions and fragment-only links. Output is sitemaps.org-compliant β <loc> plus optional <lastmod>, no <changefreq> or <priority> (deprecated, ignored by every major engine).
Whole tool is around 250 lines of vanilla Node. No puppeteer, no cheerio, no axios. Just node:http, node:https, and a few regexes.
The design choices, all in one place
- Zero runtime dependencies. The four packages combined add ~0 install footprint to your project. The only exception is
llms-txt-generator, which depends onfast-xml-parserfor the sitemap-index path because writing your own XML parser is a footgun. - AI mode is opt-in. Every CLI has a
--aiflag. Without it, behaviour is fully deterministic. With it, payloads are minimal and structured (verdicts, titles, depths) β never raw HTML or page bodies. - One tool, one job. Composable via stdout/JSON. If you want to chain
sitemap-builderintollms-txt-generator, that's a single pipe. - Boring code. No clever metaprogramming. The whole stack is meant to be readable in an afternoon. If it isn't, that's a bug, not a feature.
Why open source the building blocks
The same checks power GeoSuite, the hosted product I'm building (history, alerts, dashboards, integrations into your content pipeline). But the building blocks belong open: I find it dishonest to sell a black box that does things any developer could verify.
If you find a bot UA missing β or worse, a wrong one β the place to send it is bots.json in ai-crawler-bots, with a link to the operator's docs. UA strings drift a couple of times per year per operator, and that file ages faster than anything else in the suite.
PRs and issues welcome. Especially the ones that prove me wrong.




