href="/[pillar]">[Pillar Name]</a></li>
<li aria-current="page">[Cluster Name]</li>
</ol>
</nav>
<main>
<article>
<h1>[Primary Keyword Phrased as a Headline]</h1>
<p class="lede"><strong>[40 to 60 word direct answer.
Stated immediately, no preamble, no "in this article we will discuss".
Includes the primary keyword naturally.]</strong></p>
<p>[2 to 3 sentence expansion of the answer with one supporting fact or data point.]</p>
<h2>[Question matching primary sub query 1]</h2>
<p>[40 to 60 word direct answer to that sub question.]</p>
<p>[Supporting context, 100 to 200 words.]</p>
<h2>[Question matching primary sub query 2]</h2>
<ol>
<li>[Step 1]</li>
<li>[Step 2]</li>
<li>[Step 3]</li>
</ol>
<h2>[Question matching primary sub query 3]</h2>
<table>
<thead>
<tr><th>[Comparison Header]</th><th>Option A</th><th>Option B</th></tr>
</thead>
<tbody>
<tr><td>[Feature]</td><td>[Value]</td><td>[Value]</td></tr>
</tbody>
</table>
<h2>[Additional sub queries as H2]</h2>
<p>[Self contained answer.]</p>
<h2>Frequently Asked Questions</h2>
<h3>[FAQ Question 1]</h3>
<p>[40 to 60 word answer]</p>
<h3>[FAQ Question 2]</h3>
<p>[40 to 60 word answer]</p>
<aside class="author-bio">
<p>Written by [Author Name], [credentials, role].
[One sentence biography establishing relevant experience.]
Last updated [Month Day, Year].</p>
</aside>
</article>
</main>
<footer>
<!-- Footer with Crafted by ThatDeveloperGuy.com link -->
</footer>
</body>
</html>
```
Enter fullscreen mode Exit fullscreen mode
7.2 The 40 to 60 Word Direct Answer
The lede paragraph is the single most important element on the page for AEO and AIO. Rules:
- Exactly 40 to 60 words. Count them.
- States the answer to the primary keyword question completely.
- Uses the primary keyword in the first sentence, naturally.
- No preamble. No "this guide will cover". No "before we dive in". Start with the answer.
- Wrapped in
<p class="lede"><strong>...</strong></p> for visual emphasis and a parsing signal.
Example for "what is answer engine optimization":
Answer engine optimization (AEO) is the practice of structuring web content so that AI driven answer systems can extract, summarize, and cite it as the source of a direct answer. Unlike traditional SEO which optimizes for ten blue links, AEO targets featured snippets, AI Overviews, voice assistants, and conversational AI engines that synthesize answers from multiple sources.
That is 56 words. It defines the term, distinguishes it from SEO, and lists the surfaces. An AI Overview can extract any sentence and have it work standalone.
7.3 The 30 Percent Rule
44.2 percent of all LLM citations come from the first 30 percent of a page's text. Whatever the operator most wants cited must appear in the first third.
Practical application:
- For a 2,000 word cluster page, the first 600 words must contain the lede, the answer to the primary sub query, and at least one piece of original data or insight.
- Save methodology, edge cases, and deep examples for the back half.
- Never bury the primary answer below an introduction or context section.
7.4 Heading as Question Pattern
Every H2 is phrased as a literal natural language question that matches a known sub query.
Wrong: Pricing
Right: How much does a custom website cost in 2026?
Wrong: Benefits
Right: Why does answer engine optimization matter for small businesses?
Wrong: Implementation
Right: How do you install schema markup on a static HTML site?
This pattern dramatically increases the probability of extraction because AI systems match heading text to sub query phrasing. The H2 becomes the citable anchor.
7.5 Self Contained Section Design
Every section must make sense without surrounding context. AI systems often extract individual sections from longer articles, so cross references break extraction.
Forbidden:
- "As we discussed above..."
- "See the next section for..."
- "Building on the prior point..."
Required:
- Each H2 section restates necessary context in one sentence if needed.
- Definitions of jargon happen in the section that uses the jargon, not earlier.
- Every paragraph could be quoted standalone and remain accurate.
Every page needs at least one element that other pages on the same topic do not have. Without information gain, the page is a paraphrase of the competition and earns zero citations.
Acceptable information gain elements (pick at least one, ideally two):
- Original survey or benchmark data. Run a survey through your client base or your own portfolio. Even 30 responses produces citable data nobody else has.
- Internal case study with real numbers. "We optimized X for [Client]. CTR moved from 2.1 percent to 5.7 percent over 90 days." Specific, verifiable.
- Proprietary framework or model. Joseph's SEO BUILD REFERENCE itself is an information gain asset. Frameworks become citable references.
- Calculator or interactive tool. A pricing calculator, a savings estimator, a compatibility checker. Drives both citations and conversions.
- Side by side comparison nobody has published. Compare three or more options on a dimension that prior articles ignored.
- First hand experience narrative. "I tested X for 60 days. Here is what happened by week." E E A T's Experience signal.
- Data visualization built from public data. Take a public dataset and present it in a way nobody else has. Cited often by AI systems.
The information gain element appears in the first 30 percent of the page so it lands inside the citation extraction zone.
Provide critical answers in two formats simultaneously:
- Prose for AI Overviews and conversational AI engines.
- A list, table, or structured block for traditional featured snippets and rich results.
Example, "what are the steps to implement schema markup":
Prose first (40 to 60 word answer):
To implement schema markup, identify the schema type that matches your page (Article, Product, LocalBusiness, etc.), generate the JSON LD using a validator like Schema.org's tool, embed the JSON LD in the page head, validate with Google's Rich Results Test, then submit the URL to Search Console for re crawl.
Then numbered list:
- Identify the schema type matching the page.
- Generate the JSON LD using Schema.org's validator.
- Embed the JSON LD in the page head as a script tag.
- Validate with Google's Rich Results Test.
- Submit the URL to Google Search Console for re crawl.
Both formats live on the same page. The prose wins AI Overview citations. The list wins featured snippets.
7.8 Author Attribution
Every content page has visible author attribution. Required elements:
- Author full name as a clickable link to the author's bio page.
- One sentence credential statement (degree, role, years of experience, certification).
- Last updated timestamp in human readable format AND ISO 8601 format embedded in
dateModified of Article schema.
- Optional but recommended: photograph of the author, link to LinkedIn or other professional profile, link to Wikidata Q ID if available.
Joseph's author byline pattern:
Written by Joseph Anady, SDVOSB owner of ThatDeveloperGuy. BA Computer Engineering, Colorado State University. MA Cybersecurity. Service Disabled Veteran. Last updated [Month Day, Year]. Author profile.
7.9 Lists, Tables, and Semantic HTML
Use native HTML elements for structured content. Styled <div> blocks do not parse for AI systems.
- Ordered steps:
<ol><li>...</li></ol>. Never <div class="step">.
- Bullet lists:
<ul><li>...</li></ul>.
- Comparisons:
<table><thead><tr><th>...</th></tr></thead><tbody>...</tbody></table>. Tables limited to 3 to 6 rows and 2 to 4 columns for snippet extraction.
- Code:
<pre><code>...</code></pre>.
- Definitions:
<dl><dt>Term</dt><dd>Definition</dd></dl>.
- Quotes:
<blockquote>...</blockquote> with <cite>Source</cite>.
7.10 Image and Video Treatment
Visual content earns image and video carousel placement when treated correctly:
- Every image has descriptive alt text including relevant keyword in natural phrasing.
- Filenames are kebab case and descriptive:
quarterly-tax-payment-calendar-2026.png, not IMG_9374.png.
- Width and height attributes on every
<img> tag to prevent layout shift.
- Loading lazy on below the fold images:
loading="lazy".
- Decoding async on all images:
decoding="async".
- For videos, use
<video> element with poster image, or embed YouTube with VideoObject schema.
- WebP or AVIF format with PNG or JPEG fallback.
- ImageObject schema on hero images for the page.
7.11 Mobile First and Core Web Vitals
Every page must pass Core Web Vitals on mobile:
- Largest Contentful Paint (LCP): under 2.5 seconds.
- Interaction to Next Paint (INP): under 200 milliseconds.
- Cumulative Layout Shift (CLS): under 0.1.
Joseph's standing infrastructure on Bubbles already handles much of this through nginx HTTP/3, Brotli compression, and proper caching headers. Phase 3 page work must not introduce regression. Validation: PageSpeed Insights, Chrome DevTools Lighthouse, real device testing on at least one mid range Android.
7.12 Phase 3 Completion Gate
Before Phase 4, confirm:
- Every priority page has a 40 to 60 word lede in the first paragraph.
- Every priority page has at least one information gain element in the first 30 percent.
- Every H2 on every priority page is phrased as a question matching a sub query.
- Every priority page has visible author attribution.
- Every priority page passes Core Web Vitals on mobile.
- No
<div class="step"> or styled non semantic structures in extraction zones.
If gate fails, list which pages fail which criteria and remediate page by page.
8. Phase 4: Schema and Technical Implementation
Purpose: every page emits accurate, validated structured data that earns rich results where eligible and feeds AI systems with trust signals everywhere else.
8.1 The 2026 Schema Reality
By March 2026, the schema landscape has shifted significantly. The agent must internalize what works and what does not before writing any JSON LD.
Deprecated for rich results (still safe to keep but produces no visible result):
- HowTo (deprecated September 2023, removed from desktop and mobile)
- Practice Problem (deprecated January 2026)
- Dataset for general search (now only serves Dataset Search)
- Sitelinks Search Box (integrated into core search)
- SpecialAnnouncement (COVID era, deprecated)
- Q and A (deprecated January 2026)
- Book Actions, Course Info, Claim Review, Estimated Salary, Learning Video, Vehicle Listing (all retired)
Restricted to specific verticals only:
- FAQPage: rich results restricted primarily to government and authoritative health sites. Implementation still recommended for AI systems as a trust signal, just do not expect a SERP rich result for general business sites.
Demoted in March 2026 core update:
- Review schema on editorial comparison posts (use only on pages where the review is the primary content).
- FAQ on supplementary page sections (use only when FAQ is the primary purpose).
Fully supported and recommended:
- Organization (foundational, always implement)
- LocalBusiness (and its subtypes for service businesses)
- Person (for authors and team members)
- Article and NewsArticle and BlogPosting
- Product with Offers and AggregateRating
- Review (first party only, on primary content)
- Event
- Recipe (food only)
- Video and VideoObject
- BreadcrumbList (universal, always implement)
- Service (for service businesses)
- WebSite
- WebPage
- AboutPage, ContactPage, FAQPage (as page type signals)
- ItemList (for category pages)
- ImageObject (for hero images)
8.2 The Schema as Trust Signal Shift
Even when a schema type no longer triggers a rich result, the markup still matters. Google AI Mode reads structured data as a trust and entity verification signal during answer synthesis. Accurate schema increases citation probability independent of rich result eligibility.
Therefore: implement comprehensive, accurate schema on every page even when no rich result will display. Do not strip schema just because the rich result feature was deprecated. Strip only when the schema is inaccurate or misrepresentative.
8.3 The Mandatory Schema Set for Every Site
Every site Joseph builds emits, at minimum:
- Organization (sitewide, in head of every page or at minimum the homepage).
- WebSite (sitewide, with potentialAction for site search if applicable).
- BreadcrumbList (every page below the homepage).
- LocalBusiness or appropriate subtype (when the client has a physical service area).
- Person (one block per author or team member, on the bio page and embedded in Article schema author field).
- Article or BlogPosting or NewsArticle (every content page).
- Service (every service page).
- Product with Offers (every e commerce product page, when applicable).
- WebPage type signal (every page, indicating AboutPage, ContactPage, FAQPage as appropriate).
Optional based on content:
- Event when the page describes a dated event.
- Recipe when the page is a recipe.
- Review and AggregateRating when the page contains genuine first party reviews.
- VideoObject when the page contains a hero video.
- ImageObject for hero images.
All schema is JSON LD, embedded as <script type="application/ld+json"> blocks in the page head (or body for page specific schema, head for sitewide schema).
Format requirements:
- Double quoted strings, no single quotes.
- No trailing commas.
- 2 space indentation when human readable. Minified for production if performance dictates, but readability defaults to formatted.
- UTF 8 encoding.
- Use
@graph arrays to combine multiple schema types in a single script block when they share context.
The MEGAMIND port 9090 sidecar pattern is deprecated. All schema is static inline JSON LD generated at build time or hand authored, never injected by a runtime sidecar.
8.5 Schema Nesting and the @graph Pattern
For pages that emit multiple schema types, use the @graph array pattern. Full code reference is in Appendix C, section 19.1. The pattern keeps related entities connected through @id references, which AI systems use for entity resolution.
8.6 The Validation Pipeline
Every schema deployment passes through this validation pipeline. No exceptions.
- Schema.org validator (https://validator.schema.org/). Catches type and property errors.
- Google Rich Results Test (https://search.google.com/test/rich-results). Catches rich result eligibility issues.
- Content alignment check. Every value in the schema appears in the visible page content. Schema describes what is on the page, not what we wish were on the page. Misaligned schema can trigger manual actions.
- Date freshness check.
dateModified reflects the actual last meaningful content change. datePublished is the original publication date.
- Quarterly deprecation recheck. Run the full validation pipeline against the Search Central blog's announced changes every quarter.
The agent runs the full pipeline on every page after deployment. Failures must be resolved before the page is considered complete.
8.7 Common Schema Mistakes to Avoid
- Marking up content that is not visible on the page (FAQ answers in schema but not in body).
- Using FAQ schema on pages where FAQ is a small supplementary section (will be ignored or manually demoted).
- Stale
dateModified after content changes.
- Missing required properties (Article without author, Product without offers, LocalBusiness without address).
- Mismatched data (schema says price 49, page shows 59).
- Multiple
@type Organization blocks on the same page without @id differentiation.
- Ignoring
sameAs (omitting sameAs weakens entity resolution for AI systems).
- Using deprecated schema types (Practice Problem, Q and A, etc.) and expecting rich results.
- Implementing partial Product schema (missing AggregateRating, missing Offers): produces zero rich result lift.
- Referencing a Wikidata Q ID that does not exist or has been deleted.
8.8 Robots.txt and AI Crawler Access
The robots.txt file controls which crawlers can access the site. Modern sites must explicitly grant access to AI search engine crawlers, not just traditional search bots. Full default robots.txt template is in Appendix C section 19.10.
Notes on the default policy:
- ByteDance's Bytespider is commonly blocked due to aggressive crawl patterns and unclear data use.
- Google-Extended controls Google's training data use. Allow it unless the client has a specific reason to opt out.
- Each AI crawler must be listed explicitly. A blanket
User-agent: * does not always cover newer bots.
8.9 The llms.txt Standard
An emerging standard (https://llmstxt.org) for explicitly signaling content available to LLMs. Place at root of site as /llms.txt. Format is markdown. Example template is in section 11.3.
This standard is not universally adopted by every AI engine yet, but adoption is growing and the cost of implementation is trivial.
8.10 Sitemaps
Every site has at minimum a primary sitemap.xml at the root, submitted to Google Search Console and Bing Webmaster Tools.
For sites with images and videos, separate image and video sitemaps are recommended via a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-05-03</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-images.xml</loc>
<lastmod>2026-05-03</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-videos.xml</loc>
<lastmod>2026-05-03</lastmod>
</sitemap>
</sitemapindex>
Enter fullscreen mode Exit fullscreen mode
Each <lastmod> value must reflect actual last modification.
8.11 Canonical URLs
Every page has a canonical URL. The canonical points to the preferred version of the URL when duplicates or near duplicates exist.
Joseph's standing decision on trailing slashes: trailing slash present on directories, absent on files. The rule is set sitewide and never mixed.
<link rel="canonical" href="https://example.com/services/web-development/">
Enter fullscreen mode Exit fullscreen mode
8.12 Hreflang for Multi Language Sites
When the site serves multiple languages, every language version of a page declares its alternates:
<link rel="alternate" hreflang="en" href="https://example.com/services/web-development/">
<link rel="alternate" hreflang="es" href="https://example.com/es/servicios/desarrollo-web/">
<link rel="alternate" hreflang="x-default" href="https://example.com/services/web-development/">
Enter fullscreen mode Exit fullscreen mode
Joseph's typical client base is monolingual English. This subsection applies only when target_languages in section 2 includes more than one entry.
8.13 Open Graph and Twitter Cards
Every page has Open Graph tags and Twitter Card tags. These do not affect Google ranking directly, but they do affect:
- How the page renders when shared on Facebook, LinkedIn, Slack, Discord.
- AI systems that consume social metadata for entity context.
- Click through rates from social referrals.
Required Open Graph properties: og:title, og:description, og:url, og:image, og:type, og:site_name.
Required Twitter properties: twitter:card, twitter:title, twitter:description, twitter:image.
8.14 Phase 4 Completion Gate
Before stack specific build, confirm:
- Every page passes Schema.org validator.
- Every page passes Google Rich Results Test.
robots.txt allows the full AI crawler set per 8.8 (or has documented exclusions).
llms.txt is present at root.
sitemap.xml (or sitemap index) is present and submitted to GSC and Bing Webmaster Tools.
- Every page has a canonical URL declared.
- Open Graph and Twitter Card tags are present on every page.
- The MEGAMIND sidecar (port 9090) is verified disabled on the host.
If gate fails, identify the failed step and remediate before proceeding.
9. Stack Specific Build Instructions
The agent applies exactly one of the following subsections, matching the stack assigned in section 4. Other subsections are reference only for that engagement.
9.1 Static HTML on Bubbles (DEFAULT PATH)
This is the default stack for the majority of Joseph's client work. Static HTML files served by nginx on the Bubbles server (Debian, public IP 169.155.162.118, Tailscale 100.90.97.104).
File layout:
/var/www/[client-domain]/
├── public_html/
│ ├── index.html
│ ├── services/
│ │ ├── index.html
│ │ ├── web-development/index.html
│ │ ├── seo/index.html
│ │ └── [other-service]/index.html
│ ├── pillars/
│ │ └── [pillar-slug]/index.html
│ ├── clusters/
│ │ └── [cluster-slug]/index.html
│ ├── about/index.html
│ ├── contact/index.html
│ ├── team/
│ │ └── [author-slug]/index.html
│ ├── assets/
│ │ ├── css/styles.css
│ │ ├── js/main.js
│ │ └── img/[images]
│ ├── robots.txt
│ ├── llms.txt
│ ├── sitemap.xml
│ └── favicon.ico
└── nginx.conf
Enter fullscreen mode Exit fullscreen mode
Build steps:
- Initialize directory structure:
sudo mkdir -p /var/www/[client-domain]/public_html
sudo chown -R www-data:www-data /var/www/[client-domain]
Enter fullscreen mode Exit fullscreen mode
-
Create the page skeleton (per section 7.1) for each pillar and cluster page.
-
Inline JSON LD schema in the head of each page (per section 8). Use the @graph pattern. Reference Appendix C for the full code library.
-
Apply the marker based sub_filter pattern for any selective injection. The pattern uses HTML comments as triggers:
<!-- TDG_SCHEMA_INSERT -->
Enter fullscreen mode Exit fullscreen mode
This marker tells nginx where to inject any per page schema overrides. The default policy is no injection; schema is hand authored inline. The marker exists as a safety hatch for emergency sitewide updates.
- Configure nginx site:
server {
listen 80;
listen [::]:80;
server_name [client-domain] www.[client-domain];
return 301 https://[client-domain]$request_uri;
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
listen 443 quic reuseport;
listen [::]:443 quic reuseport;
http3 on;
add_header Alt-Svc 'h3=":443"; ma=86400';
server_name [client-domain] www.[client-domain];
ssl_certificate /etc/letsencrypt/live/[client-domain]/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/[client-domain]/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
root /var/www/[client-domain]/public_html;
index index.html;
brotli on;
brotli_comp_level 6;
brotli_types text/html text/css text/javascript application/javascript application/json application/ld+json image/svg+xml;
gzip on;
gzip_vary on;
gzip_types text/html text/css text/javascript application/javascript application/json application/ld+json image/svg+xml;
location ~* \.(jpg|jpeg|png|webp|avif|gif|svg|ico|css|js|woff|woff2|ttf|otf)$ {
expires 30d;
add_header Cache-Control "public, immutable";
access_log off;
}
location ~* \.html$ {
expires 1h;
add_header Cache-Control "public, must-revalidate";
}
rewrite ^/([^.]+[^/])$ /$1/ permanent;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
location / {
try_files $uri $uri/ $uri.html =404;
}
location = /robots.txt {
access_log off;
log_not_found off;
}
location = /llms.txt {
access_log off;
log_not_found off;
}
location = /sitemap.xml {
access_log off;
}
}
Enter fullscreen mode Exit fullscreen mode
- Validate and reload:
sudo nginx -t && sudo systemctl reload nginx
Enter fullscreen mode Exit fullscreen mode
- Issue or renew SSL certificate:
sudo certbot --nginx -d [client-domain] -d www.[client-domain]
Enter fullscreen mode Exit fullscreen mode
For wildcard SSL on Joseph's primary domains, use the existing wildcard cert pattern.
-
Submit to search engines:
- Google Search Console: add property, verify via DNS or HTML file.
- Bing Webmaster Tools: add site, verify, import GSC settings.
- Yandex Webmaster: add site if Russian market relevant.
- Submit sitemap.xml to all three.
-
Footer credit: confirm the string "Crafted by ThatDeveloperGuy.com." appears in the footer of every page.
-
Post deployment validation:
curl -I https://[client-domain]/
curl https://[client-domain]/robots.txt
curl https://[client-domain]/sitemap.xml
curl https://[client-domain]/llms.txt
Demo site requirement:
Per Joseph's standing rules, demo sites are full static HTML, never JSX, never Python generated. Required tech stack:
- Three.js for 3D and 4D effects.
- GSAP plus ScrollTrigger for animations.
- Anime.js for micro animations.
- Canvas API for custom drawing.
- SVG animations.
- WebGL for advanced visuals.
- CSS3 hardware accelerated animations.
- Vanilla JavaScript only, no framework.
- All loaded from CDN.
Demo sites must include 3D and 4D effects, custom cursor, magnetic buttons, parallax, 20 plus psychology tactics, 20 plus visual tactics, 20 plus marketing tactics. The STAGE-1-DEMO-BUILD-PROMPT and STAGE-2-PRODUCTION-BUILD-PROMPT files govern these specifications and supersede this section when both are present.
9.2 SvelteKit on Bubbles
Used when the client needs server side rendering, modern app feel, or significant interactivity. Reference build: Ernie Tackett (Globe Life AIL).
Initialize:
npm create svelte@latest [client-project]
cd [client-project]
npm install
npm install -D @sveltejs/adapter-node
Enter fullscreen mode Exit fullscreen mode
Configure adapter for Bubbles deployment:
In svelte.config.js:
import adapter from '@sveltejs/adapter-node';
export default {
kit: {
adapter: adapter({
out: 'build',
precompress: true
})
}
};
Enter fullscreen mode Exit fullscreen mode
Schema generation pattern:
Use server load functions to generate JSON LD at build or request time. Single source of truth per route in a +page.server.js file:
export async function load({ params, url }) {
const pageData = await getPageData(params.slug);
const schema = {
"@context": "https://schema.org",
"@graph": [
// Organization, WebSite, WebPage, BreadcrumbList, Article
// generated from pageData
]
};
return {
pageData,
schemaJsonLd: JSON.stringify(schema)
};
}
Enter fullscreen mode Exit fullscreen mode
In the corresponding +page.svelte:
<svelte:head>
<title>{pageData.title}</title>
<meta name="description" content={pageData.description} />
<link rel="canonical" href={pageData.canonicalUrl} />
{@html `<script type="application/ld+json">${schemaJsonLd}</script>`}
</svelte:head>
Enter fullscreen mode Exit fullscreen mode
Sitemap generation:
Create src/routes/sitemap.xml/+server.js that builds the sitemap from the route registry at build time.
Deploy to Bubbles:
npm run build
rsync -avz build/ root@[bubbles]:/var/www/[client-domain]/app/
Enter fullscreen mode Exit fullscreen mode
Run as a systemd service behind nginx reverse proxy. nginx config:
location / {
proxy_pass http://127.0.0.1:[port];
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
Enter fullscreen mode Exit fullscreen mode
End with nginx -t && systemctl reload nginx.
9.3 Next.js or Astro on Bubbles
Used when the client needs static site generation with React or framework agnostic islands. Astro is preferred over Next.js for content heavy sites because of its zero JavaScript by default approach, which is closer to the static HTML ideal.
Astro initialization:
npm create astro@latest [client-project]
cd [client-project]
npx astro add sitemap
Enter fullscreen mode Exit fullscreen mode
Schema component pattern:
Create src/components/SchemaJsonLd.astro:
---
const { schema } = Astro.props;
---
<script type="application/ld+json" set:html={JSON.stringify(schema)} />
Enter fullscreen mode Exit fullscreen mode
Use in pages:
---
import SchemaJsonLd from '../components/SchemaJsonLd.astro';
const schema = {
"@context": "https://schema.org",
"@graph": [ /* ... */ ]
};
---
<html>
<head>
<SchemaJsonLd schema={schema} />
</head>
</html>
Enter fullscreen mode Exit fullscreen mode
Build and deploy:
npm run build
rsync -avz dist/ root@[bubbles]:/var/www/[client-domain]/public_html/
Enter fullscreen mode Exit fullscreen mode
Then validate and reload nginx as in 9.1.
9.4 Hugo and Other SSG
Used for content heavy sites where the client wants markdown authoring and fast builds.
Hugo initialization:
hugo new site [client-project]
cd [client-project]
git init
git submodule add https://github.com/[theme-repo] themes/[theme-name]
Enter fullscreen mode Exit fullscreen mode
Schema partial:
Create layouts/partials/schema.html:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": "{{ .Site.BaseURL }}#organization",
"name": "{{ .Site.Params.businessName }}",
"url": "{{ .Site.BaseURL }}",
"logo": "{{ .Site.BaseURL }}{{ .Site.Params.logo }}"
},
{{ if .IsPage }}
{
"@type": "Article",
"headline": "{{ .Title }}",
"datePublished": "{{ .Date.Format "2006-01-02T15:04:05Z07:00" }}",
"dateModified": "{{ .Lastmod.Format "2006-01-02T15:04:05Z07:00" }}",
"author": { "@type": "Person", "name": "{{ .Params.author }}" }
}
{{ end }}
]
}
</script>
Enter fullscreen mode Exit fullscreen mode
Include in layouts/_default/baseof.html head:
{{ partial "schema.html" . }}
Enter fullscreen mode Exit fullscreen mode
Deploy: hugo builds to public/, rsync to Bubbles, reload nginx.
9.5 WordPress
Used when the client requires a CMS for non technical editors, or already has substantial WordPress investment. The default plugin stack:
- Yoast SEO or Rank Math Pro for SEO basics, schema generation, and sitemap.
- WP Rocket for caching when WordPress is on a non Bubbles host. On Bubbles, use the nginx caching layer instead.
- Imagify or ShortPixel for image optimization.
- Wordfence for security baseline.
Theme requirements:
- Block based theme or hybrid theme (avoid legacy classic themes for new builds).
- Theme must support custom block patterns for the page skeleton in section 7.1.
- Theme must allow inline JSON LD injection in head and per page basis.
Schema strategy:
Yoast and Rank Math both generate base Organization, WebSite, BreadcrumbList, and Article schema. Override and supplement when necessary using Code Snippets plugin or a child theme functions.php:
function add_custom_jsonld() {
if (is_singular('post') || is_page()) {
$custom_schema = array(
'@context' => 'https://schema.org',
'@type' => 'Service',
'name' => get_post_meta(get_the_ID(), '_service_name', true),
'provider' => array(
'@type' => 'Organization',
'@id' => home_url('/#organization')
)
);
echo '<script type="application/ld+json">' . wp_json_encode($custom_schema) . '</script>';
}
}
add_action('wp_head', 'add_custom_jsonld');
Enter fullscreen mode Exit fullscreen mode
Caching on Bubbles WordPress:
nginx FastCGI cache configuration:
fastcgi_cache_path /var/cache/nginx/fastcgi levels=1:2 keys_zone=WORDPRESS:100m max_size=1g inactive=60m;
fastcgi_cache_key "$scheme$request_method$host$request_uri";
location ~ \.php$ {
fastcgi_cache WORDPRESS;
fastcgi_cache_valid 200 60m;
fastcgi_pass unix:/run/php/php8.2-fpm.sock;
include fastcgi_params;
}
Enter fullscreen mode Exit fullscreen mode
End with validate and reload.
9.6 Headless Shopify
Used for e commerce clients with custom frontend needs, large catalogs, or strong design requirements. Reference build: Sara White (Blue Paradise Dairy).
Architecture:
- Shopify backend for product data, inventory, checkout, payments.
- Custom frontend on Bubbles (typically SvelteKit or Astro) consuming Shopify Storefront API.
- Schema injected at build time from Shopify product data.
Storefront API setup:
- Generate Storefront API access token in Shopify admin.
- Install
@shopify/storefront-api-client in the frontend project.
Schema mapping:
Map Shopify metafields to Product schema fields. In the frontend product page:
const productSchema = {
"@context": "https://schema.org",
"@type": "Product",
"name": product.title,
"description": product.description,
"image": product.images.map(img => img.url),
"sku": product.variants[0].sku,
"brand": {
"@type": "Brand",
"name": product.vendor
},
"offers": {
"@type": "Offer",
"url": canonicalUrl,
"priceCurrency": "USD",
"price": product.priceRange.minVariantPrice.amount,
"availability": product.availableForSale ? "https://schema.org/InStock" : "https://schema.org/OutOfStock"
},
"aggregateRating": product.metafields?.reviews ? {
"@type": "AggregateRating",
"ratingValue": product.metafields.reviews.average,
"reviewCount": product.metafields.reviews.count
} : undefined
};
Enter fullscreen mode Exit fullscreen mode
Faceted navigation canonicals:
Faceted product list pages must canonical to the unfaceted version to prevent index bloat and cannibalization.
<link rel="canonical" href="https://example.com/collections/all-products" />
Enter fullscreen mode Exit fullscreen mode
9.7 Standard Shopify
Used for e commerce clients who want the full Shopify experience without a custom frontend.
Theme:
- Use a current generation theme (Dawn or a Dawn derivative).
- Customize via theme editor and Liquid template overrides.
SEO apps:
- JSON-LD for SEO for advanced schema beyond what Shopify generates by default.
- Smart SEO for bulk meta tag management.
Critical Shopify caveats:
- Shopify generates Product schema by default, but it is often incomplete (missing AggregateRating, missing Brand). Augment with custom schema via theme template.
- Collection pages cannibalize each other in many themes. Audit and either differentiate, canonicalize, or noindex secondary collection pages.
- Tag pages (
/collections/all/tagged-X) should default to noindex unless specifically optimized.
- Shopify's robots.txt is partially editable as of 2024. Add custom directives in
robots.txt.liquid.
Custom domain and SSL:
Shopify managed. Joseph's role is configuring the domain, not the SSL.
9.8 Custom Backend
Used rarely. Documented for completeness when a client has an existing custom application that the framework must integrate into.
Approach:
- Treat the custom backend as a content source.
- Apply the schema, structure, and tracking patterns from Phases 3 and 4 to the rendered output.
- If server side rendering is in use, inject schema in the head template.
- If client side rendering only, prerender critical pages or use a hybrid approach (Next.js, Nuxt, etc.).
Crawlability check:
Verify that AI crawlers can read the rendered content. Test with:
curl -A "GPTBot" https://[domain]/[page-path]
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1)" https://[domain]/[page-path]
Enter fullscreen mode Exit fullscreen mode
If the response is JavaScript shell only, the agent must implement server side rendering or static prerendering before the page can earn citations. ChatGPT bot visits begin in reading mode (plain HTML, no JavaScript) about 46 percent of the time.
10. Off Page Authority and Brand Mentions
Purpose: earned signals beyond the website that drive AI citation rates. Distribution can lift AI citations by up to 325 percent versus owned site only.
10.1 The 2026 Off Page Stack
Every off page authority program covers seven channels:
- Earned media (PR). Press mentions in industry publications and local news.
- Podcast appearances. Both as guest and as host.
- YouTube presence. Own channel plus third party mentions.
- Reddit and forum participation. Authentic, value first.
- LinkedIn article publication. Long form thought leadership.
- Industry directory citations. Local citations for local businesses; industry directories for vertical authority.
- Wikipedia and Wikidata entity establishment. The strongest off page entity signal available.
YouTube mentions and branded web mentions are the top correlated factors with AI brand visibility per multiple 2025 to 2026 studies.
For Joseph's typical SDVOSB and small business clients:
- 3 to 5 earned media mentions per 12 month period as a minimum.
- Targets: local news (KY3, KOLR10, NWA Democrat Gazette, regional business journals), trade publications relevant to the client's vertical, podcast appearances at minimum quarterly.
- HARO replacement (Connectively, Featured, Qwoted, Help A B2B Writer) for source requests.
10.3 YouTube Strategy
For every client where the budget supports it:
- Own channel: minimum quarterly publication. Topic mix of educational, behind the scenes, and customer story.
- Schema: every video gets VideoObject schema on the page where embedded.
- Title and description optimized with primary keyword in first 60 characters of title and first 150 characters of description.
- Transcript published on the host page for AI extraction.
- Closed captions on every video.
Third party YouTube mentions:
- Pitch industry channels for guest appearances or interviews.
- Get clients featured in case study videos.
- Sponsor relevant niche channels when budget supports.
10.4 Reddit Strategy
Reddit is heavily indexed by Google AI Overviews and ChatGPT search. Authentic, value first participation drives both citations and mentions.
Rules:
- Never spam. Never drop links without context.
- Comment helpfully on relevant subreddits for 2 to 4 weeks before sharing any branded content.
- When sharing, include genuine context and acknowledge the connection to the brand.
- Maintain a brand account that participates as a recognizable voice over months.
10.5 Wikipedia and Wikidata
The strongest entity authority signal available. Wikidata Q ID is referenced by every major search engine and AI system for entity resolution.
Joseph's existing Wikidata entities:
- Joseph Anady: Q138610626
- MEGAMIND: Q138610666
For client work:
- Wikipedia article: requires notability. Possible for clients with clear notability (multi state operations, significant earned media, regulatory or community leadership). Most small businesses do not qualify.
- Wikidata entity: lower bar than Wikipedia. Can be created for any business with verifiable identity. Provides Q ID that can be referenced in
sameAs of Organization schema.
The agent does not create speculative Wikipedia articles (conflict of interest, will be deleted). The agent does help clients establish Wikidata entities where appropriate.
10.6 Citation Building
Local citation building for local businesses:
- Google Business Profile (mandatory).
- Apple Business Connect.
- Bing Places.
- Yelp.
- Facebook Business Page.
- Instagram Business Account.
- LinkedIn Company Page.
- BBB profile.
- Industry specific directories (HomeAdvisor for home services, Avvo for legal, Healthgrades for medical, etc.).
- Local chamber of commerce.
Each citation:
- NAP (Name, Address, Phone) must be consistent across every directory.
- Categories must match across directories.
- Hours must match across directories.
- Photos uploaded.
- Description includes primary keyword and service area.
For SDVOSB clients (Joseph's primary focus): also list on SAM.gov, Vets First Verification Program, and SDVOSB specific directories.
10.7 Off Page Tracking
The agent tracks off page activity in a quarterly review:
off_page_quarter:
quarter: "2026-Q2"
earned_media_mentions:
- publication: ""
url: ""
date: ""
type: "" # quote, feature, byline, mention
podcast_appearances:
- podcast: ""
episode_url: ""
date: ""
youtube_videos_published: 0
youtube_third_party_mentions: 0
reddit_threads_with_brand: 0
linkedin_articles_published: 0
new_directory_citations: 0
wikipedia_status: ""
wikidata_status: ""
Enter fullscreen mode Exit fullscreen mode
11. AI Crawler Access and Governance
Detailed reference for the robots.txt and llms.txt patterns introduced in section 8.8 and 8.9.
11.1 Known AI Crawlers (Q2 2026)
Crawler
Operator
Purpose
Default Policy
Googlebot
Google
Classic search index
Allow
Google-Extended
Google
Training data and AI features
Allow
GoogleOther
Google
Various Google products
Allow
Bingbot
Microsoft
Bing search index
Allow
BingPreview
Microsoft
Bing preview generation
Allow
MSN Bot
Microsoft
Legacy MSN
Allow
GPTBot
OpenAI
ChatGPT training
Allow
OAI-SearchBot
OpenAI
ChatGPT live search
Allow
ChatGPT-User
OpenAI
User initiated browsing within ChatGPT
Allow
PerplexityBot
Perplexity
Perplexity index and answers
Allow
Perplexity-User
Perplexity
User initiated browsing
Allow
ClaudeBot
Anthropic
Claude training and search
Allow
Anthropic-AI
Anthropic
Anthropic services
Allow
Claude-User
Anthropic
User initiated browsing within Claude
Allow
Meta-ExternalAgent
Meta
Meta AI training and agent browsing
Allow
FacebookExternalHit
Meta
Open Graph fetch for sharing
Allow
Applebot
Apple
Siri, Spotlight, Apple Intelligence
Allow
Applebot-Extended
Apple
Apple AI training
Allow
Amazonbot
Amazon
Alexa
Allow
YandexBot
Yandex
Russian search index
Allow
Baiduspider
Baidu
Chinese search index
Allow
DuckDuckBot
DuckDuckGo
DuckDuckGo index supplement
Allow
Bytespider
ByteDance
TikTok and ByteDance products
Disallow (default)
Diffbot
Diffbot
Knowledge graph extraction
Allow
CCBot
Common Crawl
Open dataset
Allow (operator preference)
When the operator overrides the default, document the override in the engagement notes with reasoning.
11.2 The Reading Mode Reality
About 46 percent of ChatGPT bot visits begin in reading mode: a plain HTML version of the page with no images, no CSS, no JavaScript, and no client side rendered content. This means:
- Critical content must render in server side HTML.
- Schema in JSON LD must be present in the initial HTML response, not injected by JavaScript.
- Navigation and internal linking must work without JavaScript.
- Single page applications without server side rendering or prerendering will be invisible to AI crawlers nearly half the time.
For Joseph's static HTML default stack, this is automatic. For SvelteKit and Next.js, server side rendering must be enabled. For pure client side React or Vue applications, the agent flags this as a critical fix before proceeding.
11.3 llms.txt Implementation
Place at /llms.txt on the root of the site. The format is markdown.
Example for ThatDeveloperGuy.com:
# ThatDeveloperGuy
> Veteran owned web development and digital optimization agency based in
> Cassville Missouri serving Northwest Arkansas and Southwest Missouri.
> SDVOSB certified. Specializes in self hosted production websites with
> full SEO, AEO, AIO, and GEO stack implementation.
## Service Lines
- [Web Development Services](https://thatdeveloperguy.com/services/web-development/): Custom static, SvelteKit, and headless Shopify builds with full SEO and AI search optimization.
- [Search Engine Optimization](https://thatdeveloperguy.com/services/seo/): On page, technical, content, and off page SEO with quarterly reporting.
- [Answer Engine Optimization](https://thatdeveloperguy.com/services/aeo/): Featured snippet capture, People Also Ask wins, and voice assistant readiness.
- [AI Overview Optimization](https://thatdeveloperguy.com/services/aio/): Citation eligibility for Google AI Overviews and AI Mode.
- [Generative Engine Optimization](https://thatdeveloperguy.com/services/geo/): ChatGPT, Perplexity, Claude, and Bing Copilot citation building.
## Frameworks and References
- [SEO Search Appearance Framework](https://thatdeveloperguy.com/frameworks/seo-search-appearance/): The full 2026 framework this agency operates from.
- [SEO BUILD REFERENCE v2.4](https://thatdeveloperguy.com/frameworks/seo-build-reference/): The 14 tier Engine Optimization stack.
- [Stage 1 Demo Build Prompt](https://thatdeveloperguy.com/frameworks/stage-1-demo/): Cinematic single file demo specification.
- [Stage 2 Production Build Prompt](https://thatdeveloperguy.com/frameworks/stage-2-production/): Full production build specification.
## About
- [About Joseph Anady](https://thatdeveloperguy.com/about/): SDVOSB, BA Computer Engineering CSU, MA Cybersecurity, martial arts coach and competitor.
- [Contact](https://thatdeveloperguy.com/contact/): admin@thatdeveloperguy.com, 505.512.3662.
Enter fullscreen mode Exit fullscreen mode
This file is regenerated whenever a major pillar is added or restructured. Quarterly review at minimum.
11.4 Server Log Analysis
The agent periodically samples server logs to confirm AI crawlers are visiting and successful. Sample bash check:
sudo tail -10000 /var/log/nginx/access.log | \
grep -E "GPTBot|ClaudeBot|PerplexityBot|OAI-SearchBot|Google-Extended|Bingbot|YandexBot" | \
awk '{print $1, $7, $9, $12}' | sort | uniq -c | sort -rn | head -50
Enter fullscreen mode Exit fullscreen mode
Review monthly. Anomalies (sudden spike, sudden drop, 404s on AI bot requests) trigger investigation.
12. Citation and Mention Tracking
Purpose: measure visibility on each surface independently and report the right metrics to clients. This section also contains the deep dive on the Google Search Console Performance report dimensions (queries, pages, countries, devices, search appearance).
12.1 Tracking Surfaces
Track these surfaces separately:
- Google classic SERP (rankings, impressions, clicks, CTR, average position).
- Google AI Overviews (impressions, citations, mentions).
- Google AI Mode (citations, mentions).
- Bing classic SERP.
- Bing Copilot.
- ChatGPT search.
- Perplexity.
- Claude with web access.
- DuckDuckGo (if relevant volume).
- Yandex (if relevant market).
- Baidu (if relevant market).
- Brave Search.
Google Search Console (primary, free):
- Performance report covers all five dimensions: Queries, Pages, Countries, Devices, Search Appearance.
- AI Overview impressions are included in overall web search type as of 2024 update.
- Daily data export via API or Looker Studio connector for historical archive (GSC retains 16 months in interface).
Bing Webmaster Tools (primary, free):
- Equivalent dimensions and metrics.
- Bing Copilot signals partially exposed in newer dashboard.
- Import GSC settings during setup to skip duplicate verification work.
Yandex Webmaster (when relevant):
- Russian language analytics.
Baidu Zhanzhang (when relevant):
- Chinese search analytics. Requires ICP license for full functionality.
Manual sampling protocol (weekly, mandatory):
- For each of the top 10 priority queries, run the query in: Google search, Google AI Mode, ChatGPT, Perplexity, Claude with web access, Bing Copilot.
- Record citation status (cited, mentioned without citation, not present).
- Record AI Overview presence.
- Record changes from prior week.
Third party tools (when budget allows):
- Semrush AI Toolkit: AI Overview tracking.
- Ahrefs AI Search: citation tracking across multiple AI engines.
- Profound: enterprise AI visibility platform.
- Geoptie: citation tracking platform.
This is the most important free tool for tracking SEO and AEO performance. The framework requires the operator to read this subsection in full. The screenshot in the engagement intake shows the standard interface: Queries, Pages, Countries, Devices, Search Appearance, and Days tabs across the table view, with Search Type, Date Range, and Add Filter controls above.
The report has four metrics across five dimensions plus filters.
The four metrics:
- Clicks. Number of times a user clicked through to the property from Google Search results. Going back to the SERP and clicking again counts as one click only. Clicks that stay inside Google Search (e.g., to the knowledge panel) do not count.
- Impressions. Number of times the site appeared in Search results. The link must be scrolled into view or visible to count, depending on the type of search element. A knowledge graph with multiple aggregated data items is one impression at the property level.
- CTR (Click Through Rate). Clicks divided by impressions. Reported as a percentage.
- Average position. Average position of the topmost result from the property for that query, page, country, or device. Position 1 is the top organic result. Below position 30 is generally page 4 or worse. Note: AI Overview citations affect impressions and clicks but the position metric for AI Overview placement is not directly comparable to organic ranking position.
The five dimensions:
- Queries. What users typed (or said via voice) into Google Search. Query data is anonymized when fewer than a few dozen users searched for it over two to three months. Anonymized queries are not shown in the Queries tab. Aggregation: by property.
- Pages. Which canonical URL on the site appeared in results. The canonical is the URL Google chose to display, which may not be the URL the user lands on if there are redirects. Aggregation: by page.
- Countries. Originating country of the searcher. Helps validate whether the property is reaching the intended geography. Aggregation: by property.
- Devices. Desktop, Mobile, or Tablet. Mobile is dominant for most consumer sites. Aggregation: by property.
- Search Appearance. The SERP feature the result appeared in. Examples: AMP_BLUE_LINK, FAQ_RICH_RESULT, JOB_LISTING, RECIPE, REVIEW_SNIPPET, SITELINKS_SEARCHBOX, VIDEO, WEBLITE, MERCHANT_LISTINGS, PRODUCT_SNIPPETS. The full list expands as Google adds features. Aggregation: by page. This dimension cannot be combined with any other dimension in a single API query; you must run a two step query.
Filters:
- Search type. Web (default), Image, Video, News.
- Date range. Up to 16 months historical. 24 hour, 7 day, 28 day, 3 month, 6 month, 12 month, 16 month, custom.
- Query filter. Contains, equals, does not contain, regex.
- Page filter. Contains, equals, does not contain, exact, regex.
- Country filter. Single or multi country.
- Device filter. Single device.
- Search appearance filter. Single or multi feature.
Aggregation rules:
- Data grouped by Queries, Countries, Devices, or Dates is aggregated by property.
- Data grouped by Pages or Search Appearance is aggregated by page.
- This is why chart totals can differ from table totals when switching dimensions.
Limits:
- 1,000 rows shown in the interface per query.
- 25,000 rows per API page request.
- 16 months historical retention.
- Anonymized queries dropped when filtering by query.
12.4 Working the Queries Dimension
What it answers: which keywords are bringing the site to user attention?
Workflow:
- Filter by date range (default: last 3 months).
- Sort by impressions descending. Take top 50.
- For each query: is the page that appeared in results the page we wanted to appear? If not, this is a sign of either cannibalization or content mis assignment. Flag for Phase 2 review.
- Sort by clicks descending. Take top 50. These are the actual traffic drivers. Confirm they map to the keyword to page map.
- Filter for queries with high impressions and low CTR (above 1,000 impressions, below 1.5 percent CTR). These are title and description optimization candidates.
- Filter for queries with average position 8 to 20. These are quick win candidates: page two pushes that can move to page one with content depth and on page improvements.
- Filter by query contains "?" or "what" or "how" or "why". These are AEO and AIO candidates.
- Compare branded versus non branded query share. Branded share above 50 percent suggests the site needs more non branded discovery work. Branded share below 20 percent suggests brand building is needed.
Worked example using the handledtax.com screenshot pattern:
The screenshot shows 8 queries on handledtax.com over 3 months, all variations of the same question about quarterly estimated taxes:
- do estimated taxes have to be paid quarterly
- what are quarterly estimated tax payments
- are quarterly estimated taxes mandatory
- do quarterly taxes have to be paid on time
- do i need to pay quarterly taxes
- do you have to pay estimated taxes quarterly
- do i have to make estimated tax payments
- do quarterly estimated taxes have to be equal
Every query has 1 impression and 0 clicks.
What this pattern means:
- The same underlying question is being asked in 8 different ways. This is exactly what query fan out looks like at the user level.
- Each variant has only 1 impression, meaning the site appeared exactly once for each phrasing. The site is on the edge of visibility for these questions.
- Zero clicks across all variants means the position is too low or the snippet is not compelling enough to earn a click.
- The 8 queries represent a single sub query intent: "are quarterly estimated tax payments required?" That sub query has clear search demand (otherwise GSC would not be capturing impressions even at 1 each).
Recommended action:
- Build one comprehensive cluster page on the topic "quarterly estimated tax payments" under the relevant pillar (likely the small business or self employed tax pillar).
- Page structure: 40 to 60 word lede directly answering "are estimated taxes required to be paid quarterly?", followed by H2 questions covering each of the 8 variants and their underlying nuances (timing, equal payments, mandatory thresholds, who is exempt).
- Schema: Article with full author attribution per section 7.8. The author should have credentials in tax preparation (Amanda Emerdinger holds PTIN, which makes her a qualified author for this content).
- Internal link from the homepage to this page, plus from the relevant service pages.
- Submit URL via GSC URL Inspection tool after publication.
- Re check in 30 days. The 8 variant queries should consolidate impressions onto the new page, and CTR should rise as ranking position improves.
This is exactly the workflow Phase 2 (cluster mapping) is designed to trigger from GSC data.
12.5 Working the Pages Dimension
What it answers: which canonical URLs are doing the work?
Workflow:
- Sort by clicks descending. The top 20 pages drive most of the value. Keep them updated and well linked.
- Sort by impressions descending. Pages with high impressions and low CTR are extraction targets: they are already showing up, just not earning the click.
- Filter for pages not in your priority list. These are unintentional ranking pages. Sometimes they are golden (a thin blog post that accidentally ranks well, opportunity to expand). Sometimes they are noise (tag pages, search result pages, parameter URLs that should be canonicalized away).
- Sort by average position ascending (best first). The pages closest to position 1 are defending. Confirm they have current information and strong linking.
- Compare year over year for content decay detection. Pages with declining position over 6 to 12 months need content refreshes.
- Cross reference with the Pillar and Cluster map from Phase 2. Every priority pillar and cluster page should appear in this report. If a priority page is absent, indexation is broken (check robots.txt, canonical tags, and Index Coverage report).
12.6 Working the Countries Dimension
What it answers: who geographically is finding the site?
Workflow:
- Confirm the top countries match the intended service area. For Joseph's typical client, expect United States dominant with possibly Canada, Mexico, and a few English speaking countries showing minor traffic.
- If high impressions are coming from countries the client does not serve, this can be an opportunity (international expansion) or noise (bot traffic, geographic anomalies). Investigate.
- If the client serves multiple countries, segment performance by country to find regional weakness or strength.
- Compare country distribution by language version of the site (when hreflang is in use).
- For local service businesses (Joseph's NWA and SW Missouri client base), state level targeting matters more than country. GSC does not break down state level by default in the interface; export data to GA4 or use IP geolocation analysis for state segmentation.
12.7 Working the Devices Dimension
What it answers: how is performance differing by mobile, desktop, and tablet?
Workflow:
- Confirm mobile traffic is the majority for consumer sites. For B2B, desktop can be larger.
- Compare mobile CTR versus desktop CTR. If mobile CTR is significantly lower, suspect mobile UX issues, slow load times, or mobile specific design problems.
- Compare mobile average position versus desktop. They should be similar; large gaps suggest mobile usability issues affecting ranking.
- Filter top queries by device. Are the same queries driving traffic on both, or are mobile and desktop workloads diverging?
- Use the comparison filter to compare mobile current period versus prior period. Sudden mobile drops often correlate with Core Web Vitals regressions.
12.8 Working the Search Appearance Dimension
What it answers: which SERP features is the site earning, and how are they performing?
Workflow:
- List all search appearances with at least one impression. Compare to the schema implementation. If a schema type is implemented but no corresponding search appearance is showing, investigate (validation failure, eligibility issue, or just not chosen by Google).
- Compare search appearance CTR. Some appearances drive better CTR (Recipe, Video, Sitelinks, Merchant Listings) than others (basic blue link).
- After implementing a new schema type, check the Search Appearance report two to four weeks later to confirm the feature is appearing.
- Note that AMP_BLUE_LINK numbers should be near zero by 2026 (AMP is largely deprecated). If significant AMP impressions remain, the site needs an AMP migration.
- Note that some features inflate metrics. Google for Jobs counts both "job listing" and "job details" as clicks; only "job details" actually goes to the site.
To query Search Appearance via API: Search Appearance is not available alongside other dimensions in a single API request. Run a two step query: first request dimensions: [searchAppearance] only to see all appearance types with data; then run a second request filtering by a specific appearance type and adding any other dimensions.
12.9 Working the Days Dimension
The Days tab is for time series analysis. Workflow:
- Identify volatility in clicks or impressions correlating with known events (Google core updates, content publication dates, schema deployment dates).
- Use the comparison filter to compare current 28 day window to prior 28 day window. Highlight pages or queries with greater than 30 percent change.
- For sites with strong seasonality (tax services like handledtax.com, retail), overlay year over year data rather than month over month.
Per section 2 reporting_to_client_cadence, the agent produces a report at the agreed cadence.
Standard monthly report structure:
# [Client Name] SEO and Search Appearance Report
## [Reporting Period]
## Executive Summary
[3 to 5 bullet points covering: total clicks, total impressions, citation count, key wins, key risks]
## Traffic Performance
[Table: clicks, impressions, CTR, average position, with month over month and year over year deltas]
## Top Performing Queries
[Table of top 20 queries by clicks]
## Top Performing Pages
[Table of top 20 pages by clicks]
## Search Appearance Performance
[Breakdown by search appearance type]
## Country and Device Performance
[Breakdown by country, breakdown by device, with deltas]
## AI Citation Status
[Manual sampling results: which queries earned citations on which surfaces]
## Brand Mention Volume
[Off page mentions detected during the period]
## Wins This Period
[Specific accomplishments with before and after metrics]
## Recommendations for Next Period
[3 to 7 prioritized actions for the next reporting cycle]
## Appendix: Raw Data
[Links to GSC export, full citation log, etc.]
Enter fullscreen mode Exit fullscreen mode
For clients with the $397 monthly Full Visibility Stack tier, the report includes the full raw data appendix. For lower tiers, the appendix is summary only.
Purpose: create the original assets that earn citations and differentiate the site from competitors.
AI search engines are increasingly hostile to derivative content. When ten sites publish the same paraphrase of the same source, AI engines pick the original and ignore the rest. Information gain is the only durable moat in 2026 SEO.
Joseph's portfolio gives him unusual leverage here. With 130 plus production websites across multiple verticals, Joseph can produce benchmarks and surveys that nobody else can match. The framework requires using this leverage.
Asset types ranked by leverage and citation potential:
- Annual benchmark report. Aggregate data across the portfolio (or across the client's vertical). Published once per year, refreshed annually. Highest citation magnet.
- Original survey. Even 30 to 50 responses produces citable data. Quarterly cadence is reasonable.
- Internal case study with verifiable numbers. Pull a real client engagement (with permission), document the before, the work, and the after with specific metrics.
- Proprietary framework or methodology. This framework itself is an asset. Frameworks become reference resources.
- Calculator or interactive tool. Hosted on the site, available without registration. Drives both citations and conversions.
- First hand experience narrative. "I used X for 60 days, here is what I learned." Strongest E E A T signal possible.
- Unique side by side comparison. Compare three or more options on a dimension nobody else has measured.
- Public dataset visualization. Take a public dataset and present it in a way nobody else has. Government data, academic data, FOIA data.
13.3 Production Cadence
For a client on the $997 build plus $397 monthly tier:
- One major information gain asset per quarter.
- Two to three minor information gain elements per month embedded in cluster pages.
- One annual flagship asset (benchmark or survey).
For a client on the $597 build plus $250 monthly tier:
- One major information gain asset per year.
- One minor information gain element per month embedded in cluster pages.
For a client on the $2,997 enterprise tier:
- One major information gain asset per month.
- One minor information gain element per cluster page.
- Two annual flagship assets.
Producing the asset is half the work. Distribution is the other half.
For every major asset:
- Press release to local and trade media.
- LinkedIn article summarizing key findings.
- Twitter or X thread with the top 5 takeaways.
- Reddit submission to relevant subreddits with substantive context.
- Email to the client's full subscriber list.
- Outreach to 10 to 20 industry publications offering exclusive angles.
- Social media graphics for each top finding.
- Update llms.txt to point AI engines to the asset.
The asset should earn 3 to 5 earned media mentions in the first 90 days post publication. If it does not, the topic was wrong or the promotion was insufficient.
14. Surface Specific Optimization
Purpose: apply the surface specific tweaks that improve performance on each platform. This section covers Google's classic SERP, AI Overviews, AI Mode, plus Bing, DuckDuckGo, Yandex, Baidu, Brave Search, ChatGPT, Perplexity, Claude with web access, Bing Copilot, voice search, and image and video carousels.
14.1 Google Classic SERP
Google holds about 90 percent of global search market share. Every site optimizes for Google first, then layers other engines on top.
Specific tactics:
- Title tag: 50 to 60 characters, primary keyword in first 30 characters, brand at end.
- Meta description: 150 to 160 characters, includes primary keyword, ends with implicit or explicit call to action.
- Heading hierarchy: one H1, multiple H2, H3 nested within H2, no H1 inside body content.
- Internal linking density: every important page has at least 3 inbound internal links.
- External linking: cite authoritative sources where they support claims. Do not link out gratuitously.
- 301 redirects on every URL change. Never 302 for permanent moves.
- Submit URL changes via the URL Inspection tool.
14.2 Google AI Overviews
Heavily favors pages already ranking in the top 10 organic, but 38 percent of citations now come from outside the top 10. AI Overviews appear on about 48 percent of all queries in Q1 2026, and over 70 percent of informational and how to queries.
Tactics:
- Strong Phase 3 page structure (lede, headings as questions, dual extraction layer).
- Schema accuracy and completeness per Phase 4.
- Information gain elements per section 13.
- Author attribution and visible E E A T signals.
- Avoid first person plural "we" voice when possible; institutional authority reads better as "the company" or third person.
- Update content cadence: every cited page reviewed and updated at least every 90 days.
14.3 Google AI Mode
Favors broad topical coverage over single high performing URLs. Personal Intelligence integration since January 2026 means Gmail and Calendar context can shape responses. Runs on Gemini 3 Pro. Has 75 million daily active users.
Tactics:
- Pillar and cluster architecture with 70 plus percent fan out coverage (per Phase 2).
- Comprehensive coverage of edge cases and adjacent topics.
- Strong entity signals (Wikidata, Organization schema, sameAs links).
- Multi format content within the cluster (text, video, images).
- Do not assume Personal Intelligence will favor your site. Build for the public retrieval path.
14.4 ChatGPT Search
Different ranking signals than Google. Specific findings:
- Pages with semantically relevant title and URL slug are more likely to get cited.
- Prefers focused shorter content. Pages covering 26 to 50 percent of fan out sub queries get cited more than pages covering 100 percent.
- 92 percent of the time, ChatGPT agents rely on the Bing Search API, so Bing visibility matters here specifically.
- 46 percent of ChatGPT bot visits begin in reading mode (plain HTML).
- 63 percent of ChatGPT agents leave immediately after landing (high bounce rate).
Tactics:
- Bing Webmaster Tools verification mandatory.
- Plain HTML rendering must work (no SPA without prerendering).
- URL slugs include primary keyword.
- Keep content focused per page; do not overload with unrelated subtopics.
14.5 Perplexity
Strong source citation, prefers academic and authoritative sources, favors recency.
Tactics:
- Cite primary sources liberally (academic papers, government data, industry reports).
- Date stamps on every published page.
- Author credentials prominent.
- Avoid affiliate heavy or thinly sourced content; Perplexity downranks it.
14.6 Claude with Web Access
Less publicly studied. Sample manually for the client's priority queries.
Tactics:
- General GEO best practices apply.
- Strong factual accuracy and citation discipline.
- Avoid promotional language; Claude prefers analytical and factual phrasing.
14.7 Bing Copilot
Tied closely to Bing core ranking. Microsoft Copilot integration across Windows, Office, and Edge browser increases Bing's reach beyond raw search market share numbers.
Tactics:
- Bing Webmaster Tools verification.
- Bing prefers official site signals and social media presence (LinkedIn, X presence helps).
- Bing has a stronger preference for institutional sources than Google.
- Optimization for Bing carries over to DuckDuckGo (which uses Bing's index in part).
14.8 Bing Classic
Bing holds about 4.98 percent global market share, higher in US (about 7 percent), and is the default search on Windows and Edge browsers. When combined with Yahoo (which uses Bing) and DuckDuckGo (which partially uses Bing), the Bing ecosystem reaches about 13 percent of US search.
Specific Bing differences from Google:
- Stronger weight on exact keyword matches in title and content.
- Higher preference for institutional sources (.gov, .edu, .org).
- Lower diversity of sources surfaced per query (fewer unique domains in top 10 versus Google).
- Greater weight on social signals (LinkedIn, X presence).
- Microsoft Advertising for paid search has typically lower CPCs than Google Ads.
Tactics:
- Submit sitemaps to Bing Webmaster Tools.
- Maintain active LinkedIn Company Page with regular posts.
- Maintain active X (Twitter) presence.
- Earn institutional citations (industry associations, .gov references when applicable, .edu mentions).
14.9 DuckDuckGo
DuckDuckGo holds about 0.76 percent globally and 1.84 percent in the US. About 100 million daily searches. Privacy focused with no user tracking or personalization.
DuckDuckGo pulls results from Bing's index plus its own crawler (DuckDuckBot) plus hundreds of other sources including Wikipedia and Wolfram Alpha. There is no separate DuckDuckGo SEO playbook beyond Bing optimization, but a few specifics matter:
- Since DuckDuckGo does not personalize, on page relevance is more important than for personalized engines.
- DuckDuckGo's own DuckDuckBot crawler must be allowed in robots.txt.
- DuckDuckGo Lite (the text only version) is parsed by ChatGPT bots in reading mode, so Bing and DuckDuckGo visibility cascade into AI engine visibility.
14.10 Yandex
Yandex holds about 1.34 percent globally, 65 to 72 percent in Russia and several CIS countries.
Skip unless the client serves Russian language markets.
If relevant:
- Yandex Webmaster verification.
- Yandex Metrica analytics (separate from Google Analytics).
- Russian language content with proper transliteration.
- Local Yandex Maps presence for local businesses in Russia.
- Yandex Direct for paid search (lower CPCs than Google in Russian markets).
- Domain age, content freshness, and user behavior signals (CTR, dwell time) are weighted heavily in Yandex's ranking algorithm per the 2023 leaked ranking factors.
14.11 Baidu
Baidu holds about 0.55 percent globally, 53 plus percent in China.
Skip unless the client serves Chinese language markets.
If relevant:
- Baidu Zhanzhang (Webmaster Tools) verification.
- ICP license required for hosting in China for full Baidu support.
- Simplified Chinese content.
- Domain hosted on China based or China optimized infrastructure (latency from US hosted sites hurts Baidu rankings).
- Baidu Tongji analytics.
- Avoid heavy JavaScript reliance; Baidu's crawler is less capable than Googlebot at rendering JavaScript.
14.12 Brave Search
Brave Search has its own independent 30 billion page index since 2023. About 50 million daily queries. Privacy focused, used heavily by Brave browser users.
Tactics:
- Submit URLs via Brave Search Webmaster Tools.
- General SEO best practices apply.
- Smaller share but growing among privacy minded users, particularly in tech and crypto verticals.
14.13 Voice Search and Smart Speakers
Voice queries skew long tail and conversational. Optimizing for AEO directly addresses voice search.
Tactics:
- Lede answer in 40 to 60 words sounds natural read aloud.
- Avoid jargon and acronyms in the lede.
- Use natural language throughout (do not write for the keyword stuffing spider).
- Phone number and address visible in plain text on contact pages (voice assistants extract these directly).
- LocalBusiness schema with geo coordinates accurate.
14.14 Image and Video Carousels
Both are search appearance types worth tracking separately.
For images:
- ImageObject schema on every hero image.
- Descriptive alt text and filename.
- Width and height attributes.
- Modern formats (WebP, AVIF) with fallbacks.
- Image sitemap.
For videos:
- VideoObject schema on every page that embeds video.
- Hosting via YouTube for maximum AI engine visibility (multiple AI engines preferentially cite YouTube).
- Self hosted video as a secondary option only.
- Transcript published on the host page.
- Closed captions in the video file.
- Video sitemap.
14.15 Local Pack and Knowledge Panel
For local businesses, the Local Pack (the map of three local results) and the Knowledge Panel (the right side panel with business details) are the most valuable Google surfaces.
Tactics for Local Pack:
- Google Business Profile complete and verified.
- Categories aligned with primary services.
- Hours accurate.
- Photos updated quarterly.
- Reviews actively requested and responded to.
- LocalBusiness schema with geo coordinates matching GBP.
- NAP consistency across all directory citations.
- Service area defined accurately.
Tactics for Knowledge Panel:
- Strong Organization schema with all properties filled.
- Wikidata Q ID linked via sameAs.
- Wikipedia article when notability allows.
- Consistent name and branding across web (Wikipedia, social, GBP, Bing Places).
- Structured social links.
15. Audit Mode
Purpose: evaluate any site (fully built, partially built, or freshly inherited) against the framework. Output: pass, partial, fail, or N/A on each criterion plus remediation steps.
15.1 When to Run Audit Mode
- Initial engagement with a new client to establish baseline.
- Quarterly health check on retainer clients.
- Post Google core update verification (March, July, October, December cadence has been typical).
- Pre handoff QA after a Phase 4 build.
- Any time GSC shows a sudden 20 percent or worse drop in clicks or impressions.
Required:
- Domain to audit.
- GSC access (read at minimum).
- Bing Webmaster Tools access (read at minimum).
- File system access for self hosted sites (read at minimum).
- Hosting environment information.
Optional but improves audit quality:
- Google Analytics 4 access.
- Server log access.
- Ahrefs or Semrush access.
- Direct access to the CMS or static site source.
15.3 The 50 Criterion Audit
The audit is organized into five pillars: Eligibility, Coverage, Extractability, Distribution, and Measurement. Each criterion produces one of four results: PASS, PARTIAL, FAIL, or N/A. Each criterion has remediation steps when it does not pass.
Pillar A: Eligibility (12 criteria)
A1. Robots.txt allows the full AI crawler set.
Method: curl https://[domain]/robots.txt. Confirm GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, Google-Extended, Bingbot, YandexBot, Baiduspider, Applebot, Amazonbot, Meta-ExternalAgent are not blocked.
Pass: all listed crawlers allowed.
Partial: most allowed, one or two missing or unintentionally blocked.
Fail: significant AI crawlers blocked.
Remediation: update robots.txt per section 8.8 and Appendix C 19.10.
A2. No CDN or proxy is interfering with crawler access.
Method: check curl -I https://[domain]/ for Cloudflare, Akamai, Fastly headers.
Pass: no third party CDN in front.
Fail: Cloudflare or other proxy detected (against Joseph's standing rules).
Remediation: remove proxy, route through Bubbles nginx directly.
A3. SSL is active and current.
Method: curl -vI https://[domain]/ 2>&1 | grep -E "subject:|start date:|expire date:".
Pass: valid certificate, expiring more than 30 days out.
Partial: valid but expiring within 30 days.
Fail: invalid, expired, or self signed.
Remediation: certbot renewal or new issuance.
A4. HTTP/3 (QUIC) is supported.
Method: curl --http3 https://[domain]/ -I (requires curl with HTTP/3 support).
Pass: HTTP/3 negotiation succeeds.
Partial: HTTP/2 only.
Fail: HTTP/1.1 only.
Remediation: enable QUIC in nginx per section 9.1 config.
A5. Mobile renders correctly without JavaScript.
Method: curl -A "Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36" https://[domain]/[priority-page] | grep -i "<h1\|<main\|<article".
Pass: critical content present in initial HTML.
Fail: empty or shell only HTML.
Remediation: implement server side rendering or static prerendering.
A6. Core Web Vitals pass on mobile.
Method: PageSpeed Insights at https://pagespeed.web.dev/ using the priority page URL.
Pass: LCP under 2.5s, INP under 200ms, CLS under 0.1.
Partial: one metric in needs improvement zone.
Fail: any metric in poor zone.
Remediation: image optimization, render blocking script reduction, layout stability fixes.
A7. Sitemap.xml is present and current.
Method: curl https://[domain]/sitemap.xml | head -50.
Pass: valid XML, lastmod dates within last 90 days for active pages.
Partial: present but stale lastmod.
Fail: missing or invalid.
Remediation: regenerate sitemap; submit to GSC and Bing.
A8. Llms.txt is present.
Method: curl -I https://[domain]/llms.txt.
Pass: returns 200 with valid content.
Partial: returns 200 but content is stale or thin.
Fail: returns 404.
Remediation: create llms.txt per section 11.3.
A9. Robots.txt is consistent with sitemap.
Method: cross reference robots.txt and sitemap.xml.
Pass: all sitemap URLs are crawlable per robots.txt.
Fail: sitemap includes URLs blocked by robots.txt.
Remediation: reconcile.
A10. Canonical tags are present and correct on every priority page.
Method: spot check 10 priority pages for <link rel="canonical" href="...">.
Pass: all 10 have correct self referencing canonical (or correct cross page canonical for variants).
Fail: missing or incorrect canonicals.
Remediation: add or fix canonicals.
A11. Author schema and visible attribution on content pages.
Method: spot check 10 content pages for visible author byline AND Article author field.
Pass: 8 of 10 or better have both.
Partial: byline present but no schema, or vice versa.
Fail: neither present.
Remediation: implement author bios and Person schema per section 7.8.
A12. Last updated dateModified is accurate.
Method: spot check 10 pages: compare dateModified in schema to actual last edit date.
Pass: dateModified within 30 days of actual edit date on 8 of 10.
Partial: dateModified within 90 days.
Fail: dateModified is stale on most pages.
Remediation: implement automated dateModified updates per stack.
Pillar B: Coverage (10 criteria)
B1. Pillar architecture exists.
Method: review pillars.yaml or equivalent documentation.
Pass: at least one pillar with 8 plus cluster pages.
Partial: pillar identified but cluster count low.
Fail: no pillar architecture.
Remediation: complete Phase 2.
B2. Each pillar has a 3,000 to 5,000 word pillar page.
Method: spot check word count on each pillar URL.
Pass: all pillars meet word count.
Partial: most do.
Fail: pillars are thin.
Remediation: expand pillar pages per Phase 3.
B3. Cluster pages are 800 to 2,500 words.
Method: spot check 10 cluster URLs.
Pass: 8 of 10 in range.
Partial: average is in range but variance high.
Fail: cluster pages are thin or bloated inappropriately.
Remediation: rewrite outliers.
B4. Sub query coverage rate is 70 percent or higher across top 10 priority pillars.
Method: review sub query coverage map.
Pass: 70 percent or higher.
Partial: 50 to 69 percent.
Fail: below 50 percent.
Remediation: add cluster pages for unanswered sub queries.
B5. No primary keyword is targeted by two pages.
Method: GSC export plus url to keyword map cross reference.
Pass: no cannibalization.
Partial: one or two flagged cases under remediation.
Fail: multiple unresolved cannibalization cases.
Remediation: complete Phase 2 cannibalization remediation.
B6. Every cluster page links back to its pillar.
Method: spot check 10 cluster pages for outbound link to pillar with descriptive anchor.
Pass: all 10.
Partial: most have link, anchor text generic.
Fail: cluster pages do not link to pillar.
Remediation: implement internal linking plan.
B7. Pillar links to all cluster pages.
Method: review pillar pages for outbound links.
Pass: pillar links to all clusters.
Partial: pillar links to most.
Fail: pillar links to few or none.
Remediation: expand pillar internal linking.
B8. Breadcrumbs are present on every page.
Method: spot check 10 pages for visible breadcrumb navigation AND BreadcrumbList schema.
Pass: 9 of 10 or better.
Fail: breadcrumbs missing on most pages.
Remediation: implement breadcrumbs per template.
B9. Each priority page includes information gain.
Method: spot check 10 priority pages for at least one information gain element in first 30 percent.
Pass: 7 of 10 or better.
Partial: 4 to 6 of 10.
Fail: 3 or fewer.
Remediation: build information gain assets per section 13.
B10. Internal linking density meets target.
Method: crawl the site and compute average inbound internal links per priority page.
Pass: average 3 plus inbound links per priority page.
Partial: average 2 to 3.
Fail: average below 2.
Remediation: expand internal linking systematically.
Pillar C: Extractability (12 criteria)
C1. Every priority page has a 40 to 60 word lede.
Method: spot check 10 priority pages. Count words in the first paragraph.
Pass: 9 of 10 in range.
Partial: 7 of 10 in range.
Fail: most have preamble before answer, or lede is too short or too long.
Remediation: rewrite ledes per section 7.2.
C2. Lede contains the primary keyword in the first sentence.
Method: spot check 10 priority pages.
Pass: 9 of 10.
Fail: keyword absent from first sentence.
Remediation: rewrite ledes.
C3. Every H2 is phrased as a question matching a sub query.
Method: spot check 10 priority pages. Read H2 list.
Pass: 80 percent of H2s are questions.
Partial: 50 to 79 percent.
Fail: most H2s are nouns or generic labels.
Remediation: rewrite H2s per section 7.4.
C4. Lists use semantic HTML, not styled divs.
Method: view source on 10 priority pages.
Pass: all lists are <ul> or <ol>.
Fail: lists are styled divs.
Remediation: refactor to semantic HTML.
C5. Tables use semantic HTML.
Method: view source.
Pass: all tabular content uses <table>, <thead>, <tbody>, <tr>, <th>, <td>.
Fail: tables are styled divs.
Remediation: refactor.
C6. The 30 percent rule is honored on priority pages.
Method: identify the citable answer on each priority page. Confirm it appears within the first third of the page text.
Pass: 8 of 10.
Fail: most pages bury the answer.
Remediation: restructure pages.
C7. Sections are self contained.
Method: read 10 random sections from priority pages out of context.
Pass: each section reads as a complete, standalone answer.
Fail: sections require context from earlier sections.
Remediation: rewrite to remove cross references.
C8. Schema is present on every priority page.
Method: spot check 10 priority pages. Look for <script type="application/ld+json">.
Pass: 10 of 10.
Fail: missing on some pages.
Remediation: implement schema per Phase 4.
C9. Schema validates against Schema.org and Google Rich Results Test.
Method: run Google Rich Results Test on 10 priority pages.
Pass: 9 of 10 pass with no errors.
Partial: warnings present but no errors.
Fail: errors present.
Remediation: fix schema errors.
C10. Schema content matches visible page content.
Method: spot check fields like price, name, address, dateModified against visible content.
Pass: full alignment on 9 of 10.
Fail: schema describes content not visible on page.
Remediation: align schema with visible content.
C11. No deprecated schema types present.
Method: scan pages for HowTo, Practice Problem, Q and A, Book Action, Course Info, Estimated Salary, Vehicle Listing, Special Announcement, Claim Review, Learning Video.
Pass: none present, OR present but documented as intentional.
Fail: deprecated types present and producing zero rich result lift.
Remediation: remove or replace deprecated schema.
C12. Open Graph and Twitter Card tags present.
Method: spot check 10 priority pages.
Pass: og:title, og:description, og:url, og:image, twitter:card present on all.
Fail: missing on most pages.
Remediation: add tags.
Pillar D: Distribution (8 criteria)
D1. Earned media count meets target.
Method: review off page tracking log for last 12 months.
Pass: 3 plus earned media mentions.
Partial: 1 or 2.
Fail: 0.
Remediation: launch earned media outreach program.
D2. YouTube presence active.
Method: confirm own YouTube channel exists with content from last 90 days; confirm at least one third party mention.
Pass: own channel active plus third party mention.
Partial: one without the other.
Fail: no YouTube presence.
Remediation: launch YouTube content cadence.
D3. Reddit and forum mentions exist.
Method: search Reddit for client brand, capture mentions in last 12 months.
Pass: 3 plus authentic mentions.
Partial: 1 or 2.
Fail: 0.
Remediation: launch authentic Reddit participation.
D4. LinkedIn article cadence active.
Method: review LinkedIn for client author publishing.
Pass: at least 1 article per quarter.
Fail: no LinkedIn publishing.
Remediation: build LinkedIn editorial calendar.
D5. Wikipedia entity exists or has been formally proposed.
Method: search Wikipedia for the brand.
Pass: live Wikipedia article exists.
Partial: Wikipedia draft exists or notability is being built.
Fail: no Wikipedia presence and no plan.
Remediation: assess notability; if eligible, draft article.
D6. Wikidata entity exists.
Method: search Wikidata for the brand.
Pass: Wikidata Q ID assigned and linked from Organization schema sameAs.
Partial: Q ID exists but not linked from schema.
Fail: no Wikidata entity.
Remediation: create Wikidata entity per section 10.5.
D7. Local citation parity (NAP consistency) for local businesses.
Method: compare Google Business Profile, Apple Business Connect, Bing Places, Yelp, Facebook for NAP consistency.
Pass: full NAP match across all five.
Partial: minor differences (formatting, abbreviation).
Fail: significant differences (different addresses, different phone numbers).
Remediation: standardize NAP across all directories.
D8. Industry directory citations match the vertical.
Method: review citation profile against industry standard directory list.
Pass: client is listed on appropriate vertical specific directories.
Partial: some directories present, others missing.
Fail: no vertical specific citations.
Remediation: build out citation profile per vertical.
Pillar E: Measurement (8 criteria)
E1. Google Search Console verified.
Method: log in to GSC, confirm property exists.
Pass: verified.
Fail: not verified.
Remediation: verify GSC property.
E2. Bing Webmaster Tools verified.
Method: log in to BWT.
Pass: verified.
Fail: not verified.
Remediation: verify and import GSC settings.
E3. Google Analytics 4 active.
Method: confirm GA4 property exists with active data.
Pass: GA4 active with data flowing.
Partial: GA4 exists but data is sparse or misconfigured.
Fail: no GA4 or no data.
Remediation: implement or fix GA4.
E4. Google Business Profile active and verified.
Method: search GBP for the client.
Pass: verified, with hours, photos, and recent reviews.
Partial: verified but incomplete.
Fail: not claimed or not verified.
Remediation: claim and complete GBP.
E5. Manual citation sampling protocol active.
Method: confirm weekly sampling log exists.
Pass: sampling log active and current.
Partial: sampling log exists but is intermittent.
Fail: no sampling.
Remediation: implement weekly sampling.
E6. AI Overview tracking active.
Method: confirm GSC Performance reporting captures AI Overview impressions; confirm tracking tool (Semrush, Ahrefs, manual) is in use.
Pass: tracking active across at least Google AI Overviews and Bing Copilot.
Partial: GSC only.
Fail: no AI Overview tracking.
Remediation: set up tracking.
E7. Reporting cadence is on schedule.
Method: review reporting log against agreed cadence.
Pass: reports delivered on time for last 3 cycles.
Partial: 1 or 2 missed.
Fail: regular missed reports.
Remediation: re establish cadence.
E8. Citation rate is being measured and reported.
Method: review reports for citation rate metric.
Pass: citation rate appears in reports with month over month delta.
Partial: citation count appears but not rate.
Fail: no citation tracking in reports.
Remediation: add citation rate to reporting template.
The agent produces two outputs: a markdown report for the client, and a JSON report for programmatic handling.
Markdown report structure:
# SEO and Search Appearance Audit
## [Client Name]
## [Audit Date]
## Executive Summary
- Overall score: X of 50 PASS, X PARTIAL, X FAIL, X N/A
- Critical issues: [count and one line summary of the top 3]
- Quick wins: [count and one line summary of the top 3]
## Pillar Scores
- Eligibility: X of 12
- Coverage: X of 10
- Extractability: X of 12
- Distribution: X of 8
- Measurement: X of 8
## Detailed Findings
### Pillar A: Eligibility
[For each criterion: status, evidence, remediation steps]
### Pillar B: Coverage
[same pattern]
### Pillar C: Extractability
[same pattern]
### Pillar D: Distribution
[same pattern]
### Pillar E: Measurement
[same pattern]
## Prioritized Remediation Plan
[Top 10 actions in priority order, with effort estimate and expected impact]
## Appendix: Audit Methodology
[Brief explanation of the framework version and methods used]
Enter fullscreen mode Exit fullscreen mode
JSON report structure:
{
"audit_metadata": {
"framework_version": "2.0",
"domain": "",
"audited_at": "",
"auditor": "",
"engagement_id": ""
},
"summary": {
"total_criteria": 50,
"pass": 0,
"partial": 0,
"fail": 0,
"na": 0,
"score_percent": 0
},
"pillar_scores": {
"eligibility": { "max": 12, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
"coverage": { "max": 10, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
"extractability": { "max": 12, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
"distribution": { "max": 8, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
"measurement": { "max": 8, "pass": 0, "partial": 0, "fail": 0, "na": 0 }
},
"criteria": [
{
"id": "A1",
"pillar": "eligibility",
"name": "Robots.txt allows the full AI crawler set",
"status": "PASS",
"evidence": "",
"remediation": [],
"effort_hours": 0,
"impact": "high"
}
],
"prioritized_remediation": []
}
Enter fullscreen mode Exit fullscreen mode
15.5 Partial Install Audit
When the operator declares the engagement as "partial install" (the framework was started but not completed by a prior team), the audit follows the same 50 criterion structure, but:
- N/A is heavily used for criteria that depend on prior phases not yet complete.
- The remediation plan is structured as a phased re completion plan rather than 50 independent fixes.
- Output includes a clear "starting point" summary identifying which phase the client is currently in.
15.6 Audit Mode Phase Gate
Audit mode does not have a forward gate (the audit is the deliverable). It does have an exit criterion:
- All 50 criteria evaluated.
- Both markdown and JSON reports produced.
- Prioritized remediation plan delivered.
- Client has reviewed and accepted the audit.
16. Maintenance Schedule
Purpose: the framework is not a one time implementation. Maintenance is what compounds the gains over time.
16.1 Weekly
- Manual citation sampling on top 10 priority queries across Google AI Overviews, Google AI Mode, ChatGPT, Perplexity, Claude, Bing Copilot.
- Spot check schema validity on any pages modified in the last 7 days.
- Review GSC for new query opportunities (queries with 10 plus impressions and no current page assignment).
- Review GSC for sudden CTR drops (greater than 30 percent week over week on any priority page).
- Triage and respond to any Search Console manual actions or warnings.
16.2 Monthly
- Update keyword to page map with any new GSC queries that have hit 10 plus impressions during the month.
- Run cannibalization audit against latest 30 days of GSC data.
- Refresh dateModified on pages with substantive content updates during the month.
- Check for newly deprecated schema properties announced in Search Central blog.
- Server log analysis for AI crawler behavior anomalies.
- Bing Webmaster Tools and Yandex Webmaster review (if applicable).
- Off page activity log update.
- Client report production per cadence.
16.3 Quarterly
- Full sub query coverage audit per pillar.
- Topic cluster health review (consolidate, delete, or reroute thin pages).
- Off page brand mention count refresh.
- Page structure review of top 20 cited or near cited pages.
- Wikidata entity refresh.
- llms.txt content refresh.
- Re run audit mode against all 50 criteria.
- Internal linking density spot check.
- Information gain asset publication (per tier cadence).
16.4 After Every Google Core Update
- Re evaluate top 20 pages for citation rate changes.
- Identify any pages that lost rich results to deprecation.
- Check schema property updates announced in Search Central blog.
- Compare ranking volatility to industry benchmarks.
- Adjust strategy if a pattern emerges (E E A T strengthened, freshness weighted higher, etc.).
16.5 Annually
- Full author bio and credential refresh across all content.
- Domain pillar architecture review.
- Annual flagship information gain asset publication.
- Framework version bump if Google or major LLMs have shipped meaningful changes.
- Client retainer pricing review.
- Hosting and infrastructure capacity review.
17. Appendix A: Deprecated Schema Reference
Schema types deprecated as of March 2026. Listed for reference; do not implement on new pages.
Schema Type
Deprecated
Reason
Replacement
HowTo
September 2023
Rich result removed from desktop and mobile
Article with ordered list, no rich result expected
Practice Problem
January 2026
Limited adoption
Article or Quiz schema
Dataset (general search)
January 2026
Now only serves Dataset Search
Keep if relevant for Dataset Search
Sitelinks Search Box
January 2026
Integrated into core search
None needed
SpecialAnnouncement
January 2026
COVID era specific
Event schema if applicable
Q and A
January 2026
Overlap with FAQPage and Forum
FAQPage if appropriate vertical
Book Actions
January 2026
Low adoption
Product or Article schema
Course Info
January 2026
Low adoption
Article or Event schema
Claim Review
January 2026
Restricted to fact checkers
Article schema
Estimated Salary
January 2026
Low adoption
JobPosting with baseSalary
Learning Video
January 2026
Replaced by VideoObject
VideoObject
Vehicle Listing
January 2026
Low adoption
Product schema
Note: removing deprecated schema does not improve rankings (no penalty for keeping it). Removal is housekeeping. The action that matters is not implementing these on new pages.
18. Appendix B: Sub Query Generation Templates
Reusable prompts for generating fan out sub queries.
18.1 General Purpose Sub Query Prompt
Generate 15 likely sub queries that an AI search system would run
when given the prompt: "[PRIMARY_KEYWORD]".
Cover these angles when applicable:
- Definition: what is X
- Cost or pricing: how much does X cost
- Comparison: X versus Y, X alternatives
- How to: how to do X, step by step X
- When: when should I do X, when does X happen
- Where: where do I get X, where can X be done
- Who: who needs X, who provides X
- Why: why does X matter, why does X happen
- Recent changes: latest X, X in 2026, X updates
- Pros and cons: benefits of X, downsides of X
- Common mistakes: X mistakes, errors with X
- Local variations: X near me, X in [location]
- Regulatory considerations: X law, X compliance
- Examples: X example, sample X
- Outcome and expectations: results of X, what to expect from X
Output as a JSON array of strings, one sub query per element.
Plain text only, no commentary, no explanation.
Enter fullscreen mode Exit fullscreen mode
18.2 Local Service Business Variant
Generate 15 likely sub queries that an AI search system would run
when a user in [CITY], [STATE] asks about: "[PRIMARY_KEYWORD]".
Cover these angles:
- Local cost variations
- Local provider names and recommendations
- Local regulations and licensing
- Local hours and availability
- Local emergency or same day options
- Service area boundaries
- Insurance and payment specific to the region
- Local reviews and reputation
- Comparison to nearby cities
- Distance and travel considerations
Output as a JSON array of strings.
Enter fullscreen mode Exit fullscreen mode
18.3 E Commerce Variant
Generate 15 likely sub queries that an AI search system would run
when a user is shopping for: "[PRODUCT_KEYWORD]".
Cover these angles:
- Best for use case: best X for [scenario]
- Comparison shopping: X vs Y, X alternatives
- Sizing and fit (when applicable)
- Material and construction
- Warranty and support
- Reviews and durability
- Where to buy: cheapest, fastest shipping, in stock
- Used or refurbished options
- Compatible accessories
- Common defects or known issues
- Return policy considerations
- Brand reputation
- Price drops or sales
Output as a JSON array of strings.
Enter fullscreen mode Exit fullscreen mode
18.4 YMYL Variant (Legal, Medical, Financial)
Generate 15 likely sub queries that an AI search system would run
when a user asks about a YMYL topic: "[YMYL_KEYWORD]".
Cover these angles:
- Definition with disclaimers
- Symptoms, signs, or indicators (medical) or warning signs (legal, financial)
- Causes
- Treatment, remedies, or solutions
- Prevention
- When to consult a professional
- Costs and insurance
- Regulations and legal considerations
- Risks of self diagnosis or self help
- Reputable sources and second opinions
- Recent research or rulings
- Statistics and prevalence
Note: YMYL content requires strong E E A T signals.
Recommend professional consultation in answers.
Cite primary authoritative sources (NIH, AMA, IRS, court rulings, etc.).
Output as a JSON array of strings.
Enter fullscreen mode Exit fullscreen mode
Generate 15 likely sub queries that an AI search system would run
when a user asks about: "[TAX_KEYWORD]".
Cover these angles:
- Definition and basic rules
- Who must comply (income thresholds, filing status)
- When payments or filings are due (specific dates and quarters)
- How payments are calculated
- Payment methods (electronic, check, IRS Direct Pay)
- Penalties for non compliance and how they accrue
- Safe harbor rules and exemptions
- State versus federal differences
- Self employed versus W 2 employee differences
- How to estimate when income is variable
- What forms are required (1040 ES, Schedule SE, etc.)
- Common filing mistakes
- When to consult a tax professional
- Recent IRS guidance and rule changes
- Examples for different income scenarios
Output as a JSON array of strings.
Enter fullscreen mode Exit fullscreen mode
19. Appendix C: Code Snippet Library
Reusable code blocks referenced from sections 7 through 14. Copy and adapt; do not modify in place in this document.
19.1 Organization Schema with @graph Pattern (Universal)
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": "https://example.com/#organization",
"name": "Example Business Inc",
"alternateName": "Example",
"url": "https://example.com/",
"logo": {
"@type": "ImageObject",
"url": "https://example.com/assets/img/logo.png",
"width": 600,
"height": 200
},
"description": "[One sentence description of the business]",
"telephone": "+1-555-555-5555",
"email": "info@example.com",
"address": {
"@type": "PostalAddress",
"streetAddress": "123 Main Street",
"addressLocality": "Cassville",
"addressRegion": "MO",
"postalCode": "65625",
"addressCountry": "US"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": 36.6781,
"longitude": -93.8722
},
"sameAs": [
"https://www.facebook.com/examplebiz",
"https://www.linkedin.com/company/examplebiz",
"https://twitter.com/examplebiz",
"https://www.youtube.com/@examplebiz",
"https://www.wikidata.org/wiki/QXXXXXXX"
],
"founder": { "@type": "Person", "name": "Founder Name" },
"foundingDate": "2020-01-15",
"areaServed": [
{ "@type": "State", "name": "Missouri" },
{ "@type": "State", "name": "Arkansas" }
]
},
{
"@type": "WebSite",
"@id": "https://example.com/#website",
"url": "https://example.com/",
"name": "Example Business",
"publisher": { "@id": "https://example.com/#organization" }
},
{
"@type": "WebPage",
"@id": "https://example.com/page-slug/#webpage",
"url": "https://example.com/page-slug/",
"name": "Page Title",
"isPartOf": { "@id": "https://example.com/#website" },
"datePublished": "2026-01-15T08:00:00-06:00",
"dateModified": "2026-05-03T14:30:00-05:00"
}
]
}
</script>
Enter fullscreen mode Exit fullscreen mode
19.2 LocalBusiness Schema (Service Area Business)
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "ProfessionalService",
"@id": "https://example.com/#localbusiness",
"name": "Example Professional Service",
"image": "https://example.com/assets/img/office.jpg",
"telephone": "+1-555-555-5555",
"priceRange": "$$",
"address": {
"@type": "PostalAddress",
"streetAddress": "123 Main Street",
"addressLocality": "Cassville",
"addressRegion": "MO",
"postalCode": "65625",
"addressCountry": "US"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": 36.6781,
"longitude": -93.8722
},
"url": "https://example.com/",
"openingHoursSpecification": [
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "08:00",
"closes": "17:00"
}
],
"areaServed": [
{
"@type": "GeoCircle",
"geoMidpoint": {
"@type": "GeoCoordinates",
"latitude": 36.6781,
"longitude": -93.8722
},
"geoRadius": "80467"
}
]
}
</script>
Enter fullscreen mode Exit fullscreen mode
19.3 Person Schema (Author or Team Member)
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Person",
"@id": "https://example.com/team/joseph-anady/#person",
"name": "Joseph Anady",
"givenName": "Joseph",
"familyName": "Anady",
"url": "https://example.com/team/joseph-anady/",
"image": "https://example.com/assets/img/team/joseph-anady.jpg",
"jobTitle": "Founder and Lead Developer",
"worksFor": { "@id": "https://example.com/#organization" },
"alumniOf": [
{
"@type": "CollegeOrUniversity",
"name": "Colorado State University"
}
],
"hasCredential": [
{
"@type": "EducationalOccupationalCredential",
"credentialCategory": "degree",
"name": "BA Computer Engineering"
},
{
"@type": "EducationalOccupationalCredential",
"credentialCategory": "degree",
"name": "MA Cybersecurity"
},
{
"@type": "EducationalOccupationalCredential",
"credentialCategory": "certification",
"name": "Service Disabled Veteran Owned Small Business (SDVOSB)"
}
],
"sameAs": [
"https://www.linkedin.com/in/josephanady",
"https://www.wikidata.org/wiki/Q138610626"
]
}
</script>
Enter fullscreen mode Exit fullscreen mode
19.4 Article Schema (Content Page)
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"@id": "https://example.com/blog/article-slug/#article",
"headline": "[Article Headline, under 110 characters]",
"description": "[Article description, under 250 characters]",
"image": ["https://example.com/assets/img/article-hero.jpg"],
"datePublished": "2026-05-03T08:00:00-05:00",
"dateModified": "2026-05-03T14:30:00-05:00",
"author": { "@id": "https://example.com/team/joseph-anady/#person" },
"publisher": { "@id": "https://example.com/#organization" },
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://example.com/blog/article-slug/"
},
"articleSection": "[Section name]",
"keywords": ["keyword one", "keyword two", "keyword three"],
"wordCount": 2400,
"inLanguage": "en-US"
}
</script>
Enter fullscreen mode Exit fullscreen mode
19.5 Product Schema (E Commerce)
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"@id": "https://example.com/products/product-slug/#product",
"name": "Product Name",
"image": [
"https://example.com/assets/img/product-1.jpg",
"https://example.com/assets/img/product-2.jpg"
],
"description": "[Product description]",
"sku": "PROD-12345",
"mpn": "MFR-67890",
"brand": { "@type": "Brand", "name": "Brand Name" },
"offers": {
"@type": "Offer",
"url": "https://example.com/products/product-slug/",
"priceCurrency": "USD",
"price": "997",
"priceValidUntil": "2026-12-31",
"availability": "https://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition",
"seller": { "@id": "https://example.com/#organization" }
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.8",
"reviewCount": "47",
"bestRating": "5",
"worstRating": "1"
}
}
</script>
Enter fullscreen mode Exit fullscreen mode
19.6 Service Schema with Tier Catalog
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Service",
"@id": "https://example.com/services/web-development/#service",
"name": "Custom Website Development",
"description": "[Description of the service]",
"provider": { "@id": "https://example.com/#organization" },
"areaServed": [
{ "@type": "State", "name": "Missouri" },
{ "@type": "State", "name": "Arkansas" }
],
"serviceType": "Web Development",
"offers": {
"@type": "Offer",
"priceCurrency": "USD",
"price": "997"
},
"hasOfferCatalog": {
"@type": "OfferCatalog",
"name": "Web Development Tiers",
"itemListElement": [
{
"@type": "Offer",
"itemOffered": { "@type": "Service", "name": "Custom Website" },
"price": "597",
"priceCurrency": "USD"
},
{
"@type": "Offer",
"itemOffered": { "@type": "Service", "name": "Website plus SEO and AEO" },
"price": "797",
"priceCurrency": "USD"
},
{
"@type": "Offer",
"itemOffered": { "@type": "Service", "name": "Full Digital Presence" },
"price": "997",
"priceCurrency": "USD"
}
]
}
}
</script>
Enter fullscreen mode Exit fullscreen mode
19.7 BreadcrumbList Schema
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{ "@type": "ListItem", "position": 1, "name": "Home", "item": "https://example.com/" },
{ "@type": "ListItem", "position": 2, "name": "Services", "item": "https://example.com/services/" },
{ "@type": "ListItem", "position": 3, "name": "Web Development" }
]
}
</script>
Enter fullscreen mode Exit fullscreen mode
Note: the last item has no item property because it represents the current page.
19.8 VideoObject Schema
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "VideoObject",
"name": "[Video Title]",
"description": "[Video Description]",
"thumbnailUrl": "https://example.com/assets/img/video-thumb.jpg",
"uploadDate": "2026-05-03T08:00:00-05:00",
"duration": "PT5M30S",
"contentUrl": "https://example.com/assets/video/video.mp4",
"embedUrl": "https://www.youtube.com/embed/VIDEO_ID",
"publisher": { "@id": "https://example.com/#organization" }
}
</script>
Enter fullscreen mode Exit fullscreen mode
19.9 FAQPage Schema (Use Sparingly per 8.1)
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "[Question text]",
"acceptedAnswer": {
"@type": "Answer",
"text": "[Answer text, full sentence answer]"
}
}
]
}
</script>
Enter fullscreen mode Exit fullscreen mode
Reminder: as of March 2026, FAQ rich results are restricted primarily to government and authoritative health sites. Other sites still benefit from FAQPage schema as an AI trust signal, but should not expect a SERP rich result.
19.10 Robots.txt Default
# robots.txt for [domain]
# Last updated: [YYYY-MM-DD]
User-agent: Googlebot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Bingbot
Allow: /
User-agent: DuckDuckBot
Allow: /
User-agent: YandexBot
Allow: /
User-agent: Baiduspider
Allow: /
User-agent: Applebot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Anthropic-AI
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: CCBot
Allow: /
User-agent: Bytespider
Disallow: /
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /checkout/
Disallow: /cart/
Sitemap: https://[domain]/sitemap.xml
Enter fullscreen mode Exit fullscreen mode
19.11 Llms.txt Template
# [Site Name]
> [One sentence description of the site and its core authority area]
## [Primary Section, e.g., Service Lines]
- [Title 1](https://[domain]/url-1/): [Brief description]
- [Title 2](https://[domain]/url-2/): [Brief description]
## [Secondary Section, e.g., Resources]
- [Title 3](https://[domain]/url-3/): [Brief description]
## About
- [About Page](https://[domain]/about/): [One sentence summary]
- [Contact](https://[domain]/contact/): [contact info]
Enter fullscreen mode Exit fullscreen mode
19.12 Sitemap Index XML
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://[domain]/sitemap-pages.xml</loc>
<lastmod>YYYY-MM-DD</lastmod>
</sitemap>
<sitemap>
<loc>https://[domain]/sitemap-images.xml</loc>
<lastmod>YYYY-MM-DD</lastmod>
</sitemap>
<sitemap>
<loc>https://[domain]/sitemap-videos.xml</loc>
<lastmod>YYYY-MM-DD</lastmod>
</sitemap>
</sitemapindex>
Enter fullscreen mode Exit fullscreen mode
19.13 Standard HTML Page Template (Static Stack)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>[Primary Keyword] | [Brand Name]</title>
<meta name="description" content="[150 to 160 character description with primary keyword in first 90 characters]">
<link rel="canonical" href="https://[domain]/[path]/">
<meta property="og:title" content="[Title]">
<meta property="og:description" content="[Description]">
<meta property="og:url" content="https://[domain]/[path]/">
<meta property="og:image" content="https://[domain]/assets/img/og-image.jpg">
<meta property="og:type" content="article">
<meta property="og:site_name" content="[Brand Name]">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="[Title]">
<meta name="twitter:description" content="[Description]">
<meta name="twitter:image" content="https://[domain]/assets/img/twitter-card.jpg">
<!-- Sitewide Organization, WebSite, BreadcrumbList @graph (see 19.1) -->
<!-- Page specific Article schema (see 19.4) -->
<link rel="stylesheet" href="/assets/css/styles.css">
<link rel="icon" type="image/png" href="/favicon.ico">
</head>
<body>
<header>
<!-- Site nav -->
</header>
<nav aria-label="Breadcrumb">
<ol>
<li><a href="/">Home</a></li>
<li><a href="/[pillar]/">[Pillar Name]</a></li>
<li aria-current="page">[Current Page]</li>
</ol>
</nav>
<main>
<article>
<h1>[Primary Keyword Phrased as a Headline]</h1>
<p class="lede"><strong>[40 to 60 word direct answer]</strong></p>
<p>[2 to 3 sentence expansion]</p>
<h2>[Question matching sub query 1]</h2>
<p>[40 to 60 word answer]</p>
<p>[Supporting context]</p>
<h2>[Question matching sub query 2]</h2>
<ol>
<li>[Step 1]</li>
<li>[Step 2]</li>
<li>[Step 3]</li>
</ol>
<h2>[Question matching sub query 3]</h2>
<table>
<thead>
<tr><th>Feature</th><th>Option A</th><th>Option B</th></tr>
</thead>
<tbody>
<tr><td>[Row]</td><td>[Value]</td><td>[Value]</td></tr>
</tbody>
</table>
<h2>Frequently Asked Questions</h2>
<h3>[FAQ 1]</h3>
<p>[Answer]</p>
<h3>[FAQ 2]</h3>
<p>[Answer]</p>
<aside class="author-bio">
<p>Written by <a href="/team/[author-slug]/">[Author Name]</a>, [credentials].
[One sentence biography]. Last updated [Month Day, Year].</p>
</aside>
</article>
</main>
<footer>
<p>Crafted by <a href="https://thatdeveloperguy.com/">ThatDeveloperGuy.com</a>.</p>
</footer>
<script src="/assets/js/main.js" defer></script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode
19.14 Audit Bash Helpers
#!/bin/bash
# Quick eligibility audit script for any domain
DOMAIN="$1"
if [ -z "$DOMAIN" ]; then
echo "Usage: $0 <domain>"
exit 1
fi
echo "=== Robots.txt check ==="
curl -s "https://$DOMAIN/robots.txt" | head -50
echo ""
echo "=== Llms.txt check ==="
curl -sI "https://$DOMAIN/llms.txt" | head -3
echo ""
echo "=== Sitemap check ==="
curl -sI "https://$DOMAIN/sitemap.xml" | head -3
echo ""
echo "=== Canonical check (homepage) ==="
curl -s "https://$DOMAIN/" | grep -i 'rel="canonical"' | head -3
echo ""
echo "=== Schema presence check (homepage) ==="
SCHEMA_COUNT=$(curl -s "https://$DOMAIN/" | grep -c 'application/ld+json')
echo "Schema script tags found: $SCHEMA_COUNT"
echo ""
echo "=== HTTP version check ==="
curl -sI --http2 "https://$DOMAIN/" | head -1
curl -sI --http3 "https://$DOMAIN/" 2>/dev/null | head -1
echo ""
echo "=== Mobile rendering check ==="
curl -s -A "Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36" "https://$DOMAIN/" | \
grep -ic '<h1\|<main\|<article'
echo ""
echo "=== AI crawler reading mode test (GPTBot) ==="
curl -s -A "GPTBot" "https://$DOMAIN/" | head -100 | grep -ic '<h1\|<main\|<article'
echo ""
echo "=== Server log AI crawler activity (last 10000 lines) ==="
if [ -r /var/log/nginx/access.log ]; then
sudo tail -10000 /var/log/nginx/access.log 2>/dev/null | \
grep -E "GPTBot|ClaudeBot|PerplexityBot|OAI-SearchBot|Google-Extended|Bingbot" | \
awk '{print $12, $13, $14, $15}' | sort | uniq -c | sort -rn | head -20
else
echo "Log file not readable from this context"
fi
Enter fullscreen mode Exit fullscreen mode
19.15 Citation Sampling Script
#!/bin/bash
# Manual citation sampling helper
# Run this weekly with a list of priority queries
QUERIES_FILE="$1"
if [ -z "$QUERIES_FILE" ]; then
echo "Usage: $0 <queries.txt>"
exit 1
fi
OUTPUT_FILE="citation-sample-$(date +%Y%m%d).md"
cat > "$OUTPUT_FILE" << EOF
# Citation Sample
## $(date +%Y-%m-%d)
For each query, manually run on each surface and record:
- Cited (URL appears as a source)
- Mentioned (brand name appears in answer text without link)
- Not present
| Query | Google AI Overview | Google AI Mode | ChatGPT | Perplexity | Claude | Bing Copilot |
|-------|--------------------|----------------|---------|------------|--------|--------------|
EOF
while IFS= read -r query; do
echo "| $query | | | | | | |" >> "$OUTPUT_FILE"
done < "$QUERIES_FILE"
echo "Created $OUTPUT_FILE. Fill in manually and commit to engagement notes."
Enter fullscreen mode Exit fullscreen mode
#!/bin/bash
# GSC export helper using Google Search Console API
# Requires gcloud auth and Search Console API enabled
PROPERTY="$1"
START_DATE="$2"
END_DATE="$3"
if [ -z "$PROPERTY" ] || [ -z "$START_DATE" ] || [ -z "$END_DATE" ]; then
echo "Usage: $0 <property-url> <start-date> <end-date>"
echo "Example: $0 'https://example.com/' 2026-04-01 2026-04-30"
exit 1
fi
# Get OAuth token
TOKEN=$(gcloud auth print-access-token)
# Export queries dimension
curl -s -X POST \
"https://searchconsole.googleapis.com/webmasters/v3/sites/$(printf %s "$PROPERTY" | jq -sRr @uri)/searchAnalytics/query" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"startDate\": \"$START_DATE\",
\"endDate\": \"$END_DATE\",
\"dimensions\": [\"query\"],
\"rowLimit\": 25000
}" > "gsc-queries-$START_DATE-to-$END_DATE.json"
# Export pages dimension
curl -s -X POST \
"https://searchconsole.googleapis.com/webmasters/v3/sites/$(printf %s "$PROPERTY" | jq -sRr @uri)/searchAnalytics/query" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"startDate\": \"$START_DATE\",
\"endDate\": \"$END_DATE\",
\"dimensions\": [\"page\"],
\"rowLimit\": 25000
}" > "gsc-pages-$START_DATE-to-$END_DATE.json"
# Export search appearance dimension (cannot combine with other dimensions per API constraint)
curl -s -X POST \
"https://searchconsole.googleapis.com/webmasters/v3/sites/$(printf %s "$PROPERTY" | jq -sRr @uri)/searchAnalytics/query" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"startDate\": \"$START_DATE\",
\"endDate\": \"$END_DATE\",
\"dimensions\": [\"searchAppearance\"],
\"rowLimit\": 25000
}" > "gsc-search-appearance-$START_DATE-to-$END_DATE.json"
echo "Exports written to current directory."
echo "Note: Search Appearance must be queried separately and cannot be combined with other dimensions."
Enter fullscreen mode Exit fullscreen mode
20. Appendix D: 2026 Data Citations
The framework's strategic decisions are anchored in measured industry data. Sources are listed here for verification and update. Refresh this section quarterly as new studies are published.
20.1 AI Overview and AI Mode Coverage
- AI Overviews appear on about 48 percent of all Google searches (Q1 2026). Source: Digital Applied, BrightEdge AI Overview Impact Report.
- AI Overviews appear on 70 plus percent of informational and how to queries. Source: BrightEdge.
- Google AI Mode launched limited availability May 2025, expanded globally through 2025, 75 million daily active users by January 2026, processing over 1 billion queries per month. Source: Digital Applied, ALM Corp.
- AI Mode runs on Gemini 3 Pro with Personal Intelligence integration since January 22, 2026. Source: ALM Corp.
20.2 Click and Conversion Behavior
- 93 percent of AI Mode queries result in zero clicks. Source: Seer Interactive 25.1 million impression study.
- Organic CTR drops up to 61 percent on queries with AI Overviews. Source: Seer Interactive (1.76 percent baseline to 0.61 percent with AI Overview).
- Sites cited in AI Overviews see 35 percent more clicks than non cited top 10 results. Source: Seer Interactive.
- Cited visitors convert at about 23 times the rate of standard search visitors. Source: Seer Interactive, Alhena, GeoLikeAPro.
- 58.5 percent of all searches now end without a click. Source: SparkToro/Datos Q2 2025 zero click study.
20.3 Citation Decoupling
- 68 percent of pages cited in AI Overviews are NOT in the top 10 organic results. Source: Surfer SEO December 2025 study of 173,902 URLs across 10,000 keywords.
- Only 38 percent of pages cited in AI Overviews also rank in the top 10, down from 76 percent seven months earlier. Source: Ahrefs February 2026 study of 863,000 keywords.
- 25 to 39 percent overlap between traditional Google rankings and AI search citations. Source: Mike King, SparkToro Office Hours, January 2026.
- Only 13.7 percent citation overlap between AI Overviews and AI Mode. Source: Ahrefs December 2025.
- Brands relying solely on traditional SEO miss 87.5 to 89.8 percent of AI citation opportunities. Source: Ekamoira Topical Coverage Gap research, synthesizing Mike King and Surfer SEO data.
20.4 Citation Volatility
- AI Overview content changes 70 percent of the time for the same query. Source: Ahrefs November 2025.
- 45.5 percent of AI Overview citations get replaced when the answer regenerates. Source: Ahrefs November 2025.
- AI Mode self overlap on the same query run three times: 9.2 percent. Source: SE Ranking August 2025.
- Less than 1 in 100 chance ChatGPT or Google AI returns the same brand list twice across 100 runs. Source: SparkToro January 2026.
20.5 Mention Versus Citation
- AI Mode cites sources 76.3 percent of the time, mentions brands 37.6 percent. Source: Growth Memo April 2026.
- AI Overviews cite sources 84.9 percent of the time, mention brands 61 percent. Source: Growth Memo April 2026.
- AI systems use content aggregators (Medium, Wikipedia, Wired) as sources but rarely mention them. Source: Growth Memo April 2026.
20.6 Query Fan Out
- Google AI Mode fires 9 to 11 parallel sub queries per user prompt; some studies measure up to 16. Source: upGrowth, SE Ranking, SEO.com.
- ChatGPT runs 2.3 to 2.8 sub queries per prompt. Source: upGrowth.
- E commerce: 18 to 22 sub queries per prompt with 61 percent citation rate. Source: Go Fish Digital cited via Wellows.
- Healthcare: 22 to 28 sub queries with 48 percent citation rate. Source: Wellows.
- Finance: 16 to 20 sub queries with 52 percent citation rate. Source: Wellows.
20.7 Content Format and Citation Patterns
- 44.2 percent of all LLM citations come from the first 30 percent of a page's text. Source: Position Digital April 2026.
- Pages above 20,000 characters average about 10 AI citations each. Pages under 500 characters average 2.39. Source: Digital Applied.
- ChatGPT prefers focused shorter content; pages covering 26 to 50 percent of fan out sub queries get cited more than pages covering 100 percent. Source: Growth Memo April 2026.
- Pages with semantically relevant title and URL slug are more likely to be cited by ChatGPT. Source: Ahrefs April 2026.
- AI cites pages that are 25.7 percent fresher than traditional search surfaces. Source: Ahrefs.
20.8 Off Page and Distribution
- Earned media distribution can lift AI citations by up to 325 percent versus owned site only. Source: Stacker December 2025.
- YouTube mentions and branded web mentions are the top correlated factors with AI brand visibility across ChatGPT, AI Mode, and AI Overviews. Source: Ahrefs December 2025.
- 92 percent of the time, ChatGPT agents rely on the Bing Search API. Source: Search Engine Land October 2025.
- 46 percent of ChatGPT bot visits begin in reading mode (plain HTML). Source: Search Engine Land October 2025.
- 63 percent of ChatGPT agents leave immediately after landing. Source: Search Engine Land October 2025.
20.9 Long Tail and Keyword Statistics
- 91.8 percent of all searches are long tail (3 plus words). Source: Whitehat SEO B2B Guide, multiple corroborating studies.
- Long tail keywords convert at 2.5 times the rate of head terms. Source: Yotpo, W3era, multiple e commerce studies.
- AI search intent breakdown: Informational 34.28 percent, Comparative/Selection 23.82 percent, Acquisition 16.44 percent. Source: Ignite Visibility benchmarks.
20.10 Schema and Rich Results
- HowTo rich results deprecated September 2023, removed from desktop and mobile. Source: Google Search Central.
- 7 schema types deprecated January 2026: Practice Problem, Dataset (general search), Sitelinks Search Box, SpecialAnnouncement, Q and A, Book Actions, Course Info, Claim Review, Estimated Salary, Learning Video, Vehicle Listing. Source: Google Search Central blog November 2025 announcement, ALM Corp coverage.
- March 2026 core update: FAQ rich result impressions dropped nearly 50 percent. HowTo rich results disappeared from supplementary content. Review schema demoted on editorial comparison posts. Source: Digital Applied March 2026.
- FAQPage rich results restricted primarily to government and authoritative health sites since 2023. Source: Google Search Central.
- Pages with structured data earn 35 percent higher CTR from rich results when displayed. Source: Digital Applied.
20.11 Search Engine Market Share (March 2026)
- Google: 90.01 percent global. Source: StatCounter March 2026.
- Bing: 4.98 percent global, ~7 percent US. Source: StatCounter, Microsoft Q2 FY26 earnings.
- Yandex: 1.34 percent global, 65 to 72 percent in Russia. Source: StatCounter.
- Yahoo: 1.39 percent global (uses Bing index). Source: StatCounter.
- DuckDuckGo: 0.76 percent global, 1.84 percent US. Source: StatCounter, DuckDuckGo Traffic.
- Baidu: 0.55 percent global, 53 percent in China. Source: StatCounter.
- Brave Search: independent 30 billion page index, 50 million daily queries. Source: Brave.
20.12 AI Engine Usage
- ChatGPT: 800 million weekly active users, 2 to 2.5 billion daily prompts, 65 percent of which qualify as search. Source: OpenAI public reporting, Jasper.
- Perplexity: 33 million plus monthly active users. Source: Perplexity public reporting.
- Total search usage (combined search engines plus LLM search) up 26 percent worldwide and 16 percent in US. Source: Graphite March 2026.
End of Framework Document
This document is version 2.0, last updated 2026-05-03. The next scheduled review is 2026-08-03 (quarterly cadence). Framework version bumps occur when Google or major LLM providers ship meaningful changes to ranking, citation, or schema behavior.
For corrections, additions, or vertical specific extensions, contact admin@thatdeveloperguy.com.
Crafted by ThatDeveloperGuy.com.