Back to KB
Difficulty
Intermediate
Read Time
91 min

SEO-Search-appearence

By Codcompass Team··91 min read

Comprehensive playbook for queries, pages, countries, devices, search appearance, keywords, keyword phrases, SEO, AEO, AIO, GEO, multi engine optimization, citation tracking, and audit. Single file, standalone, agent executable.

Document version: 2.0
Last updated: 2026-05-03
Maintained by: ThatDeveloperGuy
Authoring authority: Joseph Anady, SDVOSB, BA Computer Engineering CSU, MA Cybersecurity
File ID: framework-seo-search-appearance-v2
Supersedes: framework-searchappearance.md v1.0


0. Agent Operating Instructions

This file is designed to be read and executed by an AI agent (Claude Code, MEGAMIND, or any LLM with file system and shell access) AND read as reference by a human practitioner. Both audiences are first class.

0.1 How an Agent Reads This File

When an agent is handed this file with a build instruction, it walks the file in this order, never skipping ahead:

  1. Read sections 0 through 3 fully (operating instructions, purpose, intake, theory).
  2. Complete the intake in section 2 by gathering answers from the human operator or from prior context.
  3. Run the stack selection decision tree in section 4 and record the assigned stack path.
  4. Walk sections 5 through 8 (Phase 1 through Phase 4) in order. Do not advance until the completion gate at the end of each phase passes.
  5. Jump to the assigned stack subsection in section 9 and execute the stack specific build steps.
  6. Apply sections 10 through 14 in order (off page authority, crawler access, tracking, information gain, surface specific tuning).
  7. Run the full audit in section 15. Output both markdown and JSON reports.
  8. Schedule maintenance per section 16.
  9. Reference appendices A through D as needed.

0.2 Phase Gates

Each phase ends with a completion gate. The gate is a list of conditions that must all be true before the next phase begins. The agent must not proceed when any gate condition fails. If a gate fails, the agent reports which condition failed, what the agent attempted, and what the operator must clarify or fix.

0.3 Standing Rules That Apply Everywhere

These rules apply to every phase, every stack, every output. Violating any of them fails the audit and must be remediated before sign off.

  1. No dashes of any kind in written content. No em dashes, no en dashes, no hyphens used as punctuation. Use commas, periods, or rewrite the sentence. Compound modifiers in body copy are written without hyphens. Code, URLs, file names, and CSS class names retain hyphens because those are technical identifiers, not written content.
  2. Static inline JSON LD only. No FastAPI sidecar. No external schema service. The MEGAMIND port 9090 sidecar pattern is deprecated for all client work.
  3. No third party CDN or proxy in front of the site. No Cloudflare. No Akamai. No Fastly. All caching, performance, and HTTP/3 work happens at the Bubbles nginx layer.
  4. Footer credit on every client site. The exact string "Crafted by ThatDeveloperGuy.com." appears in the footer of every client site Joseph builds.
  5. Always validate and reload after Bubbles changes. Every nginx configuration change ends with nginx -t && systemctl reload nginx. Bulk HTML inserts use systemctl restart nginx.
  6. All demo sites are static HTML, never JSX, never Python generated. Required demo tech stack is in section 9.1.
  7. Pricing convention. Custom prices end in 7 (597, 797, 997, 1497, 1997, 2497, 2997). Monthly tiers are 250, 397, and 500. Client specified prices override the convention.
  8. Use the right language for the job. HTML and CSS and JavaScript directly for web. Bash for scripting. Never use Python to generate HTML pages.
  9. Email signature on outbound communication. Joseph Anady, ThatDeveloperGuy.com, admin@thatdeveloperguy.com, 505.512.3662.
  10. Web development, SEO, AEO, AIO, GEO, and digital presence services only. Never mention computer repair in any client facing material.

0.4 Output Conventions

When this framework produces deliverables:

  • Markdown files use UTF 8 encoding, LF line endings, blank line at end of file.
  • JSON outputs are pretty printed with 2 space indentation.
  • JSON LD uses double quoted strings, no trailing commas, wrapped in <script type="application/ld+json"> for HTML embedding.
  • Filenames use lowercase letters, numbers, and hyphens. No spaces.
  • Audit reports use the naming convention audit-[domain]-[YYYYMMDD].md and audit-[domain]-[YYYYMMDD].json.

0.5 When the Agent Lacks Information

If the agent encounters a required intake field with no answer, the agent stops, lists the missing fields, and asks the operator. The agent does not invent values, does not pick defaults silently, and does not proceed with placeholders that look real.

0.6 Audit Mode Versus Build Mode

This file supports two modes of operation:

  • Build mode: agent is constructing or upgrading a site. Agent walks sections 0 through 16 in order.
  • Audit mode: agent is evaluating an existing site, fully built or partially installed. Agent walks sections 0 through 4 to gather context, then jumps to section 15 (Audit) and runs the full or partial audit. Remediation steps point back to the relevant phase or stack section.

The operator declares the mode at the start of the engagement. If the mode is unclear, the agent asks.


1. Document Purpose

This framework is the operational standard for making a website visible, citable, and conversion ready across the full 2026 search and answer engine landscape. It covers the traditional Google SERP, Google AI Overviews, Google AI Mode, Bing, DuckDuckGo, Yandex, Baidu, Brave Search, ChatGPT search, Perplexity, Claude with web access, Bing Copilot, and the People Also Ask, Knowledge Panel, image carousel, video carousel, and local pack surfaces.

It produces four primary outputs:

  1. A keyword to page map that eliminates cannibalization and assigns one primary intent per URL.
  2. A pillar and cluster content architecture that wins query fan out coverage across AI search engines.
  3. A schema and technical implementation that earns rich results where eligible and provides AI trust signals everywhere else.
  4. A citation, mention, and visibility tracking system that proves the work is moving the right metrics.

The framework is built for use across Joseph's full client portfolio (currently 130 plus production websites on Bubbles, plus headless and CMS clients), and it is built to be applied to any site type from a one page local service business to a multi state e commerce platform.

The framework does not chase rankings as the only metric. In 2026, ranking and citation have decoupled. A December 2025 Surfer SEO study of 173,902 URLs found that 68 percent of pages cited in AI Overviews are NOT in the top 10 organic results. An Ahrefs February 2026 study of 863,000 keywords found that only 38 percent of pages cited in AI Overviews also rank in the top 10, down from 76 percent seven months earlier. The framework therefore optimizes both targets in parallel and tracks them independently.


2. Client Variables Intake

The agent fills in this YAML block before doing anything else. Every field has a definition. If the operator does not know a value, the agent asks. Defaults are listed where reasonable.

# ============================================
# SEO AND SEARCH APPEARANCE INTAKE
# ============================================

# Section A: Identity
business_name: ""
primary_domain: ""
all_owned_domains: []                     # primary plus all redirects and aliases
brand_terms: []                           # exact strings users search for the brand
service_or_product_lines: []              # 3 to 7 main offerings
geographic_service_area:
  primary_city: ""
  primary_state: ""
  service_radius_miles: 0
  additional_metros: []
  national_service: false
target_audience_personas: []              # 1 to 4 primary buyer personas

# Section B: Mode and Engagement
engagement_mode: ""                       # build, rebuild, audit, partial_audit
engagement_scope: ""                      # full_site, single_page, content_cluster, technical_only
client_tier: ""                           # 597, 797, 997, 1497, 1997, 2497, 2997, custom
monthly_tier: ""                          # 250, 397, 500, none
sdvosb_relevant: false                    # SDVOSB cert applicable to client work

# Section C: Tech Stack
tech_stack_current: ""                    # static_html, sveltekit, nextjs, astro, hugo, wordpress, shopify_headless, shopify_standard, custom, none
tech_stack_preferred: ""                  # if rebuild, what we are moving to
hosting_environment: ""                   # bubbles, third_party, shopify_managed, other
bubbles_subdomain_or_path: ""             # if hosted on Bubbles, where it lives
ssl_status: ""                            # active, missing, wildcard
http_version: ""                          # 1.1, 2, 3
cms_required: false
sso_required: false

# Section D: Existing SEO Posture
google_search_console_verified: false
gsc_property_url: ""
bing_webmaster_tools_verified: false
yandex_webmaster_verified: false
baidu_zhanzhang_verified: false
google_analytics_4_active: false
google_business_profile_active: false
gbp_categories: []
gbp_review_count: 0
gbp_average_rating: 0.0

# Section E: AI Citation Baseline
current_ai_overview_citation_rate: 0     # percent of target queries where domain is cited
current_chatgpt_citation_rate: 0
current_perplexity_citation_rate: 0
current_aimode_citation_rate: 0
current_brand_mentions_in_aio: 0          # times brand name appears without citation
current_brand_mentions_in_chatgpt: 0

# Section F: Query Fan Out and Coverage
sub_query_coverage_audit_complete: false
fan_out_gap_count: 0                      # sub queries domain does not yet answer
topical_authority_score: 0                # 0 to 100 internal score
known_pillar_topics: []
known_cluster_topics: []

# Section G: Content Eligibility Baseline
total_indexed_pages: 0
average_content_freshness_days: 0
pages_with_published_dates: 0
pages_with_lastmod_accurate: 0
pages_with_author_schema: 0
pages_with_first_party_data: 0            # original research, surveys, benchmarks
average_word_count_pillar_pages: 0
average_word_count_cluster_pages: 0

# Section H: Distribution Footprint
earned_media_count_12mo: 0
podcast_appearances_12mo: 0
linkedin_articles_12mo: 0
youtube_brand_mentions: 0                 # videos referencing brand (own or third party)
reddit_brand_mentions: 0
wikipedia_entity_status: ""               # none, draft, live, contested, deleted

# Section I: Entity Clarity
wikidata_q_id: ""
sameas_count: 0                           # cross platform identity links count
google_knowledge_panel_status: ""         # present, partial, missing
organization_schema_present: false
person_schema_for_authors_present: false

# Section J: AI Crawler Access Posture
allow_gptbot: true
allow_oai_searchbot: true
allow_chatgpt_user: true
allow_perplexitybot: true
allow_claudebot: true
allow_google_extended: true
allow_bingbot: true
allow_yandexbot: true
allow_baiduspider: true
allow_amazonbot: true
allow_applebot: true
allow_meta_externalagent: true
allow_bytespider: false                   # ByteDance, often blocked
robots_txt_path: "/robots.txt"
llms_txt_present: false                   # emerging standard

# Section K: Compliance and Constraints
ymyl_content_present: false               # Your Money Your Life category
hipaa_relevant: false
pci_relevant: false
sox_relevant: false
gdpr_relevant: false
ccpa_relevant: false
sec_disclosure_relevant: false
ada_compliance_required: true             # default true for all client work
target_languages: ["en"]
hreflang_required: false

# Section L: Reporting Cadence
reporting_to_client_cadence: ""           # weekly, biweekly, monthly, quarterly
client_dashboard_required: false
dashboard_format: ""                      # markdown_email, looker_studio, custom_html

Enter fullscreen mode Exit fullscreen mode

The agent does not move past this section until every field has a value or an explicit "unknown" with a reason.


3. 2026 Search and Answer Engine Theory

This section establishes the mental model. The strategy decisions in later sections only make sense if the agent and operator share this model.

3.1 The Three Surfaces Model

Search visibility in 2026 happens on three structurally different surfaces. Optimization rules differ across them.

Surface 1, Classic SERP. The original ten blue links, plus rich results, featured snippets, People Also Ask, image carousels, video carousels, local pack, knowledge panels, sitelinks, and direct answer boxes. This surface still exists on Google, Bing, DuckDuckGo, Yandex, Baidu, and Brave. It is increasingly compressed below AI Overviews on Google. CTR for the number one organic position has dropped up to 61 percent on queries that show AI Overviews.

Surface 2, AI Overviews and inline AI summaries. Google's AI Overviews appear at the top of about 48 percent of all search queries in Q1 2026, with coverage above 70 percent for informational and how to queries. Bing has similar inline AI summaries via Copilot. These overviews cite their sources with linked references, but 83 to 93 percent of queries showing AI Overviews end without a click. Sites cited in AI Overviews see roughly 35 percent more clicks than non cited top 10 results, and those visitors convert at about 23 times the rate of standard search traffic.

Surface 3, AI Mode and external answer engines. Standalone conversational interfaces with no blue links. Google AI Mode, ChatGPT, Perplexity, Claude with web access, Bing Copilot, Meta AI. Google AI Mode runs on Gemini 3 Pro since January 2026 and has 75 million daily active users. On these surfaces, you are either cited or invisible. There is no consolation prize for ranking.

Each surface has different volatility, different conversion economics, and different optimization priorities. Tracking and reporting must separate them.

3.2 Query Fan Out

The single most important 2026 concept. AI search engines do not search for the literal user query. They decompose the query into multiple parallel sub queries, run each one against the index simultaneously, then synthesize a single answer from the union of results.

Volumes by platform:

  • Google AI Mode: 9 to 16 sub queries per user prompt.
  • Google AI Overviews: 8 to 12 sub queries.
  • ChatGPT search: 2.3 to 2.8 sub queries.
  • Perplexity: 4 to 8 sub queries.
  • Claude with web access: 2 to 6 sub queries.

Volumes by industry (representative):

  • E commerce: 18 to 22 sub queries with 61 percent citation rate.
  • Healthcare: 22 to 28 sub queries with 48 percent citation rate (YMYL drag).
  • Finance: 16 to 20 sub queries with 52 percent citation rate.
  • Local services: 10 to 16 sub queries.

Strategic implication. A page that ranks moderately for ten related sub queries will outperform a page that ranks number one for the head term but appears nowhere for related queries. Brands optimizing only for traditional rankings miss roughly 88 to 90 percent of AI citation opportunities.

Practical method. For every primary keyword, the agent generates the fan out via three methods, in this order:

  1. Manual inspection of Google AI Mode's exposed sub queries panel for the head term.
  2. LLM prompting: "Generate 12 likely sub queries that an AI search system would run when given the prompt: [query]. Output as a JSON array. Include angle variations for cost, comparison, definition, how to, when, where, who, why, alternatives, recent changes, and pros and cons."
  3. People Also Ask harvest from Google for the head term, recursively two levels deep.

The fan out becomes the input for cluster page topic assignment in Phase 2.

3.3 Citation Decoupling

In 2026, ranking on page one and being cited in AI answers are two separate goals. They overlap, but only partially.

  • 38 percent of pages cited in AI Overviews also rank in the top 10 (down from 76 percent in mid 2025).
  • 25 to 39 percent overlap between traditional Google rankings and AI search citations across platforms.
  • AI Mode and AI Overviews share only 13.7 percent citation overlap, even though both are Google products.
  • ChatGPT prefers focused shorter content; pages covering 26 to 50 percent of fan out sub queries get cited more than pages covering 100 percent.

The framework therefore tracks ranking and citation as distinct metrics and optimizes each independently.

3.4 Citation Volatility

Citations are probabilistic, not deterministic. The operator and the client must understand this from the start.

  • AI Overview content changes 70 percent of the time for the same query.
  • When AI Overviews regenerate, 45.5 percent of citations get replaced.
  • AI Mode self overlap on the same query run three times: 9.2 percent.
  • Less than 1 in 100 chance that ChatGPT or Google AI gives the same brand list twice across 100 runs.

The goal is to maximize citation probability across many queries and many runs, not to lock down a single result. The framework optimizes for citation rate, not citation in any single instance.

3.5 Mention Versus Citation

Two different visibility events. Both matter. Track both.

Citation: the AI system links to your URL as a source. AI Mode cites sources 76.3 percent of the time. AI Overviews cite 84.9 percent of the time.

Mention: the AI system names your brand in the answer text without linking. AI Mode mentions brands 37.6 percent of the time. AI Overviews mention 61 percent.

Mentions correlate strongly with off site brand authority signals: YouTube references, podcast appearances, Reddit threads, earned media. Citations correlate more with on site signals: schema accuracy, content extractability, freshness, and topical depth.

3.6 SEO, AEO, AIO, GEO, LLMO Vocabulary

Multiple acronyms describe overlapping concepts. The framework uses the following definitions consistently:

SEO (Search Engine Optimization): the foundational discipline of making a website crawlable, indexable, relevant, authoritative, and fast. Still required. Without SEO, none of the others work.

AEO (Answer Engine Optimization): optimizing content to win direct answer slots. Featured snippets, People Also Ask, knowledge panels, voice assistant answers. Output is your page being the answer source on a classic SERP.

AIO (AI Overview Optimization): optimizing specifically for Google's AI Overviews. Heavily favors pages already ranking in the top 10 organic, plus pages with extractable structure and clear entity signals.

GEO (Generative Engine Optimization): optimizing to be cited by generative AI systems. Broader than AIO. Covers ChatGPT, Perplexity, Claude, Bing Copilot, Meta AI, Google AI Mode, and AI Overviews. GEO includes off site signals (earned media, brand mentions, YouTube presence) that AIO does not.

LLMO (Large Language Model Optimization): the technical subset of GEO that focuses on how LLMs retrieve, parse, and cite content. Robots.txt access, llms.txt, schema as trust signal, content structure for retrieval augmented generation pipelines.

Search Everywhere Optimization: the umbrella concept covering SEO plus AEO plus AIO plus GEO plus social search (Reddit, TikTok, YouTube, Pinterest) plus voice search. The framework's output is Search Everywhere Optimization in practice, even though the framework itself is structured around the more granular disciplines.

The operator does not need to memorize these distinctions to follow the framework. The phases below cover all of them. The vocabulary exists to align with industry terminology when communicating with clients and reading source material.

3.7 The 2026 Numbers Worth Memorizing

These statistics anchor every strategic decision in the framework. Sources noted in Appendix D.

  • AI Overviews appear on about 48 percent of all Google searches (Q1 2026).
  • AI Overviews appear on 70 plus percent of informational and how to queries.
  • Organic CTR drops up to 61 percent on AI Overview queries.
  • Cited pages get a 35 percent click lift versus non cited top 10 results.
  • Cited pages convert at about 23 times the rate of standard search visitors.
  • 93 percent of AI Mode queries result in zero clicks.
  • 91.8 percent of all searches are long tail (3 plus words).
  • 58.5 percent of all searches now end without a click to any external site.
  • 44.2 percent of all LLM citations come from the first 30 percent of a page's text.
  • Pages above 20,000 characters average about 10 AI citations each. Pages under 500 characters average 2.39.
  • AI cites pages that are 25.7 percent fresher than traditional search surfaces.
  • Earned media distribution can lift AI citations by up to 325 percent versus owned site only.
  • YouTube mentions and branded web mentions are the top correlated factors with AI brand visibility.
  • Long tail keywords convert at 2.5 times the rate of head terms.

3.8 Phase Gate for Section 3

Before moving to Phase 1 (section 5), confirm:

  1. The operator and the client both understand the three surfaces model.
  2. The operator can explain query fan out in one paragraph.
  3. The operator has set client expectations on citation volatility (citations are probabilistic).
  4. The operator understands the SEO plus AEO plus GEO layered model.
  5. The agent has read and internalized section 3.7's numbers.

If any condition fails, return to that subsection.


4. Stack Selection Decision Tree

The agent walks this tree at the start of every build engagement. The output is exactly one assigned stack path. The agent then jumps to the matching subsection of section 9.

4.1 Inputs

Pulled from section 2 intake:

  • engagement_mode
  • tech_stack_current
  • tech_stack_preferred
  • hosting_environment
  • cms_required
  • sso_required
  • engagement_scope
  • service_or_product_lines (does it include e commerce?)

4.2 The Decision Tree

START
  |
  +-- Q1: Is this an audit only engagement?
  |     YES -> Skip stack selection. Jump to section 15 (Audit Mode).
  |     NO  -> continue
  |
  +-- Q2: Is the client already on a stack we cannot replace?
  |     (existing client investment, contractual hosting, internal team requirement)
  |     YES -> Use the existing stack. Match it to section 9 subsection.
  |             - WordPress  -> 9.5
  |             - Shopify standard -> 9.7
  |             - Headless Shopify -> 9.6
  |             - Other custom -> 9.8
  |     NO  -> continue
  |
  +-- Q3: Is this primarily an e commerce site?
  |     YES -> continue to e commerce branch
  |     NO  -> continue to non e commerce branch
  |
  +-- E COMMERCE BRANCH
  |     Q4: Does the client have existing Shopify investment or specifically request it?
  |       YES, with custom frontend needs -> Headless Shopify (9.6)
  |       YES, simple catalog -> Standard Shopify (9.7)
  |       NO -> Static HTML on Bubbles with Stripe integration (9.1) for catalogs under 50 SKUs
  |             OR Headless Shopify (9.6) for catalogs over 50 SKUs
  |
  +-- NON E COMMERCE BRANCH
  |     Q5: Does the client need a CMS for non technical editors?
  |       YES -> WordPress (9.5)
  |       NO  -> continue
  |
  |     Q6: Does the client need server side rendering for personalization or auth?
  |       YES, modern app feel -> SvelteKit on Bubbles (9.2) or Next.js on Bubbles (9.3)
  |       NO  -> continue
  |
  |     Q7: Is the site primarily content (blog, documentation, knowledge base)?
  |       YES -> Hugo or Astro static (9.3 / 9.4)
  |       NO  -> Static HTML on Bubbles (9.1) [DEFAULT]
  |
END

Enter fullscreen mode Exit fullscreen mode

4.3 Default and Tiebreaker

When the tree produces a tie or the client expresses no preference, the default is Static HTML on Bubbles (section 9.1). Reasons: maximum control, fastest performance, simplest schema injection, no build pipeline complexity, lowest hosting cost, no JavaScript framework version churn, ideal for AI crawler reading mode.

4.4 Document the Decision

The agent records the assigned stack path in the engagement notes:

stack_decision:
  assigned_path: ""                       # one of 9.1 through 9.8
  decision_rationale: ""                  # one paragraph explaining the tree path taken
  fallback_paths: []                      # alternates if the primary cannot be applied
  decided_by: ""                          # agent_auto, joseph_manual, client_required
  decided_at: ""                          # ISO 8601 timestamp

Enter fullscreen mode Exit fullscreen mode

4.5 Phase Gate for Section 4

Before moving to Phase 1, confirm:

  1. Exactly one stack path is assigned.
  2. The decision rationale is documented in plain language.
  3. The hosting environment matches the assigned stack.

5. Phase 1: Intent and Sub Query Research

Purpose: produce a complete, intent classified, sub query mapped keyword universe for the client.

5.1 Step 1, Seed Generation

The agent generates seed keywords from five sources, in this order:

  1. Client interview answers. From section 2: brand_terms, service_or_product_lines, target_audience_personas. Each item becomes one or more seed keywords.
  2. Existing GSC queries. If GSC is verified, export the last 16 months of query data. Every query with at least one impression is a candidate seed. Anonymized queries are excluded.
  3. Competitor reverse engineering. Pick three to five direct competitors. Use Ahrefs, Semrush, or free alternatives (Ubersuggest, SERP Ninja) to extract their top ranking and top trafficked keywords.
  4. Customer language harvest. Pull from sales transcripts, support tickets, contact form messages, Yelp and Google reviews of the client and competitors. Customer phrasing is the most accurate seed source for transactional and informational intent.
  5. Industry vocabulary scan. Read the top three industry publications, the top two trade associations, and the top two regulatory bodies for the client's vertical. Extract terms of art that buyers and decision makers use.

Output: a seed list of 30 to 200 terms, captured in seeds.csv with columns term, source, intent_guess, notes.

5.2 Step 2, Expansion

For each seed, expand using:

  1. Google Autocomplete Alphabet Soup. Type the seed plus each letter a through z. Capture every suggested completion. Repeat with the seed followed by question modifiers (who, what, when, where, why, how, can, does, is, are).
  2. People Also Ask harvest. Search the seed in Google. Click each PAA box twice to expand the tree. Capture all questions revealed.
  3. Related Searches. Capture the suggestions at the bottom of the SERP.
  4. Google AI Mode sub query inspection. Run the seed in AI Mode. Inspect the sub queries panel. Capture all listed sub queries.
  5. LLM expansion prompt. Send the following prompt to a strong LLM (Claude Opus, GPT 5, Gemini 3 Pro):
Generate 15 likely sub queries that an AI search system would run
when given the prompt: "[SEED]". Cover these angles when applicable:
cost, comparison, definition, how to, when, where, who, why,
alternatives, recent changes, pros and cons, common mistakes,
local variations, regulatory considerations, examples.
Output as a JSON array of strings. Plain text only, no commentary.

Enter fullscreen mode Exit fullscreen mode

  1. Reddit and Quora scrape. Search the seed on Reddit and Quora. Capture the literal question phrasings of top posts.

Output: an expanded keyword universe of 500 to 5,000 terms, captured in keywords-raw.csv with columns term, source, parent_seed, length_words, language.

5.3 Step 3, Intent Classification

Every keyword gets exactly one intent label from this expanded eight type taxonomy. The four type model (informational, navigational, commercial, transactional) is preserved as a parent classification, with finer subtypes underneath.

Informational (parent)

  • info_definition: what is X. who is X. define X.
  • info_howto: how to do X. step by step X.
  • info_explanation: why does X happen. how does X work.
  • info_comparison_neutral: X versus Y as a learning question, not a buying decision.

Navigational (parent)

  • nav_brand: brand name search, login pages, official site lookup.
  • nav_branded_product: brand plus specific product or service name.

Commercial (parent)

  • comm_research: best X for Y. top X. X reviews. X alternatives.
  • comm_comparison_buying: X versus Y when the searcher is choosing between two specific options.
  • comm_pricing: how much does X cost. X price. X cost in 2026.

Transactional (parent)

  • trans_buy: buy X. order X. X near me with intent to purchase.
  • trans_book: book X. schedule X. appointment for X.
  • trans_contact: hire X. contact X. quote for X.

Local (parent, can combine with the above)

  • local_modifier: any keyword with a city, neighborhood, ZIP, or "near me" modifier. Tag with the geographic modifier in a separate column.

YMYL (Your Money Your Life, parent flag)

  • ymyl: any keyword in finance, health, legal, safety, or major life decision categories. Tag separately. YMYL keywords have stricter E E A T expectations.

Output: keywords-classified.csv with columns term, intent, intent_subtype, ymyl_flag, geo_modifier, length_words, language, parent_seed.

5.4 Step 4, Volume and Difficulty Enrichment

For each classified keyword, enrich with:

  1. Search volume estimate. From Ahrefs, Semrush, Ubersuggest, or Google Keyword Planner. Mark zero volume keywords explicitly. Zero volume does not mean zero value (see 5.6).
  2. Keyword difficulty (KD). Provider scoring (Ahrefs KD, Semrush KD, Moz Difficulty). Normalize to a 0 to 100 scale.
  3. CPC. Provider average cost per click in USD. Useful as a commercial intent proxy.
  4. SERP feature presence. Featured snippet present yes/no, AI Overview present yes/no, People Also Ask present yes/no, video carousel present yes/no, image carousel present yes/no, local pack present yes/no, knowledge panel present yes/no.
  5. Top 10 competitor domain list. Capture the ten domains currently ranking, plus their domain authority. This drives the citation worthiness score in 5.5.

Output: keywords-enriched.csv adding columns volume, difficulty, cpc, has_featured_snippet, has_ai_overview, has_paa, has_video_carousel, has_image_carousel, has_local_pack, has_knowledge_panel, top10_domains, top10_avg_da.

5.5 Step 5, Citation Worthiness Scoring

For each keyword, the agent computes a 0 to 100 citation worthiness score. This score predicts how likely the client can win a citation given current authority.

Components and weights:

  • Domain authority match (0 to 30): if client DA is within 10 points of the average top 10 DA, score 30. If within 20, score 20. If within 30, score 10. Beyond 30, score 0.
  • Content depth match (0 to 30): if the client's existing content on this topic is within 25 percent of the average top 10 word count, score 30. Within 50 percent, score 20. Within 75 percent, score 10. Beyond, score 0. New site: score 0.
  • Freshness match (0 to 20): if the client has updated content on this topic within the last 90 days, score 20. Within 180 days, score 15. Within 365 days, score 10. Older, score 0. No content yet, score 5 (greenfield is better than stale).
  • Entity clarity (0 to 20): if the client has Wikidata Q ID, Organization schema, and at least three sameAs links, score 20. Each missing element subtracts 7 points.

Composite score at or above 60 means pursue first. Scores 40 to 59 are second tier, pursue after the first tier is shipped. Scores below 40 are deferred until authority improves.

Output: keywords-prioritized.csv adding column citation_worthiness_score, sorted descending.

5.6 Step 6, Zero Volume Keyword Treatment

Zero volume keywords are NOT discarded. In 2026, many high intent and AI critical queries register as zero volume in keyword tools because the tools cannot see the long tail and conversational queries that AI systems handle.

The agent retains zero volume keywords if any of the following are true:

  1. The keyword is a literal sub query from a fan out for a higher volume head term.
  2. The keyword matches the client's customer language harvest from 5.1.
  3. The keyword is a long tail variation of a known commercial intent term.
  4. The keyword has high specificity that signals purchase or contact intent (model numbers, locations, dated events).

Zero volume keywords get a flag zv_high_intent in the prioritized output and are mapped to cluster pages in Phase 2.

5.7 Step 7, Sub Query Mapping

For each priority keyword from 5.5, the agent generates the fan out per section 3.2 method. The fan out is recorded as:

keyword: "[primary keyword]"
intent: "[from 5.3]"
fanout:
  - sub_query: ""
    intent_subtype: ""
    answered_by_existing_url: ""        # path or empty
    target_url: ""                      # to be assigned in Phase 2
    answer_word_budget: 0               # estimated words needed for adequate answer

Enter fullscreen mode Exit fullscreen mode

This file is the input for Phase 2.

5.8 Phase 1 Completion Gate

Before Phase 2, confirm:

  1. keywords-prioritized.csv exists with at least 200 keywords (or all keywords for very small businesses).
  2. Every keyword has an intent classification, a citation worthiness score, and a zero volume flag.
  3. The top 30 priority keywords each have a fan out file.
  4. The operator has reviewed the top 30 priority keywords and signed off.

If gate fails, identify which step's output is missing and complete that step.


6. Phase 2: Pillar Cluster Mapping with Cannibalization Detection

Purpose: assign every priority keyword and every fan out sub query to exactly one URL on the site, preventing cannibalization and building a topic cluster architecture that wins fan out coverage.

6.1 Step 1, Pillar Topic Identification

From the prioritized keyword list, identify pillar topics. A pillar topic is a head subject area with at least 8 supporting cluster topics. Typical client portfolio pillars:

  • Service line pillars (one per service_or_product_lines item).
  • Geographic pillars (one per major city or region served).
  • Audience pillars (one per primary persona).
  • Compliance or methodology pillars (when relevant: HIPAA, OSHA, SDVOSB contracting, etc.).

Rule: a pillar covers a topic broad enough to support 3,000 to 5,000 words at the head level, with enough sub topics to host 8 to 12 cluster pages.

For Joseph's typical client mix, expect 3 to 8 pillars per site. Heavy programmatic sites (real estate, legal directories) can have many more.

Output: pillars.yaml:

pillars:
  - id: ""
    title: ""
    primary_keyword: ""
    target_url: ""
    estimated_word_count: 0               # 3000 to 5000
    supporting_clusters: []               # IDs of clusters that point to this pillar
    related_pillars: []                   # IDs of sibling pillars

Enter fullscreen mode Exit fullscreen mode

6.2 Step 2, Cluster Page Assignment

Each pillar gets a list of cluster pages. Each cluster page targets one primary sub query and 2 to 4 related secondary sub queries from the fan out.

Rules:

  • One primary keyword per cluster page.
  • One primary sub query per cluster page.
  • The cluster page primary keyword must NOT be the same as its pillar's primary keyword.
  • Each cluster page is 800 to 2,500 words depending on sub query depth.
  • Each cluster page links back to its pillar with descriptive anchor text including the pillar's target keyword.

Output: clusters.yaml:

clusters:
  - id: ""
    pillar_id: ""
    title: ""
    primary_keyword: ""
    primary_sub_query: ""
    secondary_sub_queries: []
    target_url: ""
    estimated_word_count: 0
    intent: ""
    intent_subtype: ""
    citation_worthiness_score: 0

Enter fullscreen mode Exit fullscreen mode

6.3 Step 3, Existing URL Reconciliation

For sites that are not greenfield, the agent maps every existing URL to either a pillar, a cluster, or a "to deprecate" bucket.

For each existing URL:

  1. Crawl the page and extract its actual primary keyword (from H1, title tag, and most prominent body text).
  2. Match it to the pillar or cluster whose primary_keyword aligns most closely.
  3. If no match within reasonable similarity, mark the URL for review: either re scope it to fit a cluster, consolidate it into another page, or deprecate it.

Output: url-reconciliation.csv with columns existing_url, current_primary_keyword, assigned_pillar_or_cluster_id, action (keep, rescope, consolidate, deprecate), redirect_target, notes.

6.4 Step 4, Cannibalization Detection

For every primary keyword in the prioritized list, the agent runs the cannibalization check.

Method A, GSC export based (preferred when GSC is available):

  1. Export the last 90 days of GSC Performance data filtered to pages.
  2. For each keyword, list all URLs that have at least one impression for that query.
  3. For URL pairs competing on the same query, compute the Herfindahl Hirschman Index (HHI) of click distribution.
  4. HHI below 0.5 (clicks split nearly evenly) flags the keyword as cannibalized.
  5. HHI between 0.5 and 0.7 is borderline, flag for review.
  6. HHI above 0.7 (one URL dominates) is acceptable.

Method B, manual SERP check (when GSC is unavailable):

  1. Run a site:[domain] [keyword] search in Google.
  2. If two or more URLs from the client's domain appear in the top 10, flag for review.
  3. Inspect each result. If both URLs target the same primary keyword, cannibalization is confirmed.
  4. If both URLs target the same query but for clearly different intents (one informational, one transactional), this is allowed and tracked as intentional dual targeting.

Method C, semantic similarity scan (preventive):

  1. For every URL pair in the cluster map, compute embedding similarity using OpenAI or sentence transformer embeddings.
  2. Pairs above 0.85 cosine similarity are flagged as semantically overlapping.
  3. Investigate manually. Either differentiate the angle, consolidate, or accept and track.

Output: cannibalization-flags.csv with columns keyword, url_a, url_b, hhi, similarity, recommended_action, status.

6.5 Step 5, Cannibalization Remediation

For every flagged cannibalization, the agent applies one of these five fixes:

  1. 301 redirect consolidation. When two URLs target the same primary keyword and intent, merge content into the stronger page and 301 redirect the weaker URL. This is the default when in doubt.
  2. Re optimization. When two URLs cover related but genuinely distinct intents, rewrite each to clearly own its sub intent. Update headings, primary keyword usage, and internal links.
  3. Canonical tag. When both URLs must remain accessible (product variants, technical near duplicates), set rel=canonical on the weaker URL pointing to the stronger. Note: Google treats canonicals as hints, not directives. Use only when 301 is not viable.
  4. Noindex. When a page must remain on the site for users (internal reference, form thank you page) but should not compete in search, add <meta name="robots" content="noindex,follow">. Verify in GSC after deployment.
  5. Delete and redirect. When a page is thin, outdated, and has no remaining utility, delete it and 301 redirect to the most relevant surviving page. Submit the URL removal in GSC for faster deindexing.

Output: cannibalization-remediation-log.csv with columns flagged_keyword, action_taken, executed_at, by, notes.

6.6 Step 6, Sub Query Coverage Map

For every priority pillar, build a sub query coverage map showing which sub queries are answered by which URLs.

pillar_id: ""
total_sub_queries: 0
sub_queries_answered: 0
coverage_rate: 0.0                        # answered / total
sub_query_assignments:
  - sub_query: ""
    assigned_url: ""
    answer_strength: ""                   # full, partial, missing
    word_count_dedicated: 0
    last_updated: ""

Enter fullscreen mode Exit fullscreen mode

Target: 70 percent or higher coverage rate for top 10 priority pillars before Phase 3 begins.

6.7 Step 7, Internal Linking Plan

The cluster architecture requires intentional internal linking. The agent produces an internal linking plan:

  1. Every cluster page links to its pillar with descriptive anchor text including the pillar's primary keyword.
  2. Every pillar links to every cluster page in its cluster, organized by sub topic.
  3. Cluster pages within the same pillar link to each other where the topics relate.
  4. Cluster pages link to clusters in sibling pillars where relevant.
  5. The homepage links to every pillar.
  6. The footer links to top three to five pillars (mega footer pattern).
  7. Breadcrumbs are present on every cluster page.

Output: internal-linking-plan.csv with columns from_url, to_url, anchor_text, link_type (pillar, cluster, sibling, footer, breadcrumb), priority.

6.8 Phase 2 Completion Gate

Before Phase 3, confirm:

  1. pillars.yaml exists with at least one pillar.
  2. clusters.yaml exists with at least 8 clusters per pillar.
  3. url-reconciliation.csv has every existing URL mapped or marked for action.
  4. cannibalization-flags.csv shows zero remaining unresolved flags.
  5. Sub query coverage rate is 70 percent or higher across top 10 priority pillars.
  6. internal-linking-plan.csv exists and has been reviewed.

If gate fails, identify the failed step and remediate.


7. Phase 3: Page Structure for Multi Surface Extraction

Purpose: every page is structured so a featured snippet, an AI Overview, an AI Mode citation, a People Also Ask answer, a voice assistant answer, or a knowledge panel can extract the right content cleanly.

7.1 The Mandatory Page Skeleton

Every content page (pillar or cluster) follows this skeleton. Deviations require justification.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>[Primary Keyword] | [Brand Name]</title>
  <meta name="description" content="[Direct answer in 150 to 160 characters. Includes primary keyword in first 90 characters.]">
  <link rel="canonical" href="[full URL]">

  <meta property="og:title" content="[Title]">
  <meta property="og:description" content="[Same as meta description or expanded variant]">
  <meta property="og:url" content="[URL]">
  <meta property="og:image" content="[Image URL]">
  <meta property="og:type" content="[website, article, product, etc.]">
  <meta property="og:site_name" content="[Brand Name]">

  <meta name="twitter:card" content="summary_large_image">
  <meta name="twitter:title" content="[Title]">
  <meta name="twitter:description" content="[Description]">
  <meta name="twitter:image" content="[Image URL]">

  <!-- Inline JSON-LD blocks here. See Appendix C. -->
</head>
<body>
  <header>...</header>

  <nav aria-label="Breadcrumb">
    <ol>
      <li><a href="/">Home</a></li>
      <li><a

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back