A Vercel Catch-All Rewrite Caused 190 Pages to Canonicalize to the Homepage

Current Situation Analysis

Single-page applications (SPAs) dominate modern frontend development, but their routing architecture creates a fundamental mismatch with how search engine crawlers operate. Crawlers expect unique HTML documents per URL. SPAs expect JavaScript execution to render content dynamically. When these two paradigms collide on edge platforms like Vercel, indexing failures become inevitable unless explicitly architected around.

The industry pain point is straightforward: developers deploy client-side routers with a catch-all rewrite rule to handle deep linking. The platform serves the application shell (index.html) for every unmatched path. This works flawlessly in the browser, but it breaks crawler expectations. When a bot requests /category/product-42, it receives the homepage HTML. If that HTML contains a static canonical tag pointing to the root domain, the crawler interprets every route as a duplicate of the homepage.

This problem is systematically overlooked because modern development tooling masks the server response. React hydration happens in milliseconds. Browser DevTools show the fully rendered DOM. Lighthouse scores remain high. The failure state only exists in the raw HTTP response layer, which is invisible to standard frontend testing workflows.

Data from production deployments consistently shows this pattern. Sites running catch-all rewrites without build-time HTML generation routinely accumulate hundreds of URLs in Google Search Console's "Discovered — currently not indexed" bucket. These pages aren't penalized or blocked by robots.txt. They are simply deprioritized because the crawler receives duplicate canonical signals across dozens of distinct paths. The result is a silent indexing bleed that compounds over time, wasting crawl budget and suppressing organic visibility.

WOW Moment: Key Findings

The architectural shift from dynamic catch-all rewrites to build-time static generation fundamentally changes how crawlers interact with your application. The difference isn't marginal; it's structural.

Deployment Strategy	Initial HTML Payload	Canonical Tag Accuracy	Crawl Budget Efficiency	Indexation Latency
Catch-All Rewrite (CSR)	Homepage shell (~15KB)	Always points to `/`	Wasted on duplicate signals	Indefinite (stuck in discovery)
Build-Time Pre-render	Route-specific HTML (~8-12KB)	Matches actual URL	Optimized per unique page	24-72 hours post-deploy

This finding matters because it decouples indexing success from runtime JavaScript execution. By emitting actual HTML files during the build phase, you align server responses with crawler expectations. The catch-all rewrite becomes a true fallback for missing routes rather than a blanket override. Crawl budget shifts from processing duplicate canonical signals to discovering unique content. Indexation latency drops from indefinite to predictable, enabling organic traffic to scale alongside product growth.

Core Solution

The fix requires replacing runtime route resolution with build-time HTML generation. The architecture follows a four-stage pipeline: route registration, data hydration, static emission, and edge fallback configuration.

Step 1: Route Registry Definition

Instead of scattering route definitions across components, centralize them in a typed manifest. This registry acts as the single source of truth for both the build pipeline and the prerender script.

// src/config/routeRegistry.ts
export interface RouteDefinition {
  path: string;
  type: 'static' | 'dynamic';
  meta: {
    title: string;
    description: string;
    noindex?: boolean;
  };
  dataFetcher?: (params: Record<string, string>) => Promise<Record<string, unknown>>;
}

export const ROUTE_REGISTRY: RouteDefinition[] = [
  {
    path: '/about',
    type: 'static',
    meta: { title: 'About Platform', description: 'Company overview and mission.' }
  },
  {
    path: '/products/:slug',
    type: 'dynamic',
    meta: { title: 'Product Details', description: 'Technical specifications and pricing.' },
    dataFetcher: async (params) => {
      const response = await fetch(`https://api.example.com/products/${params.slug}`);
      return response.json();
    }
  }
];

Step 2: Build-Time Data Hydration

The prerender script reads the registry, executes data fetchers for dynamic routes, and prepares payload objects. This happens during the CI/CD pipeline, not at request time.

// scripts/buildHtmlManifest.ts
import { ROUTE_REGISTRY } from '../src/config/routeRegistry';
import fs from 'fs/promises';
import path from 'path';

export async function hydrateRoutes(): Promise<Map<string, { html: string; meta: any }>> {
  const output = new Map<string, { html: string; meta: any }>();

  for (const route of ROUTE_REGISTRY) {
    let payload: Record<string, unknown> = {};
    
    if (route.type === 'dynamic' && route.dataFetcher) {
      const slug = route.path.split('/').pop()?.replace(':slug', '') || 'default';
      payload = await route.dataFetcher({ slug });
    }

    output.set(route.path, {
      html: '', // populated in next step
      meta: { ...route.meta, ...payload }
    });
  }

  return output;
}

Step 3: Static HTML Emission

The renderer generates complete HTML documents with accurate meta tags, canonical URLs, and injected body content. This ensures crawlers receive meaningful text without JavaScript execution.

// scripts/renderStaticHtml.ts
import { createHash } from 'crypto';

export function generateDocument(path: string, meta: any): string {
  const canonicalUrl = `https://example.com${path}`;
  const robotsDirective = meta.noindex ? 'noindex, nofollow' : 'index, follow';
  
  const bodyContent = meta.type === 'dynamic' 
    ? `<article><h1>${meta.title}</h1><p>${meta.description}</p><div id="root"></div></article>`
    : `<div id="root"></div>`;

  return `<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>${meta.title}</title>
  <meta name="description" content="${meta.description}">
  <meta name="robots" content="${robotsDirective}">
  <link rel="canonical" href="${canonicalUrl}" />
  <script type="module" src="/assets/index.js"></script>
</head>
<body>
  ${bodyContent}
</body>
</html>`;
}

Step 4: Edge Fallback Configuration

Vercel's static file priority rules must be configured to serve pre-rendered files first, falling back to the SPA shell only for unregistered paths.

{
  "rewrites": [
    { "source": "/(.*)", "destination": "/index.html" }
  ],
  "cleanUrls": true,
  "trailingSlash": false
}

Architecture Rationale

Why build-time generation over SSR? Server-side rendering introduces runtime complexity, cold start latency, and increased infrastructure costs. Build-time generation produces static assets that Vercel's edge network caches globally. It eliminates server execution for 95% of routes while guaranteeing crawler-ready HTML.

Why centralized route registry? Scattered route definitions cause drift between the application router and the prerender script. A single registry ensures that adding a route to the app automatically registers it for static generation. It also enables type-safe meta tag management and consistent noindex handling.

Why inject body content at build time? Crawlers prioritize visible text in the initial HTML payload. Empty #root divs force JavaScript execution, which increases crawl cost and delays indexing. Pre-injecting titles, descriptions, and structural markup ensures immediate content recognition.

Pitfall Guide

1. Hardcoded Canonical Tags in Application Shell

Explanation: The default Vite/React template includes a static canonical pointing to the root domain. When the catch-all serves this shell for every route, crawlers receive identical canonical signals across all paths. Fix: Generate canonical tags dynamically during the build phase. Never rely on runtime JavaScript to inject <link rel="canonical"> for SEO-critical pages.

2. Silent Data Fetch Failures at Build Time

Explanation: If the API endpoint is unreachable during CI/CD, the prerender script may fall back to empty payloads or placeholder text. The build succeeds, but generated HTML contains thin or duplicate content. Fix: Implement strict error handling in the hydration step. Fail the build if dynamic data fetchers return empty responses or non-200 status codes. Log row counts and payload sizes for verification.

3. Over-Prerendering Low-Value Routes

Explanation: Generating static HTML for every possible URL, including out-of-scope or thin-content pages, wastes build time and dilutes crawl budget. Googlebot will crawl and index pages that provide no user value. Fix: Implement a noindex flag in the route registry. Exclude ghost routes, pagination fragments, and out-of-geography pages from the index. Use meta: { noindex: true } to emit <meta name="robots" content="noindex, nofollow">.

4. Assuming Browser Hydration Equals Crawler Visibility

Explanation: React hydration masks server response issues. A page may render perfectly in Chrome while serving duplicate HTML to crawlers. Standard E2E tests won't catch this. Fix: Add raw HTTP verification to your CI pipeline. Use curl or node-fetch to request deployed URLs and assert canonical tags, meta descriptions, and body content before merging.

5. Misconfigured Vercel Fallback Priority

Explanation: Vercel serves static files before applying rewrites. If your prerender script outputs files to the wrong directory or uses incorrect path formatting, the catch-all will override valid static assets. Fix: Ensure output paths match Vercel's static file resolution rules. Use dist/[path]/index.html structure. Verify that cleanUrls: true is set to prevent trailing slash conflicts.

6. Ignoring Crawl Budget Allocation

Explanation: Crawlers have finite request budgets per domain. Indexing hundreds of duplicate or thin pages consumes budget that should be allocated to high-value content. Fix: Audit GSC crawl stats regularly. Use noindex strategically on low-traffic routes. Prioritize prerendering for pages with existing impression data or strategic business value.

7. Missing Route Registration in CI/CD

Explanation: Developers add routes to the application router but forget to update the prerender registry. The new page works in the browser but falls back to the homepage shell for crawlers. Fix: Integrate route validation into the build pipeline. Compare the application router configuration against the prerender registry. Fail the build if unregistered routes are detected.

Production Bundle

Action Checklist

Audit existing routes: Export all client-side routes and cross-reference with GSC indexed pages
Implement centralized route registry: Define paths, meta tags, and data fetchers in a single typed module
Add noindex flag support: Configure the HTML renderer to emit robots directives based on route priority
Validate build-time data fetches: Implement strict error handling and payload size logging in the hydration step
Configure Vercel fallback rules: Ensure static files take precedence over catch-all rewrites
Add raw HTTP verification to CI: Test canonical tags and meta descriptions against deployed URLs before merge
Submit updated sitemap to GSC: Queue high-value routes for recrawling after deployment

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Marketing site with static content	Build-time pre-render	Zero runtime cost, instant edge caching, guaranteed crawler visibility	Minimal (build time only)
Dynamic application with user-specific data	CSR + selective pre-render	Pre-render public routes, keep auth/dashboard routes client-side	Low (hybrid build pipeline)
High-frequency content updates	SSR or ISR	Build-time regeneration is too slow for real-time data changes	Moderate (compute + bandwidth)
Legacy SPA with 1000+ routes	Phased pre-render migration	Migrate high-traffic routes first, apply noindex to low-value paths	Low (incremental CI changes)

Configuration Template

// vercel.json
{
  "buildCommand": "npm run build && node scripts/buildPipeline.mjs",
  "outputDirectory": "dist",
  "cleanUrls": true,
  "trailingSlash": false,
  "rewrites": [
    { "source": "/(.*)", "destination": "/index.html" }
  ],
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "Cache-Control", "value": "public, max-age=0, must-revalidate" }
      ]
    }
  ]
}

// scripts/buildPipeline.mjs
import { hydrateRoutes } from './hydrateRoutes.mjs';
import { generateDocument } from './renderStaticHtml.mjs';
import { writeFileSync, mkdirSync } from 'fs';
import { join, dirname } from 'path';

async function runPipeline() {
  const routeMap = await hydrateRoutes();
  const outputDir = 'dist';

  for (const [routePath, { meta }] of routeMap) {
    const html = generateDocument(routePath, meta);
    const filePath = join(outputDir, routePath, 'index.html');
    
    mkdirSync(dirname(filePath), { recursive: true });
    writeFileSync(filePath, html, 'utf-8');
    
    console.log(`✓ Generated: ${routePath} | Size: ${(html.length / 1024).toFixed(2)}KB`);
  }
}

runPipeline().catch(err => {
  console.error('Build pipeline failed:', err);
  process.exit(1);
});

Quick Start Guide

Install dependencies: Add fs-extra, undici, and typescript to your project. Initialize the route registry module with your existing client-side paths.
Configure build script: Update package.json to run vite build && node scripts/buildPipeline.mjs. Ensure the output directory matches Vercel's expected structure.
Test locally: Run npm run build, then start a local server with npx serve dist. Use curl -s http://localhost:3000/your-route | grep canonical to verify accurate meta tags.
Deploy and verify: Push to Vercel. After deployment, request a few routes via raw HTTP. Confirm canonical tags match actual URLs and noindex flags apply to low-value pages.
Submit to GSC: Use the URL Inspection tool to request indexing for high-priority routes. Monitor the "Discovered — currently not indexed" bucket for reduction over the next 7-14 days.

Mid-Year Sale — Unlock Full Article