A Vercel Catch-All Rewrite Caused 190 Pages to Canonicalize to the Homepage
Current Situation Analysis
Single-page applications (SPAs) dominate modern frontend development, but their routing architecture creates a fundamental mismatch with how search engine crawlers operate. Crawlers expect unique HTML documents per URL. SPAs expect JavaScript execution to render content dynamically. When these two paradigms collide on edge platforms like Vercel, indexing failures become inevitable unless explicitly architected around.
The industry pain point is straightforward: developers deploy client-side routers with a catch-all rewrite rule to handle deep linking. The platform serves the application shell (index.html) for every unmatched path. This works flawlessly in the browser, but it breaks crawler expectations. When a bot requests /category/product-42, it receives the homepage HTML. If that HTML contains a static canonical tag pointing to the root domain, the crawler interprets every route as a duplicate of the homepage.
This problem is systematically overlooked because modern development tooling masks the server response. React hydration happens in milliseconds. Browser DevTools show the fully rendered DOM. Lighthouse scores remain high. The failure state only exists in the raw HTTP response layer, which is invisible to standard frontend testing workflows.
Data from production deployments consistently shows this pattern. Sites running catch-all rewrites without build-time HTML generation routinely accumulate hundreds of URLs in Google Search Console's "Discovered β currently not indexed" bucket. These pages aren't penalized or blocked by robots.txt. They are simply deprioritized because the crawler receives duplicate canonical signals across dozens of distinct paths. The result is a silent indexing bleed that compounds over time, wasting crawl budget and suppressing organic visibility.
WOW Moment: Key Findings
The architectural shift from dynamic catch-all rewrites to build-time static generation fundamentally changes how crawlers interact with your application. The difference isn't marginal; it's structural.
| Deployment Strategy | Initial HTML Payload | Canonical Tag Accuracy | Crawl Budget Efficiency | Indexation Latency |
|---|---|---|---|---|
| Catch-All Rewrite (CSR) | Homepage shell (~15KB) | Always points to / |
Wasted on duplicate signals | Indefinite (stuck in discovery) |
| Build-Time Pre-render | Route-specific HTML (~8-12KB) | Matches actual URL | Optimized per unique page | 24-72 hours post-deploy |
This finding matters because it decouples indexing success from runtime JavaScript execution. By emitting actual HTML files during the build phase, you align server responses with crawler expectations. The catch-all rewrite becomes a true fallback for missing routes rather than a blanket override. Crawl budget shifts from processing duplicate canonical signals to discovering unique content. Indexation latency drops from indefinite to predictable, enabling organic traffic to scale alongside product growth.
Core Solution
The fix requires replacing runtime route resolution with build-time HTML generation. The architecture follows a four-stage pipeline: route registration, data hydration, static emission, and edge fallback configuration.
Step 1: Route Registry Definition
Instead of scattering route definitions across components, centralize them in a typed manifest. This registry acts as the single source of truth for both the build pipeline and the prerender script.
// src/config/routeRegistry.ts
export interface RouteDefinition {
path: string;
type: 'static' | 'dynamic';
meta: {
title: string;
description: string;
noindex?: boolean;
};
dataFetcher?: (params: Record<string, string>) => Promise<Record<string, unknown>>;
}
export const ROUTE_REGISTRY: RouteDefinition[] = [
{
path: '/about',
type: 'static',
meta: { title: 'About Platform', description: 'Company overview and mission.' }
},
{
path: '/products/:slug',
type: 'dynamic',
meta: { title: 'Product Details', description: 'Technical specifications and pricing.' },
dataFetcher: async (params) => {
const response = await fetch(`https://api.example.com/products/${params.slug}`);
return response.json();
}
}
];
Step 2: Build-Time Data Hydration
The prerender script reads the registry, executes data fetchers for dynamic routes, and prepares payload objects. This happens during the CI/CD pipeline, not at request time.
// scripts/buildHtmlManifest.ts
import { ROUTE_REGISTRY } from '../src/config/routeRegistry';
import fs from 'fs/promises';
import path from 'path';
export async function hydrateRoutes(): Promise<Map<string, { html: string; meta: any }>> {
const output = new Map<string, { html: string; meta: any }>();
for (const route of ROUTE_REGISTRY) {
let payload: Record<string, unknown> = {};
if (route.type === 'dynamic' && route.dataFetcher) {
const slug = route.path.split('/').pop()?.replace(':slug', '') || 'default';
payload = await route.dataFetcher({ slug });
}
output.set(route.path, {
html: '', // populated in next step
meta: { ...route.meta, ...payload }
});
}
return output;
}
Step 3: Static HTML Emission
The renderer generates complete HTML documents with accurate meta tags, canonical URLs, and injected body content. This ensures crawlers receive meaningful text without JavaScript execution.
// scripts/renderStaticHtml.ts
import { createHash } from 'crypto';
export function generateDocument(path: string, meta: any): string {
const canonicalUrl = `https://example.com${path}`;
const robotsDirective = meta.noindex ? 'noindex, nofollow' : 'index, follow';
const bodyContent = meta.type === 'dynamic'
? `<article><h1>${meta.title}</h1><p>${meta.description}</p><div id="root"></div></article>`
: `<div id="root"></div>`;
return `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>${meta.title}</title>
<meta name="description" content="${meta.description}">
<meta name="robots" content="${robotsDirective}">
<link rel="canonical" href="${canonicalUrl}" />
<script type="module" src="/assets/index.js"></script>
</head>
<body>
${bodyContent}
</body>
</html>`;
}
Step 4: Edge Fallback Configuration
Vercel's static file priority rules must be configured to serve pre-rendered files first, falling back to the SPA shell only for unregistered paths.
{
"rewrites": [
{ "source": "/(.*)", "destination": "/index.html" }
],
"cleanUrls": true,
"trailingSlash": false
}
Architecture Rationale
Why build-time generation over SSR? Server-side rendering introduces runtime complexity, cold start latency, and increased infrastructure costs. Build-time generation produces static assets that Vercel's edge network caches globally. It eliminates server execution for 95% of routes while guaranteeing crawler-ready HTML.
Why centralized route registry? Scattered route definitions cause drift between the application router and the prerender script. A single registry ensures that adding a route to the app automatically registers it for static generation. It also enables type-safe meta tag management and consistent noindex handling.
Why inject body content at build time? Crawlers prioritize visible text in the initial HTML payload. Empty #root divs force JavaScript execution, which increases crawl cost and delays indexing. Pre-injecting titles, descriptions, and structural markup ensures immediate content recognition.
Pitfall Guide
1. Hardcoded Canonical Tags in Application Shell
Explanation: The default Vite/React template includes a static canonical pointing to the root domain. When the catch-all serves this shell for every route, crawlers receive identical canonical signals across all paths.
Fix: Generate canonical tags dynamically during the build phase. Never rely on runtime JavaScript to inject <link rel="canonical"> for SEO-critical pages.
2. Silent Data Fetch Failures at Build Time
Explanation: If the API endpoint is unreachable during CI/CD, the prerender script may fall back to empty payloads or placeholder text. The build succeeds, but generated HTML contains thin or duplicate content. Fix: Implement strict error handling in the hydration step. Fail the build if dynamic data fetchers return empty responses or non-200 status codes. Log row counts and payload sizes for verification.
3. Over-Prerendering Low-Value Routes
Explanation: Generating static HTML for every possible URL, including out-of-scope or thin-content pages, wastes build time and dilutes crawl budget. Googlebot will crawl and index pages that provide no user value.
Fix: Implement a noindex flag in the route registry. Exclude ghost routes, pagination fragments, and out-of-geography pages from the index. Use meta: { noindex: true } to emit <meta name="robots" content="noindex, nofollow">.
4. Assuming Browser Hydration Equals Crawler Visibility
Explanation: React hydration masks server response issues. A page may render perfectly in Chrome while serving duplicate HTML to crawlers. Standard E2E tests won't catch this.
Fix: Add raw HTTP verification to your CI pipeline. Use curl or node-fetch to request deployed URLs and assert canonical tags, meta descriptions, and body content before merging.
5. Misconfigured Vercel Fallback Priority
Explanation: Vercel serves static files before applying rewrites. If your prerender script outputs files to the wrong directory or uses incorrect path formatting, the catch-all will override valid static assets.
Fix: Ensure output paths match Vercel's static file resolution rules. Use dist/[path]/index.html structure. Verify that cleanUrls: true is set to prevent trailing slash conflicts.
6. Ignoring Crawl Budget Allocation
Explanation: Crawlers have finite request budgets per domain. Indexing hundreds of duplicate or thin pages consumes budget that should be allocated to high-value content.
Fix: Audit GSC crawl stats regularly. Use noindex strategically on low-traffic routes. Prioritize prerendering for pages with existing impression data or strategic business value.
7. Missing Route Registration in CI/CD
Explanation: Developers add routes to the application router but forget to update the prerender registry. The new page works in the browser but falls back to the homepage shell for crawlers. Fix: Integrate route validation into the build pipeline. Compare the application router configuration against the prerender registry. Fail the build if unregistered routes are detected.
Production Bundle
Action Checklist
- Audit existing routes: Export all client-side routes and cross-reference with GSC indexed pages
- Implement centralized route registry: Define paths, meta tags, and data fetchers in a single typed module
- Add noindex flag support: Configure the HTML renderer to emit robots directives based on route priority
- Validate build-time data fetches: Implement strict error handling and payload size logging in the hydration step
- Configure Vercel fallback rules: Ensure static files take precedence over catch-all rewrites
- Add raw HTTP verification to CI: Test canonical tags and meta descriptions against deployed URLs before merge
- Submit updated sitemap to GSC: Queue high-value routes for recrawling after deployment
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Marketing site with static content | Build-time pre-render | Zero runtime cost, instant edge caching, guaranteed crawler visibility | Minimal (build time only) |
| Dynamic application with user-specific data | CSR + selective pre-render | Pre-render public routes, keep auth/dashboard routes client-side | Low (hybrid build pipeline) |
| High-frequency content updates | SSR or ISR | Build-time regeneration is too slow for real-time data changes | Moderate (compute + bandwidth) |
| Legacy SPA with 1000+ routes | Phased pre-render migration | Migrate high-traffic routes first, apply noindex to low-value paths | Low (incremental CI changes) |
Configuration Template
// vercel.json
{
"buildCommand": "npm run build && node scripts/buildPipeline.mjs",
"outputDirectory": "dist",
"cleanUrls": true,
"trailingSlash": false,
"rewrites": [
{ "source": "/(.*)", "destination": "/index.html" }
],
"headers": [
{
"source": "/(.*)",
"headers": [
{ "key": "Cache-Control", "value": "public, max-age=0, must-revalidate" }
]
}
]
}
// scripts/buildPipeline.mjs
import { hydrateRoutes } from './hydrateRoutes.mjs';
import { generateDocument } from './renderStaticHtml.mjs';
import { writeFileSync, mkdirSync } from 'fs';
import { join, dirname } from 'path';
async function runPipeline() {
const routeMap = await hydrateRoutes();
const outputDir = 'dist';
for (const [routePath, { meta }] of routeMap) {
const html = generateDocument(routePath, meta);
const filePath = join(outputDir, routePath, 'index.html');
mkdirSync(dirname(filePath), { recursive: true });
writeFileSync(filePath, html, 'utf-8');
console.log(`β Generated: ${routePath} | Size: ${(html.length / 1024).toFixed(2)}KB`);
}
}
runPipeline().catch(err => {
console.error('Build pipeline failed:', err);
process.exit(1);
});
Quick Start Guide
- Install dependencies: Add
fs-extra,undici, andtypescriptto your project. Initialize the route registry module with your existing client-side paths. - Configure build script: Update
package.jsonto runvite build && node scripts/buildPipeline.mjs. Ensure the output directory matches Vercel's expected structure. - Test locally: Run
npm run build, then start a local server withnpx serve dist. Usecurl -s http://localhost:3000/your-route | grep canonicalto verify accurate meta tags. - Deploy and verify: Push to Vercel. After deployment, request a few routes via raw HTTP. Confirm canonical tags match actual URLs and noindex flags apply to low-value pages.
- Submit to GSC: Use the URL Inspection tool to request indexing for high-priority routes. Monitor the "Discovered β currently not indexed" bucket for reduction over the next 7-14 days.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
