Disallow: /admin/
Disallow: /internal/
Disallow: /_next/static/
User-agent: Google-Extended
Disallow: /
Sitemap: ${baseUrl}/sitemap.xml
`.trim();
};
**Dynamic Sitemap Generation**
Sitemaps provide a roadmap for crawlers. They must only contain URLs returning a `200` status code and exclude pages marked with `noindex`. Including invalid URLs confuses crawlers and wastes crawl budget. For large sites, generate sitemaps dynamically to ensure accuracy.
```typescript
// lib/sitemap-builder.ts
import { db } from '@/database';
interface SitemapEntry {
url: string;
lastModified: Date;
changeFrequency: 'daily' | 'weekly' | 'monthly';
priority: number;
}
export async function generateSiteMap(): Promise<SitemapEntry[]> {
const activeRoutes = await db.pages.findMany({
where: { status: 'published', isIndexed: true },
select: { slug: true, updatedAt: true, isFeatured: true }
});
return activeRoutes.map(route => ({
url: `https://example.com${route.slug}`,
lastModified: route.updatedAt,
changeFrequency: route.isFeatured ? 'daily' : 'weekly',
priority: route.isFeatured ? 0.9 : 0.6
}));
}
Crawl and Index Budget Management
Crawl budget refers to the number of pages Googlebot crawls per visit. Wasting this budget on low-value pages reduces the frequency with which important content is crawled. Common budget drains include faceted navigation generating thousands of parameter combinations, paginated archives, and duplicate content across HTTP/HTTPS or www/non-www variants.
Mitigate budget waste by:
- Implementing canonical tags to consolidate duplicate content.
- Using
robots.txt to disallow parameter-heavy paths that do not provide unique value.
- Consolidating URL structures to reduce redundancy.
- Monitoring "index budget," the ratio of quality indexed pages to total pages, to ensure search engines are focusing on valuable content.
2. Rendering: Eliminating the JavaScript Queue
Client-side rendering introduces a rendering queue where Googlebot must fetch the HTML, wait for JavaScript to download, execute it, and then render the content. This delay can prevent timely indexing.
Rendering Decision Framework
- Static Generation (SSG/ISR): Use for marketing pages, blogs, and documentation. Content is pre-rendered at build time or regenerated periodically. Googlebot receives fully rendered HTML instantly, eliminating indexing latency.
- Server-Side Rendering (SSR): Use for dynamic pages requiring user-specific data or real-time updates. The server renders HTML on each request, providing complete content to crawlers without client-side execution.
- Client-Side Rendering (CSR): Reserve for application interfaces like admin panels or dashboards that do not require indexing.
React Server Components and Hydration
Even with SSR, hydration—the process of attaching event listeners to server-rendered HTML—can impact performance. Heavy hydration payloads increase JavaScript execution time, directly affecting INP scores. React Server Components (RSC) address this by rendering components on the server without shipping JavaScript to the client.
Adopt a disciplined approach to component boundaries. Only mark components as client-side when interactivity is required.
// components/product-catalog.tsx
// Server Component: Renders on server, zero client JS
import { getProducts } from '@/lib/api';
export async function ProductCatalog({ categoryId }: { categoryId: string }) {
const products = await getProducts(categoryId);
return (
<div className="grid">
{products.map(product => (
<article key={product.id}>
<h2>{product.name}</h2>
<p>{product.description}</p>
</article>
))}
</div>
);
}
// components/add-to-cart.tsx
// Client Component: Only ships JS for interactivity
'use client';
import { useState } from 'react';
export function AddToCart({ productId }: { productId: string }) {
const [loading, setLoading] = useState(false);
return (
<button
onClick={() => setLoading(true)}
disabled={loading}
>
{loading ? 'Adding...' : 'Add to Cart'}
</button>
);
}
Google evaluates three metrics at the 75th percentile using real-user data. Optimizing these is essential for maintaining search visibility.
Largest Contentful Paint (LCP)
LCP measures the time to render the largest visible element. Target: under 2.5 seconds.
- Preload critical resources: Use
fetchpriority="high" for LCP images to prioritize their download.
- Modern image formats: Serve AVIF or WebP formats with fallbacks to reduce payload size.
- Critical CSS inlining: Inline CSS required for above-the-fold content to prevent render-blocking delays.
- Server response time: Optimize Time to First Byte (TTFB) by using CDNs and aggressive caching. TTFB over 600ms severely impacts LCP.
<!-- Preload LCP image with high priority -->
<link
rel="preload"
as="image"
href="/hero-banner.avif"
fetchpriority="high"
/>
<!-- Responsive image with modern formats -->
<picture>
<source srcset="/hero.avif" type="image/avif" />
<source srcset="/hero.webp" type="image/webp" />
<img
src="/hero.jpg"
alt="Hero Banner"
width="1200"
height="600"
loading="eager"
/>
</picture>
Interaction to Next Paint (INP)
INP measures responsiveness to user interactions. Target: under 200ms. INP replaced First Input Delay (FID) and evaluates all interactions, not just the first.
- Break up long tasks: JavaScript tasks exceeding 50ms block the main thread, delaying interaction responses. Use
scheduler.yield() to yield control back to the browser, allowing it to process pending interactions.
- Audit third-party scripts: Third-party scripts (analytics, chat widgets, ads) often execute heavy JavaScript on interaction. Defer non-critical scripts or load them only after user engagement.
// lib/task-scheduler.ts
// Break up heavy computation to maintain INP compliance
export async function processBatch(items: string[]): Promise<void> {
for (const item of items) {
performHeavyOperation(item);
// Yield control to browser every 50ms to prevent main thread blocking
await scheduler.yield();
}
}
function performHeavyOperation(item: string): void {
// Simulate computation
console.log(`Processing ${item}`);
}
Cumulative Layout Shift (CLS)
CLS measures visual stability. Target: under 0.1.
- Define dimensions: Always specify width and height attributes for images and videos to reserve space before loading.
- Font loading strategies: Use
font-display: swap to prevent invisible text during font loading, but ensure fallback fonts have similar metrics to minimize shift.
- Dynamic content: Avoid inserting content above existing content unless triggered by user interaction. Use skeleton loaders to reserve space for async content.
Pitfall Guide
1. The Staging Leak
Explanation: Deploying a robots.txt file configured for staging environments to production, often containing Disallow: /, which blocks all crawling.
Fix: Implement CI/CD checks that validate robots.txt content before deployment. Use environment-specific configuration files and ensure production builds generate the correct policy.
2. The Asset Blockade
Explanation: Blocking CSS and JavaScript directories in robots.txt to save crawl budget. This prevents Googlebot from rendering the page, causing it to misinterpret content and structure.
Fix: Allow all static assets required for rendering. Only block routes that do not contribute to page presentation, such as API endpoints or administrative paths.
3. Hydration Overload
Explanation: Marking entire component trees as client-side, causing excessive JavaScript execution during hydration. This increases INP and degrades performance.
Fix: Adopt React Server Components by default. Only use 'use client' for components requiring state, event handlers, or browser APIs. Push client boundaries as deep as possible.
4. Sitemap Pollution
Explanation: Including URLs that return 404 errors or are marked noindex in the sitemap. This wastes crawl budget and confuses search engines.
Fix: Validate all URLs in the sitemap generation process. Filter out unpublished, deleted, or non-indexable pages. Regularly audit sitemaps for broken links.
5. INP Blindness
Explanation: Ignoring long JavaScript tasks that block the main thread. Developers often focus on load time but neglect interaction responsiveness.
Fix: Profile main thread activity using performance tools. Break up long tasks using scheduler.yield() or Web Workers. Defer non-critical JavaScript execution.
6. AI Training Ambiguity
Explanation: Failing to configure the Google-Extended user agent, leaving AI training data usage undefined. This may result in content being used for model training without explicit consent.
Fix: Explicitly configure Google-Extended in robots.txt based on organizational policy. Document the decision and ensure it aligns with content licensing agreements.
7. Crawl Budget Burn
Explanation: Allowing faceted navigation or URL parameters to generate thousands of low-value pages. This dilutes crawl budget and index quality.
Fix: Implement canonical tags to consolidate duplicate content. Use robots.txt to disallow parameter combinations that do not provide unique value. Limit faceted navigation to essential filters.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Marketing Landing Page | Static Generation (SSG) | Instant indexing, minimal JS, highest performance. | Low build cost, high SEO return. |
| E-Commerce Product Page | Server-Side Rendering (SSR) | Dynamic pricing/inventory, fast indexing, good INP. | Moderate server cost, high conversion impact. |
| Blog/Documentation | Incremental Static Regeneration (ISR) | Balance of freshness and performance, zero indexing delay. | Low cost, scalable. |
| Admin Dashboard | Client-Side Rendering (CSR) | No indexing required, rich interactivity acceptable. | Low SEO impact, high dev flexibility. |
| User Search Results | SSR with Streaming | Real-time data, progressive rendering improves LCP. | Higher server load, better UX. |
Configuration Template
# robots.txt
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /internal/
Disallow: /_next/static/
User-agent: Google-Extended
Disallow: /
Sitemap: https://example.com/sitemap.xml
// lib/sitemap-config.ts
export const sitemapConfig = {
maxUrlsPerFile: 50000,
changeFrequency: 'weekly',
priority: {
home: 1.0,
featured: 0.9,
standard: 0.6,
archive: 0.3
},
excludePatterns: [
/\/admin\//,
/\/api\//,
/\/draft\//,
/\/noindex\//
]
};
Quick Start Guide
- Verify Crawl Access: Run
curl -I https://example.com/robots.txt to ensure the file is accessible and correctly configured. Check that static assets are not blocked.
- Generate Sitemap: Execute your sitemap generation script and validate the output XML. Submit the sitemap URL to Google Search Console.
- Audit Rendering: Identify public-facing pages using CSR. Migrate critical pages to SSR or SSG. Ensure Server Components are used where possible.
- Optimize LCP: Add
fetchpriority="high" to your hero image. Verify TTFB is under 600ms using performance monitoring tools.
- Profile INP: Run a Lighthouse audit and check for long tasks. Implement
scheduler.yield() for any computation exceeding 50ms. Defer third-party scripts.