How to Generate Link Previews Like Slack (Without the Edge-Case Hell)

Building a Production-Ready URL Preview Engine: Security, Fallbacks, and Implementation Patterns

Current Situation Analysis

Rich link previews—cards displaying titles, descriptions, and thumbnails when a URL is shared—are a standard expectation in modern communication platforms. They transform raw text into trustworthy, clickable assets, significantly increasing engagement in chat applications, comment systems, and content management tools.

Despite the ubiquity of this feature, implementing it reliably is deceptively complex. Many engineering teams underestimate the operational burden, assuming that fetching a URL and parsing HTML meta tags is a trivial task. This assumption leads to fragile implementations that break under real-world conditions.

The core issue is that the public web is hostile to scrapers. A significant portion of the web lacks structured metadata, forcing clients to implement complex fallback logic. More critically, allowing user-submitted URLs to trigger server-side requests introduces a severe security vulnerability: Server-Side Request Forgery (SSRF). Attackers can exploit naive preview generators to probe internal networks, access cloud metadata endpoints (e.g., http://169.254.169.254), or interact with localhost services.

Industry data consistently ranks SSRF among the top web security risks. Link preview features are a classic attack vector because they inherently require the server to make outbound requests based on untrusted input. Without rigorous validation, a simple "fetch and parse" function becomes a gateway for internal network enumeration and data exfiltration.

WOW Moment: Key Findings

The decision to build a custom preview engine versus leveraging a managed service hinges on a trade-off between control, security liability, and maintenance overhead. The following comparison highlights why naive implementations are unsustainable in production environments handling user-generated content.

Approach	SSRF Risk Profile	Metadata Fallback Coverage	Maintenance Overhead	Latency Characteristics
Naive Fetch	Critical	None (Fails on missing tags)	Low	Unbounded (Hangs on slow targets)
Robust Custom Engine	Mitigated	Full (OG → Twitter → DOM)	High	Controlled (Timeouts/Caps enforced)
Managed Service	Managed by Provider	Full	None	Variable (Depends on provider SLA)

Why this matters: The "Naive Fetch" approach is not merely incomplete; it is a security liability. The "Robust Custom Engine" requires implementing IP validation on every redirect hop, byte limits, and comprehensive fallback chains. For applications accepting user input, the engineering cost to achieve parity with a managed service often outweighs the benefits, making third-party APIs the pragmatic choice for many teams.

Core Solution

Building a resilient link unfurling service requires a layered defense strategy. The implementation must address input validation, safe network transport, robust parsing, and metadata normalization. Below is a TypeScript implementation demonstrating the architectural patterns required for a production-grade engine.

1. Input Sanitization and SSRF Pre-Checks

Before any network request, the URL must be validated. This includes protocol whitelisting and IP address verification. Crucially, IP validation must occur after DNS resolution to prevent DNS rebinding attacks, where an attacker controls a domain that resolves to a public IP initially but switches to a private IP during the request lifecycle.

import * as net from 'net';
import * as dns from 'dns';
import { promisify } from 'util';

const resolve4 = promisify(dns.resolve4);

interface PreviewConfig {
  maxRedirects: number;
  timeoutMs: number;
  maxBytes: number;
  allowedProtocols: string[];
}

class LinkUnfurler {
  private config: PreviewConfig;

  constructor(config: Partial<PreviewConfig> = {}) {
    this.config = {
      maxRedirects: 5,
      timeoutMs: 3000,
      maxBytes: 2 * 1024 * 1024, // 2MB cap
      allowedProtocols: ['http:', 'https:'],
      ...config,
    };
  }

  private isPrivateOrLoopback(ip: string): boolean {
    if (!net.isIP(ip)) return true;
    // Block private, loopback, link-local, and cloud metadata ranges
    const parts = ip.split('.').map(Number);
    if (parts[0] === 10) return true;
    if (parts[0] === 172 && parts[1] >= 16 && parts[1] <= 31) return true;
    if (parts[0] === 192 && parts[1] === 168) return true;
    if (parts[0] === 127) return true;
    if (parts[0] === 169 && parts[1] === 254) return true; // Cloud metadata
    return false;
  }

  private async validateTargetUrl(urlString: string): Promise<URL> {
    const url = new URL(urlString);
    
    if (!this.config.allowedProtocols.includes(url.protocol)) {
      throw new Error('Protocol not allowed');
    }

    // Resolve IP and check immediately to mitigate DNS rebinding
    const ips = await resolve4(url.hostname);
    const isBlocked = ips.some(ip => this.isPrivateOrLoopback(ip));
    
    if (isBlocked) {
      throw new Error('Target resolves to blocked IP range');
    }

    return url;
  }
}

2. Safe Fetching with Redirect and Size Controls

Network requests must be bounded. Implementing a hard timeout prevents resource exhaustion from slow targets. A byte cap ensures that multi-megabyte responses do not consume excessive memory. Redirects must be followed manually to re-validate the destination IP on every hop.

  async unfurl(targetUrl: string): Promise<PreviewResult> {
    const safeUrl = await this.validateTargetUrl(targetUrl);
    
    let currentUrl = safeUrl;
    let redirectCount = 0;
    let finalUrl: URL = safeUrl;
    let responseBody = '';
    let receivedBytes = 0;

    while (redirectCount < this.config.maxRedirects) {
      // Re-validate IP on every redirect hop
      const ips = await resolve4(currentUrl.hostname);
      if (ips.some(ip => this.isPrivateOrLoopback(ip))) {
        throw new Error('Redirect leads to blocked IP');
      }

      const response = await fetch(currentUrl.toString(), {
        redirect: 'manual',
        signal: AbortSignal.timeout(this.config.timeoutMs),
      });

      if (response.status >= 300 && response.status < 400) {
        const location = response.headers.get('location');
        if (!location) throw new Error('Redirect without location header');
        currentUrl = new URL(location, currentUrl);
        redirectCount++;
        continue;
      }

      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }

      finalUrl = new URL(response.url);
      
      // Stream read with byte cap
      const reader = response.body.getReader();
      const decoder = new TextDecoder();
      
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        
        receivedBytes += value.length;
        if (receivedBytes > this.config.maxBytes) {
          throw new Error('Response exceeds byte limit');
        }
        responseBody += decoder.decode(value, { stream: true });
      }
      break;
    }

    return this.extractMetadata(responseBody, finalUrl);
  }

3. Metadata Extraction and Fallback Chain

Parsing must handle missing tags and relative URLs. The extraction logic should implement a deterministic fallback chain: Open Graph properties take precedence, followed by Twitter Card tags, then standard HTML elements. Relative image URLs must be resolved against the final resolved URL to ensure assets load correctly in the client.

  private extractMetadata(html: string, baseUrl: URL): PreviewResult {
    // Simplified parsing logic; production should use a robust HTML parser
    const getMeta = (property: string) => {
      const regex = new RegExp(`<meta[^>]+(?:property|name)=["']${property}["'][^>]+content=["']([^"']+)["']`, 'i');
      const match = html.match(regex);
      return match ? match[1] : null;
    };

    const title = getMeta('og:title') || getMeta('twitter:title') || 
                  html.match(/<title[^>]*>([^<]+)<\/title>/i)?.[1]?.trim() || null;

    const description = getMeta('og:description') || getMeta('twitter:description') || 
                        getMeta('description') || null;

    let imageUrl = getMeta('og:image') || getMeta('twitter:image') || null;
    
    // Resolve relative URLs
    if (imageUrl) {
      try {
        imageUrl = new URL(imageUrl, baseUrl).toString();
      } catch {
        imageUrl = null;
      }
    }

    return {
      resolvedUrl: baseUrl.toString(),
      title,
      description,
      imageUrl,
      // Additional fields like favicon or themeColor can be extracted similarly
    };
  }
}

interface PreviewResult {
  resolvedUrl: string;
  title: string | null;
  description: string | null;
  imageUrl: string | null;
}

Pitfall Guide

SSRF via Redirect Hops
- Explanation: Validating the initial URL's IP is insufficient. An attacker can provide a public URL that redirects to an internal service.
- Fix: Resolve and validate the IP address on every redirect hop before following the Location header.
DNS Rebinding Attacks
- Explanation: An attacker controls a domain that resolves to a public IP during the initial check but switches to a private IP during the fetch.
- Fix: Resolve the IP immediately before the request. In high-security contexts, resolve the IP once and fetch directly using the IP address with a Host header, or use a proxy that enforces IP policies.
Infinite Redirect Loops
- Explanation: Malicious or misconfigured servers can create redirect cycles that exhaust client resources.
- Fix: Enforce a strict maximum redirect count (e.g., 5). Abort the request if the limit is reached.
Memory Exhaustion via Large Payloads
- Explanation: Fetching a multi-gigabyte file or a slow-streaming response can crash the service.
- Fix: Implement a hard byte cap on the response body. Stream the response and abort if the limit is exceeded. Set a connection timeout.
Broken Image Links from Relative URLs
- Explanation: og:image often contains relative paths like /assets/preview.png. Rendering these without resolution results in broken images.
- Fix: Always resolve relative URLs against the final resolved URL using the URL constructor.
Incomplete Fallback Chains
- Explanation: Relying solely on Open Graph tags causes previews to fail for sites using Twitter Cards or standard HTML metadata.
- Fix: Implement a prioritized fallback chain: Open Graph → Twitter Card → <title>/<meta description> → DOM extraction.
Blocking Legitimate Content
- Explanation: Some sites block requests with default User-Agent strings or require specific headers.
- Fix: Configure the fetch client with a standard User-Agent string. Monitor for 403 Forbidden responses and adjust headers if necessary, while ensuring compliance with robots.txt where applicable.

Production Bundle

Action Checklist

Implement IP Validation: Add checks for private, loopback, link-local, and cloud metadata IP ranges.
Enforce Redirect Limits: Set a maximum redirect count and validate IPs on every hop.
Set Resource Caps: Configure hard timeouts (e.g., 3000ms) and byte limits (e.g., 2MB).
Build Fallback Chain: Ensure extraction logic covers Open Graph, Twitter Cards, and HTML fallbacks.
Resolve Relative URLs: Normalize all asset URLs against the final response URL.
Add Caching: Cache preview results to reduce latency and load on target servers.
Rate Limiting: Protect your unfurling service from abuse by limiting requests per user/IP.
Error Handling: Return consistent error shapes for timeouts, blocked IPs, and parse failures.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Internal Tools / Trusted URLs	Custom Engine	Low SSRF risk; full control over parsing logic.	Engineering time; low infra cost.
User-Generated Content	Managed Service	Eliminates SSRF liability; handles edge cases automatically.	Subscription cost; reduced dev time.
High Volume / Cost Sensitive	Custom + Aggressive Caching	Reduces external requests; scales with infra.	Higher infra complexity; low marginal cost.
Dynamic / JS-Heavy Sites	Headless Browser Service	Static HTML fetch fails on SPAs; requires rendering.	High compute cost; slower latency.

Configuration Template

Use this configuration structure to parameterize your unfurling service. Adjust values based on your latency requirements and risk tolerance.

{
  "unfurler": {
    "network": {
      "timeoutMs": 3000,
      "maxRedirects": 5,
      "maxBytes": 2097152,
      "allowedProtocols": ["http:", "https:"],
      "blockedIpRanges": [
        "10.0.0.0/8",
        "172.16.0.0/12",
        "192.168.0.0/16",
        "127.0.0.0/8",
        "169.254.0.0/16"
      ]
    },
    "parsing": {
      "fallbackOrder": ["og", "twitter", "html"],
      "resolveRelativeUrls": true,
      "maxTitleLength": 100,
      "maxDescriptionLength": 300
    },
    "caching": {
      "enabled": true,
      "ttlSeconds": 86400,
      "staleWhileRevalidate": true
    }
  }
}

Quick Start Guide

Initialize the Service: Instantiate the LinkUnfurler with your configuration. Ensure IP ranges and timeouts are set.
```
const unfurler = new LinkUnfurler({ timeoutMs: 2000, maxBytes: 1048576 });
```

Call the Unfurl Method: Pass the user-provided URL to the service. Wrap in try/catch to handle validation errors.

try {
  const preview = await unfurler.unfurl('https://example.com/article');
  console.log(preview.title);
} catch (err) {
  // Handle SSRF blocks, timeouts, or parse errors
}

Render the Result: Use the returned resolvedUrl, title, description, and imageUrl to render the preview card in your UI.
Implement Caching: Store results in Redis or a similar store keyed by the input URL to avoid redundant fetches.
Monitor Metrics: Track success rates, latency, and error types (e.g., SSRF blocks vs. timeouts) to tune configuration and detect abuse.

Mid-Year Sale — Unlock Full Article