← Back to Blog
Next.js2026-05-11Β·69 min read

How I Built a Free Markdown-to-PDF Converter in the Browser

By Chen Penghui

Zero-Server Document Generation: A Browser-First Architecture

Current Situation Analysis

The standard model for document conversion tools relies on a client-server architecture: the user uploads a file, the server processes it using headless browsers or specialized libraries, and the result is returned. While functional, this model introduces significant friction points that are increasingly unacceptable in modern development workflows.

The Privacy and Compliance Gap
Regulatory frameworks like GDPR and HIPAA impose strict data residency requirements. Sending document content to a third-party server, even temporarily, creates a compliance liability. Organizations handling sensitive intellectual property or personal data often block these tools entirely, forcing developers to build internal solutions or accept security risks.

Economic and Scalability Friction
Server-side conversion is computationally expensive. Headless Chrome instances or PDF generation libraries consume significant CPU and memory. As traffic grows, infrastructure costs scale linearly. Developers must implement rate limiting, queue management, and storage cleanup to prevent resource exhaustion. For simple conversion tasks, this overhead is disproportionate.

The Overlooked Capability of Modern Browsers
Many engineers assume client-side conversion is impossible due to historical limitations. However, modern JavaScript engines (V8, SpiderMonkey, JavaScriptCore) and the evolution of the WebAssembly ecosystem have shifted this reality. Libraries like remark, html2pdf.js, and docx now allow full document pipelines to run entirely within the main thread or Web Workers, eliminating the need for remote processing.

WOW Moment: Key Findings

The shift to a browser-first architecture fundamentally changes the trade-off matrix. The following comparison highlights the operational differences between traditional server-side processing and a zero-server client implementation.

Approach Latency Profile Data Privacy Infrastructure Cost Scalability Limit Output Quality
Server-Side Network RTT + Queue + Processing High Risk (Data leaves device) Linear scaling with usage CPU/Memory bottlenecks Vector PDF, Full CSS support
Client-Side Instant (Local processing) Zero Risk (Data stays local) Static hosting only Browser memory limits Rasterized PDF, CSS subset

Why This Matters
For applications prioritizing privacy, instant feedback, and cost efficiency, client-side generation is superior. The "rasterized PDF" trade-off is often acceptable for documentation, reports, and notes, where text selection is less critical than visual fidelity. This architecture enables "offline-first" capabilities and allows developers to offer premium features without incurring per-conversion costs.

Core Solution

The architecture follows a pipeline pattern: Input β†’ AST Transformation β†’ Format-Specific Serialization. Each stage is isolated, allowing for lazy loading of heavy dependencies and independent optimization.

1. Markdown Parsing Pipeline

The foundation is the unified ecosystem, which provides a robust AST-based transformation pipeline. We configure the processor to support GitHub-Flavored Markdown (GFM), ensuring compatibility with tables, task lists, and strikethrough syntax.

import { createProcessor } from 'unified';
import remarkParse from 'remark-parse';
import remarkGfm from 'remark-gfm';
import remarkRehype from 'remark-rehype';
import rehypeStringify from 'rehype-stringify';

export class MarkdownPipeline {
  private processor;

  constructor() {
    this.processor = createProcessor()
      .use(remarkParse)
      .use(remarkGfm)
      .use(remarkRehype, { allowDangerousHtml: true })
      .use(rehypeStringify, { allowDangerousHtml: true });
  }

  async toHtml(source: string): Promise<string> {
    const result = await this.processor.process(source);
    return String(result.value);
  }
}

Rationale: Using createProcessor allows for instance reuse. The allowDangerousHtml flag is necessary for rendering raw HTML embedded in Markdown, though sanitization should be applied if user input is untrusted.

2. Rasterized PDF Generation

Client-side PDF generation relies on html2pdf.js, which combines html2canvas and jsPDF. This approach renders the DOM to a canvas and embeds it as an image in the PDF. While this produces a rasterized document rather than a vector one, it supports complex CSS layouts that vector libraries often struggle with.

import html2pdf from 'html2pdf.js';

export interface PdfConfig {
  filename: string;
  margin: number;
  scale: number;
}

export async function generatePdfSnapshot(
  container: HTMLElement,
  config: PdfConfig
): Promise<Blob> {
  const options = {
    margin: config.margin,
    filename: config.filename,
    image: { type: 'jpeg', quality: 0.98 },
    html2canvas: { scale: config.scale, useCORS: true },
    jsPDF: { unit: 'mm', format: 'a4', orientation: 'portrait' }
  };

  return html2pdf().set(options).from(container).toBlob();
}

Rationale: Returning a Blob instead of triggering an immediate download gives the application control over the output, enabling previews or further processing. The scale parameter adjusts resolution; a scale of 2 provides retina-quality output.

3. Word Document Construction

For DOCX export, the docx library offers a programmatic API to build documents. Unlike HTML-to-PDF, this generates a native vector-based Word file with selectable text.

import { Document, Packer, Paragraph, TextRun, HeadingLevel } from 'docx';

export async function buildDocxPackage(
  sections: Array<{ title: string; content: string }>
): Promise<Blob> {
  const children = sections.flatMap(section => [
    new Paragraph({
      children: [new TextRun({ text: section.title, bold: true })],
      heading: HeadingLevel.HEADING_1,
      spacing: { after: 200 }
    }),
    new Paragraph({
      children: [new TextRun(section.content)],
      spacing: { after: 400 }
    })
  ]);

  const doc = new Document({
    sections: [{ children }]
  });

  return Packer.toBlob(doc);
}

Rationale: This builder pattern allows for structured document generation. The docx library handles the complex XML packaging internally, ensuring compatibility with Microsoft Word and LibreOffice.

4. Mind Map Hierarchy Extraction

Mind map generation requires parsing Markdown into a hierarchical tree structure. markmap-lib provides a transformer that extracts headings and lists into a node graph, which can then be rendered by markmap-view.

import { Transformer } from 'markmap-lib';

export interface MindMapResult {
  hierarchy: any;
  assets: string[];
}

export function extractMindMapHierarchy(source: string): MindMapResult {
  const transformer = new Transformer();
  const { root, features } = transformer.transform(source);
  
  return {
    hierarchy: root,
    assets: features.plugins || []
  };
}

Rationale: Separating the transformation from rendering allows the hierarchy to be used in different contexts, such as interactive SVGs or static exports.

Pitfall Guide

Building client-side conversion tools introduces unique challenges. The following pitfalls are common in production environments.

  1. Main Thread Blocking

    • Explanation: Heavy parsing or rendering tasks can freeze the UI, causing jank or unresponsiveness.
    • Fix: Offload processing to Web Workers. Use comlink to simplify worker communication. For smaller tasks, use requestIdleCallback to defer non-critical work.
  2. Rasterized PDF Limitations

    • Explanation: PDFs generated via html2canvas are images. Text cannot be selected or searched, and file sizes may be larger for text-heavy documents.
    • Fix: Accept this trade-off for layout fidelity. If vector text is required, consider pdf-lib for post-processing or switch to server-side generation for vector output. Document this limitation clearly to users.
  3. Memory Leaks with Large Documents

    • Explanation: Creating large DOM trees or holding references to blobs can exhaust browser memory, especially on mobile devices.
    • Fix: Implement chunking for large inputs. Ensure DOM elements are removed after export. Use URL.revokeObjectURL to free blob URLs. Monitor memory usage during stress testing.
  4. CSS Support in html2canvas

    • Explanation: html2canvas does not support all CSS properties. Features like backdrop-filter, complex gradients, or certain flexbox layouts may render incorrectly.
    • Fix: Test exports with a comprehensive style guide. Provide fallback styles for unsupported features. Use the ignoreElements option to exclude non-essential UI elements from the export.
  5. Internationalization and Font Embedding

    • Explanation: Client-side PDFs may fail to render CJK characters or special symbols if fonts are not loaded. html2canvas relies on system fonts, which vary by device.
    • Fix: Embed web fonts using @font-face and ensure they are loaded before export. Use document.fonts.ready to wait for font loading. For docx, ensure the library supports Unicode text runs.
  6. Offline Availability

    • Explanation: Users expect tools to work without connectivity. Client-side tools are naturally suited for this, but dependencies must be cached.
    • Fix: Implement a Service Worker to cache static assets and library bundles. Use workbox to manage caching strategies. Ensure the application functions fully when offline.
  7. Security Risks with Raw HTML

    • Explanation: Allowing raw HTML in Markdown can lead to XSS attacks if the content is rendered in a context where it interacts with the DOM.
    • Fix: Sanitize HTML using dompurify before processing. Use a sandboxed iframe for preview if necessary. Validate input on the client side, even though no server is involved.

Production Bundle

Action Checklist

  • Implement Lazy Loading: Load export libraries (html2pdf, docx, markmap) only when the user initiates an export to reduce initial bundle size.
  • Add Error Boundaries: Wrap export functions in try-catch blocks and display user-friendly error messages for memory or parsing failures.
  • Optimize Performance: Use useMemo for parsed ASTs and debounce editor input to prevent unnecessary re-processing.
  • Configure i18n: Set up locale routing and ensure font support for all target languages. Test exports with multi-byte characters.
  • Enable Offline Mode: Add a Service Worker to cache dependencies and allow full functionality without network access.
  • Add Progress Indicators: Show loading states during export to provide feedback, especially for large documents.
  • Test Memory Usage: Profile the application with large inputs to identify and fix memory leaks.
  • Document Trade-offs: Clearly communicate the rasterized PDF limitation and privacy benefits to users.

Decision Matrix

Scenario Recommended Approach Why Cost Impact
Privacy-Sensitive Docs Client-Side Data never leaves the device; zero compliance risk. Free (Static hosting)
Vector PDF Required Server-Side Client-side tools produce rasterized output; server can use vector libraries. Infrastructure costs scale
High-Volume Processing Client-Side No server bottlenecks; scales infinitely with user devices. Free (Static hosting)
Complex OCR Needs Server-Side Browser OCR is limited; server can use Tesseract or cloud APIs. Infrastructure costs scale
Offline-First App Client-Side Native support for offline operation via Service Workers. Free (Static hosting)

Configuration Template

Use this template to set up a robust Next.js configuration for client-side document generation. This ensures external packages are handled correctly and bundle size is optimized.

// next.config.js
const nextConfig = {
  reactStrictMode: true,
  webpack: (config, { isServer }) => {
    if (!isServer) {
      // Exclude heavy libraries from the initial bundle
      config.externals = config.externals || [];
      config.externals.push('html2pdf.js', 'docx', 'markmap-lib');
    }
    return config;
  },
  // Optimize images and assets
  images: {
    formats: ['image/avif', 'image/webp'],
  },
};

module.exports = nextConfig;

Quick Start Guide

  1. Initialize Project: Create a new Next.js app with TypeScript support.

    npx create-next-app@latest doc-gen-tool --typescript
    cd doc-gen-tool
    
  2. Install Dependencies: Add the core libraries for parsing and export.

    npm install unified remark-parse remark-gfm remark-rehype rehype-stringify html2pdf.js docx markmap-lib
    
  3. Create Pipeline Module: Implement the MarkdownPipeline and export functions as shown in the Core Solution section.

  4. Build UI: Create a simple interface with a Markdown editor and export buttons. Use dynamic imports for export libraries.

    const Html2Pdf = dynamic(() => import('html2pdf.js'), { ssr: false });
    
  5. Test and Deploy: Run the application locally, test exports with various Markdown inputs, and deploy to a static hosting provider like Vercel or Netlify. No server configuration is required.