How I Built a Free Markdown-to-PDF Converter in the Browser
Zero-Server Document Generation: A Browser-First Architecture
Current Situation Analysis
The standard model for document conversion tools relies on a client-server architecture: the user uploads a file, the server processes it using headless browsers or specialized libraries, and the result is returned. While functional, this model introduces significant friction points that are increasingly unacceptable in modern development workflows.
The Privacy and Compliance Gap
Regulatory frameworks like GDPR and HIPAA impose strict data residency requirements. Sending document content to a third-party server, even temporarily, creates a compliance liability. Organizations handling sensitive intellectual property or personal data often block these tools entirely, forcing developers to build internal solutions or accept security risks.
Economic and Scalability Friction
Server-side conversion is computationally expensive. Headless Chrome instances or PDF generation libraries consume significant CPU and memory. As traffic grows, infrastructure costs scale linearly. Developers must implement rate limiting, queue management, and storage cleanup to prevent resource exhaustion. For simple conversion tasks, this overhead is disproportionate.
The Overlooked Capability of Modern Browsers
Many engineers assume client-side conversion is impossible due to historical limitations. However, modern JavaScript engines (V8, SpiderMonkey, JavaScriptCore) and the evolution of the WebAssembly ecosystem have shifted this reality. Libraries like remark, html2pdf.js, and docx now allow full document pipelines to run entirely within the main thread or Web Workers, eliminating the need for remote processing.
WOW Moment: Key Findings
The shift to a browser-first architecture fundamentally changes the trade-off matrix. The following comparison highlights the operational differences between traditional server-side processing and a zero-server client implementation.
| Approach | Latency Profile | Data Privacy | Infrastructure Cost | Scalability Limit | Output Quality |
|---|---|---|---|---|---|
| Server-Side | Network RTT + Queue + Processing | High Risk (Data leaves device) | Linear scaling with usage | CPU/Memory bottlenecks | Vector PDF, Full CSS support |
| Client-Side | Instant (Local processing) | Zero Risk (Data stays local) | Static hosting only | Browser memory limits | Rasterized PDF, CSS subset |
Why This Matters
For applications prioritizing privacy, instant feedback, and cost efficiency, client-side generation is superior. The "rasterized PDF" trade-off is often acceptable for documentation, reports, and notes, where text selection is less critical than visual fidelity. This architecture enables "offline-first" capabilities and allows developers to offer premium features without incurring per-conversion costs.
Core Solution
The architecture follows a pipeline pattern: Input β AST Transformation β Format-Specific Serialization. Each stage is isolated, allowing for lazy loading of heavy dependencies and independent optimization.
1. Markdown Parsing Pipeline
The foundation is the unified ecosystem, which provides a robust AST-based transformation pipeline. We configure the processor to support GitHub-Flavored Markdown (GFM), ensuring compatibility with tables, task lists, and strikethrough syntax.
import { createProcessor } from 'unified';
import remarkParse from 'remark-parse';
import remarkGfm from 'remark-gfm';
import remarkRehype from 'remark-rehype';
import rehypeStringify from 'rehype-stringify';
export class MarkdownPipeline {
private processor;
constructor() {
this.processor = createProcessor()
.use(remarkParse)
.use(remarkGfm)
.use(remarkRehype, { allowDangerousHtml: true })
.use(rehypeStringify, { allowDangerousHtml: true });
}
async toHtml(source: string): Promise<string> {
const result = await this.processor.process(source);
return String(result.value);
}
}
Rationale: Using createProcessor allows for instance reuse. The allowDangerousHtml flag is necessary for rendering raw HTML embedded in Markdown, though sanitization should be applied if user input is untrusted.
2. Rasterized PDF Generation
Client-side PDF generation relies on html2pdf.js, which combines html2canvas and jsPDF. This approach renders the DOM to a canvas and embeds it as an image in the PDF. While this produces a rasterized document rather than a vector one, it supports complex CSS layouts that vector libraries often struggle with.
import html2pdf from 'html2pdf.js';
export interface PdfConfig {
filename: string;
margin: number;
scale: number;
}
export async function generatePdfSnapshot(
container: HTMLElement,
config: PdfConfig
): Promise<Blob> {
const options = {
margin: config.margin,
filename: config.filename,
image: { type: 'jpeg', quality: 0.98 },
html2canvas: { scale: config.scale, useCORS: true },
jsPDF: { unit: 'mm', format: 'a4', orientation: 'portrait' }
};
return html2pdf().set(options).from(container).toBlob();
}
Rationale: Returning a Blob instead of triggering an immediate download gives the application control over the output, enabling previews or further processing. The scale parameter adjusts resolution; a scale of 2 provides retina-quality output.
3. Word Document Construction
For DOCX export, the docx library offers a programmatic API to build documents. Unlike HTML-to-PDF, this generates a native vector-based Word file with selectable text.
import { Document, Packer, Paragraph, TextRun, HeadingLevel } from 'docx';
export async function buildDocxPackage(
sections: Array<{ title: string; content: string }>
): Promise<Blob> {
const children = sections.flatMap(section => [
new Paragraph({
children: [new TextRun({ text: section.title, bold: true })],
heading: HeadingLevel.HEADING_1,
spacing: { after: 200 }
}),
new Paragraph({
children: [new TextRun(section.content)],
spacing: { after: 400 }
})
]);
const doc = new Document({
sections: [{ children }]
});
return Packer.toBlob(doc);
}
Rationale: This builder pattern allows for structured document generation. The docx library handles the complex XML packaging internally, ensuring compatibility with Microsoft Word and LibreOffice.
4. Mind Map Hierarchy Extraction
Mind map generation requires parsing Markdown into a hierarchical tree structure. markmap-lib provides a transformer that extracts headings and lists into a node graph, which can then be rendered by markmap-view.
import { Transformer } from 'markmap-lib';
export interface MindMapResult {
hierarchy: any;
assets: string[];
}
export function extractMindMapHierarchy(source: string): MindMapResult {
const transformer = new Transformer();
const { root, features } = transformer.transform(source);
return {
hierarchy: root,
assets: features.plugins || []
};
}
Rationale: Separating the transformation from rendering allows the hierarchy to be used in different contexts, such as interactive SVGs or static exports.
Pitfall Guide
Building client-side conversion tools introduces unique challenges. The following pitfalls are common in production environments.
Main Thread Blocking
- Explanation: Heavy parsing or rendering tasks can freeze the UI, causing jank or unresponsiveness.
- Fix: Offload processing to Web Workers. Use
comlinkto simplify worker communication. For smaller tasks, userequestIdleCallbackto defer non-critical work.
Rasterized PDF Limitations
- Explanation: PDFs generated via
html2canvasare images. Text cannot be selected or searched, and file sizes may be larger for text-heavy documents. - Fix: Accept this trade-off for layout fidelity. If vector text is required, consider
pdf-libfor post-processing or switch to server-side generation for vector output. Document this limitation clearly to users.
- Explanation: PDFs generated via
Memory Leaks with Large Documents
- Explanation: Creating large DOM trees or holding references to blobs can exhaust browser memory, especially on mobile devices.
- Fix: Implement chunking for large inputs. Ensure DOM elements are removed after export. Use
URL.revokeObjectURLto free blob URLs. Monitor memory usage during stress testing.
CSS Support in
html2canvas- Explanation:
html2canvasdoes not support all CSS properties. Features likebackdrop-filter, complex gradients, or certain flexbox layouts may render incorrectly. - Fix: Test exports with a comprehensive style guide. Provide fallback styles for unsupported features. Use the
ignoreElementsoption to exclude non-essential UI elements from the export.
- Explanation:
Internationalization and Font Embedding
- Explanation: Client-side PDFs may fail to render CJK characters or special symbols if fonts are not loaded.
html2canvasrelies on system fonts, which vary by device. - Fix: Embed web fonts using
@font-faceand ensure they are loaded before export. Usedocument.fonts.readyto wait for font loading. Fordocx, ensure the library supports Unicode text runs.
- Explanation: Client-side PDFs may fail to render CJK characters or special symbols if fonts are not loaded.
Offline Availability
- Explanation: Users expect tools to work without connectivity. Client-side tools are naturally suited for this, but dependencies must be cached.
- Fix: Implement a Service Worker to cache static assets and library bundles. Use
workboxto manage caching strategies. Ensure the application functions fully when offline.
Security Risks with Raw HTML
- Explanation: Allowing raw HTML in Markdown can lead to XSS attacks if the content is rendered in a context where it interacts with the DOM.
- Fix: Sanitize HTML using
dompurifybefore processing. Use a sandboxed iframe for preview if necessary. Validate input on the client side, even though no server is involved.
Production Bundle
Action Checklist
- Implement Lazy Loading: Load export libraries (
html2pdf,docx,markmap) only when the user initiates an export to reduce initial bundle size. - Add Error Boundaries: Wrap export functions in try-catch blocks and display user-friendly error messages for memory or parsing failures.
- Optimize Performance: Use
useMemofor parsed ASTs and debounce editor input to prevent unnecessary re-processing. - Configure i18n: Set up locale routing and ensure font support for all target languages. Test exports with multi-byte characters.
- Enable Offline Mode: Add a Service Worker to cache dependencies and allow full functionality without network access.
- Add Progress Indicators: Show loading states during export to provide feedback, especially for large documents.
- Test Memory Usage: Profile the application with large inputs to identify and fix memory leaks.
- Document Trade-offs: Clearly communicate the rasterized PDF limitation and privacy benefits to users.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Privacy-Sensitive Docs | Client-Side | Data never leaves the device; zero compliance risk. | Free (Static hosting) |
| Vector PDF Required | Server-Side | Client-side tools produce rasterized output; server can use vector libraries. | Infrastructure costs scale |
| High-Volume Processing | Client-Side | No server bottlenecks; scales infinitely with user devices. | Free (Static hosting) |
| Complex OCR Needs | Server-Side | Browser OCR is limited; server can use Tesseract or cloud APIs. | Infrastructure costs scale |
| Offline-First App | Client-Side | Native support for offline operation via Service Workers. | Free (Static hosting) |
Configuration Template
Use this template to set up a robust Next.js configuration for client-side document generation. This ensures external packages are handled correctly and bundle size is optimized.
// next.config.js
const nextConfig = {
reactStrictMode: true,
webpack: (config, { isServer }) => {
if (!isServer) {
// Exclude heavy libraries from the initial bundle
config.externals = config.externals || [];
config.externals.push('html2pdf.js', 'docx', 'markmap-lib');
}
return config;
},
// Optimize images and assets
images: {
formats: ['image/avif', 'image/webp'],
},
};
module.exports = nextConfig;
Quick Start Guide
Initialize Project: Create a new Next.js app with TypeScript support.
npx create-next-app@latest doc-gen-tool --typescript cd doc-gen-toolInstall Dependencies: Add the core libraries for parsing and export.
npm install unified remark-parse remark-gfm remark-rehype rehype-stringify html2pdf.js docx markmap-libCreate Pipeline Module: Implement the
MarkdownPipelineand export functions as shown in the Core Solution section.Build UI: Create a simple interface with a Markdown editor and export buttons. Use dynamic imports for export libraries.
const Html2Pdf = dynamic(() => import('html2pdf.js'), { ssr: false });Test and Deploy: Run the application locally, test exports with various Markdown inputs, and deploy to a static hosting provider like Vercel or Netlify. No server configuration is required.
