Architecting Multilingual Document Generation on Serverless: LLM Translation, Script Rendering, and Rule-Based Routing

Current Situation Analysis

The modern SaaS landscape is saturated with AI integration mandates. Engineering teams routinely reach for large language models to handle document personalization, template routing, and multilingual output. The assumption is straightforward: if an LLM can generate coherent text, it can reliably transform, format, and deliver structured documents across languages and scripts. In practice, this assumption collapses under the weight of typographic complexity, cultural nuance, and serverless runtime constraints.

The core pain point is not AI capability; it is architectural mismatch. Developers treat LLMs as universal formatting engines, expecting them to preserve proper nouns, academic credentials, and culturally specific identifiers while simultaneously handling layout constraints. They also assume serverless PDF generation behaves identically across Latin and non-Latin scripts. Neither assumption holds in production.

India’s linguistic landscape exemplifies this gap. With 22 officially recognized languages and multiple distinct scripts (Devanagari, Gurmukhi, Tamil, Urdu/Nastaliq, etc.), a single document pipeline must handle right-to-left rendering, complex conjunct characters, vertical matras, and OpenType shaping rules. Standard PDF libraries that rely on font subsetting routinely drop conjuncts, rendering text as placeholder rectangles. Server-side rendering engines that lack browser-grade text shaping fail to adjust character forms based on neighboring glyphs. Line-height calculations break when diacritical marks extend beyond baseline metrics.

Simultaneously, teams overcomplicate recommendation logic. When faced with template selection, the instinct is to deploy vector similarity search or fine-tuned classifiers. In reality, deterministic scoring based on explicit user attributes (religion, language, regional origin, stylistic preference) delivers higher accuracy, lower latency, and zero inference cost.

The data from production deployments consistently shows:

Prompt engineering for culturally aware translation requires 10–15 iterations before stabilizing, primarily due to reactive fixes rather than systematic evaluation.
Headless Chromium on Lambda introduces cold start penalties that spike from ~200ms to 3–5 seconds without provisioned concurrency.
DynamoDB access patterns designed after implementation force painful schema migrations and query rewrites.
Rule-based template routing outperforms LLM-based classification in both speed and consistency when input features are categorical and well-defined.

This is not a critique of AI. It is a blueprint for separating semantic transformation (where LLMs excel) from structural rendering and deterministic routing (where traditional code dominates).

WOW Moment: Key Findings

The most valuable insight from production deployments is that AI should be scoped strictly to text transformation, while infrastructure and deterministic logic handle structure, layout, and routing. The following comparison quantifies this separation across three core subsystems.

Approach	Latency (p95)	Cost per 1k Requests	Script/Cultural Fidelity	Maintenance Overhead
LLM Translation (Bedrock)	800–1200ms	$0.04–$0.08	High (with structured prompts)	Medium (prompt versioning, eval harness)
Rule-Based Template Routing	<15ms	$0.0001	Perfect (deterministic)	Low (configuration files)
Headless Chromium PDF	2500–4000ms (cold) / 150ms (warm)	$0.002–$0.005	High (browser-grade shaping)	Medium (font embedding, Lambda layer size)

Why this matters: The table reveals a clear architectural boundary. LLMs handle semantic translation but introduce latency and cost. Rule-based scoring eliminates inference overhead entirely for categorical routing. Headless Chromium solves complex script rendering but demands cold-start mitigation. Recognizing these boundaries prevents over-engineering, reduces AWS spend, and ensures typographic accuracy across Indic and RTL scripts.

Core Solution

Building a production-ready multilingual document pipeline requires separating concerns into four distinct stages: profile normalization, deterministic routing, semantic translation, and server-side rendering. Each stage uses the most appropriate tool for the job.

Step 1: Profile Normalization & Deterministic Routing

Template selection should never rely on probabilistic models when input data is categorical. Instead, map user attributes to weighted template tags and compute a deterministic score.

interface UserProfile {
  religion: string;
  language: string;
  region: string;
  stylePreference: 'traditional' | 'modern' | 'minimal';
}

interface Template {
  id: string;
  tags: string[];
  weight: Record<string, number>;
}

const TEMPLATES: Template[] = [
  { id: 'tmpl_royal_hindu', tags: ['traditional', 'hindu', 'north-indian'], weight: { traditional: 3, hindu: 2, 'north-indian': 2 } },
  { id: 'tmpl_modern_minimal', tags: ['modern', 'minimal', 'universal'], weight: { modern: 3, minimal: 3 } },
  { id: 'tmpl_muslim_elegant', tags: ['traditional', 'muslim', 'south-indian'], weight: { traditional: 2, muslim: 3, 'south-indian': 2 } },
];

export function scoreTemplates(profile: UserProfile): string[] {
  const scored = TEMPLATES.map(tmpl => {
    const score = tmpl.tags.reduce((acc, tag) => {
      const match = profile.stylePreference === tag ? 2 : 0;
      const regionMatch = profile.religion.toLowerCase() === tag ? 1 : 0;
      return acc + match + regionMatch;
    }, 0);
    return { id: tmpl.id, score };
  });

  return scored
    .sort((a, b) => b.score - a.score)
    .slice(0, 3)
    .map(r => r.id);
}

Architecture Rationale: This approach eliminates API calls, guarantees sub-20ms response times, and remains fully auditable. Vector search or classification models introduce unnecessary latency and cost when categorical matching suffices.

Step 2: Context-Aware Translation Pipeline

LLMs excel at semantic transformation but require strict guardrails to preserve proper nouns, academic degrees, and cultural identifiers. Use structured output enforcement and explicit preservation rules.

import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime';

const client = new BedrockRuntimeClient({ region: 'us-east-1' });

const SYSTEM_PROMPT = `
You are a technical translator. Convert the following JSON profile into the target language.
RULES:
1. Preserve proper names, academic degrees (B.Tech, M.Sc, etc.), and institutional names exactly as provided.
2. Do not translate cultural identifiers like Gotra, Caste, or Clan. Keep them in original script or transliterate consistently.
3. Maintain JSON structure. Output only valid JSON.
4. Adapt honorifics and familial terms to match target language conventions.
`;

interface TranslationRequest {
  sourceJson: Record<string, string>;
  targetLanguage: string;
}

export async function translateProfile(req: TranslationRequest): Promise<Record<string, string>> {
  const payload = {
    anthropic_version: 'bedrock-2023-05-31',
    max_tokens: 2048,
    system: SYSTEM_PROMPT,
    messages: [
      {
        role: 'user',
        content: `Target: ${req.targetLanguage}\nSource: ${JSON.stringify(req.sourceJson, null, 2)}`
      }
    ]
  };

  const command = new InvokeModelCommand({
    modelId: 'anthropic.claude-3-sonnet-20240229-v1:0',
    contentType: 'application/json',
    accept: 'application/json',
    body: JSON.stringify(payload)
  });

  const response = await client.send(command);
  const decoded = new TextDecoder().decode(response.body);
  const parsed = JSON.parse(decoded);
  const content = parsed.content[0].text;
  
  // Extract JSON from markdown code blocks if present
  const jsonMatch = content.match(/```json\n([\s\S]*?)\n```/) || content.match(/([\s\S]*\{[\s\S]*\})/);
  return JSON.parse(jsonMatch ? jsonMatch[1] : content);
}

Architecture Rationale: AWS Bedrock provides consistent latency and enterprise-grade logging. Enforcing JSON output prevents structural drift. The system prompt explicitly isolates preservation rules, reducing prompt iteration cycles from 15+ to 3–4 when paired with an evaluation harness.

Step 3: Server-Side PDF Generation with Chromium

Complex scripts require browser-grade text shaping. Serverless Lambda functions can run headless Chromium via optimized layers, but cold starts must be managed.

import chromium from '@sparticuz/chromium';
import puppeteer from 'puppeteer-core';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';

const s3 = new S3Client({ region: 'us-east-1' });

export async function generatePDF(htmlContent: string, fileName: string): Promise<string> {
  const browser = await puppeteer.launch({
    args: chromium.args,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromium.executablePath(),
    headless: true,
  });

  try {
    const page = await browser.newPage();
    await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
    
    const pdfBuffer = await page.pdf({
      format: 'A4',
      printBackground: true,
      margin: { top: '15mm', bottom: '15mm', left: '10mm', right: '10mm' },
    });

    const bucketName = process.env.PDF_BUCKET!;
    const key = `documents/${fileName}.pdf`;

    await s3.send(new PutObjectCommand({
      Bucket: bucketName,
      Key: key,
      Body: pdfBuffer,
      ContentType: 'application/pdf',
    }));

    return await getSignedUrl(s3, new GetObjectCommand({ Bucket: bucketName, Key: key }), { expiresIn: 3600 });
  } finally {
    await browser.close();
  }
}

Architecture Rationale: @sparticuz/chromium provides a Lambda-optimized binary that respects OpenType shaping, handles Devanagari conjuncts, and correctly renders Urdu Nastaliq. S3 storage with presigned URLs decouples generation from delivery, enabling CloudFront caching and CDN optimization.

Pitfall Guide

1. Blind LLM Translation of Proper Nouns & Degrees

Explanation: Language models default to literal translation or semantic approximation. "B.Tech Computer Science" may become a descriptive phrase in Hindi, and names like "Priya Sharma" may be transliterated inconsistently or translated entirely. Fix: Implement explicit preservation rules in the system prompt. Use JSON schema validation to enforce structure. Run an evaluation harness that flags altered proper nouns before deployment.

2. Ignoring OpenType Shaping for Indic Scripts

Explanation: Devanagari and other Indic scripts rely on contextual shaping. A standalone character changes form when combined with vowel signs or consonant clusters. Standard PDF libraries that subset fonts or skip shaping engines render broken text. Fix: Use browser-based rendering (Chromium/Firefox) for server-side PDF generation. Embed full font families rather than relying on subsetting. Test with conjunct-heavy strings before production rollout.

3. DynamoDB Schema Design After Implementation

Explanation: DynamoDB requires access patterns to be defined before table creation. Designing queries after writing application code forces schema migrations, GSI additions, and query rewrites that break existing flows. Fix: Map all query patterns upfront. Use single-table design with composite keys (PK/SK). Validate access patterns with a query matrix before provisioning tables.

4. Reactive Prompt Engineering Without Evaluation Harness

Explanation: Adding rules to prompts reactively when edge cases break leads to 10+ iterations, prompt bloat, and inconsistent outputs. Fix: Build a deterministic test suite with 50+ edge cases (names, degrees, cultural terms, RTL strings). Run prompts against this suite after every change. Track pass/fail rates and semantic drift metrics.

5. Underestimating Cold Start Impact on Headless Browsers

Explanation: Loading a full Chromium binary in Lambda increases initialization time to 3–5 seconds. Users experience timeouts or degraded UX on first request. Fix: Enable provisioned concurrency for PDF generation functions. Use Lambda SnapStart (if supported) or keep-alive pings. Cache frequently requested templates to reduce cold start frequency.

6. Over-Engineering Recommendation Systems

Explanation: Deploying vector databases or fine-tuned classifiers for template routing adds latency, cost, and maintenance overhead when categorical matching suffices. Fix: Use weighted tag scoring for deterministic routing. Reserve LLMs for semantic translation and content generation only. Revisit ML routing only when user preferences become highly unstructured.

Production Bundle

Action Checklist

Map all DynamoDB access patterns before provisioning tables; validate with a query matrix
Implement rule-based template scoring using weighted tags; avoid LLM routing for categorical data
Configure Bedrock translation with explicit preservation rules and JSON schema enforcement
Build an evaluation harness with 50+ edge cases for prompt stability tracking
Deploy headless Chromium via Lambda-optimized layers; test Devanagari/Urdu shaping early
Enable provisioned concurrency for PDF generation functions to mitigate cold starts
Store generated PDFs in S3 with presigned URLs; route through CloudFront for CDN caching
Instrument PostHog session recordings to identify UX friction points in document flows

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Low traffic (<10k docs/month)	Serverless Lambda + on-demand concurrency	Pay-per-use model minimizes idle costs	Low ($0.50–$2.00/mo infra)
High concurrency (>50k docs/month)	Provisioned concurrency + CloudFront caching	Eliminates cold starts; reduces Lambda invocations	Medium ($15–$40/mo concurrency)
Strict compliance/audit requirements	Rule-based routing + structured JSON output	Deterministic behavior simplifies auditing	Low (no inference costs)
Multilingual RTL/Indic scripts	Headless Chromium PDF generation	Browser-grade shaping handles complex typography	Medium (Lambda memory + layer size)
Rapid prototyping/MVP	Traditional PDF library + Latin-only fallback	Faster iteration; defer script complexity	Low (initially)

Configuration Template

# serverless.yml (simplified)
service: vedadocs-pdf-pipeline

provider:
  name: aws
  runtime: nodejs20.x
  region: us-east-1
  memorySize: 1024
  timeout: 30
  environment:
    PDF_BUCKET: ${env:PDF_BUCKET}
    BEDROCK_REGION: us-east-1

functions:
  translateProfile:
    handler: src/translate.handler
    memorySize: 512
    timeout: 15
    environment:
      BEDROCK_MODEL_ID: anthropic.claude-3-sonnet-20240229-v1:0

  generatePDF:
    handler: src/pdf.handler
    memorySize: 2048
    timeout: 30
    provisionedConcurrency: 5
    layers:
      - arn:aws:lambda:${self:provider.region}:764866452798:layer:chrome-aws-lambda:38
    events:
      - http:
          path: /generate
          method: post

resources:
  Resources:
    PdfBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: ${env:PDF_BUCKET}
        LifecycleConfiguration:
          Rules:
            - Id: ExpireOldDocs
              Status: Enabled
              ExpirationInDays: 30

Quick Start Guide

Initialize the project: Run npm init -y && npm install @aws-sdk/client-bedrock-runtime @aws-sdk/client-s3 @sparticuz/chromium puppeteer-core to install core dependencies.
Configure AWS credentials: Set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION in your environment. Ensure IAM policies allow bedrock:InvokeModel, s3:PutObject, and s3:GetObject.
Deploy the stack: Use serverless deploy or AWS CDK to provision Lambda functions, S3 bucket, and IAM roles. Verify provisioned concurrency is active for the PDF function.
Test with edge cases: Submit a JSON profile containing proper nouns, academic degrees, and Devanagari/Urdu strings. Validate translation preservation and PDF rendering accuracy.
Monitor & iterate: Attach PostHog for session tracking and Sentry for error logging. Run the evaluation harness weekly to catch prompt drift before it impacts users.