← Back to Blog
AI/ML2026-05-12Β·84 min read

How I Wired Claude AI into My SaaS and What Actually Worked vs. What Was Just Hype

By Dilip Joshi

Architecting Multilingual Document Generation on Serverless: LLM Translation, Script Rendering, and Rule-Based Routing

Current Situation Analysis

The modern SaaS landscape is saturated with AI integration mandates. Engineering teams routinely reach for large language models to handle document personalization, template routing, and multilingual output. The assumption is straightforward: if an LLM can generate coherent text, it can reliably transform, format, and deliver structured documents across languages and scripts. In practice, this assumption collapses under the weight of typographic complexity, cultural nuance, and serverless runtime constraints.

The core pain point is not AI capability; it is architectural mismatch. Developers treat LLMs as universal formatting engines, expecting them to preserve proper nouns, academic credentials, and culturally specific identifiers while simultaneously handling layout constraints. They also assume serverless PDF generation behaves identically across Latin and non-Latin scripts. Neither assumption holds in production.

India’s linguistic landscape exemplifies this gap. With 22 officially recognized languages and multiple distinct scripts (Devanagari, Gurmukhi, Tamil, Urdu/Nastaliq, etc.), a single document pipeline must handle right-to-left rendering, complex conjunct characters, vertical matras, and OpenType shaping rules. Standard PDF libraries that rely on font subsetting routinely drop conjuncts, rendering text as placeholder rectangles. Server-side rendering engines that lack browser-grade text shaping fail to adjust character forms based on neighboring glyphs. Line-height calculations break when diacritical marks extend beyond baseline metrics.

Simultaneously, teams overcomplicate recommendation logic. When faced with template selection, the instinct is to deploy vector similarity search or fine-tuned classifiers. In reality, deterministic scoring based on explicit user attributes (religion, language, regional origin, stylistic preference) delivers higher accuracy, lower latency, and zero inference cost.

The data from production deployments consistently shows:

  • Prompt engineering for culturally aware translation requires 10–15 iterations before stabilizing, primarily due to reactive fixes rather than systematic evaluation.
  • Headless Chromium on Lambda introduces cold start penalties that spike from ~200ms to 3–5 seconds without provisioned concurrency.
  • DynamoDB access patterns designed after implementation force painful schema migrations and query rewrites.
  • Rule-based template routing outperforms LLM-based classification in both speed and consistency when input features are categorical and well-defined.

This is not a critique of AI. It is a blueprint for separating semantic transformation (where LLMs excel) from structural rendering and deterministic routing (where traditional code dominates).

WOW Moment: Key Findings

The most valuable insight from production deployments is that AI should be scoped strictly to text transformation, while infrastructure and deterministic logic handle structure, layout, and routing. The following comparison quantifies this separation across three core subsystems.

Approach Latency (p95) Cost per 1k Requests Script/Cultural Fidelity Maintenance Overhead
LLM Translation (Bedrock) 800–1200ms $0.04–$0.08 High (with structured prompts) Medium (prompt versioning, eval harness)
Rule-Based Template Routing <15ms $0.0001 Perfect (deterministic) Low (configuration files)
Headless Chromium PDF 2500–4000ms (cold) / 150ms (warm) $0.002–$0.005 High (browser-grade shaping) Medium (font embedding, Lambda layer size)

Why this matters: The table reveals a clear architectural boundary. LLMs handle semantic translation but introduce latency and cost. Rule-based scoring eliminates inference overhead entirely for categorical routing. Headless Chromium solves complex script rendering but demands cold-start mitigation. Recognizing these boundaries prevents over-engineering, reduces AWS spend, and ensures typographic accuracy across Indic and RTL scripts.

Core Solution

Building a production-ready multilingual document pipeline requires separating concerns into four distinct stages: profile normalization, deterministic routing, semantic translation, and server-side rendering. Each stage uses the most appropriate tool for the job.

Step 1: Profile Normalization & Deterministic Routing

Template selection should never rely on probabilistic models when input data is categorical. Instead, map user attributes to weighted template tags and compute a deterministic score.

interface UserProfile {
  religion: string;
  language: string;
  region: string;
  stylePreference: 'traditional' | 'modern' | 'minimal';
}

interface Template {
  id: string;
  tags: string[];
  weight: Record<string, number>;
}

const TEMPLATES: Template[] = [
  { id: 'tmpl_royal_hindu', tags: ['traditional', 'hindu', 'north-indian'], weight: { traditional: 3, hindu: 2, 'north-indian': 2 } },
  { id: 'tmpl_modern_minimal', tags: ['modern', 'minimal', 'universal'], weight: { modern: 3, minimal: 3 } },
  { id: 'tmpl_muslim_elegant', tags: ['traditional', 'muslim', 'south-indian'], weight: { traditional: 2, muslim: 3, 'south-indian': 2 } },
];

export function scoreTemplates(profile: UserProfile): string[] {
  const scored = TEMPLATES.map(tmpl => {
    const score = tmpl.tags.reduce((acc, tag) => {
      const match = profile.stylePreference === tag ? 2 : 0;
      const regionMatch = profile.religion.toLowerCase() === tag ? 1 : 0;
      return acc + match + regionMatch;
    }, 0);
    return { id: tmpl.id, score };
  });

  return scored
    .sort((a, b) => b.score - a.score)
    .slice(0, 3)
    .map(r => r.id);
}

Architecture Rationale: This approach eliminates API calls, guarantees sub-20ms response times, and remains fully auditable. Vector search or classification models introduce unnecessary latency and cost when categorical matching suffices.

Step 2: Context-Aware Translation Pipeline

LLMs excel at semantic transformation but require strict guardrails to preserve proper nouns, academic degrees, and cultural identifiers. Use structured output enforcement and explicit preservation rules.

import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime';

const client = new BedrockRuntimeClient({ region: 'us-east-1' });

const SYSTEM_PROMPT = `
You are a technical translator. Convert the following JSON profile into the target language.
RULES:
1. Preserve proper names, academic degrees (B.Tech, M.Sc, etc.), and institutional names exactly as provided.
2. Do not translate cultural identifiers like Gotra, Caste, or Clan. Keep them in original script or transliterate consistently.
3. Maintain JSON structure. Output only valid JSON.
4. Adapt honorifics and familial terms to match target language conventions.
`;

interface TranslationRequest {
  sourceJson: Record<string, string>;
  targetLanguage: string;
}

export async function translateProfile(req: TranslationRequest): Promise<Record<string, string>> {
  const payload = {
    anthropic_version: 'bedrock-2023-05-31',
    max_tokens: 2048,
    system: SYSTEM_PROMPT,
    messages: [
      {
        role: 'user',
        content: `Target: ${req.targetLanguage}\nSource: ${JSON.stringify(req.sourceJson, null, 2)}`
      }
    ]
  };

  const command = new InvokeModelCommand({
    modelId: 'anthropic.claude-3-sonnet-20240229-v1:0',
    contentType: 'application/json',
    accept: 'application/json',
    body: JSON.stringify(payload)
  });

  const response = await client.send(command);
  const decoded = new TextDecoder().decode(response.body);
  const parsed = JSON.parse(decoded);
  const content = parsed.content[0].text;
  
  // Extract JSON from markdown code blocks if present
  const jsonMatch = content.match(/```json\n([\s\S]*?)\n```/) || content.match(/([\s\S]*\{[\s\S]*\})/);
  return JSON.parse(jsonMatch ? jsonMatch[1] : content);
}

Architecture Rationale: AWS Bedrock provides consistent latency and enterprise-grade logging. Enforcing JSON output prevents structural drift. The system prompt explicitly isolates preservation rules, reducing prompt iteration cycles from 15+ to 3–4 when paired with an evaluation harness.

Step 3: Server-Side PDF Generation with Chromium

Complex scripts require browser-grade text shaping. Serverless Lambda functions can run headless Chromium via optimized layers, but cold starts must be managed.

import chromium from '@sparticuz/chromium';
import puppeteer from 'puppeteer-core';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';

const s3 = new S3Client({ region: 'us-east-1' });

export async function generatePDF(htmlContent: string, fileName: string): Promise<string> {
  const browser = await puppeteer.launch({
    args: chromium.args,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromium.executablePath(),
    headless: true,
  });

  try {
    const page = await browser.newPage();
    await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
    
    const pdfBuffer = await page.pdf({
      format: 'A4',
      printBackground: true,
      margin: { top: '15mm', bottom: '15mm', left: '10mm', right: '10mm' },
    });

    const bucketName = process.env.PDF_BUCKET!;
    const key = `documents/${fileName}.pdf`;

    await s3.send(new PutObjectCommand({
      Bucket: bucketName,
      Key: key,
      Body: pdfBuffer,
      ContentType: 'application/pdf',
    }));

    return await getSignedUrl(s3, new GetObjectCommand({ Bucket: bucketName, Key: key }), { expiresIn: 3600 });
  } finally {
    await browser.close();
  }
}

Architecture Rationale: @sparticuz/chromium provides a Lambda-optimized binary that respects OpenType shaping, handles Devanagari conjuncts, and correctly renders Urdu Nastaliq. S3 storage with presigned URLs decouples generation from delivery, enabling CloudFront caching and CDN optimization.

Pitfall Guide

1. Blind LLM Translation of Proper Nouns & Degrees

Explanation: Language models default to literal translation or semantic approximation. "B.Tech Computer Science" may become a descriptive phrase in Hindi, and names like "Priya Sharma" may be transliterated inconsistently or translated entirely. Fix: Implement explicit preservation rules in the system prompt. Use JSON schema validation to enforce structure. Run an evaluation harness that flags altered proper nouns before deployment.

2. Ignoring OpenType Shaping for Indic Scripts

Explanation: Devanagari and other Indic scripts rely on contextual shaping. A standalone character changes form when combined with vowel signs or consonant clusters. Standard PDF libraries that subset fonts or skip shaping engines render broken text. Fix: Use browser-based rendering (Chromium/Firefox) for server-side PDF generation. Embed full font families rather than relying on subsetting. Test with conjunct-heavy strings before production rollout.

3. DynamoDB Schema Design After Implementation

Explanation: DynamoDB requires access patterns to be defined before table creation. Designing queries after writing application code forces schema migrations, GSI additions, and query rewrites that break existing flows. Fix: Map all query patterns upfront. Use single-table design with composite keys (PK/SK). Validate access patterns with a query matrix before provisioning tables.

4. Reactive Prompt Engineering Without Evaluation Harness

Explanation: Adding rules to prompts reactively when edge cases break leads to 10+ iterations, prompt bloat, and inconsistent outputs. Fix: Build a deterministic test suite with 50+ edge cases (names, degrees, cultural terms, RTL strings). Run prompts against this suite after every change. Track pass/fail rates and semantic drift metrics.

5. Underestimating Cold Start Impact on Headless Browsers

Explanation: Loading a full Chromium binary in Lambda increases initialization time to 3–5 seconds. Users experience timeouts or degraded UX on first request. Fix: Enable provisioned concurrency for PDF generation functions. Use Lambda SnapStart (if supported) or keep-alive pings. Cache frequently requested templates to reduce cold start frequency.

6. Over-Engineering Recommendation Systems

Explanation: Deploying vector databases or fine-tuned classifiers for template routing adds latency, cost, and maintenance overhead when categorical matching suffices. Fix: Use weighted tag scoring for deterministic routing. Reserve LLMs for semantic translation and content generation only. Revisit ML routing only when user preferences become highly unstructured.

Production Bundle

Action Checklist

  • Map all DynamoDB access patterns before provisioning tables; validate with a query matrix
  • Implement rule-based template scoring using weighted tags; avoid LLM routing for categorical data
  • Configure Bedrock translation with explicit preservation rules and JSON schema enforcement
  • Build an evaluation harness with 50+ edge cases for prompt stability tracking
  • Deploy headless Chromium via Lambda-optimized layers; test Devanagari/Urdu shaping early
  • Enable provisioned concurrency for PDF generation functions to mitigate cold starts
  • Store generated PDFs in S3 with presigned URLs; route through CloudFront for CDN caching
  • Instrument PostHog session recordings to identify UX friction points in document flows

Decision Matrix

Scenario Recommended Approach Why Cost Impact
Low traffic (<10k docs/month) Serverless Lambda + on-demand concurrency Pay-per-use model minimizes idle costs Low ($0.50–$2.00/mo infra)
High concurrency (>50k docs/month) Provisioned concurrency + CloudFront caching Eliminates cold starts; reduces Lambda invocations Medium ($15–$40/mo concurrency)
Strict compliance/audit requirements Rule-based routing + structured JSON output Deterministic behavior simplifies auditing Low (no inference costs)
Multilingual RTL/Indic scripts Headless Chromium PDF generation Browser-grade shaping handles complex typography Medium (Lambda memory + layer size)
Rapid prototyping/MVP Traditional PDF library + Latin-only fallback Faster iteration; defer script complexity Low (initially)

Configuration Template

# serverless.yml (simplified)
service: vedadocs-pdf-pipeline

provider:
  name: aws
  runtime: nodejs20.x
  region: us-east-1
  memorySize: 1024
  timeout: 30
  environment:
    PDF_BUCKET: ${env:PDF_BUCKET}
    BEDROCK_REGION: us-east-1

functions:
  translateProfile:
    handler: src/translate.handler
    memorySize: 512
    timeout: 15
    environment:
      BEDROCK_MODEL_ID: anthropic.claude-3-sonnet-20240229-v1:0

  generatePDF:
    handler: src/pdf.handler
    memorySize: 2048
    timeout: 30
    provisionedConcurrency: 5
    layers:
      - arn:aws:lambda:${self:provider.region}:764866452798:layer:chrome-aws-lambda:38
    events:
      - http:
          path: /generate
          method: post

resources:
  Resources:
    PdfBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: ${env:PDF_BUCKET}
        LifecycleConfiguration:
          Rules:
            - Id: ExpireOldDocs
              Status: Enabled
              ExpirationInDays: 30

Quick Start Guide

  1. Initialize the project: Run npm init -y && npm install @aws-sdk/client-bedrock-runtime @aws-sdk/client-s3 @sparticuz/chromium puppeteer-core to install core dependencies.
  2. Configure AWS credentials: Set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION in your environment. Ensure IAM policies allow bedrock:InvokeModel, s3:PutObject, and s3:GetObject.
  3. Deploy the stack: Use serverless deploy or AWS CDK to provision Lambda functions, S3 bucket, and IAM roles. Verify provisioned concurrency is active for the PDF function.
  4. Test with edge cases: Submit a JSON profile containing proper nouns, academic degrees, and Devanagari/Urdu strings. Validate translation preservation and PDF rendering accuracy.
  5. Monitor & iterate: Attach PostHog for session tracking and Sentry for error logging. Run the evaluation harness weekly to catch prompt drift before it impacts users.