ts into memory before transmission. The client reads from disk or buffer and pipes directly to the network socket.
2. Explicit Sequence Preservation: The API merges files in the exact order they appear in the multipart payload. The client constructs the form data sequentially to guarantee deterministic output.
3. Binary Response Handling: Successful responses return raw PDF bytes. Error responses return JSON. The client must inspect the Content-Type header before parsing to prevent binary corruption.
4. Exponential Backoff: Rate limits (429) and transient server errors (500) require automatic retry with jitter. Hard failures (400, 401) must bypass retry logic to avoid wasting requests.
TypeScript Implementation
import fs from 'fs';
import path from 'path';
import { FormData } from 'formdata-node';
import { fileFromPath } from 'formdata-node/file-from-path';
import { fetch } from 'undici';
interface MergeConfig {
apiKey: string;
endpoint: string;
maxRetries?: number;
baseDelayMs?: number;
}
export class DocumentAssembler {
private readonly config: MergeConfig;
constructor(config: MergeConfig) {
this.config = { maxRetries: 3, baseDelayMs: 1000, ...config };
}
async consolidate(inputPaths: string[], outputDestination: string): Promise<void> {
const form = new FormData();
for (const filePath of inputPaths) {
const file = await fileFromPath(filePath);
form.append('files', file);
}
const response = await this.executeWithRetry(form);
await this.persistResponse(response, outputDestination);
}
private async executeWithRetry(form: FormData, attempt = 0): Promise<Response> {
try {
const res = await fetch(this.config.endpoint, {
method: 'POST',
headers: {
'X-API-Key': this.config.apiKey,
...form.headers,
},
body: form,
});
if (res.status === 429 || res.status >= 500) {
if (attempt < this.config.maxRetries!) {
const delay = this.config.baseDelayMs! * Math.pow(2, attempt) + Math.random() * 500;
await new Promise(resolve => setTimeout(resolve, delay));
return this.executeWithRetry(form, attempt + 1);
}
throw new Error(`Merge failed after ${attempt + 1} attempts. Status: ${res.status}`);
}
if (!res.ok) {
const errorBody = await res.json();
throw new Error(`API rejection: ${errorBody.message || res.statusText}`);
}
return res;
} catch (err) {
if (err instanceof TypeError && err.message.includes('fetch')) {
throw new Error(`Network failure during merge: ${err.message}`);
}
throw err;
}
}
private async persistResponse(response: Response, destination: string): Promise<void> {
const contentType = response.headers.get('content-type') || '';
if (!contentType.includes('application/pdf')) {
throw new Error(`Unexpected response type: ${contentType}. Expected PDF binary.`);
}
const buffer = Buffer.from(await response.arrayBuffer());
await fs.promises.writeFile(destination, buffer);
}
}
Python Equivalent
import os
import time
import random
import requests
from typing import List, Optional
class PdfBatchProcessor:
def __init__(self, api_key: str, endpoint: str, max_retries: int = 3, base_delay: float = 1.0):
self.api_key = api_key
self.endpoint = endpoint
self.max_retries = max_retries
self.base_delay = base_delay
def consolidate(self, source_paths: List[str], output_path: str) -> None:
files = []
for p in source_paths:
files.append(('files', (os.path.basename(p), open(p, 'rb'), 'application/pdf')))
response = self._execute_with_retry(files)
self._save_output(response, output_path)
def _execute_with_retry(self, files: list, attempt: int = 0) -> requests.Response:
try:
resp = requests.post(
self.endpoint,
headers={"X-API-Key": self.api_key},
files=files,
timeout=30
)
if resp.status_code in (429,) or resp.status_code >= 500:
if attempt < self.max_retries:
wait = self.base_delay * (2 ** attempt) + random.uniform(0, 0.5)
time.sleep(wait)
return self._execute_with_retry(files, attempt + 1)
raise RuntimeError(f"Merge exhausted retries. HTTP {resp.status_code}")
resp.raise_for_status()
return resp
except requests.exceptions.RequestException as exc:
raise RuntimeError(f"Network error during consolidation: {exc}") from exc
def _save_output(self, response: requests.Response, destination: str) -> None:
content_type = response.headers.get("Content-Type", "")
if "application/pdf" not in content_type:
raise ValueError(f"Invalid response format: {content_type}")
with open(destination, "wb") as f:
f.write(response.content)
Why This Structure Works
- Retry Logic Isolation: Network and server errors trigger backoff. Client errors (400, 401) fail fast. This prevents wasting quota on malformed requests or invalid credentials.
- Content-Type Validation: The API returns JSON on failure but binary on success. Checking the header before parsing prevents silent corruption.
- Sequential Form Construction: The API merges files in upload order. Iterating through
inputPaths and appending directly to the form guarantees deterministic output without requiring explicit index parameters.
- Stream-to-Disk Persistence: Using
arrayBuffer() or response.content captures the binary payload efficiently. Writing directly to disk avoids intermediate string conversions.
Pitfall Guide
1. Assuming Asynchronous Upload Order Preserves Merge Sequence
Explanation: Multipart forms do not guarantee transmission order if files are appended concurrently or if the HTTP client parallelizes stream reads.
Fix: Construct the form data sequentially in a synchronous loop. Avoid Promise.all() or async generators when appending files.
2. Parsing Error Responses as Binary PDFs
Explanation: A 401 or 429 response returns JSON. Writing this directly to a .pdf file corrupts the output and masks the actual error.
Fix: Always check response.ok or status_code before reading the body. Inspect Content-Type to confirm binary payload before persistence.
Explanation: The API returns 429 when quota is exceeded. Blindly retrying without reading Retry-After or implementing exponential backoff triggers cascading failures.
Fix: Implement jittered exponential backoff. Log the status code and pause execution before retrying. Never retry on 400/401.
4. Loading Entire Files Into Memory Before Transmission
Explanation: Reading files with fs.readFileSync() or open().read() before attaching to the form doubles memory usage (once for the buffer, once for the HTTP payload).
Fix: Use stream-based file readers (fs.createReadStream, fileFromPath, or context-managed file handles) to pipe data directly to the network layer.
5. Hardcoding API Credentials
Explanation: Embedding keys in source control exposes them to repository leaks, branch history, and CI/CD logs.
Fix: Load credentials from environment variables at runtime. Validate presence before initializing the client. Rotate keys regularly.
6. Skipping Pre-Upload Validation
Explanation: Uploading corrupted or non-PDF files triggers a 400 response. The API rejects invalid binaries, but the client still consumes a request quota.
Fix: Validate file extensions and optionally check PDF magic bytes (%PDF-) before constructing the request. Fail fast locally to preserve quota.
7. Blocking the Event Loop During Large Transfers
Explanation: Synchronous file I/O or blocking network calls freeze the main thread, degrading throughput for concurrent requests.
Fix: Use async/await patterns, non-blocking streams, and dedicated worker threads if processing hundreds of files simultaneously.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| <100 merges/month, strict latency requirements | Local Processing | Eliminates network hop; full control over parsing | Free (compute cost only) |
| 100-10,000 merges/month, multi-language stack | API Offloading | Removes native dependency chains; deterministic ordering | $5-$15/month |
| >10,000 merges/month, enterprise compliance | API Offloading (Business Tier) | Scales horizontally; handles binary parsing isolation | $30/month |
| Real-time document generation with custom layouts | Local Processing + API Merge | Generate pages locally, merge via API for final assembly | Hybrid (compute + API) |
Configuration Template
// config/pdf-merge.ts
import dotenv from 'dotenv';
dotenv.config();
export const PDF_MERGE_CONFIG = {
endpoint: 'https://www.forgelab.africa/api/pdf/merge',
apiKey: process.env.PDF_MERGE_API_KEY || '',
maxRetries: 3,
baseDelayMs: 1000,
requestTimeoutMs: 30000,
validation: {
checkMagicBytes: true,
allowedExtensions: ['.pdf'],
},
logging: {
enabled: true,
level: 'info',
},
};
if (!PDF_MERGE_CONFIG.apiKey) {
throw new Error('PDF_MERGE_API_KEY is required in environment variables.');
}
Quick Start Guide
- Install Dependencies: Add
formdata-node, undici (or node-fetch), and dotenv to your project. For Python, install requests.
- Set Environment Variables: Export
PDF_MERGE_API_KEY with your provider credential. Never commit this value.
- Initialize the Client: Import the configuration and instantiate
DocumentAssembler or PdfBatchProcessor with your endpoint and retry settings.
- Execute Merge: Pass an array of local file paths and an output destination. The client streams uploads, handles retries, and writes the consolidated PDF to disk.
- Verify Output: Check the file size, open the document, and confirm page sequence matches your input array. Monitor logs for retry events or quota warnings.