conversion uses query parameters for simplicity, while file uploads utilize multipart/form-data POST requests. This separation allows the backend to optimize processing pipelines based on input type.
2. Zero-Auth Design: Removing API keys reduces integration friction. Access control is managed via IP-based rate limiting, which is sufficient for most development and moderate production workloads.
3. Format Agnosticism: The backend supports 11 formats, including HTML, PDF, DOCX, PPTX, XLSX, CSV, JSON, and plain text. This eliminates the need for format-specific client logic.
TypeScript Implementation
The following implementation provides a type-safe wrapper for the normalization API. It handles both URL conversion and file uploads, with robust error handling and metric extraction.
interface ConversionMetrics {
originalTokens: number;
normalizedTokens: number;
savingsPercent: number;
}
interface ConversionResponse {
success: boolean;
markdown: string;
tokens: ConversionMetrics;
}
interface NormalizationError extends Error {
status: number;
details?: string;
}
class MarkdownNormalizer {
private readonly baseUrl: string;
constructor(baseUrl: string = 'https://md.replyfast.co.uk') {
this.baseUrl = baseUrl;
}
/**
* Converts a remote URL to Markdown.
* Supports HTML pages and direct links to supported file formats.
*/
async convertUrl(sourceUrl: string): Promise<ConversionResponse> {
const endpoint = `${this.baseUrl}/api/convert`;
const params = new URLSearchParams({ url: sourceUrl });
const response = await fetch(`${endpoint}?${params}`, {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
return this.handleResponse(response);
}
/**
* Converts a file via direct upload.
* Useful for local files or streams where a public URL is unavailable.
*/
async convertFile(file: File | Blob, filename: string): Promise<ConversionResponse> {
const endpoint = `${this.baseUrl}/api/upload`;
const formData = new FormData();
formData.append('file', file, filename);
const response = await fetch(endpoint, {
method: 'POST',
body: formData
});
return this.handleResponse(response);
}
/**
* Converts a file accessible via a private or authenticated URL.
* Note: The API must be able to fetch the URL; internal networks may require public accessibility.
*/
async convertFileUrl(fileUrl: string): Promise<ConversionResponse> {
const endpoint = `${this.baseUrl}/api/convert/file`;
const params = new URLSearchParams({ url: fileUrl });
const response = await fetch(`${endpoint}?${params}`, {
method: 'GET'
});
return this.handleResponse(response);
}
private async handleResponse(response: Response): Promise<ConversionResponse> {
if (!response.ok) {
const errorBody = await response.text().catch(() => '');
const error: NormalizationError = new Error(
`Normalization failed: ${response.status} ${response.statusText}`
) as NormalizationError;
error.status = response.status;
error.details = errorBody;
throw error;
}
const payload: ConversionResponse = await response.json();
if (!payload.success) {
throw new Error('Conversion reported failure in response payload.');
}
return payload;
}
}
// Usage Example
async function processDocument() {
const normalizer = new MarkdownNormalizer();
try {
// Convert a web page
const webResult = await normalizer.convertUrl('https://example.com/article');
console.log(`Web conversion saved ${webResult.tokens.savingsPercent}% tokens.`);
// Upload a local PDF
const pdfBlob = new Blob(['...'], { type: 'application/pdf' });
const fileResult = await normalizer.convertFile(pdfBlob, 'report.pdf');
console.log(`File conversion complete. Markdown length: ${fileResult.markdown.length}`);
} catch (error) {
console.error('Normalization error:', error);
}
}
Rationale for Implementation Choices
- Class-Based Wrapper: Encapsulating the API calls within a
MarkdownNormalizer class allows for future extensibility, such as adding caching layers or custom retry policies, without scattering logic throughout the application.
- Strict Response Validation: The
handleResponse method checks both HTTP status codes and the success boolean in the payload. This dual validation prevents silent failures where the API returns a 200 OK but indicates conversion failure internally.
- Metric Extraction: The response includes token metrics. Extracting these allows the application to log cost savings and make dynamic decisions, such as truncating content if the normalized output still exceeds model limits.
Pitfall Guide
Production deployments of normalization APIs require careful handling of edge cases and constraints. The following pitfalls are common in real-world implementations.
-
Rate Limit Violations
- Explanation: The API enforces IP-based limits of 500 URL conversions and 100 file conversions per day. Exceeding these limits results in request rejection.
- Fix: Implement a caching layer for static URLs. For high-volume scenarios, distribute requests across multiple IPs or implement exponential backoff with jitter when limits are approached.
-
URL Encoding Errors
- Explanation: Passing URLs with special characters (e.g.,
&, #, ?) without proper encoding can corrupt the request or cause the API to misinterpret parameters.
- Fix: Always use
encodeURIComponent on URL parameters before constructing the request. The TypeScript example above uses URLSearchParams, which handles encoding automatically.
-
SPA and Dynamic Content Limitations
- Explanation: Single Page Applications (SPAs) that rely heavily on client-side JavaScript may not render fully during conversion. The API may capture the initial HTML shell rather than the rendered content.
- Fix: Verify conversion results for SPAs. If content is missing, consider using a headless browser solution for those specific sources, or request server-side rendered (SSR) versions of the pages.
-
Token Counting Discrepancies
- Explanation: The API reports token savings based on its internal counting method. Different LLM providers use different tokenizers, so the reported savings may not match the exact token count in your target model.
- Fix: Treat API metrics as estimates. For precise cost calculations, run the normalized Markdown through the target model's tokenizer before inference.
-
File Size and Timeout Risks
- Explanation: Large files or complex documents can increase processing time, potentially leading to timeouts or degraded performance.
- Fix: Validate file sizes before upload. Implement client-side size checks and reject files that exceed reasonable thresholds. Monitor conversion latency and set appropriate timeout values in your fetch calls.
-
Security and Data Privacy
- Explanation: Sending documents to a third-party API, even a free one, exposes data to external processing. Sensitive or PII-containing documents should not be processed without review.
- Fix: Implement data classification checks before calling the API. For sensitive data, use local conversion libraries or self-hosted instances. Never send credentials, personal data, or proprietary code to public endpoints.
-
Assuming Universal Format Support
- Explanation: While the API supports 11 formats, specific variations or corrupted files may fail conversion.
- Fix: Handle conversion failures gracefully. Implement fallback logic, such as attempting plain text extraction or alerting the user when a file cannot be normalized.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Prototyping / MVP | Zero-Auth API | Fast integration, no setup, free tier sufficient | Zero |
| High-Volume RAG | Self-Hosted Parser | Avoid rate limits, full control over processing | Infrastructure costs |
| Sensitive Documents | Local Conversion | Data privacy, compliance requirements | Development time |
| Multi-Format Ingestion | Zero-Auth API | Supports 11 formats, reduces client complexity | Zero |
| Real-Time UI Updates | Client-Side Parsing | Lower latency for user interactions | Bundle size increase |
Configuration Template
Use this configuration object to manage normalization settings in your application.
interface NormalizationConfig {
baseUrl: string;
timeoutMs: number;
retryAttempts: number;
retryDelayMs: number;
maxFileSizeBytes: number;
cacheEnabled: boolean;
cacheTtlSeconds: number;
}
const defaultConfig: NormalizationConfig = {
baseUrl: 'https://md.replyfast.co.uk',
timeoutMs: 5000,
retryAttempts: 2,
retryDelayMs: 1000,
maxFileSizeBytes: 10 * 1024 * 1024, // 10MB
cacheEnabled: true,
cacheTtlSeconds: 3600
};
// Apply config to normalizer instance
const normalizer = new MarkdownNormalizer(defaultConfig.baseUrl);
Quick Start Guide
- Initialize the Client: Import the
MarkdownNormalizer class and instantiate it with the default base URL.
- Convert a URL: Call
convertUrl with the target URL. The method returns a promise resolving to the Markdown content and metrics.
- Upload a File: For local files, create a
File or Blob object and call convertFile with the file and filename.
- Process the Output: Extract the
markdown field from the response and feed it directly into your LLM or storage system.
- Monitor Results: Check the
tokens object in the response to verify token savings and ensure conversion quality.
This normalization strategy provides a scalable, cost-effective method for preparing diverse content sources for LLM consumption. By offloading format conversion to a specialized API, developers can focus on building intelligent applications while maintaining clean, efficient context windows.