eparates media handling from routing logic. A managed telephony provider captures audio, converts it to text, and forwards the transcript to a stateless webhook. The webhook invokes an LLM for intent classification, maps the result to a destination, and returns routing instructions to the media server.
Architecture Rationale
- Stateless Webhooks: Telephony sessions are ephemeral. Storing call state in application memory breaks horizontal scaling and introduces race conditions during failover. Each webhook must be self-contained.
- Managed Media Gateway: RTP streaming, echo cancellation, and audio buffering require specialized infrastructure. Delegating this to a provider reduces server memory footprint and eliminates WebRTC complexity.
- LLM as Classifier: Instead of training custom NLP models, a lightweight LLM with a constrained system prompt acts as a deterministic router. Temperature is set to zero, and output is strictly formatted to prevent drift.
Step 1: Provision Telephony Endpoint
Acquire a number and authenticate with the media provider.
# Authenticate and retrieve access token
curl -X POST https://api.voipbin.net/v1.0/auth/signup \
-H "Content-Type: application/json" \
-d '{"username":"dev_team","password":"secure_pass","email":"infra@company.com"}'
# Export token for subsequent requests
export MEDIA_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
# Provision a US number
curl -X POST https://api.voipbin.net/v1.0/numbers \
-H "Authorization: Bearer $MEDIA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"country_code":"US","area_code":"415"}'
Step 2: Stateless Webhook Implementation (TypeScript)
The webhook receives the STT transcript, classifies intent, and returns routing actions.
import express, { Request, Response } from 'express';
import OpenAI from 'openai';
import dotenv from 'dotenv';
dotenv.config();
const app = express();
app.use(express.json());
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
interface RoutingDestination {
department: string;
phoneNumber: string;
}
const ROUTING_MAP: Record<string, RoutingDestination> = {
sales: { department: 'Sales', phoneNumber: '+14155550100' },
support: { department: 'Technical Support', phoneNumber: '+14155550101' },
billing: { department: 'Billing & Accounts', phoneNumber: '+14155550102' },
unknown: { department: 'Main Reception', phoneNumber: '+14155550199' },
};
const PRIORITY_SIGNALS = new Set(['urgent', 'outage', 'down', 'emergency', 'critical']);
async function classifyIntent(transcript: string): Promise<string> {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: 'You are a telephony routing classifier. Return exactly one word: sales, support, billing, or unknown. Do not add punctuation or explanations.'
},
{ role: 'user', content: transcript }
],
temperature: 0,
max_tokens: 5
});
const rawOutput = response.choices[0]?.message?.content ?? '';
return rawOutput.trim().toLowerCase().replace(/[^a-z]/g, '');
}
app.post('/webhook/call-handler', async (req: Request, res: Response) => {
try {
const transcript = req.body?.speech?.results?.[0]?.text ?? '';
if (!transcript) {
return res.status(400).json({ error: 'Missing transcript payload' });
}
// Priority bypass: route immediately if critical signals detected
const lowerTranscript = transcript.toLowerCase();
const isPriority = Array.from(PRIORITY_SIGNALS).some(signal => lowerTranscript.includes(signal));
if (isPriority) {
return res.json({
actions: [
{ type: 'talk', text: 'Acknowledging urgent request. Routing to on-call engineering immediately.' },
{ type: 'transfer', destination: '+14155550911' }
]
});
}
const intent = await classifyIntent(transcript);
const destination = ROUTING_MAP[intent] ?? ROUTING_MAP.unknown;
return res.json({
actions: [
{ type: 'talk', text: `Routing you to ${destination.department}. Please hold.` },
{ type: 'transfer', destination: destination.phoneNumber }
]
});
} catch (error) {
console.error('Routing failure:', error);
// Graceful degradation: fallback to main desk
return res.json({
actions: [
{ type: 'talk', text: 'Connecting you to our main reception desk.' },
{ type: 'transfer', destination: '+14155550199' }
]
});
}
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => console.log(`Telephony webhook running on port ${PORT}`));
Configure the VoIPBin dashboard or API to route inbound calls to the webhook endpoint. The provider handles:
- RTP stream capture
- Real-time STT transcription
- TTS playback for confirmation messages
- Call transfer execution based on JSON response
Step 4: Multi-Turn Context Handling (Optional)
For ambiguous inputs, accumulate conversation turns before classification.
interface ConversationTurn {
role: 'user' | 'assistant';
content: string;
}
async function classifyWithHistory(turns: ConversationTurn[]): Promise<string> {
const messages = [
{ role: 'system' as const, content: 'Route to: sales, support, billing, or unknown.' },
...turns
];
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
temperature: 0,
max_tokens: 5
});
return response.choices[0]?.message?.content?.trim().toLowerCase().replace(/[^a-z]/g, '') ?? 'unknown';
}
Pitfall Guide
1. Unhandled LLM Output Drift
Explanation: Language models occasionally append punctuation, casing variations, or conversational filler despite strict prompts. Direct dictionary lookups fail when the output contains "sales." or "Sales".
Fix: Normalize output aggressively. Strip whitespace, convert to lowercase, and remove non-alphabetic characters before routing. Implement a fallback catch-all for unrecognized tokens.
2. Implicit State Assumptions in Webhooks
Explanation: Developers often store call context in server memory or local variables between webhook invocations. This breaks horizontal scaling and causes routing failures during load balancer rotation.
Fix: Treat every webhook request as independent. Pass all necessary context (transcript, call ID, previous turns) in the payload. Use external state stores (Redis, DynamoDB) only if multi-turn persistence is required.
3. Ignoring Audio Latency Feedback Loops
Explanation: STT and LLM inference introduce 1-2 seconds of silence. Callers interpret silence as dropped calls and hang up before routing completes.
Fix: Always return an immediate talk action confirming receipt of input. Use short, reassuring phrases like "Processing your request" or "Routing you now" to mask processing latency.
4. Single-Turn Classification Overconfidence
Explanation: Callers often provide fragmented or vague statements on the first attempt. Classifying immediately on incomplete input increases misrouting rates.
Fix: Implement a confidence threshold or multi-turn accumulation. If the transcript is under 5 words or contains high ambiguity, prompt the caller for clarification before invoking the LLM.
5. Hardcoded Destination Maps
Explanation: Embedding phone numbers directly in routing logic creates deployment friction. Changing a department number requires code changes, testing, and redeployment.
Fix: Externalize routing tables to environment variables, configuration files, or a lightweight database. Load mappings at startup or cache them with TTL-based invalidation.
6. Missing Circuit Breakers for LLM Outages
Explanation: OpenAI API rate limits, network partitions, or model degradation can cause classification timeouts. Without fallback logic, calls hang or drop.
Fix: Implement request timeouts (e.g., 3 seconds), retry logic with exponential backoff, and a synchronous fallback route. If the LLM fails, route directly to a human operator or main desk.
7. Priority Keyword Collision
Explanation: Relying solely on LLM classification for urgent requests adds unnecessary latency. A caller saying "system is down" should bypass inference entirely.
Fix: Run keyword detection synchronously before LLM invocation. Use a pre-compiled regex or Set lookup for critical signals. This reduces routing latency to <50ms for emergency cases.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume inbound support | LLM Intent Router | Handles natural language, reduces menu navigation time, scales horizontally | Moderate (LLM inference costs offset by reduced agent handle time) |
| Regulated/Compliance-heavy routing | Hybrid DTMF + LLM | DTMF ensures auditability; LLM handles post-authentication intent | Low-Moderate (dual infrastructure, but compliance-safe) |
| Low-budget/internal routing | Static DTMF Tree | Zero inference costs, predictable latency, simple maintenance | Minimal (audio recording + telephony fees only) |
| Multilingual customer base | LLM Intent Router | Native language support without tree duplication or audio re-recording | Moderate (single classifier replaces N language trees) |
| Emergency/On-call escalation | Keyword Bypass + Direct Transfer | Sub-50ms routing, zero inference dependency, guaranteed delivery | Minimal (bypasses LLM entirely) |
Configuration Template
{
"routing": {
"departments": {
"sales": "+14155550100",
"support": "+14155550101",
"billing": "+14155550102",
"fallback": "+14155550199",
"emergency": "+14155550911"
},
"priority_signals": ["urgent", "outage", "down", "emergency", "critical", "production"],
"llm": {
"model": "gpt-4o-mini",
"temperature": 0,
"max_tokens": 5,
"timeout_ms": 3000,
"system_prompt": "You are a telephony routing classifier. Return exactly one word: sales, support, billing, or unknown. Do not add punctuation or explanations."
},
"fallback": {
"on_timeout": true,
"on_error": true,
"destination": "fallback"
}
}
}
Quick Start Guide
- Initialize Project:
npm init -y && npm install express openai dotenv @types/express @types/node typescript ts-node
- Configure Environment: Create
.env with OPENAI_API_KEY, MEDIA_TOKEN, and PORT=3000
- Run Locally:
npx ts-node src/server.ts
- Expose Webhook:
npx localtunnel --port 3000 (copy the public URL)
- Wire Telephony: Set the VoIPBin inbound webhook to
<tunnel-url>/webhook/call-handler, dial the provisioned number, and verify intent-based routing.