ElevenLabs Cloud Voice Agent with Asterisk SIP Integration

By Codcompass Team·2026-05-07·6 min read

Current Situation Analysis

Building a production-grade AI voice agent traditionally requires stitching together four independent components: a streaming STT engine (e.g., Deepgram), an LLM (e.g., Groq), a TTS engine (e.g., Cartesia), and a custom Python AudioSocket server to manage real-time audio routing, barge-in detection, and format conversion. This local stack delivers sub-250ms latency and full component control, but introduces severe operational friction:

High Development Overhead: Requires custom audio pipeline engineering, WebSocket management, and state synchronization across services.
Maintenance Drift: Version incompatibilities between STT/LLM/TTS APIs, codec mismatches, and dependency updates demand continuous DevOps attention.
Infrastructure Scaling Costs: Handling concurrent calls requires load-balanced media servers, RTP relay management, and dedicated GPU/CPU resources for inference.
SIP Integration Complexity: Bridging telephony networks to AI pipelines often requires custom gateways, transcoding layers, and fragile dialplan logic.

For organizations routing moderate call volumes (hundreds/day) through existing PBX infrastructure, the local approach is over-engineered. The failure mode typically manifests as delayed time-to-market, unmanaged latency spikes from transcoding, and unsustainable operational overhead. A managed cloud conversational AI platform eliminates the audio socket layer, bundles STT/LLM/TTS into a single SIP-native endpoint, and delegates real-time media handling to the provider, allowing teams to focus on business logic and dynamic context injection instead of pipeline plumbing.

WOW Moment: Key Findings

Approach	Setup Time	Avg. Latency	Maintenance Overhead	Cost per 1k Calls	Dev Effort (Person-Days)
Local Stack (AudioSocket + Deepgram + Groq + Cartesia)	14–21 days	~250ms	High (4+ services, codec tuning, barge-in logic)	$12–18	40–60
ElevenLabs Cloud (Managed ConvAI + SIP Trunk)	2–4 hours	~300ms	Near-zero (managed STT/LLM/TTS, native G711 ulaw)	$25–35	2–5

Key Findings:

Single-Agent Multi-Brand Deployment: Dynamic context injection via webhook tools allows one agent configuration to serve dozens of company brands by resolving DID-specific variables (company_name, trade_type, callout_fee) at call start.
Zero Transcoding Overhead: Native ulaw_8000 support on both ASR and TTS sides eliminates media conversion bottlenecks and preserves telephony-grade audio fidelity.
SIP-Native Routing: Direct SIP trunking to sip.rtc.elevenlabs.io:5060 bypasses custom audio sockets, leveraging standard Asterisk dialplan logic for overflow, failover, and DID-based routing.
Tool-Driven State Management: Server

tools act as deterministic hooks for business logic (context lookup, booking creation), keeping the LLM focused on conversation flow while backend systems handle data persistence.

Core Solution

Architecture Overview

                    TELEPHONE NETWORK
                          |
                     SIP Trunk (inbound)
                          |
              +-----------v-----------+
              |      ASTERISK PBX      |
              |                        |
              |  DID arrives           |
              |  ViciDial inbound      |
              |  group processes it    |
              |                        |
              |  If no agents free:    |
              |  overflow to           |
              |  elevenlabs_ai ext     |
              |                        |
              |  Dial(SIP/elevenlabs/  |
              |    ${DID},120,tT)      |
              +-----------+------------+
                          |
                   SIP INVITE (G711 ulaw)
                   To: DID@sip.rtc.elevenlabs.io
                   From: CallerID
                          |
              +-----------v------------+
              |   ELEVENLABS CLOUD     |
              |                        |
              |  1. Agent starts       |
              |  2. Calls getCallCtx   |----> YOUR WEBHOOK SERVER
              |     webhook            |<---- { company, trade, fee }
              |  3. Greets caller      |
              |  4. Conversation...    |
              |  5. Calls createBook   |----> YOUR WEBHOOK SERVER
              |     webhook            |<---- { booking_id: 42 }
              |  6. Confirms & hangs   |
              |     up                 |
              +------------------------+
                                             YOUR SERVER
                                        +-------------------+
                                        | Webhook Endpoints |
                                        |                   |
                                        | /did_context.php  |
                                        |   - Lookup DID    |
                                        |   - Check repeat  |
                                        |   - Return context|
                                        |                   |
                                        | /create_booking   |
                                        |   .php            |
                                        |   - Validate data |
                                        |   - INSERT into   |
                                        |     bookings DB   |
                                        |   - Return ID     |
                                        +-------------------+
                                               |
                                        +------v------+
                                        |   MariaDB   |
                                        | did_company |
                                        |   _map      |
                                        | ai_agent    |
                                        |   _bookings |
                                        +-------------+

Key Concepts

Server Tools (Webhooks): HTTP endpoints attached to the agent. The LLM invokes them based on tool descriptions and conversation state. Responses populate dynamic variables used throughout the call.
Dynamic Variables: Webhook outputs (company_name, callout_fee) become scoped variables. A single agent serves multiple brands by resolving context per DID.
SIP Trunk: ElevenLabs exposes sip.rtc.elevenlabs.io:5060. Asterisk routes calls as a SIP peer. DID and CallerID flow through SIP headers, enabling context-aware routing.
G711 ulaw at 8kHz: Standard telephony codec. Native ulaw_8000 support removes transcoding overhead and ensures compatibility with legacy PBX infrastructure.

ElevenLabs Account and API Setup

# Store your API key (never commit this to git)
export EL_API_KEY="YOUR_EL_API_KEY"

# List available voices via API
curl -s "https://api.elevenlabs.io/v1/voices" \
  -H "xi-api-key: ${EL_API_KEY}" | python3 -m json.tool | head -50

Key API Endpoints

Endpoint	Method	Purpose
`/tools`	POST	Create a server tool (webhook)
`/agents/create`	POST	Create a new agent
`/agents/{id}`	PATCH	Update an existing agent
`/agents/{id}`	GET	Get agent details
`/phone-numbers`	POST	Import a phone number for SIP
`/conversations`	GET	List past conversations

Creating Webhook Server Tools

Server tools enable deterministic backend interaction. Each tool is an HTTP endpoint that receives structured parameters and returns JSON.

Tool 1: getCallContext Invoked at conversation start. Accepts DID and CallerID, returns company context, trade type, callout fee, and repeat-customer flag.

#!/bin/bash
# Create the getCallContext server tool via ElevenLabs API

EL_API_KEY="${EL_API_KEY}"
EL_BASE="https://api.elevenlabs.io/v1/convai"
WEBHOOK_HOST="https://YOUR_SERVER_DOMAIN"
WEBHOOK_API_KEY="YOUR_WEBHOOK_API_KEY"

python3 -c "
import json,

Asterisk SIP Trunk & Dialplan Routing

Configure Asterisk to route overflow calls to ElevenLabs via SIP:

SIP Peer: Point to sip.rtc.elevenlabs.io:5060 with ulaw codec priority.
Dialplan Logic: Use Dial(SIP/elevenlabs/${DID},120,tT) to pass DID dynamically. The tT flags enable transfer and DTMF passthrough.
Context Injection: The DID in the SIP To: header triggers the getCallContext webhook, resolving brand-specific variables before the agent speaks.

Agent Creation & Prompt Engineering

Define system prompt with explicit tool invocation rules.
Map dynamic variables (${company_name}, ${trade_type}) to webhook responses.
Enforce strict conversation workflows: greeting → context validation → objection handling → booking creation → confirmation.
Use ElevenLabs API to POST agent configuration, attach tools, and provision SIP phone numbers.

Pitfall Guide

Webhook Security & Validation: Exposing endpoints without API key validation, IP whitelisting, or rate limiting invites abuse and DDoS. Always verify Authorization headers and implement request signing.
SIP Codec Mismatch: Forcing PCM, G722, or ALAC instead of G711 ulaw triggers transcoding overhead, increases latency, and degrades telephony audio quality. Enforce ulaw_8000 in both Asterisk and ElevenLabs SIP profiles.
Dynamic Variable Scope Leakage: Failing to reset or namespace context variables between calls causes cross-call data contamination. Ensure webhooks return fresh payloads per SIP INVITE and avoid caching DID mappings without TTL expiration.
Vague Tool Descriptions: Imprecise tool documentation leads to premature or missed webhook invocations. Provide explicit parameter schemas, return types, and invocation triggers in the tool definition.
Unbounded Concurrency & Cost Blindness: Without call duration limits, overflow thresholds, or fallback routing, concurrent call spikes can rapidly escalate cloud AI costs. Implement Asterisk MaxCallDuration, queue limits, and human-agent handoff rules.
Missing Conversation State Logging: Failing to log webhook requests/responses and ElevenLabs conversation IDs makes debugging failed bookings or hallucinated responses impossible. Centralize logs with correlation IDs tied to SIP Call-ID.
Ignoring Barge-In & Silence Handling: Cloud agents handle barge-in natively, but misconfigured stability or similarity_boost parameters can cause unnatural interruptions or robotic pacing. Tune voice parameters per use case and test with real telephony noise profiles.

Deliverables

Architecture Blueprint: Complete SIP-to-Webhook data flow diagram, component interaction map, and state management sequence for multi-brand dynamic context injection.
Deployment Checklist: Pre-flight validation steps covering API key provisioning, SIP trunk registration, codec alignment, webhook TLS validation, database schema initialization, and dialplan syntax verification.
Configuration Templates:
- Asterisk sip.conf & extensions.conf snippets for ElevenLabs SIP peer and DID-based overflow routing
- Webhook server stubs (/did_context.php, /create_booking.php) with parameter validation and MariaDB insertion logic
- ElevenLabs agent JSON payload template with tool attachments, dynamic variable mappings, and voice configuration
- Database schema DDL for did_company_map and ai_agent_bookings tables

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back

Current Situation Analysis

WOW Moment: Key Findings

🎉 Mid-Year Sale — Unlock Full Article

Production Bundle