r real-time audio streaming.
- Direct provider billing eliminates platform markup, reducing per-minute costs by 40-60% compared to commercial alternatives.
- Plugin-based architecture abstracts VAD and codec transmuxing, cutting deployment time from weeks to hours while preserving full infrastructure control.
Sweet Spot: Ideal for Python developers and telecom engineers deploying production voice agents who require predictable unit economics, native interruption handling, and zero-middleman architecture.
Core Solution
Siphon bridges traditional SIP telephony with modern AI media pipelines by abstracting WebRTC negotiation, VAD, and SIP signaling into a unified Python framework. The architecture routes inbound/outbound SIP trunks directly to LiveKit rooms, where audio is processed through pluggable STT, LLM, and TTS modules. State management and barge-in detection are handled natively by LiveKit's media engine, eliminating custom polling loops.
Prerequisites
- Python 3.10+
- A Twilio or Telnyx SIP Trunk
- LiveKit Credentials
- An OpenAI API Key
Step 1: Installation & Setup
First, clone the Siphon repository and install the requirements.
pip install siphon-ai
Enter fullscreen mode Exit fullscreen mode
Next, create a .env file in your project root to hold your raw provider keys.
Because Siphon is self-hosted, you pay providers like OpenAI and LiveKit directly—NO MIDDLEMAN FEES.
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_key
LIVEKIT_API_SECRET=your_livekit_secret
OPENAI_API_KEY=sk-yourkey
DEEPGRAM_API_KEY=yourkey
FROM_NUMBER=+15551234567
SIP_USERNAME=your_sip_user
SIP_PASSWORD=your_sip_pass
Enter fullscreen mode Exit fullscreen mode
Step 2: Defining the Agent
Siphon abstracts away the complex WebRTC media pipelines and Voice Activity Detection (VAD).
You just need to define how your agent behaves using Siphon's plugin architecture.
from siphon.agent import Agent
from siphon.plugins import openai, cartesia, deepgram
# Define the Agent
agent = Agent(
agent_name="Receptionist",
llm=openai.LLM(),
tts=cartesia.TTS(),
stt=deepgram.STT(),
system_instructions="You are a helpful dental receptionist. Help the user book an appointment."
)
Enter fullscreen mode Exit fullscreen mode
Step 3: Triggering an Outbound Call
Siphon makes outbound SIP signaling incredibly straightforward. If you don’t have a trunk ID setup, you can programmatically trigger a call using SIP credentials, and Siphon will natively reuse or create an outbound trunk.
import os
from dotenv import load_dotenv
from siphon.telephony.outbound import Call
load_dotenv()
# Instantiate the outbound dialing sequence with SIP Credentials
call = Call(
agent_name="Receptionist",
sip_trunk_setup={
"name": "telnyx-primary",
"sip_address": "sip.telnyx.com",
"sip_number": os.getenv("FROM_NUMBER"),
"sip_username": os.getenv("SIP_USERNAME"),
"sip_password": os.getenv("SIP_PASSWORD"),
},
number_to_call="+15550200",
)
# Execute the asynchronous dial and bridge to the LiveKit WebRTC room
call.start()
Enter fullscreen mode Exit fullscreen mode
Step 4: Handling State and Interruptions
One of the hardest things to build in Voice AI is handling interruptions (barge-ins).
Because Siphon uses LiveKit's WebRTC engine natively, it halts TTS output instantly when it detects human speech. Run your script, and you will have a natural, low-latency conversation with your AI—hosted entirely on your own infrastructure.
Check out the full documentation and repository at👾
GitHub: [https://github.com/blackdwarftech/siphon]
Siphon Website: [https://siphon.blackdwarf.in/docs]
and drop us a star if this saves you money!
Pitfall Guide
- SIP Trunk Registration Failures: Incorrect
SIP_USERNAME/SIP_PASSWORD or missing FROM_NUMBER triggers 401/403 SIP challenges. Best practice: Validate credentials using sipp or provider CLI tools before initializing the Siphon Call object, and ensure your trunk allows outbound registration from your server IP.
- WebRTC NAT/Firewall Blocking: LiveKit requires UDP ports (default 7882+) and TCP fallback. Corporate or cloud firewalls often drop these, causing silent media failures. Best practice: Deploy a TURN server, configure
livekit.yaml with explicit udp_port/tcp_port ranges, and verify STUN/TURN connectivity before bridging SIP.
- VAD False Positives/Negatives: Default Voice Activity Detection may trigger on line noise or miss low-volume speech, causing premature TTS cuts or delayed responses. Best practice: Tune VAD sensitivity thresholds per deployment environment, test with telephony codecs (G.711 μ-law/A-law), and implement hysteresis to prevent chattering.
- Async Call Lifecycle Mismanagement:
call.start() runs asynchronously but lacks built-in retry or state monitoring in minimal examples. Best practice: Wrap calls in asyncio task groups, implement heartbeat/keep-alive pings, and attach LiveKit room state listeners to gracefully handle drops or network partitions.
- Codec Transmuxing Mismatches: SIP trunks typically use G.711/G.722, while STT/TTS providers expect 16kHz/24kHz PCM or Opus. Best practice: Rely on Siphon's internal transmuxer, but verify sample rate alignment in provider configs. Explicitly set
sample_rate and channels in STT/TTS plugins to avoid silent or distorted audio.
- Stateless Conversation Drift: LLM agents lose context across SIP sessions if conversation history isn't persisted. Best practice: Integrate Redis or PostgreSQL to store session IDs, dialogue turns, and user context. Inject historical context into
system_instructions or LLM prompts on reconnect.
- Provider Rate Limiting & Quota Exhaustion: Direct API calls bypass SaaS throttling but hit OpenAI/Deepgram/Cartesia limits abruptly. Best practice: Implement exponential backoff, token budgeting, and fallback providers. Monitor usage via provider dashboards and set up alerting on 429/503 responses.
Deliverables
- 📘 Production Deployment Blueprint: Architecture diagram mapping SIP trunk → Siphon worker → LiveKit room → AI plugins. Includes network flow, media transcoding paths, and high-availability scaling strategies (horizontal worker scaling, Redis-backed session state, LiveKit cluster routing).
- ✅ Pre-Flight Verification Checklist: Step-by-step validation sequence covering environment variable integrity, SIP trunk registration test, LiveKit room token generation, VAD calibration, codec alignment, and end-to-end barge-in simulation.
- ⚙️ Configuration Templates:
.env production template with secret rotation placeholders
agent_config.yaml structure for dynamic plugin routing, VAD thresholds, and LLM temperature/context windows
docker-compose.yml for containerized Siphon + LiveKit + Redis stack with health checks and resource limits