Difficulty

Intermediate

Read Time

9 min

Deploying Hermes Agent on Ubuntu 26.04

By Codcompass Team·2026-05-13·9 min read

Self-Hosting Autonomous AI Agents: A Production-Ready Guide to Hermes Agent with Traefik and Secure LLM Routing

Current Situation Analysis

The rapid adoption of AI agents has created a bifurcation in deployment strategies. Organizations and developers are increasingly moving away from monolithic SaaS AI platforms toward self-hosted solutions that offer data sovereignty, cost predictability, and deep integration capabilities. The primary pain point in this transition is infrastructure complexity. Deploying an autonomous agent requires more than running a binary; it demands a robust orchestration layer for HTTPS termination, secure authentication, persistent state management, and flexible LLM routing.

Many existing deployment guides treat AI agents as ephemeral scripts, neglecting production requirements such as automated certificate management, resource isolation, and secure credential handling. This oversight leads to fragile deployments that break under load, expose sensitive data, or incur unexpected costs due to inefficient LLM routing.

Hermes Agent, developed by Nous Research, addresses these gaps by providing a persistent, tool-enabled agent framework. Unlike stateless chatbots, Hermes maintains memory across sessions and supports tool execution. It integrates natively with communication platforms like Telegram, Discord, Slack, and WhatsApp. Crucially, it supports any OpenAI-compatible API, allowing operators to route inference to cost-effective providers like Vultr Serverless Inference or keep workloads entirely on-premise. The challenge lies in assembling the surrounding infrastructure to support these capabilities securely and scalably.

WOW Moment: Key Findings

Self-hosting an agent framework like Hermes fundamentally alters the economics and security posture of AI operations. The following comparison highlights the operational advantages of a production-grade self-hosted deployment versus standard SaaS alternatives.

Feature	SaaS AI Agent Platform	Self-Hosted Hermes Agent (Production)
Data Residency	Cloud-hosted; data processed by third-party vendors.	On-premise/VPC; data remains within operator control.
LLM Flexibility	Vendor-locked; limited to supported models.	Open; supports any OpenAI-compatible endpoint (e.g., Vultr, Local, OpenRouter).
Cost Structure	Per-seat or high-volume API fees; unpredictable scaling.	Fixed infrastructure cost + variable LLM cost; optimized routing reduces spend.
Tool Access	Restricted to platform-approved integrations.	Full system access; custom tools and local file system interaction.
Persistence	Often session-based or limited retention.	Native persistent memory; context retained across restarts and sessions.
Latency	Dependent on external API latency and rate limits.	Local inference options; reduced network hops for internal tools.

Why this matters: By deploying Hermes with a hardened infrastructure stack, teams gain the ability to run autonomous workflows on sensitive data without exfiltration risks. The support for OpenAI-compatible APIs allows dynamic switching between models based on cost or performance requirements, enabling significant savings when using providers like Vultr Serverless Inference for high-volume tasks while reserving premium models for complex reasoning.

Core Solution

This solution outlines a production-ready deployment of Hermes Agent using Docker Compose. The architecture employs Traefik v3 as the edge router for automatic HTTPS and dynamic routing, a lightweight Nginx container for dashboard authentication, and Docker volumes for persistent agent memory.

1. Infrastructure Preparation

Ubuntu 26.04 provides a stable base for containerized workloads. We install Docker Engine from the official repository to ensure access to the latest security patches and features.

Install Docker Engine:

# Install prerequisites for secure repository access
sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release

# Add Docker's official GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Configure the repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker packages
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Configure user permissions to avoid sudo for docker commands
sudo usermod -aG docker $USER
newgrp docker

Note: Running newgrp docker applies the group change to the current session without requiring a logout. In production, verify Docker is running with docker info.

2. Project Scaffolding and Security

Create a dedicated directory for the deployment. We use a .env file to manage configuration variables, separating secrets from the compose definition.

mkdir -p ~/hermes-deploy/data
cd ~/hermes-deploy

Environment Configuration:

Create .env with the following variables. This structure isolates domain settings, ACME contact info, and authentication credentials.

# Domain configuration
HERMES_DOMAIN=agent.yourdomain.com
ACME_EMAIL=ops@yourdomain.com

# Authentication
ADMIN_USER=agent_admin
# Generate hash: htpasswd -nbB <user> <pass> | cut -d: -f2
ADMIN_HASH=$2y$10$YourGeneratedBcryptHashHere

Best Practice: Never store plaintext passwords. Use htpasswd or openssl to generate bcrypt hashes for the ADMIN_HASH variable. Restrict .env file permissions with chmod 600 .env.

3. Orchestration with Docker Compose

The docker-compose.yml file defines three services: the edge router, the Hermes core, and an authentication gateway. This separation of concerns allows independent scaling and security policies.

services:
  edge-router:
    image: traefik:v3
    command:
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.letsencrypt.acme.email=${ACME_EMAIL}
      - --certificatesresolvers.letsencrypt.acme.storage=/data/acme.json
      - --certificatesresolvers.letsencrypt.acme.tlschallenge=true
      - --api.dashboard=false
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./data/traefik:/data
    restart: unless-stopped
    networks:
      - hermes-net

  hermes-cor

e: image: nousresearch/hermes-agent:latest volumes: - ./data/hermes:/opt/data labels: - traefik.enable=true - traefik.http.routers.hermes.rule=Host(${HERMES_DOMAIN}) - traefik.http.routers.hermes.entrypoints=websecure - traefik.http.routers.hermes.tls.certresolver=letsencrypt - traefik.http.routers.hermes.middlewares=auth-gateway restart: unless-stopped networks: - hermes-net

auth-gateway: image: nginx:alpine volumes: - ./data/auth/.htpasswd:/etc/nginx/.htpasswd:ro labels: - traefik.enable=true - traefik.http.middlewares.auth-gateway.basicauth.usersfile=/etc/nginx/.htpasswd restart: unless-stopped networks: - hermes-net

networks: hermes-net: driver: bridge


**Architecture Rationale:**
*   **Traefik v3:** Chosen for dynamic configuration via Docker labels and built-in ACME support. The `tlschallenge` method verifies domain ownership without requiring DNS provider credentials, simplifying deployment.
*   **Service Isolation:** The `auth-gateway` service handles basic authentication middleware. This keeps auth logic decoupled from the Hermes core, allowing the agent to focus on inference and tool execution.
*   **Persistent Volumes:** The `./data/hermes:/opt/data` mapping ensures that agent memory, configuration, and tool states survive container restarts.
*   **Security:** `exposedbydefault=false` ensures only explicitly labeled services are routable. The Docker socket is mounted read-only (`:ro`) to minimize attack surface.

#### 4. LLM Backend Configuration

Hermes Agent requires an LLM provider to function. The framework supports any OpenAI-compatible endpoint. For cost-effective production workloads, Vultr Serverless Inference provides a high-performance, OpenAI-compatible API.

**Configure the Model:**

```bash
docker run -it --rm \
  -v ./data/hermes:/opt/data \
  nousresearch/hermes-agent:latest model

The interactive CLI will prompt for provider selection. Choose the OpenAI-compatible option and input:

API Key: Your provider's secret key.
Base URL: For Vultr, this is typically https://inference.vultr.com/v1.
Model ID: The specific model identifier supported by the provider.

Production Tip: Store API keys in a secure vault or Docker secrets for large-scale deployments. Rotate keys regularly and monitor usage via the provider's dashboard to prevent unexpected costs.

5. Deployment and Verification

Launch Services:

docker compose up -d

Verify Status:

docker compose ps

Ensure all services report Up status. Check Traefik logs for certificate issuance:

docker compose logs edge-router | grep -i "acme"

Access the Agent:

Navigate to https://agent.yourdomain.com. The browser will prompt for credentials defined in the .env file. Once authenticated, the dashboard loads. Send a test message to verify LLM connectivity. A successful response confirms the routing chain: Browser → Traefik → Auth Gateway → Hermes Core → LLM Provider.

Pitfall Guide

Deploying self-hosted AI agents introduces specific operational risks. The following pitfalls are common in production environments and include mitigation strategies.

Pitfall	Explanation	Fix
Volume Permission Mismatch	The Hermes container runs as a non-root user. If the host directory is owned by root, the agent cannot write memory or config files, causing silent failures.	Run `sudo chown -R 1000:1000 ./data/hermes` (or the UID/GID specified in the Dockerfile) before starting the container.
ACME Rate Limiting	Traefik may hit Let's Encrypt rate limits if certificates are requested too frequently, often due to misconfigured domains or repeated restarts during development.	Use the `--certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory` flag during testing. Switch to production only after verification.
LLM API Key Exposure	Storing API keys in plain text within `.env` files accessible to all users on the host compromises the LLM account and incurs financial risk.	Restrict `.env` permissions (`chmod 600`). For multi-user hosts, use Docker secrets or a vault integration. Audit access logs on the LLM provider side.
Memory State Corruption	Abrupt termination of the container (e.g., `kill -9`) can corrupt the persistent memory database, leading to data loss or agent instability.	Always use `docker compose down` or `docker compose stop` for graceful shutdowns. Implement automated backups of the `./data/hermes` volume.
Firewall Blocking	Ubuntu's UFW may block inbound traffic on ports 80 and 443, preventing Traefik from serving requests or completing ACME challenges.	Run `sudo ufw allow 80/tcp` and `sudo ufw allow 443/tcp`. Verify rules with `sudo ufw status`.
Dashboard Auth Bypass	Weak passwords or misconfigured Nginx auth files can allow unauthorized access to the agent dashboard, potentially exposing tools or memory.	Use strong bcrypt hashes. Regularly rotate credentials. Consider adding IP allowlisting via Traefik middleware for internal deployments.
Resource Exhaustion	Hermes Agent with persistent memory and tool execution can consume significant RAM/CPU, especially with complex tool chains or large context windows.	Set resource limits in `docker-compose.yml` using `deploy.resources.limits`. Monitor usage with `docker stats` and adjust based on workload.

Production Bundle

This section provides actionable assets for deploying and managing Hermes Agent in production environments.

Action Checklist

DNS Verification: Ensure HERMES_DOMAIN points to the server IP and propagation is complete before deployment.
Credential Generation: Create bcrypt hashes for ADMIN_HASH and verify format compatibility with Nginx.
Volume Initialization: Pre-create data directories and set ownership to match container UID/GID.
Firewall Configuration: Open ports 80 and 443; restrict SSH access to trusted IPs.
LLM Connectivity Test: Validate API key and endpoint responsiveness before integrating with Hermes.
Backup Strategy: Configure automated backups for ./data/hermes and ./data/traefik/acme.json.
Monitoring Setup: Integrate with Prometheus/Grafana or use docker stats to track resource consumption.
Update Policy: Schedule regular updates for Docker images and Ubuntu packages to patch security vulnerabilities.

Decision Matrix

Select the appropriate LLM routing strategy based on operational requirements.

Scenario	Recommended Approach	Why	Cost Impact
High-Volume, Cost-Sensitive	Vultr Serverless Inference	OpenAI-compatible API with competitive pricing; scalable inference.	Low variable cost; predictable scaling.
Data Privacy Critical	Local LLM (e.g., Ollama)	Inference runs on-premise; no data leaves the network.	High upfront hardware cost; zero API fees.
Complex Reasoning Tasks	Premium Cloud Model (e.g., OpenAI)	State-of-the-art performance for difficult queries.	High per-token cost; use sparingly.
Hybrid Workloads	Router/Proxy (e.g., LiteLLM)	Route requests based on task complexity; optimize cost/performance.	Moderate infrastructure cost; optimized API spend.

Configuration Template

Use this template as a baseline for production deployments. Customize variables and add resource limits as needed.

# docker-compose.prod.yml
services:
  edge-router:
    image: traefik:v3
    command:
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.letsencrypt.acme.email=${ACME_EMAIL}
      - --certificatesresolvers.letsencrypt.acme.storage=/data/acme.json
      - --certificatesresolvers.letsencrypt.acme.tlschallenge=true
      - --api.dashboard=false
      - --log.level=INFO
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./data/traefik:/data
    restart: always
    networks:
      - hermes-net

  hermes-core:
    image: nousresearch/hermes-agent:latest
    volumes:
      - ./data/hermes:/opt/data
    environment:
      - TZ=UTC
    labels:
      - traefik.enable=true
      - traefik.http.routers.hermes.rule=Host(`${HERMES_DOMAIN}`)
      - traefik.http.routers.hermes.entrypoints=websecure
      - traefik.http.routers.hermes.tls.certresolver=letsencrypt
      - traefik.http.routers.hermes.middlewares=auth-gateway
    restart: always
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
    networks:
      - hermes-net

  auth-gateway:
    image: nginx:alpine
    volumes:
      - ./data/auth/.htpasswd:/etc/nginx/.htpasswd:ro
    labels:
      - traefik.enable=true
      - traefik.http.middlewares.auth-gateway.basicauth.usersfile=/etc/nginx/.htpasswd
    restart: always
    networks:
      - hermes-net

networks:
  hermes-net:
    driver: bridge

Quick Start Guide

Prepare Environment: Install Docker on Ubuntu 26.04, create project directory, and generate .env with domain and auth credentials.
Initialize Volumes: Create data directories and set ownership to prevent permission errors.
Deploy Stack: Run docker compose -f docker-compose.prod.yml up -d to start services.
Configure LLM: Execute docker run -it --rm -v ./data/hermes:/opt/data nousresearch/hermes-agent:latest model to set up the inference backend.
Verify Access: Open https://<HERMES_DOMAIN> in a browser, authenticate, and test agent responsiveness.

By following this guide, you establish a secure, scalable, and cost-efficient foundation for running autonomous AI agents. The architecture supports persistent memory, flexible LLM routing, and robust security, enabling production-grade AI workflows without vendor lock-in.