My $10/Month VPS Gets 659 SSH Attacks per Day — Here's What 4 Weeks of Running an Autonomous AI Has Taught Me About Infrastructure
Building Resilient AI Agents on a $15/Month Budget: A Production-Grade Architecture
Current Situation Analysis
The AI infrastructure narrative has been heavily skewed toward enterprise-grade complexity. Marketing materials, cloud provider documentation, and vendor whitepapers consistently push the idea that autonomous agents require managed Kubernetes clusters, dedicated GPU instances, auto-scaling groups, and six-figure cloud credits. This creates a false barrier to entry, convincing engineering teams that resilience scales linearly with compute spend.
The reality of running persistent AI workloads is fundamentally different. Autonomous systems do not fail because of insufficient RAM or missing accelerators. They fail because of unhandled state transitions, silent dependency drops, and unmonitored external API degradation. The industry overlooks lightweight orchestration and failure-tolerant design because they are difficult to productize. Vendors sell control planes; they rarely sell graceful degradation.
Production telemetry from lean deployments tells a different story. A single 4GB VPS running Ubuntu 22.04.3 LTS, operating continuously without human intervention, routinely absorbs approximately 659 failed SSH authentication attempts per day. Automated IP reputation management permanently bans roughly 31 unique source addresses daily. The system experiences periodic model unavailability (HTTP 404 responses when local inference caches expire), triggering scheduled job failures. Despite these conditions, the deployment maintains zero successful breaches and zero unplanned downtime. Total monthly operational expenditure sits at approximately $15: $10 for the VPS and $5 for cloud API fallback credits. This data contradicts the assumption that autonomous AI requires enterprise infrastructure to survive in hostile network environments.
WOW Moment: Key Findings
The critical insight is not that lightweight infrastructure is cheaper. It is that resilience emerges from explicit failure handling, not from redundant compute. When you strip away managed orchestration layers, you are forced to design for degradation. The following comparison illustrates the operational trade-offs between the enterprise AI stack and the lean autonomous architecture.
| Approach | Monthly Cost | Failure Recovery Time | Attack Surface Exposure | Maintenance Overhead | Uptime Guarantee |
|---|---|---|---|---|---|
| Enterprise AI Stack | $800–$2,500+ | 15–45 mins (auto-scaling/provisioning lag) | High (managed endpoints, load balancers, public APIs) | High (cluster updates, IAM rotation, cost monitoring) | SLA-backed, but complex blast radius |
| Lean Autonomous Stack | ~$15 | <30 seconds (local fallback + cron retry) | Low (single SSH port, strict IP reputation filtering) | Low (single-node updates, explicit env management) | Achieved through stateless retries and circuit breakers |
This finding matters because it decouples AI autonomy from infrastructure spend. It enables independent developers, research labs, and early-stage startups to deploy persistent agents that survive network scanning, model cache evictions, and API rate limits without requiring dedicated DevOps resources. The architecture proves that autonomous recovery is a software design problem, not a hardware procurement problem.
Core Solution
Building a resilient autonomous agent on constrained infrastructure requires four deliberate layers: network hardening, inference routing, stateless orchestration, and explicit health validation. Each layer is designed to fail predictably and recover without human intervention.
Step 1: Base Hardening & Network Filtering
Start with a minimal Ubuntu 22.04 LTS installation. Disable password authentication, enforce SSH key-only access, and deploy fail2ban with aggressive but progressive thresholds. The goal is not to block all traffic, but to establish an automated IP reputation system that learns from brute-force patterns.
# /etc/fail2ban/jail.local
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
findtime = 600
bantime = 86400
banaction = iptables-multiport
This configuration bans an IP for 24 hours after three failed attempts within a 10-minute window. The 24-hour ban duration prevents rapid rotation attacks while minimizing false positives on legitimate services.
Step 2: Inference Routing with Local-Cloud Fallback
Relying exclusively on local models introduces cache eviction risks. Relying exclusively on cloud APIs introduces cost volatility and latency spikes. The solution is a deterministic routing layer that attempts local inference first, validates model availability, and falls back to a cloud endpoint only when necessary.
// inference-router.ts
import { execSync } from 'child_process';
import { createClient } from '@openai/sdk';
const LOCAL_ENDPOINT = 'http://localhost:11434';
const CLOUD_MODEL = 'gpt-4o-mini';
const LOCAL_MODEL = 'qwen3:4b';
async function validateLocalModel(): Promise<boolean> {
try {
const output = execSync(`curl -s ${LOCAL_ENDPOINT}/api/tags | jq -r '.models[].name'`, { encoding: 'utf-8' });
return output.includes(LOCAL_MODEL);
} catch {
return false;
}
}
export async function routeInference(prompt: string): Promise<string> {
const localAvailable = await validateLocalModel();
if (localAvailable) {
const response = await fetch(`${LOCAL_ENDPOINT}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: LOCAL_MODEL, prompt, stream: false })
});
const data = await response.json();
return data.response;
}
// Fallback to cloud with explicit cost guard
const cloudClient = createClient({ apiKey: process.env.CLOUD_API_KEY });
const completion = await cloudClient.chat.completions.create({
model: CLOUD_MODEL,
messages: [{ role: 'user', content: prompt }],
max_tokens: 512
});
return completion.choices[0].message.content ?? '';
}
This router validates model presence before execution, preventing silent 404 failures. The cloud fallback is explicitly bounded with max_tokens to control cost exposure.
Step 3: Stateless Orchestration via Cron & Wrapper Scripts
Complex schedulers introduce state dependencies that complicate recovery. Cron, when paired with explicit environment sourcing and absolute paths, provides deterministic execution with minimal overhead. Each job is wrapped in a retry-capable launcher that logs outcomes and triggers alerts only on repeated failures.
#!/usr/bin/env bash
# agent-runner.sh
set -euo pipefail
export HOME="/root"
source /etc/environment
LOG_DIR="/var/log/ai-agent"
LOCK_FILE="/tmp/agent-runner.lock"
MAX_RETRIES=3
if [ -f "$LOCK_FILE" ]; then
echo "[$(date -u)] Job already running. Exiting." >> "$LOG_DIR/runner.log"
exit 0
fi
touch "$LOCK_FILE"
trap 'rm -f "$LOCK_FILE"' EXIT
attempt=0
while [ $attempt -lt $MAX_RETRIES ]; do
if node /opt/ai-agent/dist/main.js >> "$LOG_DIR/execution.log" 2>&1; then
echo "[$(date -u)] Success on attempt $((attempt + 1))" >> "$LOG_DIR/runner.log"
exit 0
fi
attempt=$((attempt + 1))
sleep $((attempt * 30))
done
echo "[$(date -u)] Failed after $MAX_RETRIES attempts. Alerting." >> "$LOG_DIR/runner.log"
curl -s -X POST "$ALERT_WEBHOOK_URL" -H 'Content-Type: application/json' \
-d "{\"text\": \"Agent job failed after $MAX_RETRIES retries at $(date -u)\"}"
The wrapper enforces idempotency via a lock file, sources environment variables explicitly to avoid cron path resolution issues, and implements exponential backoff before triggering external alerts.
Step 4: Autonomous Workflow Composition
The agent's actual workloads—scraping arXiv, parsing Hacker News, monitoring GitHub trending repositories, publishing to Dev.to and LinkedIn, and executing self-audits—are decoupled from the orchestration layer. Each module returns structured exit codes. The scheduler treats them as stateless tasks. This separation ensures that a failed scraping job does not block a security audit, and a publishing rate limit does not crash the inference router.
Architecture rationale:
- Cron over systemd timers: Cron is universally available, requires no service file maintenance, and handles timezone/execution drift more predictably when paired with explicit env sourcing.
- Local-first inference: Reduces cloud spend by ~70% for routine tasks while maintaining sub-second latency.
- Explicit fallback routing: Prevents silent degradation. The system knows when it is operating in degraded mode and logs accordingly.
- Lock-file concurrency control: Eliminates race conditions without requiring Redis or database-backed queues.
Pitfall Guide
1. Silent Model Cache Eviction
Explanation: Ollama periodically purges unused models to conserve disk space. Cron jobs referencing evicted models fail with HTTP 404 errors, but the scheduler logs them as generic execution failures, masking the root cause.
Fix: Implement pre-flight model validation before job execution. Cache model manifests in a lightweight JSON registry and compare against ollama list output. Trigger a pull command automatically if the manifest drifts.
2. Cron Environment Drift
Explanation: Cron executes jobs in a minimal environment. Variables like PATH, HOME, and NODE_ENV are often missing, causing tools like npm, curl, or jq to fail silently.
Fix: Always source /etc/environment or a dedicated .env file at the top of wrapper scripts. Use absolute paths for all binaries. Validate environment state in the first 10 lines of execution.
3. Cloud Fallback Cost Spikes
Explanation: When local inference fails repeatedly, the fallback router may trigger hundreds of cloud API calls in a short window, exhausting monthly credits or triggering rate limits. Fix: Implement a circuit breaker pattern. Track fallback invocation counts in a local file. If fallback usage exceeds a threshold (e.g., 50 calls/hour), pause cloud routing and switch to a degraded mode that queues tasks for local recovery.
4. Log Rotational Blind Spots
Explanation: Autonomous agents generate continuous stdout/stderr output. Without rotation, log files fill the root partition, causing the entire VPS to become unresponsive.
Fix: Deploy logrotate with compression and age limits. Structure logs as JSON lines for easier parsing. Set up a disk usage monitor that triggers a warning webhook at 80% capacity.
5. Over-Aggressive IP Banning
Explanation: fail2ban configured with low thresholds and long ban times can accidentally block legitimate services, CI runners, or monitoring probes.
Fix: Use progressive banning. Start with 10-minute bans, escalate to 1-hour, then 24-hour. Maintain an allowlist for known infrastructure IPs. Review ban logs weekly to adjust thresholds.
6. Assuming Autonomy Equals Unattended Operation
Explanation: Teams deploy autonomous agents and assume they require zero oversight. In reality, autonomous systems require proactive alerting on failure thresholds, not just success metrics. Fix: Implement tiered alerting. Info-level logs for routine retries, warning-level for fallback activation, critical-level for repeated job failures or disk/memory exhaustion. Route alerts to a lightweight webhook or email, not a complex observability platform.
7. Timezone & Scheduling Misalignment
Explanation: Cron expressions evaluate in the server's local timezone. If the VPS is set to UTC but the agent targets time-sensitive APIs (e.g., arXiv daily digests, LinkedIn posting windows), jobs execute at incorrect intervals.
Fix: Explicitly set CRON_TZ=America/New_York (or your target zone) in the crontab. Validate scheduled times against UTC conversion before deployment.
Production Bundle
Action Checklist
- Harden SSH access: Disable password auth, enforce key-only login, restrict to specific users
- Deploy fail2ban with progressive thresholds and allowlisted infrastructure IPs
- Implement local model validation before job execution to prevent silent 404 failures
- Route inference through a local-first router with explicit cloud fallback and token caps
- Wrap all cron jobs in concurrency-controlled launchers with absolute paths and env sourcing
- Configure logrotate with JSON line formatting and disk usage alerting at 80% capacity
- Implement tiered alerting: retries (info), fallback activation (warning), repeated failures (critical)
- Track monthly cloud API spend with a local counter and automatic circuit breaker activation
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Indie Developer / Solo Researcher | Lean autonomous stack with cron + local inference | Minimal maintenance, predictable $15/mo baseline, full control over failure handling | ~$15/month |
| Startup MVP / Proof of Concept | Hybrid routing with managed queue (Redis) + cloud fallback | Adds reliability for concurrent jobs without Kubernetes overhead | ~$25–$40/month |
| Enterprise Pilot / Internal Tool | Containerized agent on managed VM + centralized logging | Aligns with existing IAM, audit, and compliance requirements | ~$150–$300/month |
| Edge / IoT Deployment | Local-only inference with offline queue + periodic sync | Zero cloud dependency, survives network partition, deterministic recovery | ~$5–$10/month (hardware only) |
Configuration Template
# /etc/fail2ban/jail.local
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 3
banaction = iptables-multiport
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
bantime = 86400
[agent-health]
enabled = true
port = http,https
filter = agent-health
logpath = /var/log/ai-agent/execution.log
maxretry = 5
bantime = 7200
// logrotate.d/ai-agent
/var/log/ai-agent/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 root adm
postrotate
systemctl reload rsyslog > /dev/null 2>&1 || true
endscript
}
# crontab -e
CRON_TZ=UTC
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Run agent every 30 minutes
*/30 * * * * /opt/ai-agent/bin/agent-runner.sh >> /var/log/ai-agent/cron.log 2>&1
Quick Start Guide
- Provision & Harden: Deploy a 4GB Ubuntu 22.04 VPS. Disable SSH password authentication, upload your public key, and install
fail2ban. Apply the progressive jail configuration. - Install Inference Layer: Install Ollama, pull your target model (e.g.,
qwen3:4b), and verify local API responsiveness. Set up a cloud API key with usage caps. - Deploy Orchestration: Place the TypeScript inference router and bash wrapper script in
/opt/ai-agent. Configurelogrotate, set explicit cron environment variables, and schedule the runner. - Validate & Monitor: Trigger a manual execution. Verify model validation, fallback routing, lock-file concurrency, and log rotation. Confirm alert webhooks fire only on repeated failures. The system is now production-ready.
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
