Node.js vs Bun vs Go - A Multi-Layer HTTP Benchmark
Runtime Overhead vs. Network Reality: A Layered Performance Analysis of Node.js, Bun, and Go
Current Situation Analysis
Engineering teams frequently face pressure to migrate runtimes based on viral benchmarks claiming "blazing fast" performance. The industry pain point is not a lack of data, but a prevalence of misleading data. Most public benchmarks test idealized conditions that do not reflect production constraints, leading to architectural decisions based on event loop speed rather than system throughput.
This problem is overlooked because developers often conflate micro-benchmark efficiency with macro-system performance. A runtime that serves static JSON in microseconds may still underperform in production due to garbage collection pauses, inter-process communication (IPC) overhead, or network stack inefficiencies. Furthermore, comparisons often suffer from implementation bias, where one language is optimized while others use stock patterns.
Data from layered testing reveals that runtime differences are highly context-dependent. In CPU-bound scenarios, Bun demonstrated a ~55% throughput advantage over Node.js in multi-process configurations and nearly 2x the throughput of Go on single-core cloud instances. However, these margins collapsed when network I/O became the constraint. Over a WiFi network, all runtimes converged to a narrow band of 7,900 to 12,800 RPS, proving that hardware limitations can completely mask runtime efficiency. Additionally, Node.js exhibited significant tail latency spikes (up to 2,000 ms) and request timeouts under load, suggesting that raw throughput numbers can hide stability risks if garbage collection is not tuned.
WOW Moment: Key Findings
The most critical insight from this analysis is the convergence point. While runtimes diverge significantly in isolated environments, they converge rapidly once external bottlenecks are introduced. The following table contrasts performance across three distinct constraint layers, highlighting where runtime choice actually matters.
| Environment | Constraint Layer | Node.js (4 Cores) | Bun (4 Cores) | Go (4 Cores) | Primary Bottleneck |
|---|---|---|---|---|---|
| Localhost | Event Loop / Syscall | ~110,000 RPS | ~170,000 RPS | ~115,000 RPS | Runtime Overhead |
| Cloud 1-Core | Single Core CPU | ~11,700 RPS | ~25,400 RPS | ~13,900 RPS | Runtime Overhead |
| Cloud 4-Core | Multi-Core CPU | ~31,000 RPS | ~53,400 RPS | ~37,600 RPS | Runtime Overhead |
| LAN / WiFi | Network I/O | ~7,900 RPS | ~12,500 RPS | ~12,800 RPS | Network Hardware |
Why this matters: The data shows that Bun offers the highest raw efficiency, with a CPU cost per request of 0.0072%, compared to Go at 0.0090% and Node.js at 0.0129%. However, the LAN test demonstrates that if your infrastructure relies on constrained network paths, optimizing the runtime yields diminishing returns. The decision to migrate should be driven by CPU-bound workloads or latency-sensitive single-core operations, not generic throughput assumptions.
Core Solution
To make informed runtime decisions, engineers must adopt a layered evaluation strategy that isolates event loop performance from network and I/O constraints. This approach prevents the "localhost trap" and ensures comparisons are architecturally equivalent.
1. Architectural Equivalence in Implementation
A common error in benchmarking is comparing a single-process runtime with kernel-level socket sharing against a multi-process architecture with IPC overhead. True comparison requires matching the concurrency model.
Bun: Kernel-Level Socket Sharing
Bun can leverage reusePort to distribute connections across threads within a single process. This avoids IPC overhead but relies on the runtime's internal scheduler.
// bun-server.ts
// Uses kernel-level load balancing via reusePort.
// Single process, multi-threaded handling.
const server = Bun.serve({
port: 3000,
reusePort: true, // Distributes connections to worker threads
fetch(req: Request) {
return new Response(
JSON.stringify({ message: "Hello from Bun" }),
{ headers: { "Content-Type": "application/json" } }
);
},
});
console.log(`Listening on ${server.hostname}:${server.port}`);
Node.js: Multi-Process Clustering Node.js requires explicit clustering to utilize multiple cores. This introduces IPC overhead between the master and worker processes, which can impact latency under bursty traffic.
// node-server.ts
// Explicit multi-process architecture.
// Master process manages workers; IPC overhead exists.
import cluster from 'cluster';
import http from 'http';
import os from 'os';
const numCPUs = os.cpus().length;
if (cluster.isPrimary) {
console.log(`Primary ${process.pid} is running`);
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`Worker ${worker.process.pid} died. Restarting.`);
cluster.fork();
});
} else {
// Workers share the server port
const server = http.createServer((req, res) => {
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ message: "Hello from Node" }));
});
server.listen(3000, () => {
console.log(`Worker ${process.pid} started`);
});
}
Go: M:N Scheduler Go's runtime automatically multiplexes goroutines onto OS threads, utilizing all available cores without manual clustering or IPC overhead.
// go-server.go
// M:N scheduler handles concurrency automatically.
// No explicit clustering required.
package main
import (
"encoding/json"
"net/http"
)
func main() {
http.HandleFunc("/json", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{
"message": "Hello from Go",
})
})
http.ListenAndServe(":3000", nil)
}
2. Normalizing Payload Complexity
Benchmarks often favor Go by using pre-rendered byte slices, while JavaScript runtimes serialize JSON dynamically. This creates an unfair advantage. To ensure accuracy, all runtimes should perform equivalent work.
- Unfair Go Pattern:
w.Write([]byte({"message":"Hello"})) - Fair Go Pattern:
json.NewEncoder(w).Encode(payload) - Impact: Pre-rendering can artificially inflate Go throughput by 15-20%. Always verify that the code under test performs the same serialization and validation logic.
3. Isolation and Resource Pinning
Cloud benchmarks must eliminate CPU migration and cache invalidation. Using CPU quotas (--cpus) allows the container to float across physical cores, introducing noise. Use CPU pinning (--cpuset-cpus) for deterministic results.
# Correct isolation for cloud benchmarks
docker run --rm --cpuset-cpus="0-3" -m="512m" -p 3000:3000 my-runtime-image
Pitfall Guide
1. The Loopback Illusion
Explanation: Testing over localhost measures memory bus speed and loopback interface efficiency, not real-world network performance. Results here can be 10x higher than network-constrained tests.
Fix: Always include a network-constrained phase using a physical NIC or datacenter network to validate results.
2. The Pre-rendered JSON Trap
Explanation: Using static byte arrays in Go while JavaScript runtimes serialize objects creates an implementation bias. This favors Go but does not reflect real application logic. Fix: Ensure all runtimes perform dynamic serialization or apply the same optimization to all languages.
3. reusePort vs. Process Clustering
Explanation: Comparing Bun's reusePort (single process) to Node's cluster (multi-process) ignores IPC overhead. Bun's results may appear superior due to architecture, not just runtime speed.
Fix: Disclose the concurrency model. For strict equivalence, spawn multiple Bun processes or compare single-process modes.
4. GC Blind Spots and Tail Latency
Explanation: High average throughput can mask garbage collection pauses. Node.js showed max latencies of 2,000 ms and request timeouts, indicating GC pressure under load.
Fix: Monitor p99/p99.9 latency and GC metrics. Tune Node flags like --max-old-space-size and --optimize-for-size before drawing conclusions.
5. CPU Percentage Fallacy
Explanation: High CPU usage (e.g., Node at 400%) is often misinterpreted as inefficiency. In multi-process setups, this indicates all workers are saturated, which is desirable under load.
Fix: Calculate CPU cost per request (CPU% / RPS) to measure true efficiency. Bun achieved 0.0072% per request vs. Node's 0.0129%.
6. Single-Run Statistical Error
Explanation: Running a benchmark once produces noise, not signal. Variance in system load, network jitter, and scheduler behavior can skew results. Fix: Execute 5+ runs per configuration. Report median, standard deviation, and confidence intervals. Discard outliers.
7. Network Hardware Masking
Explanation: Using consumer-grade hardware (e.g., WiFi 3 adapters) can bottleneck all runtimes to the same low throughput, hiding runtime differences. Fix: Use wired connections or high-speed datacenter networks (10 Gbps+) to ensure the runtime, not the NIC, is the limiting factor.
Production Bundle
Action Checklist
- Pin CPU Resources: Use
--cpuset-cpusin Docker to prevent CPU migration and cache invalidation during tests. - Normalize Code Complexity: Ensure all runtimes perform equivalent serialization, validation, and business logic. Avoid pre-rendered payloads.
- Match Concurrency Models: Compare architectural equivalents (e.g., multi-process vs. multi-process) or explicitly disclose differences.
- Tune Garbage Collection: For Node.js, apply V8 tuning flags (
--max-old-space-size,--optimize-for-size) and monitor GC pauses. - Analyze Tail Latency: Report p99 and p99.9 latency, not just averages. High max latency indicates stability risks.
- Calculate CPU Efficiency: Compute CPU cost per request to compare true resource utilization across runtimes.
- Test Network Constraints: Include a phase with physical network I/O to validate performance under realistic conditions.
- Run Statistical Samples: Execute 5+ runs per configuration and report median values with standard deviation.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| High Volatility, Low Logic | Bun | Lowest per-request overhead (0.0072% CPU). Fastest single-core throughput. Drop-in Node compatibility. | Low |
| Max Efficiency, Low Latency | Go | Predictable performance. No IPC overhead. Excellent CPU efficiency (0.0090% CPU). Compiled binary. | Medium |
| Ecosystem, Stability, Tooling | Node.js | Mature ecosystem. APM integration. Debugging tools. Higher CPU cost (0.0129%) but proven reliability. | Low |
| Network-Bound Workloads | Any | Network hardware dominates performance. Runtime choice has minimal impact on throughput. | N/A |
| Single-Core Cloud Instances | Bun | Nearly 2x throughput of Go and Node on single cores. Maximizes limited resources. | Low |
| Multi-Core Cloud Instances | Bun | Highest multi-core scaling (170k RPS localhost). Efficient kernel-level socket distribution. | Low |
Configuration Template
Use this Docker Compose setup for fair, isolated benchmarking with statistical rigor.
# docker-compose.benchmark.yml
version: '3.8'
services:
target:
image: ${RUNTIME_IMAGE}
cpuset: "0-3"
mem_limit: 512m
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- GOMAXPROCS=4
loadgen:
image: alpine:latest
depends_on:
- target
command: >
sh -c "
apk add --no-cache wrk &&
wrk -t2 -c200 -d30s --latency --timeout 2s http://target:3000/json
"
Quick Start Guide
- Isolate the Environment: Deploy target and load generator in the same datacenter or use pinned Docker containers. Ensure network hardware is not the bottleneck.
- Normalize the Code: Implement equivalent logic across runtimes. Avoid pre-rendered payloads. Match concurrency models or document differences.
- Execute Benchmarks: Run
wrkwith statistical flags (--latency,--timeout). Perform 5+ runs per configuration. - Analyze Results: Calculate median RPS, p99 latency, and CPU cost per request. Identify bottlenecks (runtime vs. network).
- Make Decision: Use the Decision Matrix to select the runtime based on workload characteristics, cost, and ecosystem requirements.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
