How I built a quantum-inspired route optimizer in pure Python (no quantum hardware)

Classical QUBO Optimization: Solving Combinatorial Routing Problems with Simulated Annealing

Current Situation Analysis

Combinatorial optimization problems like the Travelling Salesman Problem (TSP) and Vehicle Routing Problem (VRP) represent a persistent bottleneck in logistics, scheduling, and resource allocation. As the number of decision variables grows, the solution space expands factorially, rendering exact solvers computationally intractable for instances beyond trivial sizes. Engineering teams frequently resort to greedy heuristics or local search methods (e.g., 2-opt), which offer speed but provide no guarantee of solution quality and often converge to suboptimal local minima.

A significant misconception in the industry is that Quadratic Unconstrained Binary Optimization (QUBO) formulations are exclusively the domain of quantum annealing hardware. This perception leads many developers to overlook a powerful alternative: classical simulation of quantum-inspired algorithms. QUBO provides a universal mathematical language for optimization that can be solved efficiently on standard CPUs using techniques like Simulated Annealing (SA). By decoupling the problem formulation from the solver backend, teams can prototype solutions classically and maintain a migration path to quantum hardware without rewriting core logic.

Empirical evidence suggests that for small-to-medium instances (typically N < 50 variables), classical QUBO solvers can achieve competitive solution quality with execution times measured in milliseconds, making them viable for real-time decision support systems where traditional solvers struggle with latency or where a unified optimization interface is preferred.

WOW Moment: Key Findings

The strategic advantage of the QUBO approach lies in its formulation versatility and solver agnosticism. The following comparison highlights where classical QUBO with Simulated Annealing fits within the optimization landscape:

Approach	Optimality Guarantee	Scalability Limit	Implementation Complexity	Hardware Dependency
Brute Force	Exact	N ≤ 10	Low	None
Greedy / 2-Opt	None	N ≤ 1000	Low	None
QUBO + Simulated Annealing	Probabilistic	N ≤ 50	Medium	None (Classical)
Quantum Annealing	Probabilistic	N ≤ 5000	High	Specialized Hardware

Key Insight: QUBO + Simulated Annealing offers a "sweet spot" for medium-complexity problems. It provides a structured, constraint-based formulation that is more rigorous than ad-hoc heuristics, while remaining entirely software-defined. This enables rapid iteration on problem constraints without the overhead of specialized infrastructure, and the formulation maps directly to quantum annealers if hardware becomes available or necessary for scaling.

Core Solution

Implementing a QUBO-based optimizer involves three distinct phases: mathematical formulation, solver implementation, and result decoding. This section outlines a production-grade architecture using pure Python.

1. Mathematical Formulation

QUBO problems are defined by the objective function:

$$ \text{Minimize } E(x) = \sum_{i} Q_{ii} x_i + \sum_{i<j} Q_{ij} x_i x_j $$

Where $x_i \in {0, 1}$ are binary variables and $Q$ is a coefficient matrix. The goal is to find the bitstring $x$ that minimizes energy $E$.

For TSP, we use one-hot encoding. Let $x_{c,t}$ be a binary variable where $c$ is the city index and $t$ is the timestep. $x_{c,t} = 1$ indicates city $c$ is visited at time $t$.

Constraints:

Visit each city exactly once: $\sum_{t} x_{c,t} = 1$ for all $c$.
Visit exactly one city per timestep: $\sum_{c} x_{c,t} = 1$ for all $t$.

Objective: Minimize total distance: $\sum_{t} \sum_{c} \sum_{d} \text{dist}(c,d) \cdot x_{c,t} \cdot x_{d,t+1}$.

These constraints are converted into penalty terms added to the objective function with a penalty coefficient $P$.

2. Implementation Architecture

The implementation uses a sparse representation for the QUBO matrix to optimize memory usage and a modular solver engine with configurable cooling schedules.

Binary Quadratic Model: A sparse dictionary-based model avoids the $O(N^2)$ memory overhead of dense matrices.

from typing import Dict, Tuple, List, Optional
import random

class BinaryQuadraticModel:
    """Sparse representation of a QUBO problem."""
    
    def __init__(self):
        # Keys are tuples (i, j) with i <= j; values are coefficients
        self._coeffs: Dict[Tuple[int, int], float] = {}
    
    def set_term(self, i: int, j: int, weight: float) -> None:
        """Set the coefficient for the interaction between variables i and j."""
        if i > j:
            i, j = j, i
        if weight == 0.0:
            self._coeffs.pop((i, j), None)
        else:
            self._coeffs[(i, j)] = weight
    
    def get_energy(self, sample: List[int]) -> float:
        """Compute the energy of a given binary sample."""
        energy = 0.0
        for (i, j), weight in self._coeffs.items():
            if sample[i] and sample[j]:
                energy += weight
        return energy
    
    def num_variables(self) -> int:
        if not self._coeffs:
            return 0
        return max(max(i, j) for i, j in self._coeffs) + 1

TSP Encoder: Encodes city coordinates into a QUBO model with penalty terms.

import math

def encode_tsp(
    cities: List[Tuple[float, float]], 
    penalty: float
) -> BinaryQuadraticModel:
    """Encode TSP instance into a QUBO model."""
    n = len(cities)
    model = BinaryQuadraticModel()
    
    # Precompute distance matrix
    dist_matrix = [
        [math.hypot(cities[i][0] - cities[j][0], cities[i][1] - cities[j][1]) 
         for j in range(n)] 
        for i in range(n)
    ]
    
    # Helper to map (city, time) to variable index
    def var_idx(c: int, t: int) -> int:
        return c * n + t
    
    # Objective: Minimize distance
    for t in range(n):
        t_next = (t + 1) % n
        for c in range(n):
            for d in range(n):
                if c == d:
                    continue
                weight = dist_matrix[c][d]
                i, j = var_idx(c, t), var_idx(d, t_next)
                model.set_term(i, j, weight)
    
    # Constraint 1: Each city visited exactly once
    for c in range(n):
        for t in range(n):
            for t2 in range(t + 1, n):
                i, j = var_idx(c, t), var_idx(c, t2)
                model.set_term(i, j, 2 * penalty)
            i = var_idx(c, t)
            model.set_term(i, i, -penalty)
    
    # Constraint 2: One city per timestep
    for t in range(n):
        for c in range(n):
            for c2 in range(c + 1, n):
                i, j = var_idx(c, t), var_idx(c2, t)
                model.set_term(i, j, 2 * penalty)
            i = var_idx(c, t)
            model.set_term(i, i, -penalty)
            
    return model

Simulated Annealing Solver: Implements a thermal annealing process with multi-restart capability and adaptive beta scheduling.

from dataclasses import dataclass

@dataclass
class AnnealingParams:
    beta_start: float = 0.1
    beta_end: float = 10.0
    steps: int = 1000
    restarts: int = 5

class ThermalAnnealer:
    """Simulated Annealing solver for QUBO problems."""
    
    def __init__(self, params: Optional[AnnealingParams] = None):
        self.params = params or AnnealingParams()
    
    def solve(self, model: BinaryQuadraticModel) -> Tuple[List[int], float]:
        """Run multi-restart simulated annealing."""
        best_sample = None
        best_energy = float('inf')
        
        n_vars = model.num_variables()
        
        for _ in range(self.params.restarts):
            sample = [random.randint(0, 1) for _ in range(n_vars)]
            current_energy = model.get_energy(sample)
            
            for step in range(self.params.steps):
                # Logarithmic cooling schedule
                beta = self.params.beta_start * (
                    (self.params.beta_end / self.params.beta_start) ** 
                    (step / self.params.steps)
                )
                
                # Flip a random bit
                flip_idx = random.randint(0, n_vars - 1)
                sample[flip_idx] = 1 - sample[flip_idx]
                new_energy = model.get_energy(sample)
                
                delta = new_energy - current_energy
                
                # Acceptance criterion
                if delta < 0 or random.random() < math.exp(-beta * delta):
                    current_energy = new_energy
                else:
                    sample[flip_idx] = 1 - sample[flip_idx]
                
                if current_energy < best_energy:
                    best_energy = current_energy
                    best_sample = sample.copy()
                    
        return best_sample, best_energy

3. Architecture Decisions

Sparse Representation: Using a dictionary for coefficients reduces memory footprint significantly for large problems where the QUBO matrix is sparse. This is critical for TSP, where interactions are local in the encoding.
Logarithmic Cooling: The beta schedule uses a logarithmic progression rather than linear. This allows for rapid exploration early in the process and finer exploitation later, which empirically yields better convergence for combinatorial landscapes.
Multi-Restart: Simulated Annealing is stochastic. Running multiple independent restarts and selecting the best result mitigates the risk of converging to a poor local minimum in a single run.
Penalty Coefficient: The penalty $P$ must be tuned. If $P$ is too low, constraints may be violated; if too high, the energy landscape becomes dominated by constraint satisfaction, obscuring the distance optimization. A sweep of $P$ relative to the maximum edge weight is recommended.

Pitfall Guide

1. Penalty Hyperparameter Sensitivity

Explanation: The penalty coefficient controls the trade-off between feasibility and optimality. An improperly tuned penalty can lead to invalid routes or suboptimal distances. Fix: Perform a penalty sweep. Start with $P$ slightly larger than the maximum edge weight in the graph. Validate that the resulting sample satisfies all constraints. If constraints are violated, increase $P$; if the solution quality degrades, decrease $P$.

2. Variable Explosion for Large Instances

Explanation: TSP encoding requires $N^2$ variables. For $N=100$, this results in 10,000 bits. Classical SA performance degrades significantly beyond a few thousand variables due to the exponential growth of the search space. Fix: For $N > 50$, consider decomposition strategies, hybrid approaches (e.g., using SA to refine segments of a heuristic solution), or switching to specialized solvers like OR-Tools. QUBO is best suited for small-to-medium instances.

3. Symmetry Redundancy

Explanation: TSP has rotational symmetry; the route A-B-C is equivalent to B-C-A. This redundancy increases the effective search space without adding value. Fix: Fix the first city to timestep 0. This reduces the variable count and eliminates rotational symmetry, improving solver efficiency. Implement this by removing variables associated with the fixed city and timestep.

4. Cooling Schedule Traps

Explanation: A cooling schedule that decreases temperature too quickly can trap the solver in a local minimum. Conversely, a schedule that stays hot too long wastes computation. Fix: Use adaptive schedules or logarithmic cooling. Monitor the acceptance rate; if it drops to zero too early, the schedule is too aggressive. Implement a warm-up phase to equilibrate the system before cooling begins.

5. Constraint Validation Omission

Explanation: Developers often assume the solver returns a valid solution. However, SA may return samples that violate constraints, especially if penalties are insufficient. Fix: Always include a validation step post-solve. Check that the one-hot constraints are satisfied. If not, flag the solution as invalid and trigger a re-run with adjusted parameters.

6. Random Seed Management

Explanation: Non-deterministic behavior makes debugging and reproducibility difficult in production environments. Fix: Expose a seed parameter in the solver configuration. Use deterministic seeding for testing and regression validation. In production, allow random seeding to ensure diverse exploration across requests.

Production Bundle

Action Checklist

Define Penalty Range: Establish a baseline penalty based on graph metrics (e.g., max edge weight) and plan a sweep.
Implement Validation: Add a function to verify one-hot constraints on all solver outputs.
Configure Multi-Restart: Set restarts ≥ 5 to ensure robustness against stochastic variance.
Benchmark Baselines: Compare QUBO+SA results against a greedy heuristic to quantify value add.
Monitor Energy Convergence: Log energy trajectories to detect premature convergence or oscillation.
Tune Cooling Schedule: Adjust beta_start, beta_end, and steps based on instance size and latency requirements.
Handle Symmetry: Fix the starting city to reduce variable count and search space.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
N ≤ 10, Exact Solution Required	Brute Force	Guaranteed optimality; computationally feasible.	Low compute cost.
10 < N ≤ 50, Fast Iteration	QUBO + Simulated Annealing	Balanced quality/speed; unified formulation; no hardware dependency.	Moderate compute cost; high developer velocity.
N > 50, Production Routing	OR-Tools / LKH	Specialized solvers scale better for large instances.	Low compute cost; higher integration complexity.
Quantum Migration Planned	QUBO + SA (Prototype)	Formulation maps directly to quantum annealers.	Zero migration cost for formulation.

Configuration Template

# annealing_config.py
from dataclasses import dataclass

@dataclass
class SolverConfig:
    """Configuration for the thermal annealer."""
    
    # Annealing parameters
    beta_start: float = 0.05
    beta_end: float = 15.0
    steps: int = 2000
    restarts: int = 10
    
    # Penalty tuning
    penalty_multiplier: float = 1.2  # Multiplier relative to max edge weight
    
    # Execution control
    seed: int = None
    timeout_ms: int = 5000
    
    def get_penalty(self, max_edge_weight: float) -> float:
        return max_edge_weight * self.penalty_multiplier

Quick Start Guide

Define Cities: Create a list of (x, y) coordinates representing stops.
```
cities = [(0, 0), (2, 3), (5, 1), (8, 4), (3, 7)]
```

Encode Problem: Generate the QUBO model with a tuned penalty.

max_dist = max(
    math.hypot(c1[0]-c2[0], c1[1]-c2[1]) 
    for c1 in cities for c2 in cities
)
penalty = max_dist * 1.5
model = encode_tsp(cities, penalty)

Solve: Run the annealer with configured parameters.

params = AnnealingParams(steps=1500, restarts=8)
annealer = ThermalAnnealer(params)
best_sample, best_energy = annealer.solve(model)

Decode Route: Convert the binary sample back to a city sequence.

n = len(cities)
route = []
for t in range(n):
    for c in range(n):
        if best_sample[c * n + t]:
            route.append(cities[c])
            break
print("Optimized Route:", route)

Validate: Ensure constraints are met before using the route.

assert len(set(route)) == n, "Route contains duplicate cities"

Mid-Year Sale — Unlock Full Article