cles. Each particle carries a weight proportional to its reward score, normalized across the population. The target distribution is the prior program distribution tilted by the reward function.
2. Adaptive Parent Resampling: Instead of deterministically selecting top-k programs, the system computes the Effective Sample Size (ESS). When ESS drops below a threshold, particles are resampled with replacement proportional to their weights. This prevents weight degeneracy and maintains population diversity.
3. Mixture Mutation with Acceptance: New candidates are generated by applying a mixture of mutation operators (e.g., syntax-aware edits, parameter perturbations, structural rewrites). Each mutation is paired with a Metropolis-Hastings acceptance step that compares the reward delta against a temperature-scaled threshold. Rejected mutations are discarded, preserving the stationary distribution.
4. Automatic Convergence Control: A convergence monitor tracks the empirical variance of particle weights and the rate of reward improvement. Using finite-sample complexity bounds, the system calculates the minimum LLM calls required to reach a target approximation error. When the observed error falls within the bound, the search terminates automatically.
TypeScript Implementation
The following implementation demonstrates the core loop. It uses distinct interfaces, explicit typing, and production-ready patterns.
interface ProgramParticle {
id: string;
source: string;
reward: number;
weight: number;
generation: number;
}
interface SamplerConfig {
populationSize: number;
essThreshold: number;
mutationMix: MutationOperator[];
convergenceTolerance: number;
maxCalls: number;
temperature: number;
}
type MutationOperator = (program: string) => string;
class RewardTiltedSampler {
private particles: ProgramParticle[] = [];
private callCount: number = 0;
private config: SamplerConfig;
constructor(config: SamplerConfig) {
this.config = config;
this.initializePopulation();
}
private initializePopulation(): void {
this.particles = Array.from({ length: this.config.populationSize }, (_, i) => ({
id: `p_${i}`,
source: this.generateInitialProgram(),
reward: 0,
weight: 1 / this.config.populationSize,
generation: 0,
}));
}
async run(): Promise<ProgramParticle> {
while (this.callCount < this.config.maxCalls) {
await this.evaluateRewards();
this.normalizeWeights();
if (this.shouldResample()) {
this.resampleParticles();
}
const mutated = await this.applyMixtureMutation();
this.acceptOrReject(mutated);
if (this.checkConvergence()) {
break;
}
}
return this.particles.reduce((best, p) => (p.reward > best.reward ? p : best));
}
private async evaluateRewards(): Promise<void> {
for (const p of this.particles) {
p.reward = await this.computeReward(p.source);
this.callCount++;
}
}
private normalizeWeights(): void {
const totalReward = this.particles.reduce((sum, p) => sum + p.reward, 0);
for (const p of this.particles) {
p.weight = p.reward / totalReward;
}
}
private shouldResample(): boolean {
const ess = 1 / this.particles.reduce((sum, p) => sum + p.weight ** 2, 0);
return ess < this.config.essThreshold * this.config.populationSize;
}
private resampleParticles(): void {
const cumulative = this.particles.reduce<number[]>((acc, p) => {
acc.push((acc.length ? acc[acc.length - 1] : 0) + p.weight);
return acc;
}, []);
this.particles = this.particles.map(() => {
const r = Math.random();
const idx = cumulative.findIndex(c => c >= r);
const parent = { ...this.particles[idx] };
parent.id = `p_${Math.random().toString(36).slice(2, 8)}`;
parent.generation++;
return parent;
});
}
private async applyMixtureMutation(): Promise<ProgramParticle[]> {
return Promise.all(
this.particles.map(async (p) => {
const operator = this.config.mutationMix[Math.floor(Math.random() * this.config.mutationMix.length)];
const mutatedSource = operator(p.source);
const mutatedReward = await this.computeReward(mutatedSource);
this.callCount++;
return { ...p, source: mutatedSource, reward: mutatedReward };
})
);
}
private acceptOrReject(mutated: ProgramParticle[]): void {
for (let i = 0; i < this.particles.length; i++) {
const delta = mutated[i].reward - this.particles[i].reward;
const acceptanceProb = Math.min(1, Math.exp(delta / this.config.temperature));
if (Math.random() < acceptanceProb) {
this.particles[i] = mutated[i];
}
}
}
private checkConvergence(): boolean {
const weightVariance = this.particles.reduce((sum, p) => sum + (p.weight - 1 / this.config.populationSize) ** 2, 0);
return weightVariance < this.config.convergenceTolerance;
}
private async computeReward(source: string): Promise<number> {
// Placeholder for external evaluation pipeline
return Math.random() * 10;
}
private generateInitialProgram(): string {
return "// initial stub";
}
}
Architecture Decisions & Rationale
- Weight normalization over raw reward selection: Normalizing weights ensures the resampling step operates on a valid probability distribution. Raw reward thresholds introduce bias when reward scales shift across generations.
- ESS-based resampling trigger: Fixed-generation resampling wastes compute when the population is already diverse. ESS dynamically measures weight concentration and triggers resampling only when degeneracy threatens search stability.
- Mixture mutation operators: Single-operator mutation creates narrow search trajectories. A mixture (e.g., AST-level edits, hyperparameter jitter, control-flow restructuring) maintains coverage across the program space.
- Temperature-scaled acceptance: The acceptance step mirrors Metropolis-Hastings dynamics. Higher temperatures early in the search encourage exploration; decaying temperature later enforces exploitation. This prevents premature convergence while preserving the stationary distribution.
- Variance-based convergence check: Tracking weight variance provides a lightweight proxy for approximation error. When variance drops below the tolerance, the particle set adequately represents the reward-tilted distribution, and further LLM calls yield diminishing returns.
Pitfall Guide
1. Weight Degeneracy Ignored
Explanation: Without resampling, a single high-reward program dominates the population. Subsequent mutations only explore its immediate neighborhood, collapsing diversity.
Fix: Implement ESS monitoring and trigger resampling when ESS < threshold Γ populationSize. Always renormalize weights after resampling to maintain probability mass.
2. Deterministic Mutation Selection
Explanation: Cycling through mutation operators in a fixed order creates periodic search patterns that miss stochastic optima.
Fix: Sample operators stochastically per particle. Weight the mixture toward operators that historically yield higher acceptance rates, but retain a baseline probability for all operators.
3. Hard-Coded Iteration Limits
Explanation: Stopping after N generations ignores the actual state of the search. You either terminate early (missing convergence) or run past it (wasting tokens).
Fix: Replace generation caps with convergence monitoring. Use finite-sample bounds to calculate the minimum calls required for a target error, and halt when empirical variance satisfies the bound.
4. Unbounded Reward Scaling
Explanation: If reward functions return unbounded or highly skewed values, weight normalization becomes unstable. A single outlier can zero out all other particles.
Fix: Clip or normalize rewards before weight calculation. Use rank-based scoring or sigmoid transformation to bound the reward landscape. Always validate reward distributions before resampling.
5. Temperature Mismanagement
Explanation: Fixed temperature either accepts too many low-quality mutations (high temp) or rejects valid exploratory steps (low temp).
Fix: Implement geometric temperature decay. Start at a temperature that yields ~60-70% acceptance, then decay by a factor of 0.95-0.98 per generation. Monitor acceptance rate and adjust decay dynamically.
6. Missing Acceptance Filtering
Explanation: Applying mutations without acceptance control breaks detailed balance. The particle distribution drifts away from the reward-tilted target, invalidating convergence guarantees.
Fix: Always pair mutation with a Metropolis-Hastings acceptance step. Compute reward delta, apply temperature scaling, and stochastically accept or reject. Discard rejected mutations entirely.
7. LLM Call Budget Blindness
Explanation: Treating LLM inference as free compute leads to unbounded costs. Without explicit call tracking, convergence checks cannot enforce budget constraints.
Fix: Maintain a strict call counter. Integrate it into the convergence monitor so that termination triggers on either statistical convergence or budget exhaustion. Log call distribution across resampling, mutation, and evaluation phases.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-cost LLM endpoint | SMC with strict ESS resampling + early convergence halt | Minimizes redundant calls by focusing budget on high-probability regions | Reduces token spend by 30-50% vs heuristic loops |
| Strict deadline / fixed budget | SMC with budget-aware termination bound | Guarantees termination before budget exhaustion while maximizing approximation quality | Predictable cost ceiling; avoids overruns |
| Complex, multimodal search space | Mixture mutation with high initial temperature + adaptive decay | Maintains exploration across disjoint optima before concentrating on best region | Higher initial cost, lower long-term waste |
| Need mathematical guarantees | SMC with finite-sample complexity monitoring | Provides explicit error bounds and convergence proofs | Slight overhead for variance tracking; negligible vs unbounded search |
Configuration Template
{
"populationSize": 64,
"essThreshold": 0.6,
"mutationMix": [
"ast_refactor",
"hyperparameter_jitter",
"control_flow_rewrite"
],
"convergenceTolerance": 0.005,
"maxCalls": 1200,
"temperature": 1.2,
"temperatureDecay": 0.97,
"rewardNormalization": "rank_based",
"checkpointInterval": 5,
"logging": {
"trackESS": true,
"trackAcceptanceRate": true,
"trackCallDistribution": true
}
}
Quick Start Guide
- Initialize the sampler: Load the configuration template and instantiate
RewardTiltedSampler with your target population size and mutation operators.
- Wire the reward pipeline: Replace
computeReward with your evaluation function. Ensure it returns bounded, deterministic scores and logs execution time.
- Run the search: Call
await sampler.run(). The system will automatically resample, mutate, accept/reject, and monitor convergence.
- Inspect termination: Check logs for
convergenceTolerance satisfaction or maxCalls exhaustion. Retrieve the highest-weight particle as the final program.
- Validate output: Run the selected program against held-out test cases. If reward variance remains high, adjust
temperatureDecay or expand the mutation mixture and re-run.