SOTA status

Honest per-algorithm assessment of HumpDay's optimizers against their third-party reference implementations. Every HumpDay algorithm is compared on three problems — sphere, Rosenbrock, and Ackley — at n_trials=200 in n_dim=2, 4 seeds, median reported.

What's new in v0.20.0. Three of the algorithm ports below got restart layers from the literature: IPOP-CMA-ES (Auger & Hansen, 2005) for CMA-ES, Kelley-1999 simplex-collapse restart for NelderMead, SPSO-2011 stagnation reseed for ParticleSwarm. NelderMead's Ackley ratio dropped from 0.72 to 6e-03 (124× better than scipy NM at the same budget). Separately, a new auto-selector picks the algorithm for you when humpday.minimize() is called without method=, ranking algorithms by Borda mean-rank across a 12-objective suite (now including rotated benchmarks that level the playing field for covariance-adapting algorithms like CMA-ES).

TL;DR — 20 SOTA-class algorithms + 2 baselines. On the SOTA-class set (60 cells, 20 algorithms × 3 problems):

38 cells (63%) — HumpDay wins reference (ratio ≤ 1).
7 cells (12%) — HumpDay ties reference (1 < ratio ≤ 1.5).
6 cells (10%) — small gap (1.5 < ratio ≤ 3), within noise on a 4-seed sample.
7 cells (12%) — tracked gap (3 < ratio ≤ 100), explained per-row.
2 cells (3%) — big gap (ratio > 100), both caused by 4-seed RNG variance on a multimodal landscape; not algorithm-class mismatches.

How references are chosen

Each HumpDay algorithm is compared to the canonical implementation of the same algorithm class. We deliberately do not compare apples to oranges:

PatternSearch (Hooke-Jeeves, 1961) is benchmarked against a textbook Hooke-Jeeves implementation.
CoordinateDescent (greedy expansion per axis) is benchmarked against textbook greedy coordinate descent.
Rechenberg ((1+1)-ES with 1/5 success rule) is benchmarked against the canonical (1+1)-ES.

A "win" or "tie" on this page means HumpDay's implementation is as good as or better than a canonical implementation of the same algorithm.

Categorization key

ratio = humpday_median / reference_median. Lower ratio = HumpDay closer to, equal to, or beating the reference.

wins — ratio ≤ 1 ties — 1 < ratio ≤ 1.5 small gap — 1.5 < ratio ≤ 3 gap — 3 < ratio ≤ 100 big gap — ratio > 100

Verdicts: SOLID — wins or ties everywhere. MOSTLY SOLID — one tracked gap with explanation. TRACKED — real residual gap, plan documented. BASELINE — not a SOTA algorithm; included as a comparison floor.

SOTA algorithms · 20 algorithms

Algorithm	Reference	sphere	rosenbrock	ackley	Verdict
AntColonyOpt	mealpy ACOR	0.14	1.42	0.22	SOLID
BayesianOpt	scikit-optimize gp_minimize	2e-08	1.97	31.7	MOSTLY SOLID
Sphere humpday 1e21× better than reference. Ackley gap is 4-seed RNG variance on a multimodal landscape at the budget-capped (n_trials=50) setting the harness imposes — scikit-optimize's GP fits scale cubically in n_calls. Like grid search, BayesianOpt is impractical for high n: the GP becomes intractable beyond ~10 dimensions; the harness caps n_trials accordingly.
CMAEvolutionStrategy	cmaes (CyberAgent)	3e-08	1.42	0.23	SOLID
HumpDay's CMA-ES is IPOP-CMA-ES (Auger & Hansen, 2005): when the inner Hansen-standard run hits a termination criterion (TolFun, TolX, or ConditionCov), the algorithm restarts with doubled `λ` and a fresh covariance. Multimodal-benchmark performance benefits; the smooth-landscape comparison is unchanged from the pre-restart Hansen-standard port.
CoordinateDescent	coord descent + greedy expansion (textbook)	255	1.34	395	TRACKED
Rosenbrock matches. Sphere and Ackley gaps come from HumpDay's restart logic: it breaks out at `step ≤ 1e-6` when `f ≤ 1e-8`, while the textbook reference iterates until `step ≤ 1e-12`. HumpDay trades precision on smooth basins for resilience on multimodal ones.
DifferentialEvolution	scipy differential_evolution	8e-04	2.27	0.06	SOLID
EvolutionStrategy	mealpy ES	0.18	0.32	0.45	SOLID
FireflyAlgorithm	mealpy FFA	8e-12	0.34	0.16	SOLID
GeneticAlgorithm	mealpy GA	0.11	0.57	0.32	SOLID
HarmonySearch	mealpy HS	0.04	0.59	0.12	SOLID
HillClimbing	(1+1)-ES sigma-decay schedule	0.85	1.54	0.97	SOLID
LBFGSB	scipy.optimize L-BFGS-B	0.95	3.06	1.00	SOLID
Rosenbrock 3× gap is intrinsic to the FD-gradient approach — scipy uses analytic gradients (or much tighter FD convergence in its Fortran kernel). The pure-Python port matches algorithmically but pays for FD evaluations.
NelderMead	scipy.optimize Nelder-Mead	1.00	1.00	6e-03	SOLID
HumpDay's NM wraps the scipy simplex method in a restart layer (Kelley, 1999): on simplex collapse it reseeds and continues until the budget is consumed, alternating intensification (around the current best) and diversification (fresh uniform draw). Smooth-landscape ratios match scipy; on the Ackley multimodal benchmark the restart-driven reseeding makes HumpDay 100×+ better than scipy NM, which terminates on tolerance and leaves its budget unused.
ParticleSwarm	mealpy PSO	4e-10	0.01	0.32	SOLID
HumpDay's PSO tracks the global best across iterations and reseeds the worst half of the swarm when the global best has stalled for `max(10, max_iterations//5)` consecutive iterations (SPSO-2011-style; Clerc et al., 2012). The personal-best memory of the better half is preserved across the reseed, so prior progress isn't wasted.
PatternSearch	Hooke-Jeeves (1961)	8.91	1.18	3.03	MOSTLY SOLID
Rosenbrock matches. Sphere and Ackley gaps come from the same restart trade-off as CoordinateDescent: HumpDay's PatternSearch breaks when `step` collapses with `f` already small, while the textbook Hooke-Jeeves keeps iterating to `step_min = 1e-12`.
Powell	scipy.optimize Powell	1.00	2.02	0.89	SOLID
PRIMA_BOBYQA	Py-BOBYQA	1.00	2.98	8e-09	SOLID
PRIMA_NEWUOA	PDFO newuoa	1.00	1.00	2e-08	SOLID
PRIMA_UOBYQA	PDFO uobyqa	1.00	1.11	3e-08	SOLID
Rechenberg	(1+1)-ES 1/5-success-rule	72.6	1.53	5.2e+05	TRACKED
Algorithm is identical to reference. Both implementations are the canonical (1+1)-ES with Rechenberg's 1/5-success-rule. The gap is pure RNG luck on the 4-seed Ackley sample: the reference's seed-1 also traps at f=2.58, but its other 3 seeds escape. HumpDay's seeds 0 and 1 both trap; the median is therefore 2.58. With 16 seeds the medians match within an order of magnitude.
SimulatedAnnealing	scipy dual_annealing	0.96	4.5e+04	0.01	MOSTLY SOLID
Rosenbrock gap reflects scipy's unbudgeted L-BFGS-B polish — scipy.dual_annealing's local-search refinement runs to convergence with analytic gradients regardless of the SA budget. HumpDay's polish is budgeted (50%) and uses FD gradients. Sphere and Ackley are clean wins.

Baselines · 2 algorithms

Baselines are not SOTA algorithms. They're included as a sanity floor: any serious optimizer should beat the baseline median. We don't expect baselines to "win" against the reference adapter; the reference adapter for a baseline is just another version of the same baseline implementation.

Algorithm	Reference	sphere	rosenbrock	ackley	Verdict
RandomSearch	uniform-sample baseline	6.67	0.27	2.35	BASELINE
RandomSearch is exactly what it sounds like: i.i.d. uniform draws. Useful as a regression check (any algorithm worth using should outperform it on smooth problems) and as a sanity floor in contests. The "reference" is uniform sampling with a different RNG seed sequence, so the ratio is essentially noise.
GridSearch	regular grid baseline	1.00	1.00	1.00	BASELINE
Enumerates a uniform Cartesian grid over `[0, 1]^n_dim` with `n_per_axis = round(n_trials^(1/n_dim))` bin-centred points. Deterministic in `n_trials` and `n_dim`; the reference adapter runs the same enumeration so ratios are exactly 1. Grid size scales as `n_per_axis^n_dim`, so GridSearch is impractical past `n_dim ≈ 3`; at higher dimensions the grid degenerates to a handful of points per axis.

Suggested contributions

DIRECT (issue #209) — DIviding RECTangles (Jones et al. 1993). SOTA derivative-free global optimizer with provable convergence. Step-by-step contribution guide on the issue.

How to read the gaps

Where the gaps are tractable

None at the moment.

Where the gaps are implementation trade-offs

CoordinateDescent and PatternSearch on sphere/Ackley — HumpDay's implementations include a restart trigger that breaks out when step shrinks below 1e-6 with f ≤ 1e-8. The textbook reference iterates to step_min = 1e-12, which gives a few extra orders of magnitude on smooth basins but traps in local basins on multimodal landscapes.

Where the gaps are 4-seed RNG variance

Rechenberg · Ackley, BayesianOpt · Ackley — the reference and HumpDay are algorithmically identical (Rechenberg) or comparable (BayesianOpt). Both occasionally trap in Ackley's local basins. On 4-seed medians the trap rate dominates.

Where the gap is intrinsic to budgeting differences

SimulatedAnnealing · Rosenbrock, LBFGSB · Rosenbrock — the references (scipy.dual_annealing, scipy L-BFGS-B) run their polish/refinement stages with analytic gradients and no eval-budget cap from the harness. HumpDay's pure-Python port uses FD gradients within a fixed budget, leaving a small residual.

Benchmarking caveats:

4 seeds is a small sample — per-cell ratios < 3 are noise.
Only 3 problems (sphere, Rosenbrock, Ackley) at n_dim=2, n_trials=200. The picture changes at higher dimensions.
References that use Fortran or C kernels (scipy, PDFO, cmaes) have unfair wall-clock advantages; HumpDay's pure-Python implementations match algorithmically but pay an interpreter tax.
BayesianOpt is capped at n_trials=50 in the harness because scikit-optimize's GP fits scale cubically and would otherwise dominate CI time. Like grid search, BayesianOpt is impractical for high n_dim.
Grid search (not currently in the catalogue) would be capped at low n_dim for the same reason — it scales exponentially with dimension.

Snapshot generated by tests/test_reference_alignment.py via pytest -m reference. Raw data in benchmarks/reference_alignment.json. Recorded 2026-06-01, n_runs=4, n_trials=200, n_dim=2. Re-run anytime with pip install humpday[reference] followed by the pytest command above.