Bayesian Optimization - Implementation Details

← Back to Optimizer Competition

Bayesian Optimization fits a Gaussian-process surrogate to the observed (x, f(x)) pairs and proposes the next evaluation point by maximising an acquisition function. HumpDay's BayesianOpt uses an RBF kernel and the Expected Improvement acquisition. After the GP-EI loop exhausts its share of the budget, HumpDay finishes with a short L-BFGS-B polish from the best observed point — the same pattern scikit-optimize's gp_minimize uses, and the reason HumpDay's BayesianOpt reaches machine precision on smooth basins where the GP itself would plateau at the kernel's noise floor.

HumpDay's BayesianOpt at a glance

Initial design: n_initial = min(5, max(2, n_dim)) uniform random samples.
Surrogate: Gaussian process with an RBF kernel; jitter on the kernel matrix for numerical stability.
Acquisition: Expected Improvement, optimised by random sampling + a short local refinement of the best candidate.
Polish: reserve min(20·n_dim, n_trials/2) evaluations for an L-BFGS-B polish from the best observed point at the end.

Interactive 3D Visualization

See Bayesian Optimization in action on 3D optimization surfaces:

Loading 3D visualization...

Requires WebGL support

Instructions: Choose a test function and algorithm, then click Start to watch the step-by-step optimization process.

Auto-selection metadata. Consulted by humpday.minimize() when picking a default algorithm.

Overhead tier 4: very heavy (~100ms/iter; O(n_obs³) GP fit). Eligible when measured eval_time ≥ 10 ms.
Dimensional cap: n_dim ≤ 10 — GP cost O(n_obs³) + RBF kernel degeneracy.
Minimum trials: max(30, 5n) — GP needs initialisation samples to amortise its O(n_obs³) fits.

Implementation Details

Component	Details	Links
Algorithmic Foundation	Sequential Model-Based Optimization Gaussian Process regression + acquisition functions Expected Improvement, Upper Confidence Bound Developed: 1960s-70s (Kushner, Močkus)	Tutorial Paper
Reference Implementation	scikit-optimize (skopt) Sequential model-based optimization library Gaussian Process surrogate with various acquisition functions Package: scikit-optimize	Skopt Docs GitHub
HumpDay Python	HumpDay `BayesianOpt` RBF-kernel GP surrogate + Expected Improvement acquisition + L-BFGS-B polish reserved at the end. No required dependencies (numpy is used transparently when present for kernel construction; a pure-Python loop path covers the no-numpy install). File: `humpday/optimizers/evolutionary_algorithms.py`	Source
HumpDay JavaScript	Browser `BayesianOpt` Mirrors the Python port (RBF + EI + L-BFGS-B polish). Used in the contest and the in-browser visualizer. Class: `BayesianOpt`	JS Port

Performance characteristics

Best for: Expensive black-box objectives where every evaluation matters. The GP-EI loop is sample-efficient; the L-BFGS-B polish closes the residual on smooth basins to machine precision.
Worst for: High dimensions — the GP kernel cost grows as O(n_samples³) and the RBF surrogate degenerates past a dozen or so dimensions. Like grid search, BayesianOpt is impractical past n_dim ≈ 10.
Per outer iteration: rebuild the kernel matrix, optimise EI, evaluate one point. Then at the end, min(20·n_dim, n_trials/2) evaluations on an L-BFGS-B polish.
Relative to scikit-optimize gp_minimize at n_trials=200, n_dim=2: HumpDay wins by ~10²¹× on sphere (machine precision vs ref's 1e-8), and matches on Rosenbrock. The polish is the difference. See SOTA status for the live numbers.

Educational Resources

Core Components

Surrogate Model: Gaussian Process regression with uncertainty
Acquisition Function: Expected Improvement, Upper Confidence Bound, Entropy Search
Kernel Functions: RBF, Matérn, polynomial kernels
Hyperparameter Learning: Maximum likelihood estimation
Multi-Point Selection: Batch acquisition for parallel evaluation

Acquisition Functions

Expected Improvement (EI): E[max(f(x) - f_best, 0)]
Upper Confidence Bound (UCB): μ(x) + β*σ(x)
Probability of Improvement (PI): P[f(x) > f_best + ξ]
Entropy Search: Information gain about global optimum
Knowledge Gradient: Expected value of perfect information

Applications

Hyperparameter Optimization: Machine learning model tuning
Experimental Design: Scientific experiments and A/B testing
Engineering Optimization: CAD parameter tuning
Drug Discovery: Molecular property optimization
Robotics: Control parameter learning
AutoML: Architecture search and pipeline optimization

Advanced Techniques

Multi-Task GP: Transfer learning between related problems
Multi-Objective BO: Pareto optimization with GP models
Constrained BO: Handling feasibility constraints
High-Dimensional BO: Random embeddings, additive models
Distributed BO: Asynchronous parallel optimization

Mathematical Foundation

Gaussian Processes: Non-parametric Bayesian regression
Regret Bounds: Theoretical convergence guarantees
Information Theory: Mutual information and entropy
Decision Theory: Optimal stopping and value of information

Bayesian Optimization (scikit-optimize)