L-BFGS-B - Implementation Details

← Back to Optimizer Competition

HumpDay's LBFGSB is a faithful pure-Python port of scipy's _minimize_lbfgsb with four elements that distinguish it from a textbook two-loop L-BFGS:

Bound-aware direction projection — direction components that would push x past an active bound are zeroed (the simple-bounds reduction of the Cauchy-point step).
Projected-gradient sup-norm convergence test (pgtol).
f-tolerance termination — stop when (f_old − f_new) < factr·eps_mach·max(|f|, 1), scipy's headline criterion.
Feasibility-capped initial step — cap step length so x + step·direction stays in [0,1]ⁿ before backtracking.

HumpDay is a derivative-free library — algorithms see only f(x), never an analytical gradient. Where scipy's L-BFGS-B uses analytic gradients (or a tighter Fortran-backed FD), HumpDay computes a central finite-difference gradient (2·n_dim evaluations) per L-BFGS iteration. That's the only structural deviation from the reference algorithm.

Interactive 3D Visualization

See L-BFGS-B in action on 3D optimization surfaces:

Loading 3D visualization...

Requires WebGL support

Instructions: Choose a test function and algorithm, then click Start to watch the step-by-step optimization process.

Auto-selection metadata. Consulted by humpday.minimize() when picking a default algorithm.

Overhead tier 1: light (~100µs/iter). Eligible when measured eval_time ≥ 10 µs.
Dimensional cap: no cap — linear-in-dim or better.
Minimum trials: 10 — baseline minimum.

Implementation Details

Component	Details	Links
Original Algorithm	Byrd, Lu, Nocedal & Zhu Limited-memory BFGS with bound constraints Quasi-Newton method for large-scale problems Published: 1995	Original Paper
Reference Implementation	SciPy optimize.minimize Method: 'L-BFGS-B' scipy.optimize.minimize(method='L-BFGS-B') Package: scipy	SciPy Docs Source
HumpDay Python	HumpDay `LBFGSB` Pure-Python port of scipy's `_minimize_lbfgsb`: two-loop recursion + bound-aware direction projection + projected-gradient pgtol + factr·eps_mach termination + feasibility-capped Armijo line search. FD gradient (2·n_dim evals per iteration); memory `m = min(10, n_dim)`. File: `humpday/optimizers/scipy_algorithms.py`	Source
HumpDay JavaScript	Browser `LBFGSB` Mirrors the Python port line-for-line. Also shared via `BaseOptimizer._lbfgsPolish` as the polish stage for DE / SA / BayesianOpt / PSO / CMA-ES / FA. Class: `LBFGSB`	JS Port

Performance characteristics

Best for: Smooth, near-quadratic basins where the L-BFGS curvature estimate is informative. Routinely the polish stage of choice at the end of population-based methods.
Worst for: Multimodal landscapes — L-BFGS-B is local, not global. The bound-aware direction projection keeps it from wandering off the cube but it has no escape from a wrong basin.
Per iteration: central FD gradient (2·n_dim evaluations) + one or more line-search trials. Memory m = min(10, n_dim).
Relative to scipy.optimize L-BFGS-B at n_trials=200, n_dim=2: ties on sphere and Ackley; a small residual on Rosenbrock from the FD-gradient cost. See SOTA status for the live numbers.
Reused as a polish stage by DE, SA, BayesianOpt, PSO, CMA-ES, and FireflyAlgorithm via BaseOptimizer._lbfgs_polish.

Educational Resources

Algorithm Components

Limited Memory: Store only m recent (s,y) pairs instead of full Hessian
BFGS Update: Quasi-Newton Hessian approximation using gradients
Two-Loop Recursion: Efficient computation of search direction
Bound Constraints: Active set method for box constraints
Line Search: Wolfe conditions for step length selection

Mathematical Foundation

BFGS Formula: H_{k+1} = H_k + (1 + (y^T H_k y)/(s^T y)) * (s s^T)/(s^T y) - (s y^T H_k + H_k y s^T)/(s^T y)
Two-Loop: q = ∇f_k, then backwards and forwards loops over stored vectors
Initial Hessian: H_0 = γI where γ = (s^T y)/(y^T y)
Bound Projection: Project gradient onto feasible cone
Cauchy Point: Piecewise linear minimizer along projected gradient

️ Key Parameters

Memory Size (m): Usually 5-20 (s,y) pairs stored
Line Search: Wolfe conditions c1=1e-4, c2=0.9
Gradient Tolerance: ||∇f|| < gtol (typically 1e-5)
Function Tolerance: Relative change in f < ftol
Max Iterations: Prevent infinite loops

L-BFGS vs L-BFGS-B

L-BFGS: Unconstrained optimization only
L-BFGS-B: Extends to simple bound constraints
Active Set: L-BFGS-B identifies and handles active bounds
Subspace: Optimization in subspace of free variables
Complexity: Similar O(mn) complexity per iteration

Applications

Machine Learning: Training large neural networks
Parameter Estimation: Maximum likelihood estimation
Signal Processing: Sparse signal reconstruction
Engineering Design: Structural optimization with bounds
Portfolio Optimization: Asset allocation with constraints
Image Processing: Image restoration and deconvolution

Why L-BFGS-B is Popular

Memory Efficient: O(mn) storage vs O(n²) for full Newton
Fast Convergence: Superlinear on smooth problems
Handles Bounds: Natural box constraint support
Robust Implementation: Well-tested in SciPy
Scalable: Works well on high-dimensional problems
Default Choice: Often first choice for smooth bounded optimization