L-BFGS-B

Byrd-Lu-Nocedal-Zhu (1995) limited-memory quasi-Newton with simple bounds, on finite-difference gradients

HumpDay's LBFGSB is a faithful pure-Python port of scipy's _minimize_lbfgsb with four elements that distinguish it from a textbook two-loop L-BFGS:
  1. Bound-aware direction projection — direction components that would push x past an active bound are zeroed (the simple-bounds reduction of the Cauchy-point step).
  2. Projected-gradient sup-norm convergence test (pgtol).
  3. f-tolerance termination — stop when (f_old − f_new) < factr·eps_mach·max(|f|, 1), scipy's headline criterion.
  4. Feasibility-capped initial step — cap step length so x + step·direction stays in [0,1]n before backtracking.
HumpDay is a derivative-free library — algorithms see only f(x), never an analytical gradient. Where scipy's L-BFGS-B uses analytic gradients (or a tighter Fortran-backed FD), HumpDay computes a central finite-difference gradient (2·n_dim evaluations) per L-BFGS iteration. That's the only structural deviation from the reference algorithm.

Interactive 3D Visualization

See L-BFGS-B in action on 3D optimization surfaces:

Loading 3D visualization...

Requires WebGL support

Instructions: Choose a test function and algorithm, then click Start to watch the step-by-step optimization process.

Implementation Details

Component Details Links
Original Algorithm Byrd, Lu, Nocedal & Zhu
Limited-memory BFGS with bound constraints
Quasi-Newton method for large-scale problems
Published: 1995
Original Paper
Reference Implementation SciPy optimize.minimize
Method: 'L-BFGS-B'
scipy.optimize.minimize(method='L-BFGS-B')
Package: scipy
SciPy Docs Source
HumpDay Python HumpDay LBFGSB
Pure-Python port of scipy's _minimize_lbfgsb: two-loop recursion + bound-aware direction projection + projected-gradient pgtol + factr·eps_mach termination + feasibility-capped Armijo line search.
FD gradient (2·n_dim evals per iteration); memory m = min(10, n_dim).
File: humpday/optimizers/scipy_algorithms.py
Source
HumpDay JavaScript Browser LBFGSB
Mirrors the Python port line-for-line. Also shared via BaseOptimizer._lbfgsPolish as the polish stage for DE / SA / BayesianOpt / PSO / CMA-ES / FA.
Class: LBFGSB
JS Port

Performance characteristics

  • Best for: Smooth, near-quadratic basins where the L-BFGS curvature estimate is informative. Routinely the polish stage of choice at the end of population-based methods.
  • Worst for: Multimodal landscapes — L-BFGS-B is local, not global. The bound-aware direction projection keeps it from wandering off the cube but it has no escape from a wrong basin.
  • Per iteration: central FD gradient (2·n_dim evaluations) + one or more line-search trials. Memory m = min(10, n_dim).
  • Relative to scipy.optimize L-BFGS-B at n_trials=200, n_dim=2: ties on sphere and Ackley; a small residual on Rosenbrock from the FD-gradient cost. See SOTA status for the live numbers.
  • Reused as a polish stage by DE, SA, BayesianOpt, PSO, CMA-ES, and FireflyAlgorithm via BaseOptimizer._lbfgs_polish.

Educational Resources

Algorithm Components

  • Limited Memory: Store only m recent (s,y) pairs instead of full Hessian
  • BFGS Update: Quasi-Newton Hessian approximation using gradients
  • Two-Loop Recursion: Efficient computation of search direction
  • Bound Constraints: Active set method for box constraints
  • Line Search: Wolfe conditions for step length selection

Mathematical Foundation

  • BFGS Formula: H_{k+1} = H_k + (1 + (y^T H_k y)/(s^T y)) * (s s^T)/(s^T y) - (s y^T H_k + H_k y s^T)/(s^T y)
  • Two-Loop: q = ∇f_k, then backwards and forwards loops over stored vectors
  • Initial Hessian: H_0 = γI where γ = (s^T y)/(y^T y)
  • Bound Projection: Project gradient onto feasible cone
  • Cauchy Point: Piecewise linear minimizer along projected gradient

️ Key Parameters

  • Memory Size (m): Usually 5-20 (s,y) pairs stored
  • Line Search: Wolfe conditions c1=1e-4, c2=0.9
  • Gradient Tolerance: ||∇f|| < gtol (typically 1e-5)
  • Function Tolerance: Relative change in f < ftol
  • Max Iterations: Prevent infinite loops

L-BFGS vs L-BFGS-B

  • L-BFGS: Unconstrained optimization only
  • L-BFGS-B: Extends to simple bound constraints
  • Active Set: L-BFGS-B identifies and handles active bounds
  • Subspace: Optimization in subspace of free variables
  • Complexity: Similar O(mn) complexity per iteration

Applications

  • Machine Learning: Training large neural networks
  • Parameter Estimation: Maximum likelihood estimation
  • Signal Processing: Sparse signal reconstruction
  • Engineering Design: Structural optimization with bounds
  • Portfolio Optimization: Asset allocation with constraints
  • Image Processing: Image restoration and deconvolution

Why L-BFGS-B is Popular

  • Memory Efficient: O(mn) storage vs O(n²) for full Newton
  • Fast Convergence: Superlinear on smooth problems
  • Handles Bounds: Natural box constraint support
  • Robust Implementation: Well-tested in SciPy
  • Scalable: Works well on high-dimensional problems
  • Default Choice: Often first choice for smooth bounded optimization