Coordinate Descent minimises a black-box objective by sweeping one coordinate at a time. Each sweep tries a step of size
step in the ± direction on each axis; on the first improving move, the algorithm greedily expands — it keeps stepping in the same direction with the same step size while the objective keeps improving. After a full sweep with no improvement the step is halved.
HumpDay's coordinate descent at a glance
- Greedy expansion per axis: on first improvement at a coordinate, keep stepping in that direction with the same step until f stops improving.
- Step halving: when a full coordinate sweep finds no improvement, halve the step.
- Restart on multimodal landscapes: when
step ≤ 1e-6withf > 1e-8the run is presumed trapped in a local basin and reinitialises from a random point. On smooth basins (sphere, Rosenbrock) the run exits naturally before the restart fires. - Initial step
0.1, floor1e-12, on the unit cube[0, 1]n.
Interactive 3D Visualization
See Coordinate Descent in action on 3D optimization surfaces:
Loading 3D visualization...
Requires WebGL support
Instructions: Choose a test function and algorithm, then click Start to watch the step-by-step optimization process.
Implementation Details
| Component | Details | Links |
|---|---|---|
| Algorithm Foundation |
Coordinate descent (cyclic) One of the oldest derivative-free methods; predates the optimization literature. HumpDay's variant uses Brent-style greedy expansion per axis (no inner line-search subroutine) and step halving on failed sweeps. Reference: Wright (2015), "Coordinate descent algorithms", Math. Programming 151(1) |
Wikipedia |
| Reference Implementation |
Custom Implementation No single standard package (widely implemented) Various coordinate selection strategies Self-reference: Algorithm-specific variants |
Boyd et al. |
| HumpDay Python |
HumpDay CoordinateDescentCyclic sweeps, greedy expansion on the first improving axis step, halve step on failed sweeps, restart trigger on multimodal trapping. Pure Python; no required dependencies. File: humpday/optimizers/search_algorithms.py
|
Source |
| HumpDay JavaScript |
Browser CoordinateDescentSame algorithm as the Python port (greedy expansion + restart). Used in the contest and the in-browser visualizer. Class: CoordinateDescent
|
JS Implementation |
Performance characteristics
- Best for: Smooth, near-axis-aligned objectives where greedy expansion gets traction.
- Worst for: Curved valleys and highly multimodal landscapes — coordinate moves zigzag down a curved valley (Rosenbrock) and trap easily in Ackley-style basins.
- Dimensions: Each sweep is O(n_dim) function evaluations; the algorithm scales cheaply with dimension but precision suffers as the basin shape becomes less axis-aligned.
- Convergence: Geometric reduction of the step size; a single run halts when step falls to the floor (or the restart trigger fires).
- Relative to a textbook greedy coordinate descent reference at
n_trials=200, n_dim=2: matches on Rosenbrock; the restart trigger leaves a precision gap on sphere and Ackley (see SOTA status for the live numbers).
Educational Resources
- Coordinate Descent Wikipedia
- Proximal Algorithms (Boyd et al.)
- Coordinate Descent Converges Faster with Gauss-Southwell Rule
- CMU Convex Optimization Notes
Algorithm Variants
- Cyclic Coordinate Descent: Fixed order through coordinates
- Random Coordinate Descent: Random coordinate selection
- Gauss-Southwell: Choose coordinate with largest gradient component
- Block Coordinate Descent: Update blocks of coordinates simultaneously
- Accelerated Coordinate Descent: Momentum-based improvements
Mathematical Foundation
- Update Rule: x_i^(k+1) = arg min_z f(x_1^(k+1), ..., x_{i-1}^(k+1), z, x_{i+1}^k, ..., x_n^k)
- Separability: f(x) = Σᵢ fᵢ(xᵢ) + interaction terms
- Convergence Rate: O(1/k) for convex, linear for strongly convex
- Line Search: Exact or Armijo backtracking along coordinate
️ Coordinate Selection Strategies
- Cyclic (Round-Robin): i = (k mod n) + 1
- Random: i ~ Uniform{1, 2, ..., n}
- Gauss-Southwell: i = arg max_j |∂f/∂x_j|
- Lipschitz-Based: Probability ∝ Lipschitz constants
- Greedy: Choose coordinate with maximum expected decrease
Algorithm Pseudocode
function coordinateDescent(f, x0, maxIter, tolerance):
x = x0.copy()
n = length(x)
for k in 1 to maxIter:
x_old = x.copy()
for i in 1 to n: // Cyclic coordinate selection
// Optimize f along coordinate i
x[i] = lineSearch(lambda t: f(..., x[i-1], t, x[i+1], ...))
if ||x - x_old|| < tolerance:
break
return x
When Coordinate Descent Excels
- Separable Functions: f(x) = Σᵢ fᵢ(xᵢ) with minimal coupling
- Sparse Problems: Most coordinates have little impact
- Large Scale: Memory efficient, good for high dimensions
- Non-smooth Functions: Handles piecewise linear objectives
- Constrained Problems: Easy to handle box constraints
️ Limitations
- Coupling: Slow on strongly coupled variables
- Conditioning: Poor performance on ill-conditioned problems
- Non-Convex: Can get trapped in local minima
- Step Size: May require careful line search tuning
- Coordinate System: Performance depends on variable scaling
Applications
- Lasso Regression: L1-regularized least squares
- SVM Training: Sequential minimal optimization (SMO)
- Matrix Factorization: Alternating least squares
- Deep Learning: Per-layer parameter updates
- Game Theory: Best response dynamics
- Signal Processing: Sparse coding and denoising
Modern Extensions
- Proximal Coordinate Descent: Handle non-smooth regularizers
- Stochastic Coordinate Descent: Random sampling with convergence guarantees
- Parallel Coordinate Descent: Update multiple coordinates simultaneously
- Adaptive Methods: Automatically adjust step sizes
- Accelerated Variants: Nesterov-style momentum acceleration