Coordinate Descent - Implementation Details

← Back to Optimizer Competition

Coordinate Descent minimises a black-box objective by sweeping one coordinate at a time. Each sweep tries a step of size step in the ± direction on each axis; on the first improving move, the algorithm greedily expands — it keeps stepping in the same direction with the same step size while the objective keeps improving. After a full sweep with no improvement the step is halved.

HumpDay's coordinate descent at a glance

Greedy expansion per axis: on first improvement at a coordinate, keep stepping in that direction with the same step until f stops improving.
Step halving: when a full coordinate sweep finds no improvement, halve the step.
Restart on multimodal landscapes: when step ≤ 1e-6 with f > 1e-8 the run is presumed trapped in a local basin and reinitialises from a random point. On smooth basins (sphere, Rosenbrock) the run exits naturally before the restart fires.
Initial step 0.1, floor 1e-12, on the unit cube [0, 1]ⁿ.

Interactive 3D Visualization

See Coordinate Descent in action on 3D optimization surfaces:

Loading 3D visualization...

Requires WebGL support

Instructions: Choose a test function and algorithm, then click Start to watch the step-by-step optimization process.

Auto-selection metadata. Consulted by humpday.minimize() when picking a default algorithm.

Overhead tier 1: light (~100µs/iter). Eligible when measured eval_time ≥ 10 µs.
Dimensional cap: no cap — linear-in-dim or better.
Minimum trials: 10 — baseline minimum.

Implementation Details

Component	Details	Links
Algorithm Foundation	Coordinate descent (cyclic) One of the oldest derivative-free methods; predates the optimization literature. HumpDay's variant uses Brent-style greedy expansion per axis (no inner line-search subroutine) and step halving on failed sweeps. Reference: Wright (2015), "Coordinate descent algorithms", Math. Programming 151(1)	Wikipedia
Reference Implementation	Custom Implementation No single standard package (widely implemented) Various coordinate selection strategies Self-reference: Algorithm-specific variants	Boyd et al.
HumpDay Python	HumpDay `CoordinateDescent` Cyclic sweeps, greedy expansion on the first improving axis step, halve step on failed sweeps, restart trigger on multimodal trapping. Pure Python; no required dependencies. File: `humpday/optimizers/search_algorithms.py`	Source
HumpDay JavaScript	Browser `CoordinateDescent` Same algorithm as the Python port (greedy expansion + restart). Used in the contest and the in-browser visualizer. Class: `CoordinateDescent`	JS Implementation

Performance characteristics

Best for: Smooth, near-axis-aligned objectives where greedy expansion gets traction.
Worst for: Curved valleys and highly multimodal landscapes — coordinate moves zigzag down a curved valley (Rosenbrock) and trap easily in Ackley-style basins.
Dimensions: Each sweep is O(n_dim) function evaluations; the algorithm scales cheaply with dimension but precision suffers as the basin shape becomes less axis-aligned.
Convergence: Geometric reduction of the step size; a single run halts when step falls to the floor (or the restart trigger fires).
Relative to a textbook greedy coordinate descent reference at n_trials=200, n_dim=2: matches on Rosenbrock; the restart trigger leaves a precision gap on sphere and Ackley (see SOTA status for the live numbers).

Educational Resources

Algorithm Variants

Cyclic Coordinate Descent: Fixed order through coordinates
Random Coordinate Descent: Random coordinate selection
Gauss-Southwell: Choose coordinate with largest gradient component
Block Coordinate Descent: Update blocks of coordinates simultaneously
Accelerated Coordinate Descent: Momentum-based improvements

Mathematical Foundation

Update Rule: x_i^(k+1) = arg min_z f(x_1^(k+1), ..., x_{i-1}^(k+1), z, x_{i+1}^k, ..., x_n^k)
Separability: f(x) = Σᵢ fᵢ(xᵢ) + interaction terms
Convergence Rate: O(1/k) for convex, linear for strongly convex
Line Search: Exact or Armijo backtracking along coordinate

️ Coordinate Selection Strategies

Cyclic (Round-Robin): i = (k mod n) + 1
Random: i ~ Uniform{1, 2, ..., n}
Gauss-Southwell: i = arg max_j |∂f/∂x_j|
Lipschitz-Based: Probability ∝ Lipschitz constants
Greedy: Choose coordinate with maximum expected decrease

Algorithm Pseudocode

function coordinateDescent(f, x0, maxIter, tolerance):
    x = x0.copy()
    n = length(x)

    for k in 1 to maxIter:
        x_old = x.copy()

        for i in 1 to n:  // Cyclic coordinate selection
            // Optimize f along coordinate i
            x[i] = lineSearch(lambda t: f(..., x[i-1], t, x[i+1], ...))

        if ||x - x_old|| < tolerance:
            break

    return x

When Coordinate Descent Excels

Separable Functions: f(x) = Σᵢ fᵢ(xᵢ) with minimal coupling
Sparse Problems: Most coordinates have little impact
Large Scale: Memory efficient, good for high dimensions
Non-smooth Functions: Handles piecewise linear objectives
Constrained Problems: Easy to handle box constraints

️ Limitations

Coupling: Slow on strongly coupled variables
Conditioning: Poor performance on ill-conditioned problems
Non-Convex: Can get trapped in local minima
Step Size: May require careful line search tuning
Coordinate System: Performance depends on variable scaling

Applications

Lasso Regression: L1-regularized least squares
SVM Training: Sequential minimal optimization (SMO)
Matrix Factorization: Alternating least squares
Deep Learning: Per-layer parameter updates
Game Theory: Best response dynamics
Signal Processing: Sparse coding and denoising

Modern Extensions

Proximal Coordinate Descent: Handle non-smooth regularizers
Stochastic Coordinate Descent: Random sampling with convergence guarantees
Parallel Coordinate Descent: Update multiple coordinates simultaneously
Adaptive Methods: Automatically adjust step sizes
Accelerated Variants: Nesterov-style momentum acceleration