A regularized nonlinear least-square problem

In this tutorial, we will show how to model and solve the nonconvex nonsmooth least-square problem

\[ \min_{x \in \mathbb{R}^2} \tfrac{1}{2} \sum_{i=1}^m \big(y_i - x_1 e^{x_2 t_i}\big)^2 + \lambda \|x\|_0.\]

This problem models the fitting of an exponential curve, given noisy data.

Modelling the problem

We first formulate the objective function as the sum of a smooth function $f$ and a nonsmooth regularizer $h$:

\[ \tfrac{1}{2} \sum_{i=1}^m \big(y_i - x_1 e^{x_2 t_i}\big)^2 + \lambda \|x\|_0 = f(x) + h(x),\]

where

\[\begin{align*} f(x) &:= \tfrac{1}{2} \sum_{i=1}^m \big(y_i - x_1 e^{x_2 t_i}\big)^2,\\ h(x) &:= \lambda\|x\|_0. \end{align*}\]

To model $f$, we are going to use ADNLPModels.jl. For the nonsmooth regularizer, we use ProximalOperators.jl. We then wrap the smooth function and the regularizer in a RegularizedNLPModel.

using ADNLPModels
using ProximalOperators
using Random
using RegularizedProblems

Random.seed!(0)

# Generate synthetic nonlinear least-squares data
m = 100
t = range(0, 1, length=m)
a_true, b_true = 2.0, -1.0
y = [a_true * exp(b_true * ti) + 0.1*randn() for ti in t]

# Starting point
x0 = [1.0, 0.0]   # [a, b]

# Define nonlinear residuals
function F(x)
  a, b = x
  return [yi - a*exp(b*ti) for (ti, yi) in zip(t, y)]
end

# Build ADNLSModel
f_model = ADNLSModel(F, x0, m, name = "nonlinear LS model of f")

# Get the regularizer from ProximalOperators
λ  = 0.01
h = NormL0(λ)

# Wrap into a RegularizedNLPModel
regularized_pb = RegularizedNLPModel(f_model, h)
Smooth model: ADNLSModel - Nonlinear least-squares model with automatic differentiation backend ADModelBackend{
  ForwardDiffADGradient,
  ForwardDiffADHvprod,
  EmptyADbackend,
  EmptyADbackend,
  EmptyADbackend,
  SparseADHessian,
  EmptyADbackend,
  ForwardDiffADHvprod,
  ForwardDiffADJprod,
  ForwardDiffADJtprod,
  SparseADJacobian,
  SparseADHessian,
}
  Problem name: nonlinear LS model of f
   All variables: ████████████████████ 2      All constraints: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0        All residuals: ████████████████████ 100   
            free: ████████████████████ 2                 free: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0               linear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
           lower: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                lower: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0            nonlinear: ████████████████████ 100   
           upper: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                upper: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                 nnzj: (  0.00% sparsity)   200   
         low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0              low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                 nnzh: ( 33.33% sparsity)   2     
           fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
          infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0               infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
            nnzh: (  0.00% sparsity)   3               linear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
                                                    nonlinear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
                                                         nnzj: (------% sparsity)         

  Counters:
             obj: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                 grad: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                 cons: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
        cons_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0             cons_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                 jcon: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
           jgrad: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                  jac: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0              jac_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
         jac_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                jprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0            jprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
       jprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0               jtprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0           jtprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
      jtprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                 hess: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0                hprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
           jhess: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0               jhprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0             residual: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
    jac_residual: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0       jprod_residual: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0      jtprod_residual: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     
   hess_residual: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0       jhess_residual: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0       hprod_residual: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0     

Regularizer: ProximalOperators.NormL0{Float64}(0.01)

Selected variables: 1:2

Solving the problem

We can now choose one of the solvers presented here to solve the problem we defined above. In the case of least-squares, it is usually more appropriate to choose LM or LMTR.

using RegularizedOptimization

# LM is a quadratic regularization method.
out = LM(regularized_pb, verbose = 1, atol = 1e-4)
println("LM converged after $(out.iter) iterations.")
[ Info:  outer  inner     f(x)     h(x)  √(ξ1/ν)        ρ        σ      ‖x‖      ‖s‖     ‖J‖² LM
[ Info:      0     25  1.0e+01  1.0e-02  2.6e+01  8.4e-01  7.4e-04  1.0e+00  1.5e+00  1.3e+02 =
[ Info:      1     14  2.0e+00  2.0e-02  1.3e+01  9.9e-01  7.4e-04  2.2e+00  3.0e-01  5.2e+01 ↘
[ Info:      2     15  3.8e-01  2.0e-02  1.5e+00  1.0e+00  2.5e-04  2.2e+00  3.9e-02  7.1e+01 ↘
[ Info:      3     17  3.6e-01  2.0e-02  1.7e-02  1.0e+00  8.2e-05  2.2e+00  7.9e-04  6.8e+01 ↘
[ Info:      4      0  3.6e-01  2.0e-02  2.5e-05  1.0e+00  2.7e-05  2.2e+00  3.7e-07  6.8e+01
[ Info: LM: terminating with √(ξ1/ν) = 2.5352037067312497e-5
LM converged after 4 iterations.

We can visualize the solution with plots,

using Plots

# Extract estimated parameters
a_est, b_est = out.solution

# True curve
y_true = [a_true * exp(b_true * ti) for ti in t]

# Estimated curve
y_est = [a_est * exp(b_est * ti) for ti in t]

# Plot
scatter(t, y, label="Noisy data", legend=:bottomleft)
plot!(t, y_true, label="True model", lw=2)
plot!(t, y_est, label="Fitted model", lw=2, ls=:dash)
Example block output
#We can choose LMTR instead which is a trust-region method
out = LMTR(regularized_pb, verbose = 1, atol = 1e-4)
println("LMTR converged after $(out.iter) iterations.")
[ Info:  outer  inner     f(x)     h(x)  √(ξ1/ν)        ρ        Δ      ‖x‖      ‖s‖        ν LMTR
[ Info:      0     16  1.0e+01  1.0e-02  2.6e+01  9.1e-01  1.0e+00  1.0e+00  1.0e+00  7.9e-03 ↗
[ Info:      1      6  1.5e+00  2.0e-02  1.2e+01  1.0e+00  3.0e+00  1.8e+00  2.2e-01  1.6e-02 ↗
[ Info:      2     18  3.6e-01  2.0e-02  1.7e-01  1.0e+00  3.0e+00  2.0e+00  1.3e-02  1.5e-02 ↗
[ Info:      3      4  3.6e-01  2.0e-02  3.9e-04  1.0e+00  3.0e+00  2.0e+00  2.9e-05  1.5e-02 ↗
[ Info:      4      0  3.6e-01  2.0e-02  8.6e-05  1.0e+00  3.0e+00  2.0e+00  9.8e-07  1.5e-02
[ Info: LMTR: terminating with √(ξ1/ν) = 8.638767447185872e-5
LMTR converged after 4 iterations.
# Extract estimated parameters
a_est, b_est = out.solution

# Estimated curve
y_est = [a_est * exp(b_est * ti) for ti in t]

# Plot
scatter(t, y, label="Noisy data", legend=:bottomleft)
plot!(t, y_true, label="True model", lw=2)
plot!(t, y_est, label="Fitted model", lw=2, ls=:dash)
Example block output