How to switch backend in ADNLPModels
ADNLPModels allows the use of different backends to compute the derivatives required within NLPModel API. It uses ForwardDiff.jl, ReverseDiff.jl, and more via optional depencies.
The backend information is in a structure ADNLPModels.ADModelBackend in the attribute adbackend of a ADNLPModel, it can also be accessed with get_adbackend.
The functions used internally to define the NLPModel API and the possible backends are defined in the following table:
| Functions | FowardDiff backends | ReverseDiff backends | Zygote backends | Enzyme backend | Sparse backend |
|---|---|---|---|---|---|
gradient and gradient! | ForwardDiffADGradient/GenericForwardDiffADGradient | ReverseDiffADGradient/GenericReverseDiffADGradient | ZygoteADGradient | EnzymeReverseADGradient | – |
jacobian | ForwardDiffADJacobian | ReverseDiffADJacobian | ZygoteADJacobian | SparseEnzymeADJacobian | SparseADJacobian |
hessian | ForwardDiffADHessian | ReverseDiffADHessian | ZygoteADHessian | SparseEnzymeADHessian | SparseADHessian/SparseReverseADHessian |
Jprod | ForwardDiffADJprod/GenericForwardDiffADJprod | ReverseDiffADJprod/GenericReverseDiffADJprod | ZygoteADJprod | EnzymeReverseADJprod | – |
Jtprod | ForwardDiffADJtprod/GenericForwardDiffADJtprod | ReverseDiffADJtprod/GenericReverseDiffADJtprod | ZygoteADJtprod | EnzymeReverseADJtprod | – |
Hvprod | ForwardDiffADHvprod/GenericForwardDiffADHvprod | ReverseDiffADHvprod/GenericReverseDiffADHvprod | – | EnzymeReverseADHvprod | – |
directional_second_derivative | ForwardDiffADGHjvprod | – | – | – | – |
The functions hess_structure!, hess_coord!, jac_structure! and jac_coord! defined in ad.jl are generic to all the backends for now.
using ADNLPModels
f(x) = sum(x)
x0 = ones(2)
ADNLPModel(f, x0, show_time = true)ADNLPModel - Model with automatic differentiation backend ADModelBackend{
ForwardDiffADGradient,
ForwardDiffADHvprod,
EmptyADbackend,
EmptyADbackend,
EmptyADbackend,
SparseADHessian,
EmptyADbackend,
}
Problem name: Generic
All variables: ████████████████████ 2 All constraints: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
free: ████████████████████ 2 free: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
lower: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 lower: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
upper: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 upper: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nnzh: (100.00% sparsity) 0 linear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nonlinear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nnzj: (------% sparsity)
Counters:
obj: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 grad: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 cons: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
cons_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 cons_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jcon: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jgrad: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jac: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jac_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jac_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jtprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jtprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jtprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 hess: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 hprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jhess: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jhprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
The keyword show_time is set to true to display the time needed to instantiate each backend. For unconstrained problem, there is no need to compute derivatives of constraints so an EmptyADbackend is used for Jacobian computations.
Examples
We now present a serie of practical examples. For simplicity, we focus here on unconstrained optimization problem. All these examples can be generalized to problems with bounds, constraints or nonlinear least-squares.
Use another backend
As shown in Tutorial, it is very straightforward to instantiate an ADNLPModel using an objective function and an initial guess.
using ADNLPModels, NLPModels
f(x) = sum(x)
x0 = ones(3)
nlp = ADNLPModel(f, x0)
grad(nlp, nlp.meta.x0) # returns the gradient at x03-element Vector{Float64}:
1.0
1.0
1.0Thanks to the backends inside ADNLPModels.jl, it is easy to change the backend for one (or more) function using the kwargs presented in the table above.
nlp = ADNLPModel(f, x0, gradient_backend = ADNLPModels.ReverseDiffADGradient)
grad(nlp, nlp.meta.x0) # returns the gradient at x0 using `ReverseDiff`3-element Vector{Float64}:
1.0
1.0
1.0It is also possible to try some new implementation for each function. First, we define a new ADBackend structure.
struct NewADGradient <: ADNLPModels.ADBackend end
function NewADGradient(
nvar::Integer,
f,
ncon::Integer = 0,
c::Function = (args...) -> [];
kwargs...,
)
return NewADGradient()
endMain.NewADGradientThen, we implement the desired functions following the table above.
ADNLPModels.gradient(adbackend::NewADGradient, f, x) = rand(Float64, size(x))
function ADNLPModels.gradient!(adbackend::NewADGradient, g, f, x)
g .= rand(Float64, size(x))
return g
endFinally, we use the homemade backend to compute the gradient.
nlp = ADNLPModel(sum, ones(3), gradient_backend = NewADGradient)
grad(nlp, nlp.meta.x0) # returns the gradient at x0 using `NewADGradient`3-element Vector{Float64}:
0.38399685823932317
0.5121112858339084
0.2542564037970457Change backend
Once an instance of an ADNLPModel has been created, it is possible to change the backends without re-instantiating the model.
using ADNLPModels, NLPModels
f(x) = 100 * (x[2] - x[1]^2)^2 + (x[1] - 1)^2
x0 = 3 * ones(2)
nlp = ADNLPModel(f, x0)
get_adbackend(nlp) # returns the `ADModelBackend` structure that regroup all the various backends.ADModelBackend{
ForwardDiffADGradient,
ForwardDiffADHvprod,
EmptyADbackend,
EmptyADbackend,
EmptyADbackend,
SparseADHessian,
EmptyADbackend,
}There are currently two ways to modify instantiated backends. The first one is to instantiate a new ADModelBackend and use set_adbackend! to modify nlp.
adback = ADNLPModels.ADModelBackend(nlp.meta.nvar, nlp.f, gradient_backend = ADNLPModels.ForwardDiffADGradient)
set_adbackend!(nlp, adback)
get_adbackend(nlp)ADModelBackend{
ForwardDiffADGradient,
ForwardDiffADHvprod,
EmptyADbackend,
EmptyADbackend,
EmptyADbackend,
SparseADHessian,
EmptyADbackend,
}The alternative is to use set_adbackend! and pass the new backends via kwargs. In the second approach, it is possible to pass either the type of the desired backend or an instance as shown below.
set_adbackend!(
nlp,
gradient_backend = ADNLPModels.ForwardDiffADGradient,
jtprod_backend = ADNLPModels.GenericForwardDiffADJtprod(),
)
get_adbackend(nlp)ADModelBackend{
ForwardDiffADGradient,
ForwardDiffADHvprod,
EmptyADbackend,
GenericForwardDiffADJtprod,
EmptyADbackend,
SparseADHessian,
EmptyADbackend,
}Support multiple precision without having to recreate the model
One of the strength of ADNLPModels.jl is the type flexibility. Let's assume, we first instantiate an ADNLPModel with a Float64 initial guess.
using ADNLPModels, NLPModels
f(x) = 100 * (x[2] - x[1]^2)^2 + (x[1] - 1)^2
x0 = 3 * ones(2) # Float64 initial guess
nlp = ADNLPModel(f, x0)ADNLPModel - Model with automatic differentiation backend ADModelBackend{
ForwardDiffADGradient,
ForwardDiffADHvprod,
EmptyADbackend,
EmptyADbackend,
EmptyADbackend,
SparseADHessian,
EmptyADbackend,
}
Problem name: Generic
All variables: ████████████████████ 2 All constraints: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
free: ████████████████████ 2 free: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
lower: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 lower: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
upper: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 upper: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nnzh: ( 0.00% sparsity) 3 linear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nonlinear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nnzj: (------% sparsity)
Counters:
obj: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 grad: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 cons: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
cons_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 cons_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jcon: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jgrad: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jac: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jac_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jac_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jtprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jtprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jtprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 hess: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 hprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jhess: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jhprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
Then, the gradient will return a vector of Float64.
x64 = rand(2)
grad(nlp, x64)2-element Vector{Float64}:
-78.92730450282966
126.35092159076886It is now possible to move to a different type, for instance Float32, while keeping the instance nlp.
x0_32 = ones(Float32, 2)
set_adbackend!(nlp, gradient_backend = ADNLPModels.ForwardDiffADGradient, x0 = x0_32)
x32 = rand(Float32, 2)
grad(nlp, x32)2-element Vector{Float64}:
2.2938292026519775
-1.793980598449707