PartitionedStructures.jl: Tutorial

This tutorial shows how PartitionedStructures.jl can define partitioned quasi-Newton approximations, by using partitioned vectors and partitioned matrices. Those partitioned structures are strongly related to partially-separable functions.

What are the partially-separable structure and the partitioned quasi-Newton approximations

The partitioned quasi-Newton methods exploit the partially-separable structure of $f:\R^n \to \R$

\[ f(x) = \sum_{i=1}^N f_i (U_i x) , \; f_i : \R^{n_i} \to \R, \; U_i \in \R^{n_i \times n},\; n_i \ll n,\]

as the sum of element function $f_i$. The gradient $\nabla f$ and the Hessian $\nabla^2 f$

\[\nabla f(x) = \sum_{i=1}^N U_i^\top f_i (U_i x), \quad \nabla^2 f(x) = \sum_{i=1}^N U_i^\top f_i (U_i x) U_i,\]

accumulate the element derivatives $\nabla f_i$ and $\nabla^2 f_i$.

The partitioned structure of the Hessian let us the definition of partitioned quasi-Newton approximations $B \approx \nabla^2 f$, such that $B$ accumulates every element Hessian approximation $B_i \approx \nabla^2 f_i$

\[B = \sum_{i=1}^N U_i^\top B_i U_i\]

The partitioned quasi-Newton approximations respect sparsity structure of $\nabla^2 f$, which is not the case of classical quasi-Newton approximation (e.g. BFGS, SR1). Moreover, the rank of the partitioned updates may be proportional to the number of elements $N$, whereas classical quasi-Newton approximations are low-rank updates. A partitioned quasi-Newton update may update every element-Hessian $B_i$ at each step $s$. It requires $B_i$, $U_i s$ and $\nabla f_i (U_i (x+s)) - \nabla f_i (U_i x)$, and therefore we have to store such an approximation and vectors for every element.

Reference

  • A. Griewank and Ph. L. Toint. Partitioned variable metric updates for large structured optimization problems. Numer. Math., 39:119–137, 1982, 10.1007/BF01399316

Example: the partitioned structure of a quadratic

Let's take the quadratic f as an example

f(x) = x[1]^2 + x[1]*x[2] + x[2]^2 + x[3]^2 + 3x[2]*x[3]
f (generic function with 1 method)

f can be considered as the sum of two element functions

f1(y) = y[1]^2 + y[1]*y[2]
f2(y) = y[1]^2 + y[2]^2 + 3y[1]*y[2]
f2 (generic function with 1 method)

Define $U_1$ and $U_2$ indicating which variables are required by each element function:

U1 = [1 0 0; 0 1 0]
U2 = [0 1 0; 0 0 1]
2×3 Matrix{Int64}:
 0  1  0
 0  0  1

However, dense matrices produce a memory issue for large problems. Instead, we use a linear operator only informing the indices of the variables

U1 = [1, 2]
U2 = [2, 3]
2-element Vector{Int64}:
 2
 3

By gathering the different $U_i$ together

U = [U1, U2]
2-element Vector{Vector{Int64}}:
 [1, 2]
 [2, 3]

we define the function exploiting the partially-separable structure f_pss as

f_pss(x, U) = f1(x[U[1]]) + f2(x[U[2]])
x0 = [2., 3., 4.]

f(x0) == f_pss(x0, U)
true

Similarly, we can compute: the gradient, the element gradients and explicit how the gradient is partitioned

∇f(x) = [2x[1] + x[2], x[1] + 2x[2] + 3x[3], 2x[3] + 3x[2]]
∇f1(y) = [2y[1] + y[2], y[1]]
∇f2(y) = [2y[1] + 3y[2], 2y[2] + 3y[1]]
function ∇f_pss(x, U)
  gradient = zeros(length(x))
  gradient[U1] .+= ∇f1(x[U[1]])
  gradient[U2] .+= ∇f2(x[U[2]])
  return gradient
end
∇f(x0) == ∇f_pss(x0, U)
true

However, ∇f_pss accumulates directly the element gradients and does not store the value of each element gradients ∇f1, ∇f2. We would like to store every element gradient, so we can later update partitioned quasi-Newton approximations. Thus, we define a partitioned vector from U to store each element gradient and form $\nabla f$ when required:

using PartitionedStructures
U = [U1, U2]
n = length(x0)
# create the partitioned vector
partitioned_gradient_x0 = create_epv(U)
Elemental_pv{Float64}(2, 3, Elemental_elt_vec{Float64}[Elemental_elt_vec{Float64}([0.6325983422646918, 0.15907213325892944], [1, 2], 2), Elemental_elt_vec{Float64}([0.10602901995509595, 0.4297342009350201], [2, 3], 2)], [0.0, 0.0, 0.0], [[1], [1, 2], [2]], [1, 2, 3])

We set the value of each element vector to the corresponding element gradient

# return every element gradient
vector_gradient_element(x, U) = [∇f1(x[U[1]]), ∇f2(x[U[2]])] :: Vector{Vector{Float64}}
# set each element vector to its corresponding element gradient
set_epv!(partitioned_gradient_x0, vector_gradient_element(x0, U))

# Build the gradient vector
build_v!(partitioned_gradient_x0)

# with the same value as the gradient
get_v(partitioned_gradient_x0) == ∇f(x0)
true

Approximate the Hessian $\nabla^2 f$

There are at least two ways to approximate $\nabla^2 f$: a classical quasi-Newton approximation (ex: BFGS) and a partitioned quasi-Newton approximation (ex: PBFGS). Both methods are presented, and we expose the sparse structure of the partitioned approximation.

Quasi-Newton approximation of the quadratic (BFGS)

In the case of the BFGS method, you want to approximate the Hessian from s = x1 - x0

x1 = [1., 2., 2.]
s = x1 .- x0
3-element Vector{Float64}:
 -1.0
 -1.0
 -2.0

the gradient difference y

y = ∇f(x1) .- ∇f(x0)
3-element Vector{Float64}:
 -3.0
 -9.0
 -7.0

and the approximation B, initially set to the identity

B = [ i==j ? 1. : 0. for i in 1:n, j in 1:n]
3×3 Matrix{Float64}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0

By applying the BFGS update, you satisfy the secant equation Bs = y

# PartitionedStructures.jl exports a BFGS implementation
B_BFGS = BFGS(s,y,B)

using LinearAlgebra
# numerical verification of the secant equation
atol = sqrt(eps(eltype(s)))
norm(B_BFGS * s - y) < atol
true

but the approximation B_BFGS is dense:

B_BFGS
3×3 Matrix{Float64}:
 1.17949   0.871795  0.474359
 0.871795  3.94872   2.08974
 0.474359  2.08974   2.21795

which makes it impractical for large-scale problems.

Partitioned quasi-Newton approximation of the quadratic (PBFGS)

In order to make a sparse quasi-Newton approximation of $\nabla^2 f$, you may define a partitioned matrix with the same partially-separable structure as partitioned_gradient_x0 where each element matrix is set to the identity

partitioned_matrix = epm_from_epv(partitioned_gradient_x0)
Elemental_pm{Float64}(2, 3, Elemental_em{Float64}[Elemental_em{Float64}(2, [1, 2], [1.0 0.0; 0.0 1.0], Counter_elt_mat(0, 0, 0, 0, 0, 0), false, false, [6.95221969188094e-310, 6.9522196902944e-310]), Elemental_em{Float64}(2, [2, 3], [1.0 0.0; 0.0 1.0], Counter_elt_mat(0, 0, 0, 0, 0, 0), false, false, [0.0, 0.0])], sparse(Int64[], Int64[], Float64[], 3, 3), sparse(Int64[], Int64[], Float64[], 3, 3), [[1], [1, 2], [2]], [1, 2, 3])

It can be visualized with the Matrix constructor

Matrix(partitioned_matrix)
3×3 Matrix{Float64}:
 1.0  0.0  0.0
 0.0  2.0  0.0
 0.0  0.0  1.0

The second term on the diagonal accumulates two components of value 1.0 from the two initial element approximations.

Then you compute the partitioned gradient at x1

partitioned_gradient_x1 = similar(partitioned_gradient_x0)
set_epv!(partitioned_gradient_x1, vector_gradient_element(x1, U))
Elemental_pv{Float64}(2, 3, Elemental_elt_vec{Float64}[Elemental_elt_vec{Float64}([4.0, 1.0], [1, 2], 2), Elemental_elt_vec{Float64}([10.0, 10.0], [2, 3], 2)], [6.95220890896384e-310, 6.95220890896384e-310, 6.95220890896384e-310], [[1], [1, 2], [2]], [1, 2, 3])

and compute the elementwise difference of the partitioned gradients partitioned_gradient_x1 - partitioned_gradient_x0

# copy to avoid side effects on partitioned_gradient_x0
partitioned_gradient_difference = copy(partitioned_gradient_x0)

# apply in place an unary minus to every element gradient
minus_epv!(partitioned_gradient_difference)

# add in place the element vector of partitioned_gradient_x1
# to the corresponding element vector of partitioned_gradient_difference
add_epv!(partitioned_gradient_x1, partitioned_gradient_difference)

# compute the vector y
build_v!(partitioned_gradient_difference)
get_v(partitioned_gradient_difference) == y
true

Then you can define the partitioned quasi-Newton PBFGS update

# apply the partitioned PBFGS update to partitioned_matrix
# and return Matrix(partitioned_matrix)
B_PBFGS = update(partitioned_matrix, partitioned_gradient_difference, s; name=:pbfgs, verbose=true)
3×3 Matrix{Float64}:
 2.75  0.25     0.0
 0.25  4.45909  2.14545
 0.0   2.14545  2.42727

which respects the sparsity structure of ∇²f.

In addition, update() indicates the number of elements: updated, not updated or untouched, as long as the user doesn't set verbose=false. The partitioned update verifies the secant equation

norm(B_PBFGS*s - y) < atol
true

which may also be calculated with

# compute the product partitioned matrix vector
Bs = mul_epm_vector(partitioned_matrix, s)
norm(Bs - y) < atol
true

Other partitioned quasi-Newton approximations

There exist two categories of partitioned quasi-Newton updates. In the first category, each element Hessian $\nabla^2 f_i$ is approximated with a dense matrix, for example: PBFGS. In the second category, each element Hessian $\nabla^2 f_i$ is approximated with a quasi-Newton linear-operator.

Partitioned quasi-Newton operators

Once the partitioned matrix is allocated,

partitioned_matrix_PBFGS = epm_from_epv(partitioned_gradient_x0)
partitioned_matrix_PSR1 = epm_from_epv(partitioned_gradient_x0)
partitioned_matrix_PSE = epm_from_epv(partitioned_gradient_x0)
Elemental_pm{Float64}(2, 3, Elemental_em{Float64}[Elemental_em{Float64}(2, [1, 2], [1.0 0.0; 0.0 1.0], Counter_elt_mat(0, 0, 0, 0, 0, 0), false, false, [6.9522196987263e-310, 6.95221974173494e-310]), Elemental_em{Float64}(2, [2, 3], [1.0 0.0; 0.0 1.0], Counter_elt_mat(0, 0, 0, 0, 0, 0), false, false, [6.95221259809826e-310, 6.95221259809826e-310])], sparse(Int64[], Int64[], Float64[], 3, 3), sparse(Int64[], Int64[], Float64[], 3, 3), [[1], [1, 2], [2]], [1, 2, 3])

you can apply on it any of the three partitioned updates : PBFGS, PSR1, PSE (by default) :

  • PBFGS updates each element approximation with BFGS;
  • PSR1 updates each element approximation with SR1;
  • In PSE, each element approximation is updated with BFGS if it is possible or with SR1 otherwise.
B_PBFGS = update(partitioned_matrix_PBFGS, partitioned_gradient_difference, s; name=:pbfgs)
B_PSR1 = update(partitioned_matrix_PSR1, partitioned_gradient_difference, s; name=:psr1)
 # name=:pse by default
B_PSE = update(partitioned_matrix_PSE, partitioned_gradient_difference, s)
3×3 Matrix{Float64}:
 2.75  0.25     0.0
 0.25  4.45909  2.14545
 0.0   2.14545  2.42727

All these approximations satisfy the secant equation as long as every element approximation is updated

norm(mul_epm_vector(partitioned_matrix_PBFGS, s) - y) < atol
true
norm(mul_epm_vector(partitioned_matrix_PSR1, s) - y) < atol
true
norm(mul_epm_vector(partitioned_matrix_PSE, s) - y) < atol
true

Limited-memory partitioned quasi-Newton operators

These operators exist to apply the partitioned quasi-Newton methods to a partially-separable function with large $n_i$, whose element Hessian approximations can't be store as dense matrices. The limited-memory partitioned quasi-Newton operators allocate a quasi-Newton operator LBFGS or LSR1 defined in LinearOperators.jl for each element Hessian approximation. It defines three approximations:

  • PLBFGS, each element approximation is a LBFGSOperator;
  • PLSR1, each element approximation is a LSR1Operator;
  • PLSE, each element approximation may be a LBFGSOperator or LSR1Operator.

Contrary to the partitioned quasi-Newton operators, each partitioned limited-memory quasi-Newton operators has a different type

partitioned_linear_operator_PLBFGS = eplo_lbfgs_from_epv(partitioned_gradient_x0)
partitioned_linear_operator_PLSR1 = eplo_lsr1_from_epv(partitioned_gradient_x0)
partitioned_linear_operator_PLSE = eplo_lose_from_epv(partitioned_gradient_x0)
Elemental_plo{Float64}(2, 3, Union{Elemental_elo_bfgs{Float64}, Elemental_elo_sr1{Float64}}[Elemental_elo_bfgs{Float64}(2, [1, 2], Linear operator
  nrow: 2
  ncol: 2
  eltype: Float64
  symmetric: true
  hermitian: true
  nprod:   0
  ntprod:  0
  nctprod: 0

, Counter_elt_mat(0, 0, 0, 0, 0, 0), false), Elemental_elo_bfgs{Float64}(2, [2, 3], Linear operator
  nrow: 2
  ncol: 2
  eltype: Float64
  symmetric: true
  hermitian: true
  nprod:   0
  ntprod:  0
  nctprod: 0

, Counter_elt_mat(0, 0, 0, 0, 0, 0), false)], sparse(Int64[], Int64[], Float64[], 3, 3), sparse(Int64[], Int64[], Float64[], 3, 3), [[1], [1, 2], [2]], [1, 2, 3])

The different types simplify the update method, since no argument name is required to determine which update is applied

B_PLBFGS = update(partitioned_linear_operator_PLBFGS, partitioned_gradient_difference, s)
norm(B_PLBFGS * s - y) < atol
true
B_PLSE = update(partitioned_linear_operator_PLSE, partitioned_gradient_difference, s)
norm(B_PLSE * s - y) < atol
true
B_PLSR1 = update(partitioned_linear_operator_PLSR1, partitioned_gradient_difference, s)
norm(B_PLSR1 * s - y) < atol
true

That's it, you have all the tools to implement a partitioned quasi-Newton method. Enjoy!

Tips

The main issue about the definition of partitioned structures is informing the $f_i$ and $U_i$. To address this issue you may want to take a look at ExpressionTreeForge.jl which detect automatically the partially separable structure from an ADNLPModels or a JuMP model.