Matrix-free operators
All methods are matrix-free, which means that you only need to provide operator-vector products.
The A
or B
input arguments of Krylov.jl solvers can be any object that represents a linear operator. That object must implement mul!
, for multiplication with a vector, size()
and eltype()
. For certain methods it must also implement adjoint()
.
Some methods only require A * v
products, whereas other ones also require A' * u
products. In the latter case, adjoint(A)
must also be implemented.
A * v | A * v and A' * u |
---|---|
CG, CR, CAR | CGLS, CRLS, CGNE, CRMR |
SYMMLQ, CG-LANCZOS, MINRES, MINRES-QLP, MINARES | LSLQ, LSQR, LSMR, LNLQ, CRAIG, CRAIGMR |
DIOM, FOM, DQGMRES, GMRES, FGMRES, BLOCK-GMRES | BiLQ, QMR, BiLQR, USYMLQ, USYMQR, TriLQR |
CGS, BICGSTAB | TriCG, TriMR |
GPMR is the only method that requires A * v
and B * w
products.
Preconditioners M
, N
, C
, D
, E
or F
can be also linear operators and must implement mul!
or ldiv!
.
We strongly recommend LinearOperators.jl to model matrix-free operators, but other packages such as LinearMaps.jl, DiffEqOperators.jl or your own operator can be used as well.
With LinearOperators.jl
, operators are defined as
A = LinearOperator(type, nrows, ncols, symmetric, hermitian, prod, tprod, ctprod)
where
type
is the operator element type;nrow
andncol
are its dimensions;symmetric
andhermitian
should be set totrue
orfalse
;prod(y, v)
,tprod(y, w)
andctprod(u, w)
are called when writingmul!(y, A, v)
,mul!(y, transpose(A), w)
, andmul!(y, A', u)
, respectively.
See the tutorial and the detailed documentation for more information on LinearOperators.jl
.
Examples
In the field of nonlinear optimization, finding critical points of a continuous function frequently involves linear systems with a Hessian or Jacobian as coefficient. Materializing such operators as matrices is expensive in terms of operations and memory consumption and is unreasonable for high-dimensional problems. However, it is often possible to implement efficient Hessian-vector and Jacobian-vector products, for example with the help of automatic differentiation tools, and used within Krylov solvers. We now illustrate variants with explicit matrices and with matrix-free operators for two well-known optimization methods.
Example 1: Newton's Method for convex optimization
At each iteration of Newton's method applied to a $\mathcal{C}^2$ strictly convex function $f : \mathbb{R}^n \rightarrow \mathbb{R}$, a descent direction direction is determined by minimizing the quadratic Taylor model of $f$:
\[\min_{d \in \mathbb{R}^n}~~f(x_k) + \nabla f(x_k)^T d + \tfrac{1}{2}~d^T \nabla^2 f(x_k) d\]
which is equivalent to solving the symmetric and positive-definite system
\[\nabla^2 f(x_k) d = -\nabla f(x_k).\]
The system above can be solved with the conjugate gradient method as follows, using the explicit Hessian:
using ForwardDiff, Krylov
xk = -ones(4)
f(x) = (x[1] - 1)^2 + (x[2] - 2)^2 + (x[3] - 3)^2 + (x[4] - 4)^2
g(x) = ForwardDiff.gradient(f, x)
H(x) = ForwardDiff.hessian(f, x)
d, stats = cg(H(xk), -g(xk))
The explicit Hessian can be replaced by a linear operator that only computes Hessian-vector products:
using ForwardDiff, LinearOperators, Krylov
xk = -ones(4)
f(x) = (x[1] - 1)^2 + (x[2] - 2)^2 + (x[3] - 3)^2 + (x[4] - 4)^2
g(x) = ForwardDiff.gradient(f, x)
H(y, v) = ForwardDiff.derivative!(y, t -> g(xk + t * v), 0)
opH = LinearOperator(Float64, 4, 4, true, true, (y, v) -> H(y, v))
cg(opH, -g(xk))
([2.0, 3.0, 4.0, 5.0], SimpleStats
niter: 1
solved: true
inconsistent: false
residuals: []
Aresiduals: []
κ₂(A): []
timer: 621.38ms
status: solution good enough given atol and rtol
)
Example 2: The Gauss-Newton Method for Nonlinear Least Squares
At each iteration of the Gauss-Newton method applied to a nonlinear least-squares objective $f(x) = \tfrac{1}{2}\| F(x)\|^2$ where $F : \mathbb{R}^n \rightarrow \mathbb{R}^m$ is $\mathcal{C}^1$, we solve the subproblem:
\[\min_{d \in \mathbb{R}^n}~~\tfrac{1}{2}~\|J(x_k) d + F(x_k)\|^2,\]
where $J(x)$ is the Jacobian of $F$ at $x$.
An appropriate iterative method to solve the above linear least-squares problems is LSMR. We could pass the explicit Jacobian to LSMR as follows:
using ForwardDiff, Krylov
xk = ones(2)
F(x) = [x[1]^4 - 3; exp(x[2]) - 2; log(x[1]) - x[2]^2]
J(x) = ForwardDiff.jacobian(F, x)
d, stats = lsmr(J(xk), -F(xk))
However, the explicit Jacobian can be replaced by a linear operator that only computes Jacobian-vector and transposed Jacobian-vector products:
using LinearAlgebra, ForwardDiff, LinearOperators, Krylov
xk = ones(2)
F(x) = [x[1]^4 - 3; exp(x[2]) - 2; log(x[1]) - x[2]^2]
J(y, v) = ForwardDiff.derivative!(y, t -> F(xk + t * v), 0)
Jᵀ(y, u) = ForwardDiff.gradient!(y, x -> dot(F(x), u), xk)
opJ = LinearOperator(Float64, 3, 2, false, false, (y, v) -> J(y, v),
(y, w) -> Jᵀ(y, w),
(y, u) -> Jᵀ(y, u))
lsmr(opJ, -F(xk))
([0.49889007728348445, -0.2594343430903828], LsmrStats
niter: 2
solved: true
inconsistent: true
residuals: []
Aresiduals: []
residual: 0.022490204087080457
Aresidual: 2.0849695585298754e-15
κ₂(A): 1.2777264193293687
‖A‖F: 5.328138145631234
xNorm: 0.5623144027914095
timer: 629.91ms
status: found approximate minimum least-squares solution
)
Example 3: Solving the Poisson equation with FFT and IFFT
In applications related to partial differential equations (PDEs), linear systems can arise from discretizing differential operators. Storing such operators as explicit matrices is computationally expensive and unnecessary when matrix-free methods can be used, particularly with structured grids.
The FFT is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, transforming data from the spatial domain to the frequency domain. In the context of solving PDEs, it simplifies the application of differential operators like the Laplacian by converting derivatives into algebraic operations.
For a function $u(x)$ discretized on a periodic grid with $n$ points, the FFT of $u$ is:
\[\hat{u}_k = \sum_{j=0}^{n-1} u_j e^{-i k x_j},\]
where $\hat{u}_k$ represents the Fourier coefficients for the frequency $k$, and $u_j$ is the value of $u$ at the grid point $x_j$ defined as $x_j = \frac{2 \pi j}{L}$ with period $L$. The inverse FFT (IFFT) reconstructs $u$ from its Fourier coefficients:
\[u_j = \frac{1}{n} \sum_{k=0}^{n-1} \hat{u}_k e^{i k x_j}.\]
In Fourier space, the Laplacian operator $\frac{d^2}{dx^2}$ becomes a simple multiplication by $-k^2$, where $k$ is the wavenumber derived from the grid size. This transforms the Poisson equation $\frac{d^2 u(x)}{dx^2} = f(x)$ into an algebraic equation in the frequency domain:
\[-k^2 \hat{u}_k = \hat{f}_k.\]
By solving for $\hat{u}_k$ and applying the inverse FFT, we can recover the solution $u(x)$ efficiently.
The inverse FFT (IFFT) is used to convert data from the frequency domain back to the spatial domain. Once the solution in frequency space is obtained by dividing the Fourier coefficients $\hat{f}_k$ by $-k^2$, the IFFT is applied to transform the result back to the original grid points in the spatial domain.
This example consists of solving the 1D Poisson equation on a periodic domain $[0, 4\pi]$:
\[\frac{d^2 u(x)}{dx^2} = f(x),\]
where $u(x)$ is the unknown solution, and $f(x)$ is the given source term. We solve this equation using FFTW.jl to compute the matrix-free action of the Laplacian within the conjugate gradient solver.
using FFTW, Krylov, LinearAlgebra
# Define the problem size and domain
n = 32768 # Number of grid points (2^15)
L = 4π # Length of the domain
x = LinRange(0, L, n+1)[1:end-1] # Periodic grid (excluding the last point)
# Define the source term f(x)
f = sin.(x)
# Define a matrix-free operator using FFT and IFFT
struct FFTPoissonOperator
n::Int
L::Float64
complex::Bool
k::Vector{Float64} # Store Fourier wave numbers
end
function FFTPoissonOperator(n::Int, L::Float64, complex::Bool)
if complex
k = Vector{Float64}(undef, n)
else
k = Vector{Float64}(undef, n÷2 + 1)
end
k[1] = 0 # DC component -- f(x) = sin(x) has a mean of 0
for j in 1:(n÷2)
k[j+1] = 2 * π * j / L # Positive wave numbers
end
if complex
for j in 1:(n÷2 - 1)
k[n-j+1] = -2 * π * j / L # Negative wave numbers
end
end
return FFTPoissonOperator(n, L, complex, k)
end
Base.size(A::FFTPoissonOperator) = (n, n)
function Base.eltype(A::FFTPoissonOperator)
type = A.complex ? ComplexF64 : Float64
return type
end
function LinearAlgebra.mul!(y::Vector, A::FFTPoissonOperator, u::Vector)
# Transform the input vector `u` to the frequency domain using `fft` or `rfft`.
# If the operator is complex, use the full FFT; otherwise, use the real FFT.
if A.complex
u_hat = fft(u)
else
u_hat = rfft(u)
end
# In Fourier space, solve the system by multiplying with -k^2 (corresponding to the second derivative).
# This step applies the Laplacian operator in the frequency domain.
u_hat .= -u_hat .* (A.k .^ 2)
# Transform the result back to the spatial domain using `ifft` or `irfft`.
# If the operator is complex, use the full inverse FFT; otherwise, use the inverse real FFT.
if A.complex
y .= ifft(u_hat)
else
y .= irfft(u_hat, A.n)
end
return y
end
# Create the matrix-free operator for the Poisson equation
complex = false
A = FFTPoissonOperator(n, L, complex)
# Solve the linear system using CG
u_sol, stats = cg(A, f, atol=1e-10, rtol=0.0, verbose=1)
# The exact solution is u(x) = -sin(x)
u_star = -sin.(x)
u_star ≈ u_sol
true
Preconditioners can be also implemented as abstract operators. For instance, we could compute the Cholesky factorization of $M$ and $N$ and create linear operators that perform the forward and backsolves.
Krylov methods combined with factorization free operators allow to reduce computation time and memory requirements considerably by avoiding building and storing the system matrix. In the field of partial differential equations, the implementation of high-performance factorization free operators and assembly free preconditioning is a subject of active research.
Example 4: Solving the Poisson equation with a matrix-free multigrid preconditioner
onsider the Poisson equation in a two-dimensional domain $\Omega$:
\[-\Delta u = f \quad \text{in } \Omega,\]
with Dirichlet boundary conditions, where $u$ is the unknown solution and $f$ is a given source term.
The Laplacian operator $\Delta$ can be discretized using finite differences or finite elements, leading to a large linear system of the form:
\[A \mathbf{u} = \mathbf{f},\]
where $A$ is a sparse matrix representing the discretized Laplacian, $\mathbf{u}$ is the vector of unknowns, and $\mathbf{f}$ is the discretized source term.
Instead of explicitly forming the matrix $A$, we will implement the matrix-vector product as a function that applies the discretized Laplacian to a vector.
For the discretized Laplacian using finite differences, the matrix-vector product can be represented as:
\[A \mathbf{v} \approx \frac{1}{h^2} \left( \mathbf{v}_{i+1,j} + \mathbf{v}_{i-1,j} + \mathbf{v}_{i,j+1} + \mathbf{v}_{i,j-1} - 4 \mathbf{v}_{i,j} \right),\]
where $h$ is the grid spacing, and $\mathbf{v}_{i,j}$ represents the value at grid point $(i, j)$.
For a grid defined on a domain $[0,1] \times [0,1]$ with dimensions $N \times N$, where $N$ is the number of points along each dimension, the values of the grid can be represented in a vector $v$ of length $N^2$. The mapping from a 2D grid index $(i, j)$ to a 1D vector index is done using the following formula:
\[v[(i-1) \times N + j]\]
The multigrid approach, which is employed as a preconditioner, involves a hierarchy of grid levels, and the solution process typically follows these steps:
- Smoothing:
- Reduce high-frequency errors using a relaxation method such as Jacobi or Gauss-Seidel.
Given the linear system $A \mathbf{u} = \mathbf{f}$, a Jacobi iteration can be written as:
\[\mathbf{u}^{(k+1)} = \mathbf{u}^{(k)} + D^{-1} (\mathbf{f} - A \mathbf{u}^{(k)}),\]
where $D$ is the diagonal part of $A$.
In contrast, the Gauss-Seidel method updates the solution sequentially, using the most recent values as soon as they are computed. The Gauss-Seidel iteration can be expressed as:
\[\mathbf{u}_i^{(k+1)} = \frac{1}{A_{ii}} \left( f_i - \sum_{j=1}^{i-1} A_{ij} \mathbf{u}_j^{(k+1)} - \sum_{j=i+1}^{N} A_{ij} \mathbf{u}_j^{(k)} \right),\]
where $A_{ii}$ is the diagonal element of $A$ corresponding to the $i$-th equation. This approach allows for the incorporation of updated values into subsequent calculations, often leading to faster convergence than the Jacobi method, especially for diagonally dominant or symmetric positive definite matrices.
- Restriction:
- Transfer the residual $\mathbf{r} = \mathbf{f} - A \mathbf{u}$ to a coarser grid using the restriction operator $R$:
\[\mathbf{r}_c = R \mathbf{r}.\]
- Coarse-grid Correction:
- On the coarse grid, solve the system approximately:
\[A_c \mathbf{e}_c = \mathbf{r}_c.\]
- Prolongation:
- Transfer the coarse-grid correction back to the fine grid using the prolongation operator $P$:
\[\mathbf{u} = \mathbf{u} + P \mathbf{e}_c.\]
- Post-smoothing:
- Apply another smoothing step to reduce any new high-frequency errors introduced by the coarse-grid correction.
By incorporating the multigrid method as a preconditioner, we can achieve improved convergence rates for iterative solvers, particularly for problems characterized by large linear systems, such as the Poisson equation.
using SparseArrays, LinearAlgebra, Krylov
# Define parameters
N = 64 # Grid size
h = 1.0 / (N + 1) # Grid spacing
f = zeros(N, N) # Source term
u = zeros(N, N) # Initial guess
# Define the source term (e.g., a simple function)
for i in 1:N
for j in 1:N
f[i, j] = sin(π * i * h) * sin(π * j * h) # Example source term
end
end
# Function for the matrix-vector product using the discretized Laplacian
function laplacian_matrix_vector_product(v)
result = zeros(size(v))
for i in 2:N-1
for j in 2:N-1
result[i, j] = (v[i+1, j] + v[i-1, j] + v[i, j+1] + v[i, j-1] - 4 * v[i, j]) / h^2
end
end
return result
end
# Define a function for the Jacobi smoothing step
function jacobi_smoothing(u, f, iterations=10)
for _ in 1:iterations
u_old = copy(u)
for i in 2:N-1
for j in 2:N-1
u[i, j] = u_old[i, j] + (f[i, j] - laplacian_matrix_vector_product(u_old)[i, j]) / 4 # Update rule
end
end
end
return u
end
# Define the restriction operator (simple averaging)
function restriction(r)
return 0.25 * (r[1:end-1:2, 1:end-1:2] + r[2:end-1:2, 1:end-1:2] + r[1:end-1:2, 2:end-1:2] + r[2:end-1:2, 2:end-1:2])
end
# Define the prolongation operator (simple linear interpolation)
function prolongation(ec)
fine_size = 2 * size(ec, 1) - 1
u = zeros(fine_size, fine_size)
u[1:end-1:2, 1:end-1:2] .= ec # Copy coarse correction
u[2:end-1:2, 1:end-1:2] .= 0.5 * (ec[1:end-1, :] .+ ec[2:end, :]) # Linear interpolation
u[1:end-1:2, 2:end-1:2] .= 0.5 * (ec[:, 1:end-1] .+ ec[:, 2:end]) # Linear interpolation
return u
end
# Main multigrid V-cycle function
function multigrid_v_cycle(u, f, iterations=10)
# 1. Pre-smoothing
u = jacobi_smoothing(u, f, iterations)
# 2. Compute residual
r = f - laplacian_matrix_vector_product(u)
# 3. Restrict the residual to the coarse grid
rc = restriction(r)
# 4. Solve the coarse-grid problem (approximate)
ec = zeros(size(rc)) # Initial guess for the coarse grid correction
ec = jacobi_smoothing(ec, rc, iterations) # Use Jacobi for coarse correction
# 5. Prolongate correction back to fine grid
u += prolongation(ec)
# 6. Post-smoothing
u = jacobi_smoothing(u, f, iterations)
return u
end
# Run the multigrid V-cycle
u = multigrid_v_cycle(u, f)
# TO DO!
# add multi_grid_v_cycle in a Krylov solver.