The most expensive procedures in Krylov methods are matrix-vector products (y ← Ax) and vector operations (dot products, vector norms, y ← αx + βy). Therefore they directly affect the efficiency of the methods provided by Krylov.jl
. In this section, we present various optimizations based on multithreading to speed up these procedures. Multithreading is a form of shared memory parallelism that makes use of multiple cores to perform tasks.
Multi-threaded BLAS and LAPACK operations
OPENBLAS_NUM_THREADS=N julia # Set the number of OpenBLAS threads to N
MKL_NUM_THREADS=N julia # Set the number of MKL threads to N
If you don't know the maximum number of threads available on your computer, you can obtain it with
NMAX = Sys.CPU_THREADS
and define the number of OpenBLAS/MKL threads at runtine with
BLAS.set_num_threads(N) # 1 ≤ N ≤ NMAX
BLAS.get_num_threads()
The recommended number of BLAS threads is the number of physical and not logical cores, which is in general N = NMAX / 2
.
By default Julia ships with OpenBLAS but it's also possible to use Intel MKL BLAS and LAPACK with MKL.jl.
using LinearAlgebra
BLAS.vendor() # get_config() for Julia ≥ 1.7
Multi-threaded sparse matrix-vector products
For sparse matrices, the Julia implementation of mul!
of SparseArrays library is not parallelized. A siginifiant speed-up can be observed with the multhreaded mul!
of MKLSparse.jl.
It's also possible to implement a generic multithreaded julia version. For instance, the following function can be used for symmetric matrices
using Base.Threads
function threaded_mul!(y::Vector{T}, A::SparseMatrixCSC{T}, x::Vector{T}) where T <: Number
A.m == A.n || error("A is not a square matrix!")
@threads for i = 1 : A.n
tmp = zero(T)
@inbounds for j = A.colptr[i] : (A.colptr[i+1] - 1)
tmp += A.nzval[j] * x[A.rowval[j]]
end
@inbounds y[i] = tmp
end
return y
end
and wrapped inside a linear operator to solve symmetric linear systems
using LinearOperators
n, m = size(A)
sym = herm = true
T = eltype(A)
opA = LinearOperator(T, n, m, sym, herm, (y, v) -> threaded_mul!(y, A, v))
To enable multi-threading with Julia, you can start julia with the environment variable JULIA_NUM_THREADS
or the options -t
and --threads
julia -t auto # alternative: --threads auto
julia -t N # alternative: --threads N
JULIA_NUM_THREADS=N julia
Thereafter, you can verify the number of threads usable by Julia
using Base.Threads
nthreads()
The following benchmarks illustrate the time required in seconds to compute 1000 sparse matrix-vector products with symmetric matrices of the SuiteSparse Matrix Collection. The computer used for the benchmarks has 2 physical cores and Julia was launched with JULIA_NUM_THREADS=2
.