Rational Krylov for Stieltjes matrix functions ...€¦ · Leonardo Robol, UniPI (joint work with...

Rational Krylov for Stieltjes matrix functions: convergenceand pole selection

Leonardo Robol, UniPI (joint work with S. Massei, EPFL)13 Febbraio 2020, Montecatini, Convegno GNCS

0

Progetto GNCS Giovani ricercatori

• Progetto GNCS “Giovani ricercatori” 2018/2019.• Titolo: “Metodi di proiezione per equazioni di matrici e sistemi lineari con operatori

definiti tramite somme di prodotti di Kronecker, e soluzioni con struttura di rango.”• Supporto per la partecipazione alle conferenze ILAS2019 (Rio de Janeiro) e ICIAM2019

(Valencia).

1

Motivation: Fractional diffusion equations

We are concerned with the “fractional equivalent” of the 1D/2D Laplacian. That is, instead ofconsidering

∂2u∂x2 = f (x), ∆u = f (x , y)

we deal with∂αu∂yα = f (x), ∆αu = f (x , y)

for 1 < α < 2.

Fractional diffusion allow to model nonlocal behavior.

• Useful in describing anomalous diffusion phenomena (plasma physics, financial markets,. . . )

• Several (non-equivalent) definitions in the literature: Riemann-Liouville, Caputo,Grunwald-Letkinov, Matrix Transform Method

2

Motivation: Matrix Transform Method

Let A be the usual discretization of the Laplacian operator.

One way to define the discrete operator M representing the derivative of order α is to imposethat

M 2α = A ⇐⇒ M = Aα

2 (1 < α < 2)

• This follows by “composition” of derivatives• The problem is recast as computing x := f (A)v , with f (z) = z−α2 .

3

Evaluation of matrix functions

Let f (z) : Ω→ R be an analytic function on Ω ⊆ R+.

Problem 1. Given a symmetric positive definite matrix A ∈ Rn×n with eigenvalues in[λmin, λmax] ⊂ Ω and v ∈ Rn compute

x := f (A)v .

Applications: system of ODEs, exponential integrators, fractional diffusion problems, networkcommunicability measures, control theory, ... .

4

Subspace projection methods

Let U ⊂ Rn be a `-dimensional subspace (` n) with an orthogonal basis U = [u1| . . . |u`] andA` = U∗AU, v` = U∗v be the projections of A and v on U .

• Linear systemsx = A−1v ≈ x` := UA−1

` v`.

• Matrix functionsx = f (A)v ≈ x` := Uf (A`)v`.

If we use the Krylov subspace U = K`(A, v) := spanv ,Av , . . . ,A`−1v then,

‖x − x`‖2 ≤ C · minp(z)∈P`−1

maxz∈[λmin,λmax]

|p(z)− f (z)|,

where P` := poly of degree ≤ ` and C is independent on A and f .

5

Subspace projection methods

Let U ⊂ Rn be a `-dimensional subspace (` n) with an orthogonal basis U = [u1| . . . |u`] andA` = U∗AU, v` = U∗v be the projections of A and v on U .

• Linear systemsx = A−1v ≈ x` := UA−1

` v`.

• Matrix functionsx = f (A)v ≈ x` := Uf (A`)v`.

If we use the Krylov subspace U = K`(A, v) := spanv ,Av , . . . ,A`−1v then,

‖x − x`‖2 ≤ C · minp(z)∈P`−1


|p(z)− f (z)|,

where P` := poly of degree ≤ ` and C is independent on A and f .

5

Polynomial approximation does not always work...

Sometimes the convergence of polynomial approximation struggles, e.g. x = A− 12 v ,

[λmin, λmax] ≈ [10−5, 4], n = 1000.

0 50 100 150 200

10−3

10−1

101

`

‖x − x`‖2

6

Rational Krylov method

Rational Krylov subspace. Given Σ` := σ1, . . . , σ` ⊂ C it is defined as1

RK`(A, v ,Σ`) : = q`(A)−1K`+1(A, v) =

p(A)q`(A) v : p(z) ∈ P`

= spanv , (σ1I − A)−1v , . . . , (σ`I − A)−1v

where q`(z) :=∏

j(z − σj)−1.

If U = RK`(A, v ,Σ`) we get a problem of rational approximation with fixed poles

‖x − x`‖2 ≤ C · minr(z)∈ P`q`(z)


|r(z)− f (z)|.

1last equality is valid only for distinct poles

7

Rational approximation does not always work...

• Uniform approximations of highly oscillatory functions on the whole positive real line arenot possible for polynomials or rational functions

• In general a number of steps dependent on the norm of the operator is needed forconvergence2.

• E.g. x = cos(A)v , where ‖A‖2 ≈ 105, n = 1000

0 20 40 60 80 100

10−3

10−1

`

‖x−

x `‖ 2

Ext. KrylovRat. Krylov

2V. Grimm, M. Hochbruck. Rational approximation to trigonometric operators, BIT 2006.

8

Stieltjes functions

A favourable case is when f (z) : R+ → R if defined via a Stieltjes integral:

1. f (z) =∫ ∞

0

1z + t dµ(t) Cauchy-Stieltjes/Markov function

2. f (z) =∫ ∞

0e−ztdµ(t) Laplace-Stieltjes function

where dµ(t) is a non negative measure on R+.

Examples of functions in these classes are

1. z−α = sin(απ)π

∫ ∞0

t−αz + t dt α ∈ (0, 1), log(1 + z)

z ,e−t√

z − 1z .

2. e−z ,1− e−z

z , ϕj(z) :=∫ ∞

0e−tz [max1− t, 0]j−1

(j − 1)! dt.

9

Stieltjes functions

A favourable case is when f (z) : R+ → R if defined via a Stieltjes integral:

1. f (z) =∫ ∞

0

1z + t dµ(t) Cauchy-Stieltjes/Markov function

2. f (z) =∫ ∞

0e−ztdµ(t) Laplace-Stieltjes function

where dµ(t) is a non negative measure on R+.

Examples of functions in these classes are

1. z−α = sin(απ)π

∫ ∞0

t−αz + t dt α ∈ (0, 1), log(1 + z)

z ,e−t√

z − 1z .

2. e−z ,1− e−z

z , ϕj(z) :=∫ ∞

0e−tz [max1− t, 0]j−1

(j − 1)! dt.

9

Stieltjes functions

• Laplace Stieltjes functions on (0,∞) are also known as completely monotone functions,i.e. those such that (−1)j f (j)(z) > 0 ∀z > 0,

• By considering dµ(t) to be a finite sum of Dirac deltas we have that rational functions ofthe form

f (z) =h∑

j=1

αjz − βj

, αj > 0, βj < 0,

are Cauchy-Stieltjes.• It holds (Bernstein’s theorem3)

z−1 ∈ Cauchy − Stieltjes ⊂ Laplace − Stieltjes.

3S. Bernstein. Sur les fonctions absolument monotones, Acta Mathematica 1929

10

Evaluation of Stieltjes matrix functions

Let f (z) : R+ → R be an analytic function defined via a Stieltjes integral:

f (z) =∫ ∞

0g(t, z)dµ(t) g(t, z) ∈

1

z + t , e−zt.

Problem 1. Given a symmetric positive definite matrix A ∈ Rn×n with eigenvalues in[λmin, λmax] ⊂ Ω and v ∈ Rn compute

x := f (A)v .

Goal. Provide selection strategies for Σ` and estimates of the error ‖x − x`‖2.

11

Outline

• Problem 1:x = f (A)v

• Cauchy-Stieltjes case• Laplace-Stieltjes case

• Problem 2:x = f (I ⊗ A + B ⊗ I)vec(low-rank matrix)

• Laplace-Stieltjes case• Cauchy-Stieltjes case

12

Cauchy-Stieltjes functions: Sketching the idea

• By writing the integral formulation of f we get:

f (A)v =∫ ∞

0(A + tI)−1v dµ(t) ≈

∫ ∞0

U(A` + tI)−1v` dµ(t).

• So we need a space RK(A, v ,Σ`) that approximates simultaneously well (A + tI)−1v forany t > 0.

• Problem: Krylov subspaces are not shift invariant (apart from polynomial Krylov).

13


• Approximating (A + tI)−1 ∀t ≥ 0, is linked to approximate 1λ+t in the strip

[λmin, λmax]× [0,∞].• We consider a Skeleton approximation of the form

1λ+ t ≈ fskel (λ, t) :=

[1

λ+ t1, . . . ,

1λ+ t`

]M−1

1

λ1+t...1

λ`+t

, Mij =(

1λi + tj

).

14


fskel (λ, t) :=[

1λ+ t1

, . . . ,1

λ+ t`

]M−1

1

λ1+t...1

λ`+t

Skeleton approximations have 2 crucial properties:

1 Explicit expression of the residual error4:

1λ+ t − fskel (λ, t) = 1

λ+ t ·r(λ)

r(−t) , r(z) :=∏

j

z − λjz + tj

.

2 fskel (λ, t) is a rational function in λ with poles −t1, . . . ,−t`.If we set Σ` = −t1, . . . ,−t`, then

fskel (A, t)v ∈ RK`(A, v ,Σ`) ∀t ∈ [0,∞].

4Oseledets. Lower bounds for separable approximations of the Hilbert kernel, Sbornik 2007.

15


fskel (λ, t) :=[

1λ+ t1

, . . . ,1

λ+ t`

]M−1

1

λ1+t...1

λ`+t

Skeleton approximations have 2 crucial properties:

1 Explicit expression of the residual error4:

1λ+ t − fskel (λ, t) = 1

λ+ t ·r(λ)

r(−t) , r(z) :=∏

j

z − λjz + tj

.

2 fskel (λ, t) is a rational function in λ with poles −t1, . . . ,−t`.If we set Σ` = −t1, . . . ,−t`, then

fskel (A, t)v ∈ RK`(A, v ,Σ`) ∀t ∈ [0,∞].4Oseledets. Lower bounds for separable approximations of the Hilbert kernel, Sbornik 2007.

15


• We have the following point-wise estimate of the error 5

‖(tI + A)−1v − U(tI + A`)−1v`‖2 ≤2‖v‖2λmin + t min

r(z)∈P`Σ`

maxz∈[λmin,λmax] |r(z)|minz∈[−∞,0] |r(z)| .

• Minimizing the right hand side over Σ` means solving

minr(z)∈R`,`

maxz∈[λmin,λmax] |r(z)|minz∈[−∞,0] |r(z)| ,

where R`,` is the set of (`, `) rational functions.

5An estimate of the L2-norm in:Druskin, Knizhnerman, Zaslavsky. Solution of large scale evolutionary problems using rational Krylov subspaceswith optimized shifts, SISC 2009.

16


• We have the following point-wise estimate of the error 5

‖(tI + A)−1v − U(tI + A`)−1v`‖2 ≤2‖v‖2λmin + t min

r(z)∈P`Σ`

maxz∈[λmin,λmax] |r(z)|minz∈[−∞,0] |r(z)| .

• Minimizing the right hand side over Σ` means solving

minr(z)∈R`,`

maxz∈[λmin,λmax] |r(z)|minz∈[−∞,0] |r(z)| ,

where R`,` is the set of (`, `) rational functions.

5An estimate of the L2-norm in:Druskin, Knizhnerman, Zaslavsky. Solution of large scale evolutionary problems using rational Krylov subspaceswith optimized shifts, SISC 2009.

16


• This problem can be transformed via a Moebius map T (z) = αz+βγz+δ into

minr(z)∈R`,`

maxz∈[a,b] |r(z)|minz∈[−b,−a] |r(z)| . (1)

that Zolotarev solved ≈ 140 years ago.

• In particular, we know explicitly the optimal poles Σ` of the rational function that solves(1).So the optimal poles Σ∗` for our starting problem are given by

Σ∗` := T−1(Σ`).

17

Theoretical result

TheoremLet f (z) be a Cauchy-Stieltjes function, U be an orthogonal basis of RK`(A, v ,Σ∗` ) andx` = Uf (A`)v`. Then

‖f (A)v − x`‖2 ≤ 8f (λmin)‖v‖2ρ`,

where ρ := exp(− π2

log(

16λmaxλmin

)).

Similar results by Beckermann and Reichel6.

6Beckerman, Reichel. Error estimate and evaluation of matrix functions via the Faber transform, SINUM 2009

18

Numerical results

x = A− 12 v , A = trid(−1, 2,−1) ∈ R1000×1000

0 10 20 30 40 5010−11

10−6

10−1

104

Iterations (`)

‖x−

x `‖ 2

BoundExt. KrylovPol. KrylovRat. Krylov

markovfunmv7

7Guttel, Knizhnerman. A black-box rational Arnoldi variant for Cauchy-Stieltjes matrix functions, BIT 2013

19

Laplace-Stieltjes functions

f (λ) =∫ ∞

0e−λtdµ(t)

• The core idea is to exploit the relation:

e−λt = 12πi

∫iR

est

λ+ s ds

to link (parameter dependent) resolvents with exponentials.

• Then, using the approximation

e−λt ≈∫

iRfskel (λ, s)est ds

yields‖e−tAv − Ue−tA`v`‖2 ≤ 4γ` max

z∈[λmin,λmax]|r(z)|, r(z) :=

∏j

z − λjz + λj

with U orthogonal basis of RK`(A, v , −λ1, . . . ,−λ`).

20


f (λ) =∫ ∞

0e−λtdµ(t)

• The core idea is to exploit the relation:

e−λt = 12πi

∫iR

est

λ+ s ds

to link (parameter dependent) resolvents with exponentials.

• Then, using the approximation

e−λt ≈∫

iRfskel (λ, s)est ds

yields‖e−tAv − Ue−tA`v`‖2 ≤ 4γ` max

z∈[λmin,λmax]|r(z)|, r(z) :=

∏j

z − λjz + λj

with U orthogonal basis of RK`(A, v , −λ1, . . . ,−λ`).

20


TheoremLet f (z) be a Laplace-Stieltjes function, U be an orthogonal basis of RK`(A, v ,Σ`) andx` = Uf (A`)v`. Then there exists a choice of Σ` such that

‖f (A)v − x`‖2 ≤ 8γ`f (0)‖v‖2ρ`2 ρ := exp

− π2

log(

4λmaxλmin

) ,

where γ` := 2.23 + 2π log(4`λmax

λmin).

Conjecture: The result holds with γ` = 1.

21

2D problems with tensor structure

• When considering discretization (say, finite differences) of

∂2u∂x2 + ∂2u

∂y 2 = f (x , y)

on a rectangular domain one get a linear system of the form

(A⊗ I + I ⊗ A)︸︷︷︸M

x = v .

• Moreover, if f (x , y) is a regular function or has a small support (e.g. a point source) then

v ≈ vec(C)

where C is a low-rank matrix, i.e. C = WZ T for some tall and skinny matrices W ,Z .• Applying the matrix transform method to the fractional analogous of this problem requires

to compute f (M)vec(C).

22

Function evaluation and Kronecker structure

Problem 2. Given two symmetric positive definite matrices A,B ∈ Rn×n with eigenvalues in[λmin, λmax] ⊂ Ω and v = vec(C) = vec(low-rank matrix) ∈ Rn2 , compute

x := f (I ⊗ A + B ⊗ I)v , or equivalently X := vec−1(f (I ⊗ A + B ⊗ I)v).

• If f (z) = z−1, then this is equivalent to solve the Sylvester equation: we know thelow-rank structure is transferred from C to X .

• Does this hold in more general cases? Yes, it does.• Once again, we focus on the case where f (z) is a Stieltjes function:

f (z) =∫ ∞

0g(t, z)dµ(t), g(t, z) ∈

1

z + t , e−zt.

23






f (z) =∫ ∞

0g(t, z)dµ(t), g(t, z) ∈

1

z + t , e−zt.

23





• Does this hold in more general cases? Yes, it does.

• Once again, we focus on the case where f (z) is a Stieltjes function:

f (z) =∫ ∞

0g(t, z)dµ(t), g(t, z) ∈

1

z + t , e−zt.

23






f (z) =∫ ∞

0g(t, z)dµ(t), g(t, z) ∈

1

z + t , e−zt.

23

Galerkin projection for evaluating f (M)vec(C)

Inspired by the Galerkin projection for Sylvester equations, we may consider the followingalgorithm8:

• Choose subspaces of Rn spanned by orthogonal matrices U,V .• Evaluate the projected matrix function

vec(Y ) = f ((V ⊗ U)∗M(V ⊗ U))vec(U∗CV ) = f (M)vec(U∗CV ),

where M := V ∗BV ⊗ I + I ⊗ U∗AU.• Use UYV ∗ as approximation for X .

8Benzi, Simoncini. Approximation of functions of large matrices with Kronecker structure, NumerischeMathematik, 2017.

24



• Choose subspaces of Rn spanned by orthogonal matrices U,V .

• Evaluate the projected matrix function




24





where M := V ∗BV ⊗ I + I ⊗ U∗AU.

• Use UYV ∗ as approximation for X .


24







24

Evaluating the small scale matrix function

For f (z) = z−1, evaluating f (M)vec(C) can be done in O(n3) using Bartels-Stewart.

For a more general f (z) and M = I ⊗ A + B ⊗ I:

• The projected matrix retains the same structure.• M can be diagonalized in O(n3) by diagonalizing A and B.

If Q∗AAQA = DA, Q∗BBQB = DB , then

vec(X ) = f (M)vec(C) = (QA ⊗ QB)f (I ⊗ DA + DB ⊗ I)vec(Q∗ACQB)

which can be written as

X = QA(F (Q∗ACQB))Q∗B , Fij := f (λA,i + λB,j).

Again, total cost is about O(n3).

25


• Algorithm reduces to Galerkin for Sylvester eqs. if f (z) = z−1.• Proposed by Benzi & Simoncini for more general f (z) using polynomial Krylov subspaces.• Using rational Krylov subspaces yields improved convergence in ill-conditioned cases;

major difficulty: how to choose the poles?

26

Determining the poles: Laplace-Stieltjes case

Here the situation is straightforward because e−tM = eI⊗(−tA)e(−tB)⊗I , so that

vec(X ) = f (M)vec(C)

=∫ ∞

0e−tMvec(C)dµ(t)

= vec(∫ ∞

0e−tACe−tBdµ(t)

)= vec

(∫ ∞0

(e−tAW

) (e−tBZ

)T dµ(t)).

Hence we can choose the same set of poles used in 1D case, for both Krylov subspaces.

27

Convergence result: Laplace Stieltjes case

TheoremThere exists a choice of poles that, for any Laplace-Stieltjes function f applied toM = I ⊗ A + B ⊗ I, where A,B are symmetric positive definite with spectrum contained in[λmin, λmax], yields the convergence

‖X − X`‖2 ≤ 16γ`,κf (0) · ‖C‖2 · ρ`2 , ρ = exp

(π2

log(4λmaxλmin

)

).

28

Numerical results

0 10 20 30 40 5010−15

10−9

10−3

103

Iterations (`)

‖X−

X `‖ 2

Evaluation of vec(X ) = ϕ1(I ⊗ A + A⊗ I)vec(C)

BoundExt. Kryl.Pol. Kryl.Rat. Kryl.

29

Determining the poles: Cauchy Stieltjes case

If f (z) =∫∞

0 dµ(t)/(z + t) then, f (M) =∫∞

0 dµ(t)(tI +M)−1.

Since tI +M = A⊗ I + I ⊗ (B + tI) we have

vec(X ) = f (M)vec(C) ⇐⇒ X =∫ ∞

0Xtdµ(t), AXt + Xt(B + tI) = C .

• Can we design a projection space that is good for all values of t?• This can be related to the Zolotarev problem:

max z ∈ [λmin, λmax]|r(z)|minz∈(−∞,−λmin] |r(z)|

• Using a Mobius transform, this can be mapped (approximately) into a Zolotarev problemon [−2λmax,−λmin] ∪ [λmin, 2λmax].

• We can compute optimal poles there and go back.

30

Convergence result: Cauchy Stieltjes case

TheoremThere exists a choice of poles that, for any Cauchy-Stieltjes function f applied toM = I ⊗ A + B ⊗ A, where A,B are symmetric positive definite with spectrum contained in[λmin, λmax], yields the convergence

‖X − X`‖2 ≤ 4 · f (2λmin) ·(

1 + λmaxλmin

)· ‖C‖2 · ρ−`, ρ = exp

(π2

log(8λmaxλmin

)

).

Note: the exponent for f (z) = z−1 is exp(

π2

log(4λmaxλmin

)

).

31

Decaying singular values

The theory predicts the decay in the singular values as well.

0 10 20 30 40 5010−17

10−12

10−7

10−2

`

Evaluation of vec(X ) = (I ⊗ A + A⊗ I)−0.7vec(C)

Bound‖X` − X‖2σ`(X )

32

Conclusions and outlook

• Practically, nested sequences of poles with the same asymptotic behavior can be used.

Possible extensions:

• Similar result for more general spectra configuration (and normal matrices) — implicitlydepends on the Zolotarev rational approximation problem.

• For non-normal cases, one can resort to using the field-of-values instead of the spectrum.• Divide and conquer methods for right hand sides obtained as vectorizations of banded or

hierarchical matrices.• Higher dimensional Laplace-like operators.

Full story:

• Rational Krylov for Stieltjes matrix functions: convergence and pole selection. S. Massei.,L.R., arXiv, 2019.

33

Date post:	04-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Rational Krylov for Stieltjes matrix functions ...€¦ · Leonardo Robol, UniPI (joint work with...

Documents