Structured preconditioners for nonsingular matrices of block two-by-two structures

By Zhong-Zhi Bai

Abstract

For the large sparse block two-by-two real nonsingular matrices, we establish a general framework of practical and efficient structured preconditioners through matrix transformation and matrix approximations. For the specific versions such as modified block Jacobi-type, modified block Gauss-Seidel-type, and modified block unsymmetric (symmetric) Gauss-Seidel-type preconditioners, we precisely describe their concrete expressions and deliberately analyze eigenvalue distributions and positive definiteness of the preconditioned matrices. Also, we show that when these structured preconditioners are employed to precondition the Krylov subspace methods such as GMRES and restarted GMRES, fast and effective iteration solvers can be obtained for the large sparse systems of linear equations with block two-by-two coefficient matrices. In particular, these structured preconditioners can lead to efficient and high-quality preconditioning matrices for some typical matrices from the real-world applications.

1. Introduction

Let $\mathbb{R}^n$ represent the real $n$-dimensional vector space, and $\mathbb{R}^{n \times n}$ the real $n \times n$ matrix space. Consider an iterative solution of the large sparse system of linear equations

$$\begin{equation} Ax=b, \qquad A \in \mathbb{R}^{n\times n}\ \text{nonsingular} \quad \text{ and } \quad x,b \in \mathbb{R}^n. \cssId{system}{\tag{1.1}} \end{equation}$$

In this paper, we will study algorithmic constructions and theoretical properties of practical and efficient structured preconditioners to the matrix $A \in \mathbb{R}^{n \times n}$ which is of the block two-by-two structure

$$\begin{eqnarray} A=\left[ \begin{array}{cc} B & E \\F & C \end{array} \right], \cssId{texmlid1}{\tag{1.2}} \end{eqnarray}$$

where $B \in \mathbb{R}^{p \times p}$ nonsingular, $C \in \mathbb{R}^{q \times q}$, $E \in \mathbb{R}^{p \times q}$ and $F \in \mathbb{R}^{q \times p}$, with $p \ge q$, such that $A \in \mathbb{R}^{n \times n}$ is nonsingular. Evidently, when the matrix block $B$ is nonsingular, the matrix $A$ is nonsingular if and only if its Schur complement $S_A = C - F B^{-1} E$ is nonsingular.

Linear systems of the form Equation 1.1–Equation 1.2 arise in a variety of scientific and engineering applications, including computational fluid dynamics Reference 21Reference 23Reference 26, mixed finite element approximation of elliptic partial differential equations Reference 16Reference 38, optimization Reference 25Reference 30Reference 34, optimal control Reference 13, weighted and equality constrained least squares estimation Reference 14, stationary semiconductor device Reference 36Reference 42Reference 43, structural analysis Reference 44, electrical networks Reference 44, inversion of geophysical data Reference 31, and so on.

As we have known, preconditioned Krylov subspace methods Reference 40 are efficient iterative solvers for the system of linear equations Equation 1.1–Equation 1.2, and effective and high-quality preconditioners play a crucial role to guarantee their fast convergence and economical costs. A number of structured preconditioners have been studied in the literature for some special cases of the block two-by-two matrix $A$ in Equation 1.2. Besides specialized incomplete factorization preconditioners Reference 17Reference 18 we mention, among others, algebraic multilevel iteration preconditioners Reference 2Reference 3Reference 4Reference 5Reference 12, block and approximate Schur complement preconditioners Reference 21Reference 23, splitting iteration preconditioners Reference 15Reference 19Reference 22Reference 28Reference 29Reference 39Reference 45, block definite and indefinite preconditioners Reference 24Reference 34Reference 38Reference 10, and block triangular preconditioners Reference 35Reference 37Reference 10. Theoretical analyses and experimental results have shown that these preconditioners may lead to nicely clustered eigenvalue distributions of the preconditioned matrices and, hence, result in fast convergence of the preconditioned Krylov subspace iteration methods for solving the large sparse system of linear equations Equation 1.1–Equation 1.2. However, exact inversions of the matrix block $B$ or $C$, as well as the Schur complement $S_A$, are demanded for most of these preconditioners, which makes them less practical and effective in actual applications.

In this paper, by sufficiently utilizing the matrix structure and property, we first establish a general framework of a class of practical and efficient structured preconditioners to the matrix $A \in \mathbb{R}^{n \times n}$ in Equation 1.2 through matrix transformation and several steps of matrix approximations; these preconditioners can avoid the exact inversions of the matrix blocks $B$ and $C$, as well as the Schur complement $S_A$, and cover the known preconditioners mentioned previously as special cases. Then, with this framework we further present a family of practical and efficient preconditioners by technically combining it with the modified block relaxation iterations Reference 6Reference 7, which includes the modified block Jacobi-type, the modified block Gauss-Seidel-type and the modified block unsymmetric (symmetric) Gauss-Seidel-type preconditioners as typical examples. Moreover, we particularly discuss the eigenvalue distributions and the positive definiteness of the preconditioned matrices with respect to the modified block Jacobi-type, the modified block Gauss-Seidel-type, and the modified block unsymmetric (symmetric) Gauss-Seidel-type preconditioners to the block two-by-two matrix $A$, and deliberately address the applications of these preconditioners to three classes of real-world matrices, i.e., the symmetric positive definite matrix, the saddle point matrix and the Hamiltonian matrix. Besides, we show that when these structured preconditioners are employed to precondition the Krylov subspace methods such as GMRES or restarted GMRES, fast and effective iteration solvers can be obtained for the large sparse system of linear equations Equation 1.1–Equation 1.2.

The organization of this paper is as follows. After establishing the general framework of the structured preconditioners in Section 2, we present the modified block splitting iteration preconditioners and study the eigenvalue distributions and the positive definiteness of the corresponding preconditioned matrices in Section 3; connections of these preconditioners to Krylov subspace iteration methods are also briefly discussed in this section. Specifications of these preconditioners to three classes of real-world matrices are investigated in Section 4. Finally, in Section 5, we use a brief conclusion and several remarks to end the paper.

2. General framework of the structured preconditioners

The construction of our structured preconditioners basically includes the following three steps: Firstly, seek two nonsingular block two-by-two matrices $P, Q \in \mathbb{R}^{n \times n}$ such that $P$ and $Q$ are easily invertible and $A=PHQ$ holds for a block two-by-two matrix $H \in \mathbb{R}^{n \times n}$ of certain good properties; secondly, approximate the matrix $H$ by another block two-by-two matrix $\overline{W} \in \mathbb{R}^{n \times n}$ by dropping some higher-order small block quantities; thirdly, approximate the matrix $\overline{W}$ further by another block two-by-two matrix $W \in \mathbb{R}^{n \times n}$ that is also easily invertible. Then, the resulting preconditioners are of the form $M=PWQ$. See Reference 9Reference 11.

Let $L_B, R_B \in \mathbb{R}^{p \times p}$ and $L_C, R_C \in \mathbb{R}^{q \times q}$ be nonsingular matrices such that

$$\begin{equation} L_B^{-1} B R_B^{-1} = J_B \quad \text{and} \quad L_C^{-1} C R_C^{-1} = J_C, \cssId{LBR-inv}{\tag{2.1}} \end{equation}$$

or equivalently,

$$\begin{equation} B = L_B J_B R_B \quad \text{and} \quad C = L_C J_C R_C, \cssId{LBR}{\tag{2.2}} \end{equation}$$

where $J_B \in \mathbb{R}^{p \times p}$ is a matrix approximating the identity matrix $I_B \in \mathbb{R}^{p \times p}$, and $J_C \in \mathbb{R}^{q \times q}$ is a matrix approximating the identity matrix $I_C \in \mathbb{R}^{q \times q}$ when it is positive definite and approximating $-I_C \in \mathbb{R}^{q \times q}$ when it is negative definite. For simplicity, in the sequel we will abbreviate the identity matrices $I_B$ and $I_C$ as $I$, with their dimensions being inferred from the context.

Evidently, $L_B$, $R_B$ and $L_C$, $R_C$ can be considered as split preconditioners to the matrix blocks $B$ and $C$, respectively, whose preconditioning properties can be measured by the approximation degrees of the matrices $J_B$ and $\pm J_C$ to the identity matrix $I$. There are many possible choices of the matrices $L_B$, $R_B$ and $L_C$, $R_C$. For example, they may be the incomplete lower-upper triangular factors Reference 2Reference 40, the incomplete orthogonal triangular factors Reference 8, the approximate inverse preconditioners Reference 40, the splitting iteration matrices Reference 2Reference 6Reference 7Reference 27, the multigrid or the algebraic multilevel approximations Reference 2Reference 3Reference 4Reference 5Reference 12, or even technical combinations of the above-mentioned matrices, to the matrix blocks $B$ and $C$, respectively.

In particular, when $C \in \mathbb{R}^{q \times q}$ is singular, besides the possible choices mentioned above, we may choose $L_C$ and $R_C$ according to the following cases:

(i): If $C$ is a symmetric positive semidefinite matrix, we may let $L_C = R_C = I$. Hence, $J_C = C$ is also symmetric positive semidefinite.
(ii): If $C$ is a symmetric negative semidefinite matrix, we may let $L_C = -I$ and $R_C = I$ (or $L_C = I$ and $R_C = -I$). Hence, $J_C = -C$ is symmetric positive semidefinite. Or we may let $L_C=R_C=I$. Hence, $J_C=C$ is also symmetric negative semidefinite.
(iii): If $C$ is a general singular matrix, we may let $L_C = R_C = I$. Hence, $J_C = C$ is also singular.

To construct a high-quality structured preconditioner to the block two-by-two matrix $A \in \mathbb{R}^{n \times n}$, we introduce matrices

$$\begin{eqnarray*} D_L = \left[ \begin{array}{cc} L_B & O \\O & L_C \end{array} \right], \qquad D_R = \left[ \begin{array}{cc} R_B & O \\O & R_C \end{array} \right] \end{eqnarray*}$$

and

$$\begin{equation} \overline{E} = L_B^{-1} E R_C^{-1}, \qquad \overline{F} = L_C^{-1} F R_B^{-1}, \cssId{EFbar}{\tag{2.3}} \end{equation}$$

where $O$ denotes the zero matrix. Then from Equation 2.2 we have

$$\begin{eqnarray*} A &=& \left[ \begin{array}{cc} B & E \\F & C \end{array} \right] = \left[ \begin{array}{cc} L_B J_B R_B & E \\F & L_C J_C R_C \end{array} \right] \\ &=& \left[ \begin{array}{cc} L_B & O \\O & L_C \end{array} \right] \left[ \begin{array}{cc} J_B & L_B^{-1} E R_C^{-1} \\L_C^{-1} F R_B^{-1} & J_C \end{array} \right] \left[ \begin{array}{cc} R_B & O \\O & R_C \end{array} \right] \\ &:=& D_L \overline{A} D_R, \end{eqnarray*}$$

where

$$\begin{eqnarray*} \overline{A} := \left[ \begin{array}{cc} J_B & \overline{E} \\\overline{F} & J_C \end{array} \right]. \end{eqnarray*}$$

Furthermore, we can find a unit lower triangular matrix $L \in \mathbb{R}^{n \times n}$ and a unit upper triangular matrix $U \in \mathbb{R}^{n \times n}$ of block two-by-two structures such that $H = L \overline{A} U$ is block-diagonally dominant as far as possible and may also possess some other desired good properties.

In fact, if we let

$$\begin{eqnarray*} L = \left[ \begin{array}{cc} I & O \\L_{21} & I \end{array} \right] \quad \text{and} \quad U = \left[ \begin{array}{cc} I & U_{12} \\O & I \end{array} \right], \end{eqnarray*}$$

then by concrete computations we obtain

$$\begin{eqnarray*} H & = & \left[ \begin{array}{cc} H_{11} & H_{12} \\H_{21} & H_{22} \end{array} \right] := L \overline{A} U \\ & = & \left[ \begin{array}{cc} I & O \\L_{21} & I \end{array} \right] \left[ \begin{array}{cc} J_B & \overline{E} \\\overline{F} & J_C \end{array} \right] \left[ \begin{array}{cc} I & U_{12} \\O & I \end{array} \right] \end{eqnarray*}$$

with

$$\begin{eqnarray*} \left\{ \begin{array}{lllllll} H_{11} &=& J_B, & \qquad & H_{12} &=& J_B U_{12} + \overline{E}, \\H_{21} &=& L_{21} J_B + \overline{F}, & \qquad & H_{22} &=& J_C + L_{21} J_B U_{12} + L_{21} \overline{E} + \overline{F} U_{12}, \end{array} \right. \end{eqnarray*}$$

and

$$\begin{equation*} A = D_L \overline{A} D_R = D_L (L^{-1} H U^{-1}) D_R = (D_L L^{-1}) H (U^{-1} D_R) := P H Q \end{equation*}$$

with

$$\begin{equation} P:= D_L L^{-1} = \left[ \begin{array}{cc} L_B & O \\O & L_C \end{array} \right] \left[ \begin{array}{cc} I & O \\- L_{21} & I \end{array} \right] = \left[ \begin{array}{cc} L_B & O \\- L_C L_{21} & L_C \end{array} \right] \cssId{P-matrix}{\tag{2.4}} \end{equation}$$

and

$$\begin{equation} Q:= U^{-1} D_R = \left[ \begin{array}{cc} I & - U_{12} \\O & I \end{array} \right] \left[ \begin{array}{cc} R_B & O \\O & R_C \end{array} \right] = \left[ \begin{array}{cc} R_B & - U_{12} R_C \\O & R_C \end{array} \right]. \cssId{Q-matrix}{\tag{2.5}} \end{equation}$$

We can now choose the matrices $L$ and $U$ such that either of the following two principles is satisfied as far as possible:

($P_1$): the matrix $H$ is block-diagonally dominant and symmetric;
($P_2$): the matrix $H$ is block-diagonally dominant and skew-symmetric.

This is because if the matrix $H$ satisfies either of the principles ($P_1$) and ($P_2$), we can easily construct a good approximation to it, and hence, obtain a high-quality preconditioner $M$ to the original matrix $A$.

According to both ($P_1$) and ($P_2$), we can take $L_{21}$ and $U_{12}$ such that

$$\begin{eqnarray*} \left\{ \begin{array}{lll} H_{21} &=& L_{21} J_B + \overline{F} \approx (J_B U_{12} + \overline{E})^T = \pm H_{12}^T, \\H_{21} &=& L_{21} J_B + \overline{F} \approx O. \end{array} \right. \end{eqnarray*}$$

Recalling that $J_B \approx I$, we can let

$$\begin{equation*} L_{21} = - \overline{F} \quad \text{and} \quad U_{12} = - \overline{E}. \end{equation*}$$

Thus, for both cases, it follows from Equation 2.4 and Equation 2.5 that the matrices $P$ and $Q$ have the following expressions:

$$\begin{equation} P = \left[ \begin{array}{cc} L_B & O \\L_C \overline{F} & L_C \end{array} \right], \qquad Q = \left[ \begin{array}{cc} R_B & \overline{E} R_C \\O & R_C \end{array} \right]. \cssId{PQ}{\tag{2.6}} \end{equation}$$

Therefore, for these choices of the matrices $P$ and $Q$, we have

$$\begin{equation} \begin{split} H &= \left[ \begin{array}{cc} J_B & (I-J_B) \overline{E} \\\overline{F} (I-J_B) & J_C - \overline{F} \overline{E} - \overline{F} (I-J_B) \overline{E} \end{array} \right] \\ &\approx \left[ \begin{array}{cc} J_B & (I-J_B) \overline{E} \\\overline{F} (I-J_B) & J_C - \overline{F} \overline{E} \end{array} \right] := \overline{W}. \end{split}\cssId{HW}{\tag{2.7}} \end{equation}$$

Because the nonsingularity of the matrix $A$ implies that the matrix $\overline{A}$ and its Schur complement $S_{\overline{A}}:=J_C - \overline{F} J_B^{-1} \overline{E}$ are nonsingular, and

$$\begin{equation} J_C - \overline{F} \overline{E} = S_{\overline{A}} + \overline{F} (I-J_B) J_B^{-1} \overline{E} \cssId{AppSchur}{\tag{2.8}} \end{equation}$$

and the Schur complement of $\overline{W}$ is

$$\begin{equation*} S_{\overline{W}} := J_C - \overline{F} \overline{E} - \overline{F} (I-J_B) J_B^{-1} (I-J_B) \overline{E} = S_{\overline{A}} - \overline{F} (I-J_B) J_B^{-1} \overline{E}, \end{equation*}$$

we immediately know that when

$$\begin{equation} \|I-J_B\|_2 < \max \left\{ \frac{1}{1+\|\overline{E}\|_2\|S_{\overline{A}}^{-1} \overline{F}\|_2}, \quad \frac{1}{1+\|\overline{E} S_{\overline{A}}^{-1} \|_2\|\overline{F}\|_2} \right\}, \cssId{inv-cond}{\tag{2.9}} \end{equation}$$

both matrices $J_C-\overline{F} \overline{E}$ and $\overline{W}$ are nonsingular.

Now, if we let $W \in \mathbb{R}^{n \times n}$ be a nonsingular “replacement” of the matrix $\overline{W}$, or in other words, a “replacement” to the matrix $H$, then the matrix

$$\begin{equation} M = P W Q \cssId{PreconM}{\tag{2.10}} \end{equation}$$

is a natural preconditioner to the original matrix $A \in \mathbb{R}^{n \times n}$, and under the condition Equation 2.9 this preconditioner is well defined.

Note that here we use the term “replacement” rather than “approximation”. This is because sometimes we may choose the matrix $W$ being not an approximation to $\overline{W}$ in the usual sense so that the obtained preconditioner and the preconditioned matrix can possess some desired properties such as positive definiteness and, hence, a specified Krylov subspace iteration method may exploit its efficiency sufficiently.

If $M$ is used as a left preconditioner to $A$, then

$$\begin{equation} M^{-1} A = (PWQ)^{-1} (PHQ) = Q^{-1} (W^{-1}H) Q := Q^{-1} K_L Q \cssId{L-MA}{\tag{2.11}} \end{equation}$$

with

$$\begin{equation} K_L = W^{-1} H. \cssId{KL}{\tag{2.12}} \end{equation}$$

Therefore, the preconditioning property of $M$ to $A$ is determined by the properties of the matrices $K_L$ and $Q$. If $M$ is used as a right preconditioner to $A$, then

$$\begin{equation} A M^{-1} = (PHQ) (PWQ)^{-1} = P (HW^{-1}) P^{-1} := P K_R P^{-1} \cssId{R-AM}{\tag{2.13}} \end{equation}$$

with

$$\begin{equation} K_R = H W^{-1}. \cssId{KR}{\tag{2.14}} \end{equation}$$

Therefore, the preconditioning property of $M$ to $A$ is determined by the properties of the matrices $K_R$ and $P$. In general, if the matrix $W$ admits a split form

$$\begin{equation} W = W_L W_R, \cssId{WLR}{\tag{2.15}} \end{equation}$$

then Equation 2.10 straightforwardly leads to a split preconditioner

$$\begin{equation} M = (P W_L) (W_R Q) := M_L M_R, \quad \text{with} \quad M_L = P W_L \quad \text{and} \quad M_R = W_R Q \cssId{MLR}{\tag{2.16}} \end{equation}$$

to the original matrix $A$. Because

$$\begin{equation} M_L^{-1} A M_R^{-1} = (P W_L)^{-1} (PHQ) (W_R Q)^{-1} = W_L^{-1} H W_R^{-1} := K, \cssId{LR-MA}{\tag{2.17}} \end{equation}$$

we see that the preconditioning property of $M$ to $A$ is determined by the property of the matrix $K$.

Evidently, the matrices $K_L$, $K_R$ and $K$ are similar, and hence, they have exactly the same spectrum. However, the eigenvectors of these kinds of preconditioned matrices are usually quite different, which may lead to different performance results of the corresponding preconditioned Krylov subspace iteration methods.

In actual applications, when the matrix $M$ defined in Equation 2.10 is employed as a preconditioner to some Krylov subspace iteration method for solving the block two-by-two system of linear equations Equation 1.1, we need to solve a generalized residual equation of the form

$$\begin{equation} M z = r \cssId{resi-system}{\tag{2.18}} \end{equation}$$

at each iteration step, where $r$ is the current residual vector. By making use of the two-by-two block structure of $M$, we can obtain the following practical procedure for computing the generalized residual vector $z = M^{-1} r$.

Procedure for computing the generalized residual vector.

Let $r\!=\!(r_1^T\!,r_2^T)^T\!,$ $z=(z_1^T,z_2^T)^T$ and $w=(w_1^T,w_2^T)^T$, with $r_1, z_1, w_1 \in \mathbb{R}^p$ and $r_2, z_2, w_2 \in \mathbb{R}^q$.

1.: Solve $L_B t_1 = r_1$ and $L_C t_2 = r_2$ to get $t_1$ and $t_2$, and let $t_2 := t_2 + \overline{F} t_1$.
2.: Solve $W w = t$ to get $w$, with $t = (t_1^T, t_2^T)^T$.
3.: Solve $R_C z_2 = w_2$ and $R_B z_1 = w_1 - \overline{E} w_2$ to get $z_1$ and $z_2$.

When the approximation matrix $W \in \mathbb{R}^{n \times n}$ to the matrix $\overline{W} \in \mathbb{R}^{n \times n}$ is specified, a concrete procedure for computing the generalized residual vector $z \in \mathbb{R}^n$ defined by Equation 2.18 can be straightforwardly obtained from this procedure.

Usually, the matrix $W \in \mathbb{R}^{n \times n}$ may involve information about the matrices $J_B$, $J_C$, $\overline{E}$ and $\overline{F}$. Therefore, to solve the linear system $Ww=t$ we may need to compute the vectors

$$\begin{eqnarray*} \left\{ \begin{array}{lllrll} \overline{w}_1 &=& J_B w_1 = L_B^{-1} B R_B^{-1} w_1, & \quad \widetilde{w}_1 &=& \overline{F} w_1 = L_C^{-1} F R_B^{-1} w_1, \\\overline{w}_2 &=& J_C w_2 = L_C^{-1} C R_C^{-1} w_2, & \quad \widetilde{w}_2 &=& \overline{E} w_2 = L_B^{-1} E R_C^{-1} w_2. \end{array} \right. \end{eqnarray*}$$

These vectors can be economically computed by the following formulas:

1.: Solve $R_B t_1 = w_1$.
2.: Solve $L_B \overline{w}_1 = B t_1$, $L_C \widetilde{w}_1 = F t_1$.
3.: Solve $R_C t_2 = w_2$.
4.: Solve $L_C \overline{w}_2 = C t_2$, $L_B \widetilde{w}_2 = E t_2$.

3. Several practical structured preconditioners

In this section, we will construct three classes of structured approximations $W$ to the block two-by-two matrix $\overline{W}$, or in other words, to the block two-by-two matrix $H$ in Equation 2.7, by making use of the modified block Jacobi, the modified block Gauss-Seidel and the modified block unsymmetric Gauss-Seidel splittings of $\overline{W}$. See Reference 6Reference 7 for details. Therefore, three types of structured preconditioners to the original block two-by-two matrix $A \in \mathbb{R}^{n \times n}$, called the modified block Jacobi-type (MBJ-type) preconditioner, the modified block Gauss-Seidel-type (MBGS-type) preconditioner and the modified block unsymmetric Gauss-Seidel-type (MBUGS-type) preconditioner, can be obtained, correspondingly.

To analyze the spectral property of the preconditioned matrices with respect to the above-mentioned preconditioners, we need the following two basic facts.

Lemma 3.1.

Let $L \in \mathbb{R}^{(p+q) \times (p+q)}$ and $U \in \mathbb{R}^{(p+q) \times (p+q)}$ be unit lower and upper triangular matrices of the block two-by-two forms

$$\begin{eqnarray*} L = \left[ \begin{array}{cc} I & O \\L_{21} & I \end{array} \right] \quad \text{and} \quad U = \left[ \begin{array}{cc} I & U_{12} \\O & I \end{array} \right], \end{eqnarray*}$$

where $L_{21} \in \mathbb{R}^{q \times p}$ and $U_{12} \in \mathbb{R}^{p \times q}$. Let

$$\begin{eqnarray} \gamma (t) = \left[ 1 + \frac{1}{2} t \left( t + \sqrt {t^2+4} \right) \right]^{ \frac{1}{2} } \cssId{fung}{\tag{3.1}} \end{eqnarray}$$

be a monotone increasing function with respect to $t$ in the interval $[0, + \infty )$. Then it follows that

$$\begin{equation*} \|L\|_2 = \gamma (\|L_{21}\|_2) \quad \text{and} \quad \|U\|_2 = \gamma (\|U_{12}\|_2). \end{equation*}$$

Proof.

By direct computations we have

$$\begin{eqnarray*} L^T L = \left[ \begin{array}{cc} I & L_{21}^T \\O & I \end{array} \right] \left[ \begin{array}{cc} I & O \\L_{21} & I \end{array} \right] = \left[ \begin{array}{cc} I+L_{21}^T L_{21} & L_{21}^T \\L_{21} & I \end{array} \right]. \end{eqnarray*}$$

Without loss of generality, we assume $p \ge q$. From Theorem 2.5.2 in Reference 27, page 70 we know that the matrix $L_{21}$ admits a singular value decomposition (SVD), i.e., there exist two orthogonal matrices $V_1 \in \mathbb{R}^{q \times q}$ and $V_2 \in \mathbb{R}^{p \times p}$ and a matrix $\widetilde{\Sigma }=[\Sigma ,O] \in \mathbb{R}^{q \times p}$, with $\Sigma ={\mathrm{diag}}(\sigma _1, \sigma _2,\ldots ,\sigma _q) \in \mathbb{R}^{q \times q}$ being a nonnegative diagonal matrix having the maximum diagonal entry $\sigma _1=\|L_{21}\|_2$, such that $L_{21} = V_1^T \widetilde{\Sigma }V_2$ holds. Define

$$\begin{eqnarray*} V= \left[ \begin{array}{cc} V_2 & O \\O & V_1 \end{array}\right]. \end{eqnarray*}$$

Then $V$ is an orthogonal matrix, too. It follows from concrete computations that

$$\begin{eqnarray*} L^T L &=& V^T \left[ \begin{array}{ccc} I+\Sigma ^2 & O & \Sigma \\O & I & O \\\Sigma & O & I \end{array}\right] V. \end{eqnarray*}$$

Therefore, detailed analysis shows that the eigenvalues of the matrix $L^T L$ are $1$ with multiplicity $p-q$ and

$$\begin{equation*} 1+\frac{1}{2} \sigma _k \left( \sigma _k \pm \sqrt {\sigma _k^2+4}\right), \quad k=1,2,\ldots ,q. \end{equation*}$$

It then follows straightforwardly that the spectral radius of the matrix $L^T L$, say $\rho (L^T L)$, is given by

$$\begin{eqnarray*} \rho (L^T L) &=& 1+\frac{1}{2} \sigma _1 \left( \sigma _1 + \sqrt {\sigma _1^2+4}\right) \\ &=& 1+\frac{1}{2} \|L_{21}\|_2 \left( \|L_{21}\|_2 + \sqrt {\|L_{21}\|_2^2+4}\right), \end{eqnarray*}$$

and therefore,

$$\begin{eqnarray*} \|L\|_2 = \rho (L^T L)^{\frac{1}{2}} = \left[ 1+\frac{1}{2} \|L_{21}\|_2 \left( \|L_{21}\|_2 + \sqrt {\|L_{21}\|_2^2+4}\right) \right]^{\frac{1}{2}} = \gamma (\|L_{21}\|_2). \end{eqnarray*}$$

The proof of the second equality can be demonstrated in a similar fashion.

■

We remark that for the real one-variable function $\gamma (t)$ defined by Equation 3.1, the estimate $\gamma (t) \le t+1$ holds for all $t \in [0, +\infty )$ because of $\sqrt {t^2+4} \le t+2$ and $\sqrt {t^2+t+1} \le t+1$.

Lemma 3.2.

Let $\Lambda ={\mathrm{diag}}(\lambda _1,\lambda _2,\ldots ,\lambda _n) \in \mathbb{C}^{n \times n}$ be a diagonal matrix, and $Y \in \mathbb{C}^{n \times n}$ a given matrix, where $\mathbb{C}^{n \times n}$ represents the complex $n \times n$ matrix space. If there exists a positive constant $\rho _y$ such that $\|\Lambda - Y\|_2 \le \rho _y$, then all eigenvalues of the matrix $Y$ are located within $\bigcup _{i=1}^n {\mathcal{N}}(\lambda _i,\rho _y)$, where ${\mathcal{N}}(\lambda _i,\rho _y)$ denotes the circle having center $\lambda _i$ and radius $\rho _y$ on the complex plane.

Proof.

Let $\lambda$ be an eigenvalue of the matrix $Y \in \mathbb{C}^{n \times n}$ and $v$ be the corresponding normalized eigenvector. Then we have $(\Lambda - Y) v = (\Lambda - \lambda I) v$. Hence,

$$\begin{equation*} \|(\Lambda - \lambda I) v\|_2 = \|(\Lambda - Y)v\|_2 \le \|\Lambda - Y\|_2 \le \rho _y. \end{equation*}$$

It then follows that $\|\Lambda - \lambda I\|_2 \le \rho _y$. Therefore, it follows that $|\lambda -\lambda _i| \le \rho _y$ ($i=1,2,\ldots ,n$), or equivalently, $\lambda \in \bigcup _{i=1}^n {\mathcal{N}}(\lambda _i,\rho _y)$.

■

For the simplicity of our statements, in the sequel we always use $\gamma : (0, +\infty ) \to (0, +\infty )$ to represent the function defined by Equation 3.1. For the matrices $J_B$ in Equation 2.1 and $\overline{E}$, $\overline{F}$ in Equation 2.3, we write

$$\begin{equation*} \overline{\Delta }_1 = \overline{F} (I-J_B) \overline{E} \quad \text{and} \quad \overline{\Delta }_2 = \overline{F} (I-J_B)^2 \overline{E}, \end{equation*}$$

and denote the $(2,2)$-block entry of the matrix $\overline{W}$ in Equation 2.7 by $\overline{S}$, i.e.,

$$\begin{eqnarray} \overline{S} = J_C - \overline{F} \overline{E}. \cssId{Sbar}{\tag{3.2}} \end{eqnarray}$$

Assume $\overline{W}$ and $\overline{S}$ are nonsingular, let $S$ be a nonsingular matrix that is a replacement to $\overline{S}$ (e.g., $S = \pm (I-\overline{F}\overline{E})$ or $S = \pm (I-{\mathrm{diag}}(\overline{F}\overline{E}))$, etc.), and define the quantities

$$\begin{equation*} \Theta =\|\overline{E}\|_2, \quad \Gamma =\|\overline{F}\|_2, \quad \Theta _s=\|\overline{E} S^{-1}\|_2, \quad \Gamma _s=\|S^{-1} \overline{F}\|_2. \end{equation*}$$

In addition, in the case that $S$ is an approximation to $\overline{S}$, we define the quantities

$$\begin{equation*} \epsilon _L = \max \{ \|I-J_B\|_2, \|I-S^{-1} \overline{S}\|_2\}, \quad \epsilon _R = \max \{ \|I-J_B\|_2, \|I-\overline{S} S^{-1}\|_2\}; \end{equation*}$$

and in the case that $S$ is an approximation to $-\overline{S}$, instead of $\epsilon _L$ and $\epsilon _R$ we use the quantities

$$\begin{equation*} \widetilde{\epsilon }_L = \max \{ \|I-J_B\|_2, \|I+S^{-1} \overline{S}\|_2\}, \quad \widetilde{\epsilon }_R = \max \{ \|I-J_B\|_2, \|I+\overline{S} S^{-1}\|_2\}. \end{equation*}$$

For two positive constants $\rho _L^{(\xi )}$ and $\rho _R^{(\xi )}$, to be specified later, we use ${\mathcal{N}}^{(\xi )}$ to denote the circle having center $(1,0)$ and radius $\rho ^{(\xi )}:=\min \{\rho _L^{(\xi )} \epsilon _L, \rho _R^{(\xi )} \epsilon _R\}$, and use $\widetilde{\mathcal{N}}^{(\xi )}$ to denote the union of the two circles having centers $(-1,0)$ and $(1,0)$ and radius $\widetilde{\rho }^{(\xi )}:=\min \{\rho _L^{(\xi )}\widetilde{\epsilon }_L, \rho _R^{(\xi )}\widetilde{\epsilon }_R\}$, on the complex plane, respectively.

By making use of the above notation, the nonsingularity of the matrices $\overline{S}$ and $\overline{W}$ can be precisely described by the following lemma.

Lemma 3.3.

The matrices $\overline{S}$ and $\overline{W}$ are nonsingular, provided either of the following conditions holds:

(1)

$S$ is an approximation to $\overline{S}$, and

(a): $\epsilon _L < 1+\Theta \Gamma _s - \sqrt {\Theta \Gamma _s (\Theta \Gamma _s + 2) }$, or
(b): $\epsilon _R < 1+\Theta _s \Gamma - \sqrt {\Theta _s \Gamma (\Theta _s \Gamma + 2)}$;

(2)

$S$ is an approximation to $- \overline{S}$, and

(a): $\widetilde{\epsilon }_L < 1+\Theta \Gamma _s - \sqrt {\Theta \Gamma _s (\Theta \Gamma _s + 2)}$, or
(b): $\widetilde{\epsilon }_R < 1+\Theta _s \Gamma - \sqrt {\Theta _s \Gamma (\Theta _s \Gamma + 2)}$.

Proof.

We only prove (1a), as the other conclusions can be demonstrated analogously.

Because $\|I-J_B\|_2 \le \epsilon _L < 1$, it follows that

$$\begin{equation*} \|J_B^{-1}\|_2 = \|[I-(I-J_B)]^{-1}\|_2 \le \frac{1}{1-\|I-J_B\|_2}. \end{equation*}$$

From Equation 2.8 we have

$$\begin{equation*} S_{\overline{A}} = \overline{S} - \overline{F} (I-J_B) J_B^{-1} \overline{E}. \end{equation*}$$

Hence,

$$\begin{eqnarray*} \|I-S^{-1} S_{\overline{A}}\|_2 &\le & \| I-S^{-1}\overline{S} \|_2 + \|\overline{E}\|_2 \|S^{-1} \overline{F}\|_2 \cdot \frac{\|I-J_B\|_2}{1-\|I-J_B\|_2} \\ &\le & \left(1+\frac{\Theta \Gamma _s}{1-\|I-J_B\|_2} \right) \cdot \max \{\|I-J_B\|_2, \quad \| I-S^{-1}\overline{S} \|_2 \} \\ &\le & \left(1+\frac{\Theta \Gamma _s}{1-\epsilon _L} \right) \epsilon _L \\ &<& 1. \end{eqnarray*}$$

It then follows that

$$\begin{equation*} \|S_{\overline{A}}^{-1} \overline{F} \|_2 \le \frac{\|S^{-1} \overline{F}\|_2}{1-\|I-S^{-1} S_{\overline{A}}\|_2} \le \frac{\Gamma _s (1-\epsilon _L)}{(1-\epsilon _L)^2 - \Theta \Gamma _s \epsilon _L}. \end{equation*}$$

Now, we easily see that Equation 2.9 holds when

$$\begin{equation*} \epsilon _L < \frac{1}{1 +\frac{\Theta \Gamma _s (1-\epsilon _L)}{(1-\epsilon _L)^2 - \Theta \Gamma _s \epsilon _L} } = \frac{(1-\epsilon _L)^2 - \Theta \Gamma _s \epsilon _L}{(1-\epsilon _L)^2 - \Theta \Gamma _s \epsilon _L + \Theta \Gamma _s (1-\epsilon _L) }, \end{equation*}$$

or equivalently, $\epsilon _L^2 - 2 (1+\Theta \Gamma _s) \epsilon _L + 1 >0$. Therefore, when

$$\begin{equation*} \epsilon _L < 1+\Theta \Gamma _s - \sqrt {\Theta \Gamma _s (\Theta \Gamma _s + 2)}, \end{equation*}$$

the matrices $\overline{S}$ and $\overline{W}$ are nonsingular.

■

We first consider the case that $S \approx \overline{S}$. The case that $S \approx - \overline{S}$ will be discussed in Section 3.4.

3.1. The MBJ-type preconditioners

If the matrix $W \in \mathbb{R}^{n \times n}$ in Equation 2.10 is taken to be the modified block Jacobi splitting matrix Reference 6Reference 7 of the matrix $\overline{W}$ in Equation 2.7, i.e.,

$$\begin{equation} W^{(J)} : = W = \left[ \begin{array}{cc} I & O \\O & S \end{array}\right], \cssId{PreconJ}{\tag{3.3}} \end{equation}$$

then we obtain the modified block Jacobi-type (MBJ-type) preconditioner $M=P W^{(J)} Q$ to the original matrix $A \in \mathbb{R}^{n \times n}$. Note that when $S \in \mathbb{R}^{q \times q}$ is symmetric positive definite, $W^{(J)}$ is a symmetric positive definite matrix, and when $S \in \mathbb{R}^{q \times q}$ is symmetric negative definite, $W^{(J)}$ is a symmetric indefinite matrix.

The following theorem describes the eigenvalue distribution of the preconditioned matrix with respect to the MBJ-type preconditioner.

Theorem 3.1.

Let $M=P W^{(J)} Q$ be the MBJ-type preconditioner to the block two-by-two matrix $A=PHQ \in \mathbb{R}^{n \times n}$ in Equation 1.2, where $P$ and $Q$ are given by Equation 2.6, $H$ is given by Equation 2.7, and $W^{(J)}$ is defined by Equation 3.3. Let $K_L=W^{(J)^{-1}}H$ and $K_R=HW^{(J)^{-1}}$. Then it follows that

(i): $\|I-K_L\|_2 \le \rho _L^{(J)} \epsilon _L$, with $\rho _L^{(J)} = \gamma (\Theta ) \cdot \gamma (\Gamma _s)$; and
(ii): $\|I-K_R\|_2 \le \rho _R^{(J)} \epsilon _R$, with $\rho _R^{(J)} = \gamma (\Theta _s) \cdot \gamma (\Gamma )$.

It follows from Lemma 3.2 as well as Equation 2.11 and Equation 2.13 that the eigenvalues of the matrices $M^{-1}A$ and $AM^{-1}$ are located within a circle having center $(1,0)$ and radii $\rho _L^{(J)} \epsilon _L$ and $\rho _R^{(J)} \epsilon _R$, respectively, and therefore, they are all within the circle ${\mathcal{N}}^{(J)}$.

Proof.

We only prove (i), as (ii) can be verified analogously.

From Equation 2.7 and Equation 3.3 we have

$$\begin{equation*} K_L = W^{(J)^{-1}} H = \left[ \begin{array}{cc} J_B & (I-J_B) \overline{E} \\S^{-1} \overline{F} (I-J_B) & S^{-1} \overline{S} - S^{-1} \overline{\Delta }_1 \end{array} \right]. \end{equation*}$$

Hence,

$$\begin{equation*} I-K_L = \left[ \begin{array}{cc} I & O \\-S^{-1} \overline{F} & I \end{array} \right] \left[ \begin{array}{cc} I-J_B & O \\O & I-S^{-1} \overline{S} \end{array} \right] \left[ \begin{array}{cc} I & - \overline{E} \\O & I \end{array} \right]. \end{equation*}$$

By making use of Lemma 3.1 we can immediately obtain

$$\begin{equation*} \begin{split} \|I-K_L\|_2 &\le \gamma (\|S^{-1} \overline{F}\|_2) \cdot \gamma (\|\overline{E}\|_2) \cdot \max \{ \|I-J_B\|_2, \quad \|I-S^{-1} \overline{S}\|_2 \} \\ & = \gamma (\Theta ) \cdot \gamma (\Gamma _s) \cdot \max \{ \|I-J_B\|_2, \quad \|I-S^{-1} \overline{S}\|_2 \} \\ & = \rho _L^{(J)} \epsilon _L. \end{split} \end{equation*}$$■

Furthermore, when the matrix $J_B$ is positive definite, we can demonstrate the positive definiteness of the matrices $K_L=W^{(J)^{-1}}H$ and $K_R=HW^{(J)^{-1}}$.

Theorem 3.2.

Let the matrix $J_B$ be positive definite. Then

(i): the matrix $K_L=W^{(J)^{-1}}H$ is positive definite, provided $\epsilon _L < \delta _L^{(J)}$, where$$\begin{equation*} \delta _L^{(J)} = \frac{ 2 \left( \Theta \Gamma _s + 2 - \sqrt {\Theta ^2\Gamma _s^2 +(\Theta +\Gamma _s)^2} \right)}{4-(\Theta - \Gamma _s)^2 } < 1; \end{equation*}$$
(ii): the matrix $K_R=HW^{(J)^{-1}}$ is positive definite, provided $\epsilon _R < \delta _R^{(J)}$, where$$\begin{equation*} \delta _R^{(J)} = \frac{ 2 \left( \Theta _s \Gamma + 2 - \sqrt {\Theta _s^2\Gamma ^2 +(\Theta _s+\Gamma )^2} \right)}{4-(\Theta _s - \Gamma )^2 } < 1. \end{equation*}$$

Proof.

We only prove the validity of (i), as (ii) can be demonstrated similarly.

Some straightforward computations immediately show that $\delta _L^{(J)} < 1$. Let

$$\begin{equation*} T=\left[ \begin{array}{cc} T_{11} & T_{12} \\T_{21} & T_{22} \end{array}\right] \equiv \frac{1}{2} (K_L + K_L^T). \end{equation*}$$

Then from the proof of Theorem 3.1 we easily obtain

$$\begin{eqnarray*} \left\{ \begin{array}{lll} T_{11} &=& \frac{1}{2} (J_B + J_B^T), \\T_{12} &=& \frac{1}{2} [(I-J_B) \overline{E} + (I-J_B^T) \overline{F}^T S^{-T} ], \\T_{21} &=& \frac{1}{2} [ S^{-1} \overline{F} (I-J_B) + \overline{E}^T (I-J_B^T) ], \\T_{22} &=& \frac{1}{2} [S^{-1} \overline{S} + \overline{S}^T S^{-T}] -\frac{1}{2} [S^{-1} \overline{\Delta }_1 + \overline{\Delta }_1^T S^{-T}]. \end{array} \right. \end{eqnarray*}$$

Because $J_B$ is positive definite, we know that its symmetric part $\frac{1}{2} (J_B + J_B^T)$ is symmetric positive definite. Therefore, the matrix $T$ is symmetric positive definite if and only if so is its Schur complement $S_T := T_{22} - T_{21} T_{11}^{-1} T_{12}$.

Since

$$\begin{equation*} \|I-T_{11}\|_2 \le \|I-J_B\|_2 \le \epsilon _L < \delta _L^{(J)} < 1, \end{equation*}$$

we have

$$\begin{equation*} \|T_{11}^{-1}\|_2 = \|[I-(I-T_{11})]^{-1}\|_2 \le \frac{1}{1-\|I-J_B\|_2}. \end{equation*}$$

By direct computations we immediately obtain

$$\begin{eqnarray*} \|T_{12}\|_2 = \|T_{21}\|_2 & \le & \frac{1}{2} \left(\|S^{-1} \overline{F}\|_2 \|I-J_B\|_2 + \| \overline{E}^T \|_2 \|I-J_B^T\|_2 \right) \\ & = & \frac{1}{2} (\Theta + \Gamma _s) \|I-J_B\|_2 \end{eqnarray*}$$

and

$$\begin{eqnarray*} \|I-T_{22}\|_2 & \le & \frac{1}{2} \left( \|I-S^{-1} \overline{S}\|_2 +\|I-\overline{S}^T S^{-T} \|_2 \right) \\ && + \frac{1}{2} \left( \|S^{-1} \overline{\Delta }_1\|_2 +\|\overline{\Delta }_1^T S^{-T}\|_2 \right) \\ & \le & \|I-S^{-1} \overline{S}\|_2 + \|S^{-1} \overline{F}\|_2 \|\overline{E}\|_2 \|I-J_B\|_2 \\ & \le & (1+\Theta \Gamma _s) \cdot \max \{\|I-J_B\|_2, \quad \|I-S^{-1} \overline{S}\|_2\} \\ &=& (1 + \Theta \Gamma _s) \epsilon _L. \end{eqnarray*}$$

It then follows that

$$\begin{eqnarray*} \min _{x \ne 0} \frac{\langle x, S_T x \rangle }{\langle x, x \rangle } &\ge & 1-\max _{x \ne 0} \frac{\langle x, (I-T_{22}) x \rangle }{\langle x, x \rangle } -\max _{x \ne 0} \frac{\langle x, T_{21}T_{11}^{-1}T_{12} x \rangle }{\langle x, x \rangle } \\ &\ge & 1 - \|I-T_{22}\|_2 - \|T_{21}T_{11}^{-1}T_{12}\|_2 \\ &\ge & 1 - \left( (1 + \Theta \Gamma _s) \epsilon _L + \frac{(\Theta + \Gamma _s)^2 \|I-J_B\|_2^2}{4(1-\|I-J_B\|_2)} \right) \\ &\ge & 1 - \left( 1 + \Theta \Gamma _s+ \frac{(\Theta + \Gamma _s)^2 }{4(1-\|I-J_B\|_2)} \cdot \|I-J_B\|_2 \right) \epsilon _L \\ & \ge & 1 - \left( 1 + \Theta \Gamma _s + \frac{(\Theta + \Gamma _s)^2 }{4(1-\epsilon _L)} \cdot \epsilon _L \right) \epsilon _L. \end{eqnarray*}$$

Noticing that

$$\begin{equation*} \left( 1 + \Theta \Gamma _s + \frac{(\Theta + \Gamma _s)^2 }{4(1-\epsilon _L)} \cdot \epsilon _L \right) \epsilon _L <1 \end{equation*}$$

holds if and only if

$$\begin{equation*} 4 \Theta \Gamma _s \epsilon _L + (\Theta - \Gamma _s)^2 \epsilon _L^2 < 4 (1- \epsilon _L)^2, \end{equation*}$$

or equivalently,

$$\begin{equation*} \epsilon _L < \frac{ 2 \left( \Theta \Gamma _s + 2 - \sqrt {\Theta ^2\Gamma _s^2 +(\Theta +\Gamma _s)^2} \right)}{4-(\Theta - \Gamma _s)^2 }, \end{equation*}$$

we therefore know that $\min _{x \ne 0} \frac{\langle x, S_T x \rangle }{\langle x, x \rangle }>0$ holds true when $\epsilon _L < \delta _L^{(J)}$. Hence, $S_T$ is a symmetric positive definite matrix, and $K_L$ is a positive definite matrix.

■

3.2. The MBGS-type preconditioners

If the matrix $W \in \mathbb{R}^{n \times n}$ in Equation 2.10 is taken to be the modified block Gauss-Seidel splitting matrix Reference 6Reference 7 of the matrix $\overline{W}$ in Equation 2.7, i.e.,

$$\begin{equation} W^{(GS)} : = W = \left[ \begin{array}{cc} I & O \\\overline{F} (I-J_B) & S \end{array}\right], \cssId{PreconGS}{\tag{3.4}} \end{equation}$$

then we obtain the modified block Gauss-Seidel-type (MBGS-type) preconditioner $M=P W^{(GS)} Q$ to the original matrix $A \in \mathbb{R}^{n \times n}$.

The following theorem describes the eigenvalue distribution of the preconditioned matrix with respect to the MBGS-type preconditioner.

Theorem 3.3.

Let $M=P W^{(GS)} Q$ be the MBGS-type preconditioner to the block two-by-two matrix $A=PHQ \in \mathbb{R}^{n \times n}$ in Equation 1.2, where $P$ and $Q$ are given by Equation 2.6, $H$ is given by Equation 2.7, and $W^{(GS)}$ is defined by Equation 3.4. Let $K_L=W^{(GS)^{-1}}H$ and $K_R=HW^{(GS)^{-1}}$. Then it follows that

(i): $\|I-K_L\|_2 \le \rho _L^{(GS)} \epsilon _L$, with $\rho _L^{(GS)} = \gamma (\Theta ) \cdot \gamma (\Gamma _s \|I-J_B\|_2) \cdot (1+\Theta \Gamma _s)$; and
(ii): $\|I-K_R\|_2 \le \rho _R^{(GS)} \epsilon _R$, with $\rho _R^{(GS)} = \gamma (\Theta _s) \cdot \gamma (\Gamma \|I-J_B\|_2) \cdot (1+\Theta _s \Gamma )$.

It follows from Lemma 3.2 as well as Equation 2.11 and Equation 2.13 that the eigenvalues of the matrices $M^{-1}A$ and $AM^{-1}$ are located within a circle having center $(1,0)$ and radii $\rho _L^{(GS)} \epsilon _L$ and $\rho _R^{(GS)} \epsilon _R$, respectively, and therefore, they are all within the circle ${\mathcal{N}}^{(GS)}$.

Proof.

We only prove (i), as (ii) can be verified analogously.

From Equation 2.7 and Equation 3.4 we have

$$\begin{equation*} K_L = W^{(GS)^{-1}} H = \left[ \begin{array}{cc} J_B & (I-J_B) \overline{E} \\S^{-1} \overline{F} (I-J_B)^2 & S^{-1} \overline{S} - S^{-1} (\overline{\Delta }_1 + \overline{\Delta }_2) \end{array} \right]. \end{equation*}$$

Hence,

$$\begin{equation*} I-K_L\!=\! \left[ \begin{array}{cc} I & O \\-S^{-1} \overline{F} (I-J_B) & I \end{array} \right]\! \left[ \begin{array}{cc} I-J_B & O \\O & I-S^{-1} \overline{S}+S^{-1}\overline{\Delta }_1 \end{array} \right]\! \left[ \begin{array}{cc} I & - \overline{E} \\O & I \end{array} \right]. \end{equation*}$$

By making use of Lemma 3.1 we can immediately obtain

$$\begin{equation*} \begin{split} \|I-K_L\|_2 & \le \gamma (\|S^{-1} \overline{F} (I-J_B) \|_2) \cdot \gamma (\|\overline{E}\|_2) \\ &\qquad \cdot \max \{ \|I-J_B\|_2, \quad \|I-S^{-1} \overline{S}\|_2 + \|S^{-1}\overline{F}\|_2 \|\overline{E}\|_2 \|I-J_B\|_2 \} \\ & \le \gamma (\Theta ) \cdot \gamma (\Gamma _s \|I-J_B\|_2) \cdot (1+\Theta \Gamma _s) \\ &\qquad \cdot \max \{ \|I-J_B\|_2, \quad \|I-S^{-1} \overline{S}\|_2 \}\\ & = \rho _L^{(GS)} \epsilon _L. \end{split} \end{equation*}$$■

Furthermore, when the matrix $J_B$ is positive definite, we can demonstrate the positive definiteness of the matrices $K_L=W^{(GS)^{-1}}H$ and $K_R=HW^{(GS)^{-1}}$.

Theorem 3.4.

Let the matrix $J_B$ be positive definite. Then

(i): the matrix $K_L=W^{(GS)^{-1}}H$ is positive definite, provided $\epsilon _L < \delta _L^{(GS)}$, where$$\begin{equation*} \delta _L^{(GS)} = \frac{ 2 \left(\Theta \Gamma _s+2-\sqrt {\Theta ^2\Gamma _s^2+4\Theta \Gamma _s +(\Theta +\Gamma _s)^2 } \right)}{4- (\Theta + \Gamma _s)^2} <1; \end{equation*}$$
(ii): the matrix $K_R=HW^{(GS)^{-1}}$ is positive definite, provided $\epsilon _R < \delta _R^{(GS)}$, where$$\begin{equation*} \delta _R^{(GS)} = \frac{ 2 \left(\Theta _s\Gamma +2-\sqrt {\Theta _s^2\Gamma ^2+4\Theta _s\Gamma +(\Theta _s+(1+\Theta _s\Gamma )\Gamma )^2 } \right)}{4- (\Theta _s + (1+\Theta _s\Gamma )\Gamma )^2} <1. \end{equation*}$$

Proof.

We only prove the validity of (i), as (ii) can be demonstrated similarly.

Some straightforward computations immediately show that $\delta _L^{(GS)}<1$. Let

$$\begin{equation*} T=\left[ \begin{array}{cc} T_{11} & T_{12} \\T_{21} & T_{22} \end{array}\right] \equiv \frac{1}{2} (K_L + K_L^T). \end{equation*}$$

Then from the proof of Theorem 3.3 we can easily obtain

$$\begin{eqnarray*} \left\{ \begin{array}{lll} T_{11} &=& \frac{1}{2} (J_B + J_B^T), \\T_{12} &=& \frac{1}{2} [(I-J_B) \overline{E} + (I-J_B^T)^2 \overline{F}^T S^{-T} ], \\T_{21} &=& \frac{1}{2} [ S^{-1} \overline{F} (I-J_B)^2 + \overline{E}^T (I-J_B^T) ], \\T_{22} &=& \frac{1}{2} [S^{-1} \overline{S} + \overline{S}^T S^{-T}] -\frac{1}{2} [S^{-1} \overline{\Delta }_1 + \overline{\Delta }_1^T S^{-T}] -\frac{1}{2} [S^{-1} \overline{\Delta }_2 + \overline{\Delta }_2^T S^{-T}]. \end{array} \right. \end{eqnarray*}$$

Since

$$\begin{equation*} \|I-T_{11}\|_2 \le \|I-J_B\|_2 \le \epsilon _L < \delta _L^{(GS)} < 1, \end{equation*}$$

we have

$$\begin{equation*} \|T_{11}^{-1}\|_2 = \|[I-(I-T_{11})]^{-1}\|_2 \le \frac{1}{1-\|I-J_B\|_2}. \end{equation*}$$

By direct computations we immediately get

$$\begin{equation*} \begin{split} \|T_{12}\|_2 = \|T_{21}\|_2 & \le \frac{1}{2} \left( \|S^{-1} \overline{F} (I-J_B)^2\|_2 + \|\overline{E}^T (I-J_B^T) \|_2 \right) \\ & \le \frac{1}{2} ( \Theta + \Gamma _s \|I-J_B\|_2 ) \|I-J_B\|_2 \end{split} \end{equation*}$$

and

$$\begin{eqnarray*} \|I-T_{22}\|_2 & \le & \frac{1}{2} \left( \|I-S^{-1} \overline{S}\|_2 +\|I-\overline{S}^T S^{-T}\|_2 \right) \\ && + \frac{1}{2} \left( \|S^{-1} \overline{F} (I-J_B) \overline{E} \|_2 +\|\overline{E}^T (I-J_B^T) \overline{F}^T S^{-T} \|_2 \right) \\ && + \frac{1}{2} \left( \|S^{-1} \overline{F} (I-J_B)^2 \overline{E} \|_2 +\|\overline{E}^T (I-J_B^T)^2 \overline{F}^T S^{-T} \|_2 \right) \\ & \le & \|I-S^{-1} \overline{S}\|_2 + \Theta \Gamma _s (1+ \|I-J_B\|_2) \|I-J_B\|_2 \\ & \le & [1+\Theta \Gamma _s (1+\|I-J_B\|_2)] \cdot \max \{ \|I-J_B\|_2, \quad \|I-S^{-1} \overline{S}\|_2 \} \\ & \le & [1+\Theta \Gamma _s (1+\epsilon _L)] \epsilon _L. \end{eqnarray*}$$

It then follows that

$$\begin{eqnarray*} \min _{x \ne 0} \frac{\langle x, S_T x \rangle }{\langle x, x \rangle } &\ge & 1-\max _{x \ne 0} \frac{\langle x, (I-T_{22}) x \rangle }{\langle x, x \rangle } -\max _{x \ne 0} \frac{\langle x, T_{21}T_{11}^{-1}T_{12} x \rangle }{\langle x, x \rangle } \\ &\ge & 1 - \|I-T_{22}\|_2 - \|T_{21}T_{11}^{-1}T_{12}\|_2 \\ &\ge & 1 - [1+\Theta \Gamma _s (1+\epsilon _L)] \epsilon _L - \frac{(\Theta + \Gamma _s \|I-J_B\|_2)^2 \|I-J_B\|_2^2}{4(1-\|I-J_B\|_2)} \\ & \ge & 1 - \left( 1 + \Theta \Gamma _s (1+\epsilon _L) + \frac{(\Theta + \Gamma _s \epsilon _L)^2 }{4(1-\epsilon _L)} \cdot \epsilon _L \right) \epsilon _L. \end{eqnarray*}$$

Notice that

$$\begin{equation*} \left( 1 + \Theta \Gamma _s (1+\epsilon _L) + \frac{(\Theta + \Gamma _s \epsilon _L)^2 }{4(1-\epsilon _L)} \cdot \epsilon _L \right) \epsilon _L <1 \end{equation*}$$

holds if and only if

$$\begin{equation*} 4 \Theta \Gamma _s (1-\epsilon _L^2) \epsilon _L +(\Theta + \Gamma _s \epsilon _L)^2 \epsilon _L^2 < 4 (1-\epsilon _L)^2, \end{equation*}$$

and this inequality holds when

$$\begin{equation*} 4 \Theta \Gamma _s \epsilon _L + (\Theta + \Gamma _s)^2 \epsilon _L^2 < 4(1-\epsilon _L)^2, \end{equation*}$$

or equivalently,

$$\begin{equation*} \epsilon _L < \frac{ 2 \left(\Theta \Gamma _s+2-\sqrt {\Theta ^2\Gamma _s^2+4\Theta \Gamma _s +(\Theta +\Gamma _s)^2 } \right)}{4- (\Theta + \Gamma _s)^2}. \end{equation*}$$

Therefore, we know that $\min _{x \ne 0} \frac{\langle x, S_T x \rangle }{\langle x, x \rangle }>0$ holds true when $\epsilon _L < \delta _L^{(GS)}$. Hence, $S_T$ is a symmetric positive definite matrix, and $K_L$ is a positive definite matrix.

■

Alternatively, if the matrix $W \in \mathbb{R}^{n \times n}$ in Equation 2.10 is taken to be the modified block Gauss-Seidel splitting matrix Reference 6Reference 7 of the matrix $\overline{W}$ in Equation 2.7, i.e.,

$$\begin{equation} W^{(GS)} : = W = \left[ \begin{array}{cc} I & (I-J_B) \overline{E}\\O & S \end{array}\right], \cssId{PreconGS00}{\tag{3.5}} \end{equation}$$

then we obtain another modified block Gauss-Seidel-type (MBGS-type) preconditioner $M=P W^{(GS)} Q$ to the original matrix $A \in \mathbb{R}^{n \times n}$. Exactly following the demonstrations of Theorems 3.3 and 3.4, we can obtain the following results for the eigenvalue distribution and the positive definiteness of the preconditioned matrix with respect to the MBGS-type preconditioner Equation 3.5.

Theorem 3.5.

Let $M=P W^{(GS)} Q$ be the MBGS-type preconditioner to the block two-by-two matrix $A=PHQ \in \mathbb{R}^{n \times n}$ in Equation 1.2, where $P$ and $Q$ are given by Equation 2.6, $H$ is given by Equation 2.7, and $W^{(GS)}$ is defined by Equation 3.5. Let $K_L=W^{(GS)^{-1}}H$ and $K_R=HW^{(GS)^{-1}}$. Then it follows that

(i): $\|I-K_L\|_2 \le \rho _L^{(GS)} \epsilon _L$, with $\rho _L^{(GS)} = \gamma (\Theta \|I-J_B\|_2) \cdot \gamma (\Gamma _s) \cdot (1+\Theta \Gamma _s)$; and
(ii): $\|I-K_R\|_2 \le \rho _R^{(GS)} \epsilon _R$, with $\rho _R^{(GS)} = \gamma (\Theta _s \|I-J_B\|_2) \cdot \gamma (\Gamma ) \cdot (1+\Theta _s \Gamma )$.

It follows from Lemma 3.2 as well as Equation 2.11 and Equation 2.13 that the eigenvalues of the matrices $M^{-1}A$ and $AM^{-1}$ are located within a circle having center $(1,0)$ and radii $\rho _L^{(GS)} \epsilon _L$ and $\rho _R^{(GS)} \epsilon _R$, respectively, and therefore, they are all within the circle ${\mathcal{N}}^{(GS)}$.

Theorem 3.6.

Let the matrix $J_B$ be positive definite. Then

(i): the matrix $K_L=W^{(GS)^{-1}}H$ is positive definite, provided $\epsilon _L < \delta _L^{(GS)}$, where$$\begin{equation*} \delta _L^{(GS)} = \frac{2 \left(\Theta \Gamma _s+2-\sqrt {\Theta ^2\Gamma _s^2+4\Theta \Gamma _s +(\Gamma _s+(1+\Theta \Gamma _s)\Theta )^2} \right)}{4-(\Gamma _s+(1+\Theta \Gamma _s)\Theta )^2} <1; \end{equation*}$$
(ii): the matrix $K_R=HW^{(GS)^{-1}}$ is positive definite, provided $\epsilon _R < \delta _R^{(GS)}$, where$$\begin{equation*} \delta _R^{(GS)} = \frac{2 \left(\Theta _s\Gamma +2-\sqrt {\Theta _s^2\Gamma ^2+4\Theta _s\Gamma +(\Theta _s+\Gamma )^2} \right)}{4-(\Theta _s+\Gamma )^2} <1. \end{equation*}$$

3.3. The MBUGS-type preconditioners

If the matrix $W \in \mathbb{R}^{n \times n}$ in Equation 2.10 is taken to be the modified block unsymmetric Gauss-Seidel splitting matrix Reference 6Reference 7 of the matrix $\overline{W}$ in Equation 2.7, i.e.,

$$\begin{equation} W^{(UGS)} : = W = \left[ \begin{array}{cc} I & (I-J_B) \overline{E} \\O & S \end{array}\right] \left[ \begin{array}{cc} I & O \\O & S \end{array}\right]^{-1} \left[ \begin{array}{cc} I & O \\\overline{F} (I-J_B) & S \end{array}\right], \cssId{PreconUGS}{\tag{3.6}} \end{equation}$$

then we obtain the modified block unsymmetric Gauss-Seidel-type (MBUGS-type) preconditioner $M=P W^{(UGS)} Q$ to the original matrix $A \in \mathbb{R}^{n \times n}$.

The following theorem describes the eigenvalue distribution of the preconditioned matrix with respect to the MBUGS-type preconditioner.

Theorem 3.7.

Let $M=P W^{(UGS)} Q$ be the MBUGS-type preconditioner to the block two-by-two matrix $A=PHQ \in \mathbb{R}^{n \times n}$ in Equation 1.2, where $P$ and $Q$ are given by Equation 2.6, $H$ is given by Equation 2.7, and $W^{(UGS)}$ is defined by Equation 3.6. Let $K_L=W^{(UGS)^{-1}}H$ and $K_R=HW^{(UGS)^{-1}}$. Then it follows that

(i): $\|I-K_L\|_2 \le \rho _L^{(UGS)} \epsilon _L$, with$$\begin{equation*} \rho _L^{(UGS)} = \gamma (\Gamma _s \|I-J_B\|_2) \cdot \left[\gamma (\Theta \|I-J_B\|_2) + \Theta \Gamma _s \|I-J_B\|_2 \right] \cdot (1+\Theta \Gamma _s); and \end{equation*}$$
(ii): $\|I-K_R\|_2 \le \rho _R^{(UGS)} \epsilon _R$, with$$\begin{equation*} \rho _R^{(UGS)} = \gamma (\Theta _s \|I-J_B\|_2) \cdot \left[\gamma (\Gamma \|I-J_B\|_2) + \Theta _s \Gamma \|I-J_B\|_2 \right] \cdot (1+\Theta _s \Gamma ). \end{equation*}$$

It follows from Lemma 3.2 as well as Equation 2.11 and Equation 2.13 that the eigenvalues of the matrices $M^{-1}A$ and $AM^{-1}$ are located within a circle having center $(1,0)$ and radii $\rho _L^{(UGS)} \epsilon _L$ and $\rho _R^{(UGS)} \epsilon _R$, respectively, and therefore, they are all within the circle ${\mathcal{N}}^{(UGS)}$.

Proof.

It is analogous to the proofs of Theorems 3.1 and 3.3 and hence is omitted.

■

Furthermore, when the matrix $J_B$ is positive definite, we can demonstrate the positive definiteness of the matrices $K_L=W^{(UGS)^{-1}}H$ and $K_R=HW^{(UGS)^{-1}}$.

Theorem 3.8.

Let the matrix $J_B$ be positive definite. Then

(i): the matrix $K_L=W^{(UGS)^{-1}}H$ is positive definite, provided $\epsilon _L < \delta _L^{(UGS)}$, where$$\begin{equation*} \delta _L^{(UGS)} = \frac{2 \left(\Theta \Gamma _s+2-\sqrt {\Theta ^2\Gamma _s^2+4\Theta \Gamma _s +(1+\Theta \Gamma _s)^2 (\Theta +\Gamma _s)^2} \right)}{4-(1+\Theta \Gamma _s)^2(\Theta + \Gamma _s)^2} <1; \end{equation*}$$
(ii): the matrix $K_R=HW^{(UGS)^{-1}}$ is positive definite, provided $\epsilon _R < \delta _R^{(UGS)}$, where$$\begin{equation*} \delta _R^{(UGS)} = \frac{2 \left(\Theta _s\Gamma +2-\sqrt {\Theta _s^2\Gamma ^2+4\Theta _s\Gamma +(1+\Theta _s\Gamma )^2 (\Theta _s+\Gamma )^2} \right)}{4-(1+\Theta _s\Gamma )^2(\Theta _s + \Gamma )^2} <1. \end{equation*}$$

Proof.

It is analogous to the proofs of Theorems 3.2 and 3.4 and hence is omitted.

■

Alternatively, if the matrix $W^{(UGS)}$ defined by Equation 3.6 is considered to possess the split form $W^{(UGS)}=W_L^{(UGS)} W_R^{(UGS)}$, with

$$\begin{equation} W_L^{(UGS)} = \left[ \begin{array}{cc} I & (I-J_B) \overline{E} S^{-1} \\O & I \end{array}\right], \qquad W_R^{(UGS)} = \left[ \begin{array}{cc} I & O \\\overline{F} (I-J_B) & S \end{array}\right] \cssId{PreconUGSaa}{\tag{3.7}} \end{equation}$$

$$\begin{equation} W_L^{(UGS)} = \left[ \begin{array}{cc} I & (I-J_B) \overline{E} \\O & S \end{array}\right], \qquad W_R^{(UGS)} = \left[ \begin{array}{cc} I & O \\S^{-1} \overline{F} (I-J_B) & I \end{array}\right], \cssId{PreconUGSbb}{\tag{3.8}} \end{equation}$$

then we can obtain other modified block unsymmetric Gauss-Seidel-type preconditioners $M = M_L^{(UGS)} M_R^{(UGS)}$ to the original matrix $A \in \mathbb{R}^{n \times n}$, where

$$\begin{equation*} M_L^{(UGS)} = P W_L^{(UGS)} \quad \text{and} \quad M_R^{(UGS)} = W_R^{(UGS)} Q, \end{equation*}$$

and $P$ and $Q$ are given by Equation 2.6. Exactly following the demonstrations of Theorems 3.7 and 3.8, we can obtain the results about the eigenvalue distributions and the positive definiteness of the preconditioned matrices with respect to the MBUGS-type preconditioners Equation 3.7–Equation 3.8.

We remark that when $\overline{F} = \overline{E}^T$, the above-discussed modified block unsymmetric Gauss-Seidel-type preconditioners naturally reduce to the modified block symmetric Gauss-Seidel-type (MBSGS-type) preconditioners to the matrix $A \in \mathbb{R}^{n \times n}$ in Equation 1.2, correspondingly.

3.4. The case $S \approx -\overline{S}$

In the case that $\overline{S}$ is negative definite, we may let $S$ be an approximation to $-\overline{S}$ in order to obtain a preconditioner of positive definiteness in nature. Hence, some specified preconditioned Krylov subspace iteration method can exploit its efficiency sufficiently.

When $S \approx -\overline{S}$, for the MBJ-, the MBGS-, and the MBUGS-type preconditioners discussed above, we can demonstrate that the eigenvalues of the preconditioned matrices are, correspondingly, located within two circles having center $(-1,0)$ and $(1,0)$ in the complex plane. These results are precisely summarized in the following theorem. Since their proofs are essentially the same as those of Theorems 3.1, 3.3, 3.5 and 3.7 with only the identity matrix $I$ being replaced by the matrix

$$\begin{equation*} J := \left[ \begin{array}{cc} I & O \\O & -I \end{array} \right], \qquad I \in \mathbb{R}^{p \times p} \quad \text{and} \quad -I \in \mathbb{R}^{q \times q}, \end{equation*}$$

we only state the theorem but omit its proof.

Theorem 3.9.

Let $M=PWQ \in \mathbb{R}^{n \times n}$ in Equation 2.10 be the preconditioner to the block two-by-two matrix $A=PHQ \in \mathbb{R}^{n \times n}$ in Equation 1.2, with $P$ and $Q$ being given by Equation 2.6 and $H$ being given by Equation 2.7. Let $K_L=W^{-1}H$ and $K_R=HW^{-1}.$

(i)

If $W=W^{(J)}$ is defined by Equation 3.3, then$$\begin{equation*} \|J-K_L\|_2 \le \rho _L^{(J)} \widetilde{\epsilon }_L, \quad \|J-K_R\|_2 \le \rho _R^{(J)} \widetilde{\epsilon }_R, \end{equation*}$$

where $\rho _L^{(J)}$ and $\rho _R^{(J)}$ are the same as in Theorem 3.1.

(ii)

If $W=W^{(GS)}$ is defined by Equation 3.4, then$$\begin{equation*} \|J-K_L\|_2 \le \rho _L^{(GS)} \widetilde{\epsilon }_L, \quad \|J-K_R\|_2 \le \rho _R^{(GS)} \widetilde{\epsilon }_R, \end{equation*}$$

where $\rho _L^{(GS)}$ and $\rho _R^{(GS)}$ are the same as in Theorem 3.3.

(iii)

If $W=W^{(GS)}$ is defined by Equation 3.5, then$$\begin{equation*} \|J-K_L\|_2 \le \rho _L^{(GS)} \widetilde{\epsilon }_L, \quad \|J-K_R\|_2 \le \rho _R^{(GS)} \widetilde{\epsilon }_R, \end{equation*}$$

where $\rho _L^{(GS)}$ and $\rho _R^{(GS)}$ are the same as in Theorem 3.5.

(iv)

If $W=W^{(UGS)}$ is defined by Equation 3.6, then$$\begin{equation*} \|J-K_L\|_2 \le \rho _L^{(UGS)} \widetilde{\epsilon }_L, \quad \|J-K_R\|_2 \le \rho _R^{(UGS)} \widetilde{\epsilon }_R, \end{equation*}$$

where $\rho _L^{(UGS)}$ and $\rho _R^{(UGS)}$ are the same as in Theorem 3.7.

It follows from Lemma 3.2 as well as Equation 2.11 and Equation 2.13 that the eigenvalues of the preconditioned matrix $M^{-1}A$ are located within the union of two circles having centers $(-1,0)$ and $(1,0)$ and radius $\rho _L^{(\xi )} \widetilde{\epsilon }_L$, and those of the preconditioned matrix $AM^{-1}$ are located within the union of two circles having centers $(-1,0)$ and $(1,0)$ and radius $\rho _R^{(\xi )} \widetilde{\epsilon }_R$, respectively. Therefore, they are all within $\widetilde{\mathcal{N}}^{(\xi )}$. Here, $\xi =J$, $GS$ and $UGS$.

We observe from the demonstrations of Theorems 3.1-3.9 that when $J_B=I$ or $S=\overline{S}$, the results in these theorems can be considerably improved and made more accurate.

3.5. Connections to Krylov subspace methods

The preconditioning matrix $M$ defined in Equation 2.10 can be used to accelerate the Krylov subspace methods such as GMRES or its restarted variant GMRES($m$) Reference 41Reference 40 for solving the large sparse system of linear equations Equation 1.1–Equation 1.2. This preconditioning matrix can be used as a left (see Equation 2.11–Equation 2.12), a right (see Equation 2.13–Equation 2.14), or a split (see Equation 2.15–Equation 2.17) preconditioner to the system of linear equations Equation 1.1. The obtained equivalent linear systems can be solved by GMRES or GMRES($m$).

Assume that the coefficient matrices ${\mathbf{A}}$ of the above preconditioned linear systems are diagonalizable, i.e., there exist a nonsingular matrix ${\mathbf{X}} \in \mathbb{C}^{n \times n}$ and a diagonal matrix ${\mathbf{D}} \in \mathbb{C}^{n \times n}$ such that ${\mathbf{A}} = {\mathbf{X}} {\mathbf{D}} {\mathbf{X}}^{-1}$. Then it is well known from Reference 41, Theorem 4 that the residual norm $\|{\mathbf{r}}^{(k)}\|_2$ at the $k$-th step of the preconditioned GMRES is bounded by $\|{\mathbf{r}}^{(k)}\|_2 \le \kappa ({\mathbf{X}}) \varepsilon ^{(k)} \|{\mathbf{r}}^{(0)}\|_2$, where $\kappa ({\mathbf{X}})$ is the Euclidean condition number of ${\mathbf{X}}$ and $\varepsilon ^{(k)} := \min _{{\mathcal{P}} \in {\mathcal{P}}_k} \max _{\lambda _i \in \sigma ({\mathbf{A}})} |{\mathcal{P}}(\lambda _i)|$. Here, ${\mathcal{P}}_k$ denotes the set of all polynomials ${\mathcal{P}}(\lambda )$ of degree not greater than $k$ such that ${\mathcal{P}}(0)=1$, and $\sigma ({\mathbf{A}})$ denotes the spectrum of the matrix ${\mathbf{A}}$.

Consider $\overline{S}$ defined by Equation 3.2; see also Equation 2.1 and Equation 2.3. When the matrix $S$ is an approximation to the matrix $\overline{S}$, from Theorems 3.1, 3.3, 3.5 and 3.7 we know that all eigenvalues of the matrix ${\mathbf{A}}$ are contained in either of the circles ${\mathcal{N}}^{(\xi )}$, $\xi =J$, $GS$ and $UGS$. Therefore, when $\rho ^{(\xi )}<1$, a special case of Theorem 5 in Reference 41 implies that $\varepsilon ^{(k)} \le (\rho ^{(\xi )})^k$, $\xi =J$, $GS$ and $UGS$.

Alternatively, the preconditioning matrix $M$ can also be used as a left, a right, or a split preconditioner to the system of linear equations Equation 1.1 to obtain a preconditioned linear system of coefficient matrix $\widetilde{\mathbf{A}} = K_L$, $K_R$, or $K$, respectively. Because Theorems 3.2, 3.4, 3.6 and 3.8 guarantee the positive definiteness of the preconditioned matrix $\widetilde{\mathbf{A}}$, it is known from Reference 20 and Reference 41, p. 866 that the following error bound for the correspondingly preconditioned GMRES holds:

$$\begin{equation*} \|{\mathbf{r}}^{(k)}\|_2 \le \left(1-\frac{(\lambda _{\min }(\widetilde{\mathbf{H}}))^2}{\lambda _{\max }(\widetilde{\mathbf{A}}^T\widetilde{\mathbf{A}})} \right)^{\frac{k}{2}} \|{\mathbf{r}}^{(0)}\|_2, \end{equation*}$$

where $\widetilde{\mathbf{H}}=\frac{1}{2}(\widetilde{\mathbf{A}}+\widetilde{\mathbf{A}}^T)$ denotes the symmetric part of the matrix $\widetilde{\mathbf{A}}$, and $\lambda _{\min }(\cdot )$ and $\lambda _{\max }(\cdot )$ denote, respectively, the smallest and the largest eigenvalues of the corresponding matrix. This gives a guarantee for the convergence of the restarted preconditioned GMRES iteration, say PGMRES($m$), for all $m$, when the coefficient matrix $\widetilde{\mathbf{A}}$ is positive definite.

When the matrix $S$ is an approximation to the matrix $-\overline{S}$, because the preconditioned matrix ${\mathbf{A}}$ or $\widetilde{\mathbf{A}}$ may be usually not positive definite, instead of GMRES and GMRES($m$) we may use other Krylov subspace methods such as BiCGSTAB, QMR and TFQMR to solve the preconditioned linear systems. In particular, when the original coefficient matrix $A$ is symmetric indefinite, MINRES is a possible candidate if a symmetric positive definite or indefinite preconditioner $M$ is obtainable. See Reference 2Reference 27Reference 40.

4. Applications to three typical matrices

In this section, we will investigate the concretizations of the structured preconditioners established in Sections 2 and 3 to three special classes of matrices arising from real-world applications.

4.1. The symmetric positive definite matrix

When the matrix blocks $B \in \mathbb{R}^{p \times p}$ and $C \in \mathbb{R}^{q \times q}$ are symmetric positive definite, $F=E^T$ and the Schur complement $S_A=C-E^T B^{-1} E$ is symmetric positive definite, the matrix $A \in \mathbb{R}^{n \times n}$ reduces to the block two-by-two symmetric positive definite matrix

$$\begin{equation*} A= \left[\begin{array}{cc} B & E \\E^T & C \end{array}\right]. \end{equation*}$$

These kinds of matrices may arise in the red/black ordering of a symmetric positive definite linear system, or in discretization incorporated with a domain decomposition technique of a boundary value problem of a self-adjoint elliptic partial differential equation, etc. See Reference 2Reference 3Reference 6Reference 7Reference 27Reference 40.

Let $L_B \in \mathbb{R}^{p \times p}$ and $L_C \in \mathbb{R}^{q \times q}$ be nonsingular matrices such that either Equation 2.1 or Equation 2.2 holds with $R_B = L_B^T$ and $R_C = L_C^T$. Then from Equation 2.10 and Equation 2.6 we know that $M=PWQ$ is the structured preconditioner to the matrix $A$, where

$$\begin{equation*} P =\left[ \begin{array}{cc} L_B & O \\E^T L_B^{-T} & L_C \end{array} \right], \qquad Q =\left[ \begin{array}{cc} L_B^T & L_B^{-1} E \\O & L_C^T \end{array} \right] =P^T, \end{equation*}$$

and $W \in \mathbb{R}^{n \times n}$ is an approximation to the matrix

$$\begin{equation*} \overline{W} =\left[ \begin{array}{cc} J_B & (I-J_B) \overline{E} \\\overline{E}^T (I-J_B) & \overline{S} \end{array} \right] \approx H \end{equation*}$$

defined by Equation 2.7, with $\overline{E} = L_B^{-1} E L_C^{-T}$ and $\overline{S} = J_C - \overline{E}^T \overline{E}$.

Note that $\overline{S}$ and $\overline{W}$ are symmetric positive definite. Let $S \in \mathbb{R}^{q \times q}$ be an approximation to the matrix $I- \overline{E}^T \overline{E} \approx \overline{S}$. To guarantee the symmetric positive definiteness of the preconditioning matrix $M$, we can choose $W$ to be the modified block Jacobi splitting matrix in Equation 3.3 or the modified block symmetric Gauss-Seidel splitting matrix in Equation 3.6, obtaining the modified block Jacobi-type preconditioner or the modified block symmetric Gauss-Seidel-type preconditioner to the matrix $A$, respectively.

4.2. The saddle point matrix

When the matrix block $B \in \mathbb{R}^{p \times p}$ is symmetric positive definite, $C=O$ and $F= \pm E^T$ is of full row rank, the matrix $A \in \mathbb{R}^{n \times n}$ reduces to the saddle point matrices

$$\begin{equation*} A^{\pm } = \left[\begin{array}{cc} B & E \\\pm E^T & O \end{array}\right]. \end{equation*}$$

These kinds of matrices may arise in constrained optimization as well as least-squares, saddle-point and Stokes problems, without a regularizing/stabilizing term, etc. See Reference 14Reference 16Reference 24Reference 25Reference 28Reference 37Reference 44.

Let $L_B \in \mathbb{R}^{p \times p}$ be a nonsingular matrix such that either Equation 2.1 or Equation 2.2 holds with $R_B = L_B^T$ and $L_C = R_C = I$. Then from Equation 2.10 and Equation 2.6 we know that $M^{\pm }=P^{\pm }W^{\pm }Q^{\pm }$ are the preconditioners to the matrices $A^{\pm }$, respectively, where

$$\begin{equation*} P^{\pm } =\left[ \begin{array}{cc} L_B & O \\\pm E^T L_B^{-T} & I \end{array} \right], \qquad Q^{\pm } =\left[ \begin{array}{cc} L_B^T & L_B^{-1} E \\O & I \end{array} \right], \end{equation*}$$

and $W^{\pm } \in \mathbb{R}^{n \times n}$ are approximations to the matrices

$$\begin{equation*} \overline{W}^{\pm } =\left[ \begin{array}{cc} J_B & (I-J_B) \overline{E} \\\pm \overline{E}^T (I-J_B) & \overline{S}^{\pm } \end{array} \right] \approx H^{\pm } \end{equation*}$$

defined by Equation 2.7, with $\overline{E} = L_B^{-1} E$ and $\overline{S}^{\pm } = \mp \overline{E}^T \overline{E}$.

Let $S^{\pm } \in \mathbb{R}^{q \times q}$ be approximations to the matrices $\overline{S}^{\pm }$. By choosing the matrices $W^{\pm }$ to be the modified block Jacobi splitting matrices in Equation 3.3, the modified block Gauss-Seidel splitting matrices in Equation 3.4 or Equation 3.5, or the modified block unsymmetric Gauss-Seidel splitting matrices in Equation 3.6, we can obtain the modified block Jacobi-type preconditioners, the modified block Gauss-Seidel-type preconditioners, or the modified block unsymmetric Gauss-Seidel-type preconditioners to the matrices $A^{\pm }$, respectively.

4.3. The Hamiltonian matrix

When the matrix block $B \in \mathbb{R}^{p \times p}$ is symmetric positive definite and $C \in \mathbb{R}^{q \times q}$ is symmetric positive/negative definite (denoted by $C_{+}/C_{-}$, respectively), and $F= \mp E^T$, the matrix $A \in \mathbb{R}^{n \times n}$ reduces to the Hamiltonian matrices

$$\begin{equation*} A^{\pm } = \left[\begin{array}{cc} B & E \\\mp E^T & C_{\pm } \end{array}\right]. \end{equation*}$$

These kinds of matrices may arise in stationary semiconductor devices Reference 36Reference 43Reference 42, in constrained optimization as well as least-squares, saddle-point and Stokes problems, with a regularizing/stabilizing term Reference 28.

Let $L_B \in \mathbb{R}^{p \times p}$ and $L_{C_{\pm }} \in \mathbb{R}^{q \times q}$ be nonsingular matrices such that either Equation 2.1 or Equation 2.2 holds with $R_B = L_B^T$ and $R_{C_{\pm }} = L_{C_{\pm }}^T$. Then from Equation 2.10 and Equation 2.6 we know that $M^{\pm }=P^{\pm }W^{\pm }Q^{\pm }$ are the preconditioners to the matrices $A^{\pm }$, where

$$\begin{equation*} P^{\pm } =\left[ \begin{array}{cc} L_B & O \\\mp E^T L_B^{-T} & L_{C_{\pm }} \end{array} \right], \qquad Q^{\pm } =\left[ \begin{array}{cc} L_B^T & L_B^{-1} E \\O & L_{C_{\pm }}^T \end{array} \right], \end{equation*}$$

and $W^{\pm } \in \mathbb{R}^{n \times n}$ are approximations to the matrices

$$\begin{equation*} \overline{W}^{\pm } =\left[ \begin{array}{cc} J_B & (I-J_B) \overline{E}^{\pm } \\\mp (\overline{E}^{\pm })^T (I-J_B) & \overline{S}^{\pm } \end{array} \right] \approx H^{\pm } \end{equation*}$$

defined by Equation 2.7, with $\overline{E}^{\pm } = L_B^{-1} E L_{C_{\pm }}^{-T}$ and $\overline{S}^{\pm } = J_C \pm (\overline{E}^{\pm })^T \overline{E}^{\pm }$.

Let $S^{\pm } \in \mathbb{R}^{q \times q}$ be approximations to the matrices $I \pm (\overline{E}^{\pm })^T \overline{E}^{\pm } \approx \overline{S}^{\pm }$. By choosing the matrices $W^{\pm }$ to be the modified block Jacobi splitting matrices in Equation 3.3, the modified block Gauss-Seidel splitting matrices in Equation 3.4 or Equation 3.5, or the modified block unsymmetric Gauss-Seidel splitting matrices in Equation 3.6, we can obtain the modified block Jacobi-type preconditioners, the modified block Gauss-Seidel-type preconditioners, or the modified block unsymmetric Gauss-Seidel-type preconditioners to the matrices $A^{\pm }$, respectively.

4.4. An illustrative example

Let us consider the electromagnetic scattering problem from a large rectangular cavity on the $(x,y)$-plane in which the medium is $y$-directional inhomogeneous. In the transverse magnetic polarization case, when the model Helmholtz equation with positive wave number is discretized by the five-point finite difference scheme with uniform stepsize $h$, we obtain a block two-by-two system of linear equations Equation 1.1–Equation 1.2, in which

$$\begin{equation*} B=V \otimes I + I \otimes V - I \otimes \Omega \in \mathbb{R}^{p \times p}, \quad C= I-hG \in \mathbb{R}^{q \times q}, \quad E= I \otimes e_q \in \mathbb{R}^{p \times q} \end{equation*}$$

and $F= -E^T$, where $h=\frac{1}{q+1}$, $p=q^2$, $\theta \ge 0$ is a real constant, $e_q$ is the $q$-th unit vector in $\mathbb{R}^q$, $I$ is the $q$-by-$q$ identity matrix, $V={\mathrm{tridiag}}(-1+\frac{1}{2} \theta h,2,-1-\frac{1}{2} \theta h) \in \mathbb{R}^{q \times q}$ is a tridiagonal matrix, $\Omega =h^2 \cdot {\mathrm{diag}}(\omega _1^2,\omega _2^2,\ldots ,\omega _q^2) \in \mathbb{R}^{q \times q}$ is a nonnegative diagonal matrix, $G = (g_{ij}) \in \mathbb{R}^{q \times q}$, and $\otimes$ denotes the Kronecker product. See Reference 33Reference 1.

Concretely, in our computations we take $\theta =1$, $\omega _i = 16 \pi$ $(i=1,2,\ldots ,q)$, and $g_{ij}=\frac{1}{(i+j)^2}$ $(i, j=1,2,\ldots ,q)$.

Let $B \approx L_B R_B$ be an incomplete triangular factorization of the matrix block $B$, and $L_C = R_C =I$. Then we have

$$\begin{equation*} \overline{E} = L_B^{-1} E, \quad \overline{F} = -E^T R_B^{-1}, \quad J_C = C \quad \text{and} \quad \overline{S} = C - \overline{F} \overline{E}. \end{equation*}$$

Now, by choosing $S={\mathrm{band}}_{\ell _b}(C) - \overline{F} \overline{E}$ with ${\mathrm{band}}_{\ell _b}(C)$ being the band matrix of half-band width $\ell _b$ truncated from the matrix $C$, after straightforward computations we can obtain the results listed in Tables 1, 2, 3, and 4 for the discretization stepsizes $h=\frac{1}{16}$, $\frac{1}{24}$, $\frac{1}{32}$ and $\frac{1}{64}$, or equivalently, for the problem sizes $(p,q)=(225, 15)$, $(529, 23)$, $(961, 31)$ and $(3969, 63)$, respectively.

In Table 1 we list the half-band width $\ell _b$, the quantities

$$\begin{equation*} \Theta =\|\overline{E}\|_2, \quad \Gamma =\|\overline{F}\|_2, \quad \Theta _s=\|\overline{E} S^{-1}\|_2 \quad \text{and} \quad \Gamma _s=\|S^{-1} \overline{F}\|_2 \end{equation*}$$

with respect to the matrix norms, and

$$\begin{equation*} \epsilon _L = \max \{ \|I-J_B\|_2, \|I-S^{-1} \overline{S}\|_2\} \quad \text{and} \quad \epsilon _R = \max \{ \|I-J_B\|_2, \|I-\overline{S} S^{-1}\|_2\} \end{equation*}$$

with respect to the matrix approximation accuracies. For $\xi =J$, $GS$ and $UGS$, in Tables 2, 3, and 4 we list the radii $\rho _L^{(\xi )} \epsilon _L$ and $\rho _R^{(\xi )} \epsilon _R$ of the circles centered at $(1,0)$ where all eigenvalues of the matrices $K_L$ and $K_R$ are located within $\rho ^{(\xi )}=\min \{\rho _L^{(J)} \epsilon _L, \rho _R^{(J)} \epsilon _R\}$ and the radii $\rho ^{(\xi )}_{*}$ of the smallest circles that include all eigenvalues of the corresponding preconditioned matrices (see Theorems 3.1, 3.3 and 3.7), and the quantities $\delta ^{(\xi )}_{L}$ and $\delta ^{(\xi )}_{R}$ that guarantee the positive definiteness of the preconditioned matrices $K_L$ and $K_R$ whenever $\epsilon _L < \delta ^{(\xi )}_{L}$ and $\epsilon _R < \delta ^{(\xi )}_{R}$ (see Theorems 3.2, 3.4 and 3.8), respectively.

The results in Tables 2, 3, and 4 clearly show that

(i): for $\xi =J$, $GS$ and $UGS$, $\rho _L^{(\xi )} \epsilon _L < 1$ and $\rho _R^{(\xi )} \epsilon _R < 1$. It follows that $\rho ^{(\xi )}<1$. As $\rho _L^{(\xi )} \epsilon _L$ and $\rho _R^{(\xi )} \epsilon _R$ are quite small, the eigenvalues of the preconditioned matrices, with respect to the MBJ-, the MBGS- and the MBUGS-type preconditioners, are tightly clustered around the point $(1,0)$; see Theorems 3.1, 3.3 and 3.7. Hence, a Krylov subspace method such as GMRES, when applied to the preconditoned systems of linear equations, will achieve fast convergence; see Section 3.5.
(ii): for $\xi =J$, $GS$ and $UGS$, $\epsilon _L < \delta _L^{(\xi )} < 1$ and $\epsilon _R < \delta _R^{(\xi )} < 1$. It follows that the preconditioned matrices, with respect to the MBJ-, the MBGS- and the MBUGS-type preconditioners, are positive definite, and the convergence of the restarted GMRES methods preconditioned by these preconditioners are guaranteed; see Theorems 3.2, 3.4 and 3.8 as well as Section 3.5.
(iii): for $\xi =J$, $GS$ and $UGS$, $\rho _{\ast }^{(\xi )} \le \rho ^{(\xi )}$. This shows that the eigenvalues of the preconditioned matrices, with respect to the MBJ-, the MBGS- and the MBUGS-type preconditioners, are really located within the theoretically estimated circles centered at $(1,0)$ with radii $\rho ^{(\xi )}$ given in Theorems 3.1, 3.3 and 3.7, respectively.

In summary, this example shows that the conditions of our theorems are reasonable and the conclusions of them are correct.

5. Conclusion and remarks

We have established a general framework of practical and efficient structured preconditioners to the large sparse block two-by-two nonsingular matrices. For several special cases associated with the modified block relaxation iteration methods, we have studied the eigenvalue distributions and the positive definiteness of the preconditioned matrices. Theoretical analyses have shown that this preconditioning technique can afford effective and high-quality preconditioners to the Krylov subspace iteration methods for solving large sparse systems of linear equations with block two-by-two coefficient matrices.

We remark that our preconditioning technique and the corresponding theory can be straightforwardly developed to the following cases.

(a): The approximation matrix $W \in \mathbb{R}^{n \times n}$ in Equation 2.10 that is generated by a multi-step variant of the modified block Jacobi, the modified block Gauss-Seidel or the modified block unsymmetric Gauss-Seidel splitting matrix of the matrix $\overline{W} \in \mathbb{R}^{n \times n}$ in Equation 2.7 Reference 6Reference 7.
(b): Alternatively, the approximation matrix $W \in \mathbb{R}^{n \times n}$ in Equation 2.10 that is generated by a single- or multiple-step variant of the modified block successive overrelaxation (SOR), the modified block unsymmetric SOR, the modified block accelerated overrelaxation (AOR) or the modified block unsymmetric AOR splitting matrix of the matrix $\overline{W} \in \mathbb{R}^{n \times n}$ in Equation 2.7 Reference 32Reference 6Reference 7.
(c): More generally, the approximation matrix $W \in \mathbb{R}^{n \times n}$ in Equation 2.10 that is generated by any suitable direct or iterative method induced by the matrix $\overline{W} \in \mathbb{R}^{n \times n}$ in Equation 2.7.
(d): The matrix $A \in \mathbb{R}^{n \times n}$ that is of a general $\ell$-by-$\ell$ block structure. More concretely, $A=(A_{i,j}) \in \mathbb{R}^{n \times n}$, where $A_{i,j} \in \mathbb{R}^{n_i \times n_j}$, $i,j=1,2,\ldots ,\ell$, and $n_i\ (i=1,2,\ldots ,\ell )$ are positive integers satisfying $n_1 + n_2 + \ldots + n_\ell =n$.

For the structured preconditioners based on the relaxation iteration methods involving parameters, we can further optimize them through choices of the optimal parameters. In addition, we should point out that, although all results in this paper are demonstrated in the $\|\cdot \|_2$-norm, they trivially hold for other consistent matrix norms such as the $\|\cdot \|_1$-norm and the $\|\cdot \|_\infty$-norm.

Acknowledgments

The author is very much indebted to the referees for their constructive and valuable comments and suggestions which greatly improved the original version of this paper.

$h$	$\frac{1}{16}$	$\frac{1}{24}$	$\frac{1}{32}$	$\frac{1}{64}$
$\ell _b$	2	4	6	30
$\Theta$	11.9704	15.8432	25.3679	28.8844
$\Gamma$	6.05339	7.38068	8.95453	14.7829
$\Theta _{s}$	5.77108	6.57523	11.0775	39.2410
$\Gamma _{s}$	2.82323	3.28938	3.93560	19.5354
$\epsilon _{L}$	$2.95e-03$	$1.52e-03$	$6.39e-04$	$5.75e-05$
$\epsilon _{R}$	$4.13e-03$	$2.15e-03$	$7.89e-04$	$4.63e-05$

$h$	$\frac{1}{16}$	$\frac{1}{24}$	$\frac{1}{32}$	$\frac{1}{64}$
$\rho ^{(J)}_{L} \epsilon _{L}$	0.111566	8.62e-02	6.78e-02	3.26e-02
$\rho ^{(J)}_{R} \epsilon _{R}$	0.152507	0.108497	7.98e-02	2.70e-02
$\rho ^{(J)}$	0.111566	8.62e-02	6.78e-02	2.70e-02
$\delta ^{(J)}_{L}$	$2.75e-02$	$1.82e-02$	$9.71e-03$	$1.77e-02$
$\delta ^{(J)}_{R}$	$2.71e-02$	$1.98e-02$	$9.88e-03$	$1.72e-03$
$\rho ^{(J)}_{*}$	$2.31e-03$	$1.24e-03$	$4.63e-04$	$3.93e-05$

$h$	$\frac{1}{16}$	$\frac{1}{24}$	$\frac{1}{32}$	$\frac{1}{64}$
$\rho ^{(GS)}_{L} \epsilon _{L}$	$3.55e-02$	$2.41e-02$	$1.62e-02$	$1.66e-03$
$\rho ^{(GS)}_{R} \epsilon _{R}$	$2.45e-02$	$1.44e-02$	$8.81e-03$	$1.82e-03$
$\rho ^{(GS)}$	$2.45e-02$	$1.44e-02$	$8.81e-03$	$1.66e-03$
$\delta ^{(GS)}_{L}$	$2.69e-02$	$1.79e-02$	$9.63e-03$	$1.76e-03$
$\delta ^{(GS)}_{R}$	$7.60e-03$	$4.69e-03$	$1.97e-03$	$2.17e-04$
$\rho ^{(GS)}_{*}$	$2.31e-03$	$1.24e-03$	$4.63e-04$	$3.93e-05$

$h$	$\frac{1}{16}$	$\frac{1}{24}$	$\frac{1}{32}$	$\frac{1}{64}$
$\rho ^{(UGS)}_{L} \epsilon _{L}$	0.102518	$8.06e-02$	0.166135	$3.25e-02$
$\rho ^{(UGS)}_{R} \epsilon _{R}$	0.148479	0.106365	0.216704	$2.69e-02$
$\rho ^{(UGS)}$	0.102518	$8.06e-02$	0.166135	$2.69e-02$
$\delta ^{(UGS)}_{L}$	$3.62e-03$	$1.87e-03$	$6.54e-04$	$7.16e-05$
$\delta ^{(UGS)}_{R}$	$4.32e-03$	$2.69e-03$	$9.47e-04$	$6.25e-05$
$\rho ^{(UGS)}_{*}$	$2.31e-03$	$1.24e-03$	$4.63e-04$	$3.93e-05$

Structured preconditioners for nonsingular matrices of block two-by-two structures

Abstract

1. Introduction

2. General framework of the structured preconditioners

Procedure for computing the generalized residual vector.

3. Several practical structured preconditioners

3.1. The MBJ-type preconditioners

3.2. The MBGS-type preconditioners

3.3. The MBUGS-type preconditioners

3.4. The case $S \approx -\overline{S}$

3.5. Connections to Krylov subspace methods

4. Applications to three typical matrices

4.1. The symmetric positive definite matrix

4.2. The saddle point matrix

4.3. The Hamiltonian matrix

4.4. An illustrative example

5. Conclusion and remarks

Acknowledgments

Table of Contents

Figures

Mathematical Fragments

References

Article Information

Settings