# Direct methods for sparse matrix solution

Post-publication activity

Curator: Iain Duff

Direct methods for sparse matrix solutions are characterized by using a matrix factorization to solve a set of equations of the form

$$Ax=b$$

where $$b$$ is a given vector, $$x$$ is the vector of unknowns and $$A$$ is a given sparse matrix representing the coefficients of unknowns in each equation. In contrast to iterative methods, direct methods obtain the solution to the above system in a finite and fixed number of steps.

## Definition

In direct methods, the matrix $$A$$ is factorized into a product of other sparse matrices. Systems of equations with the factor matrices as a coefficient matrix are very easy to solve.

## Factorization or decomposition

Depending on the properties of the matrix $$A$$, different factorizations are used:

• For an $$n\times n$$ symmetric positive definite matrix, the Cholesky factorization $$A=LL^T$$ is usually computed, where $$L$$ is a lower triangular (sparse) matrix. Often an $$LDL^T$$ decomposition is used to avoid taking square roots and to avoid problems with matrices that are nearly singular.
• For a $$n\times n$$ unsymmetric matrix $$A$$, its LU decomposition $$A = LU$$ is computed where $$L$$ is a unit lower triangular matrix, and $$U$$ is an upper triangular matrix.
• For an $$n\times n$$ symmetric indefinite matrix, its $$LDL^T$$ decomposition $$A = LDL^T$$ is computed where $$D$$ is a block diagonal matrix (with blocks of order 1 or 2), and $$L$$ is a unit lower triangular matrix.
• For an $$m\times n$$ rectangular matrix, with $$m\ge n$$, its QR decomposition $$A=QR$$ is computed. Here, $$Q$$ is an $$m\times m$$ orthogonal matrix, and $$R$$ is an $$m\times n$$ upper trapezoidal matrix.

The linear systems with $$L$$, $$U$$, $$D$$, and $$Q$$, as defined above, are easy to solve. Those with $$L$$ are solved by forward substitution, those with $$U$$ are solved by back substitution, those with $$D$$ are solved by multiplying by the explicit inverse of the $$1\times 1$$ and $$2\times 2$$ blocks, and those with $$Q$$ are solved by multiplying the right-hand side by $$Q^T$$.

The factorizations discussed above and their use in solving the systems are mathematically equivalent to their dense counterparts. The added difficulty in the sparse case is that the factorization should try to avoid operating on the zeros of the matrix and should keep the factors as sparse as possible. Meeting these requirements usually entails first performing a symbolic analysis on the matrix in order to predict and reduce the memory and run time requirements for the subsequent numerical factorization.

The symbolic analysis can usually be viewed using the graph models for sparse matrices. Below some tasks to be performed during the symbolic analysis are discussed.

### Cholesky factorization

Consider the sparse symmetric positive definite matrix

$$A = \left [ \begin{array}{rrrr} 4 & -1 & -1 & -1 \\ -1 & 2 & & \\ -1 & & 2 & \\ -1 & & & 2\end{array}\right]$$

and its Cholesky factor $$L_A$$ is given by (up to four units of accuracy)

$$L_A = \left [ \begin{array}{rrrr} 2.0000 & & &\\ -0.5000 & 1.3229 & & \\ -0.5000 & -0.1890 & 1.3093 & \\ -0.5000 & -0.1890 & -0.2182 & 1.2910\end{array}\right ].$$

As is seen, the lower triangular matrix $$L_A$$ is full. The entries of $$L_A$$ that were zero in $$A$$ are called fill-in. In the matrix $$L_A$$, the $$(3,2)$$, $$(4,2)$$, and $$(4,3)$$ entries are fill-ins.

Consider the permutation matrix

$$P = \left [ \begin{array}{rrrr} 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ 1 & 0 & 0 & 0\end{array}\right ] .$$

If $$A$$ is multiplied by $$P$$ from the left and by $$P^T$$ from the right, the symmetric positive definite matrix $$B = PAP^T$$ is obtained, viz.

$$B = \left [\begin{array}{rrrr} 2 & & & -1 \\ & 2 & & -1 \\ & & 2 & -1\\ -1 & -1 & -1 & 4\end{array}\right ]$$

with its Cholesky factor $$L_B$$

$$L_B = \left [ \begin{array}{rrrr} 1.4142 & & &\\ & 1.4142 & & \\ & & 1.4142 &\\ -0.7071 & -0.7071 & -0.7071 & 1.5811\end{array}\right ]$$

having no fill-ins.

The factorization of $$B = L_BL_B^T$$ is clearly preferable to that of $$A = L_AL_A^T$$. The solution $$x$$ of $$Ax=b$$ can be computed by first finding the solution $$y$$ of the linear system $$By=c$$ with $$c = Pb$$, and then by setting $$x=P^Ty$$. Since there are fewer entries in $$L_B$$ than in $$L_A$$, the time and memory requirements will be less for the second alternative. Matrices of the form of $$A$$ are called arrowhead matrices and clearly if the dimension is very large there will be a very significant difference in costs between working with the form of $$A$$ (first row and column dense) and working with the form of $$B$$ (last row and column dense).

Methods used to find a permutation matrix $$P$$ to reduce fill-in so that a factorization of the matrix $$PAP^T$$ has much less fill-in and requires fewer operations during the Cholesky factorization are called ordering methods. Ordering methods normally employ graph theoretical tools. There is a correspondence between Cholesky factorization and an elimination process in the undirected graph associated with $$A$$. In this model, there is a vertex for each row/column of $$A$$, and there is an edge for each off-diagonal nonzero entry $$a_{ij}=a_{ji}\neq 0$$. One step of the factorization, where a pivot $$a'_{ii}$$, for $$i=1, ..., n-1$$ is used to reduce all entries below the $$i$$th diagonal entry to zero, corresponds to making the neighbours of vertex $$i$$ a clique, and then removing the vertex $$i$$ as described by Parter (1961) and Rose (1970). Using this equivalence, ordering methods can be viewed as ordering the vertices of the graph $$G$$ so that the same ordering determines the pivotal sequence for the matrix factorization. Minimizing the fill-in in Cholesky factorization has been shown to be NP-complete by Yannakakis (1981). Fortunately, there are many heuristics to get an ordering for reducing the fill-in significantly, including the minimum degree (Tinney and Walker, 1967) and the nested dissection (George, 1973) heuristics and their variants.

Other symbolic analysis operations include obtaining main characteristics of the pattern of the Cholesky factor (such as the number of nonzeros for each row and column), and determining the pattern of the Cholesky factor using the elimination tree  (Schreiber, 1982).

### LU decomposition

In the case of Cholesky factorization, the symbolic and numerical concerns are nicely separated. When performing an LU decomposition on the other hand, one may need to perform pivoting to maintain numerical stability. Although this means that the symbolic analysis cannot give as accurate a prediction as in the Cholesky case, there are effective symbolic analysis methods for the LU decomposition.

Consider the LU decomposition of $$A$$ with partial pivoting, i.e., $$PA=LU$$, where $$P$$ is a permutation matrix describing the numerical pivoting. Consider a symbolic analysis for the Cholesky factorization on the matrix $$B=A^TA$$, where $$B$$ is ordered as $$Q^TBQ$$ for fill reduction. The pattern of the Cholesky factor $$L_B$$ of $$Q^TBQ$$ will be a superset of the $$L$$ factor of $$AQ$$ for any permutation matrix $$P$$. Therefore, the symbolic analysis can be performed on $$A^TA$$ (either with or without forming the product). In general, the symbolic analysis on $$B$$ will greatly overestimate the requirement for $$A$$.

An alternative to the above approach is to perform a symbolic analysis on the symmetric matrix $$B= |A| + |A^T|$$. This time, row pivoting operations during numerical factorization may greatly perturb the ordering found and hence the symbolic analysis will not be a good forecast of the work and storage involved in the subsequent factorization. If it is possible to choose the pivots from the diagonal during the numerical factorization then it should be possible to keep closer to the forecast of the symbolic phase. In order to encourage such an outcome, the matrix $$A$$ can be permuted at the beginning of the symbolic analysis phase to have large diagonal entries, i.e., a matrix $$C =|AQ|+|AQ^T|$$ is found whose diagonal entries are large. A permutation $$Q$$ that puts large entries into the diagonal of a matrix can be found by using a variant of the maximum weighted bipartite matching algorithm on the bipartite graph model of $$A$$ as shown by Duff and Koster (2001).

### $$LDL^T$$ decomposition

In order to avoid taking square roots in Cholesky factorization, a symmetric positive definite matrix can be factored as $$A=LDL^T$$ with $$D$$ being diagonal and positive, and $$L$$ with unit diagonal entries.

The same principles can be carried over to symmetric indefinite matrices by permitting $$D$$ be a block diagonal matrix; in general with blocks of size $$1\times 1$$ and $$2\times 2$$. The blocks of $$D$$ corresponds to the pivots. A useful approach is to determine those $$1\times 1$$ and $$2\times 2$$ pivots during the analysis phase with the hope that the predetermined pivots will be numerically favorable during the actual numerical factorization. In current methods, weighted bipartite matching based algorithms are used to determine those potential pivots. Once such pivots are determined, the fill reducing ordering methods are constrained to keep the $$2\times 2$$ pivots together.

### QR decomposition

Consider an $$m\times n$$ sparse rectangular matrix $$A$$ with a full column rank, having a QR factorization $$A=QR$$. In a thin QR factorization $$R$$ is written down as an upper triangular matrix, instead of an upper trapezoidal one. If $$R$$ comes from such a factorization, then $$R$$ is also the Cholesky factor of $$A^TA=R^TR$$. Many of the sparsity oriented issues in the QR decomposition case uses this correspondence and harness the methods developed for sparse Cholesky factorization.

## Some other sparsity issues

A common sparsity oriented technique is to permute a sparse matrix into block triangular (BTF) form using a matching of maximum cardinality in the bipartite graph of $$A$$ as described by Pothen and Fan (1990). A matrix can be put into BTF form, if it is reducible. Given a matrix in this form viz.

$$A_{BTF} = \left [\begin{array}{rrrr} A_{11}& A_{12} & \cdots& A_{1K}\\ & A_{22} & \cdots& A_{2K}\\ & &\ddots & \vdots \\ & & &A_{KK}\end{array}\right],$$

it is possible to exploit this to solve the linear system $$Ax=b$$ without factorizing the whole matrix. One can factor the last block $$A_{KK}$$ to find the corresponding entries in the unknown vector $$x$$, and then substitute those in the equations involving $$A_{K-1,K-1}$$ and $$A_{K-1,K}$$ and so on to solve the linear system $$Ax=b$$ using a block back substitution.

## Computation schemes

The basic algorithms to compute $$LU$$, $$LDL^T$$ and Cholesky factorizations are called up-looking, left-looking, and right-looking methods. If $$A$$ is very sparse and would have only little fill-in, the up-looking algorithm is preferable. Otherwise, supernodal (an efficient formulation of the left-looking method) and multifrontal (an efficient formulation of the right-looking method) methods are preferred.

The two computational schemes to compute QR factorization are based on Householder reflections and Givens rotations. There are left-looking supernodal, and right-looking, multifrontal methods to compute the QR factorization using Householder reflections. The Givens rotations follow the up-looking method and are preferable when there is only little fill-in.