COMBINATORIAL PRECONDITIONERS SIVAN TOLEDO AND HAIM AVRON TEL-AVIV UNIVERSITY 1. Introduction The Conjugate Gradient (CG) method is an iterative algorithm for solving linear systems of Ax = b, where A is symmetric and positive denite. The convergence of the method A; when its eigenvalues are clustered,√the method converges rapidly. In particular, CG converges to within a xed tolerance in O( κ), where κ = κ(A) = λmax (A)/λmin (A) is the spectral condition number of A. When the spectrum of A is not clustered, a preconditioner can accelerate convergence. The equations depends on the spectrum of Preconditioned Conjugate Gradients (PCG) method applies the CG iteration to the linear −1/2 system (B AB −1/2 )(B 1/2 x) = B −1/2 b using a clever transformation that only requires −1 applications of A and B in every iteration; B also needs to be symmetric positive denite. −1/2 The convergence of PCG is determined by the spectrum of (B AB −1/2 ), which is the same −1 −1 as the spectrum of B A. If a representation of B can be constructed quickly and applied −1 quickly, and if B A has a clustered spectrum, the method is very eective. There are also variants of PCG that require only one of A and B to be positive denite, and variants that allow them to be singular, under some technical conditions on their null spaces. Combinatorial preconditioning is a technique that relies on graph algorithms to construct eective preconditioners. The simplest applications of combinatorial preconditioning target a class of matrices that are isomorphic to weighted undirected graph. The coecient matrix is viewed as its isomorphic graph GB GA . such that the isomorphic matrix A specialized graph algorithm constructs another graph B is a good preconditioner for aims to achieve two goals: the inverse of B It turns out that the spectrum of of properties of the graphs GA GB ; A. The graph algorithm should be easy to apply, and the spectrum of B −1 A can be bounded in terms B −1 A should be clustered. and A in particular, the quality of embeddings of GA in GB (and sometimes vice versa) plays a fundamental role in these spectral bounds. This chapter focuses on explaining the relationship between the spectrum of B −1 A and quantitative properties of embeddings of the two graphs. The last section surveys algorithms that construct combinatorial preconditioners. We omit most proofs from this chapter; some are trivial, and the others appear in the paper cited in the statement of the theorem or lemma. : January, 2010. This research was supported in part by an IBM Faculty Partnership Award and by grant 1045/09 from the Israel Science Foundation (founded by the Israel Academy of Sciences and Humanities). Date 1 COMBINATORIAL PRECONDITIONERS 2 2. Symmetric Diagonally-Dominant Matrices and Graphs We begin by exploring the structure of diagonally-dominant matrices and their relation to graphs. 2.1. Incidence Factorizations of Diagonally-Dominant Matrices. Denition 2.1. 1, 2, . . . n we have A ∈ Rn×n A square matrix Aii ≥ P i6=j is called diagonally-dominant U i= |Aij |. Symmetric diagonally dominant matrices have symmetric factorizations that each column of if for every A = UUT such has at most two nonzeros, and all nonzeros in each column have the same absolute values. We now establish a notation for such columns. Denition 2.2. and the Let 1 ≤ i, j ≤ n, i 6= j . The length-n negative edge vector hi, ji are dened by +1 k = i −1 k = j hi, −jik = 0 otherwise. and positive edge vector denoted +1 k = i +1 k = j . hi, jik = 0 otherwise. The reason for the assignment of signs to edge vectors will become apparent later. A vector hii is the unit vector hiik = hi, −ji vertex +1 k = i 0 otherwise. A symmetric diagonally dominant matrix can always be expressed as a sum of outer products of edge and vertex vectors, and therefore, as a symmetric product of a matrix whose columns are edge and vertex vectors. Lemma 2.3. ([6]) Let A ∈ Rn×n be a diagonally dominant symmetric matrix. We can decompose A as follows A = X |Aij | hi, ji hi, jiT i<j Aij >0 + X |Aij | hi, −ji hi, −jiT i<j Aij <0 n n X X T + |Aij | Aii − hii hii i=1 j=1 j6=i Matrix decompositions of this form play a prominent role in support theory, so we give them a name: COMBINATORIAL PRECONDITIONERS Denition 2.4. of the forms U U T where 3 A matrix whose columns are scaled edge and vertex vectors (that is, vectors c hi, −ji, c hi, ji, and c hii) is called U is an incidence matrix is called an an incidence matrix. A factorization A = incidence factorization. An incidence factorization with no zero columns, with at most one vertex vector for each index most one edge vector for each index pair form c hmin(i, j), − max(i, j)i is called a i, j , i, with at and whose positive edge vectors are all of the canonical incidence factorization. Lemma 2.5. Let A ∈ Rn×n be a diagonally dominant symmetric matrix. Then A has an incidence factorization A = U U T , and a unique canonical incidence factorization. 2.2. Graphs and Their Laplacians Matrices. We now dene the connection between undirected graphs and diagonally-dominant symmetric matrices. Denition 2.6. G = ({1, 2, . . . n}, E, c, d) be a weighted undirected graph on the vertex set {1, 2, . . . , n} with no self loops and no parallel edges, and with weight functions c : E → R \ {0} and d : {1, . . . , n} → R+ ∪ {0}. That is, the edge set consists of unordered pairs of n×n unequal integers (i, j) such that 1 ≤ i, j ≤ n. The Laplacian of G is the matrix A ∈ R such that P d(i) + (i,k)∈E |c(i, k)| i = j Aij = −c(i, j) (i, j) ∈ E 0 otherwise. A vertex i such that d(i) > 0 is called a strictly dominant vertex. If c = 1, the graph is not considered weighted. If c > 0 and is not always 1, the graph is weighted. If some weights Let are negative, the graph is signed. Lemma 2.7. The Laplacians of the graphs dened in Denition 2.6 are symmetric and diagonally dominant. Furthermore, these graphs are isomorphic to symmetric diagonallydominant matrices under this Laplacian mapping. We prefer to work with vertex weights rather than allowing self loops because edge and vertex vectors are algebraically dierent. In algorithms, given an explicit representation of a diagonally-dominant matrix easily compute an explicit representation of an incidence factor U A, we can (including the canonical incidence factor if desired). Sparse matrices are often represented by a data structure that stores a compressed array of nonzero entries for each row or each column of the matrix. Each entry in a row (column) array stores the column index (row index) of the nonzero, and the value of the nonzero. From such a representation of representation of U by columns. We traverse each row of nonzero in the upper (or lower) part of the d(i)'s. A. A we can easily construct A, creating a column of U a sparse for each During the traversal, we can also compute all The conversion works even if only the upper or lower part of A is represented explicitly. A as an implicit representation of U , with each odiagonal nonzero of A representing an edge-vector column of U . If A has no strictly-dominant rows, that is all. If A has strictly dominant rows, we need to compute their weights using a one-pass traversal of A. We can use the explicit representation of COMBINATORIAL PRECONDITIONERS 4 3. Support Theory Support theory is a set of tools that aim to bound the generalized eigenvalues satisfy λ that Ax = λBx from above and below. If B is nonsingular, these eigenvalues are also the B −1 A, but the generalized representation allows us to derive bounds that also eigenvalues of apply to singular matrices. Denition 3.1. λ is a v 6= 0 such that Av = λBv and Bv 6= 0. We say that ∞ is a innite generalized eigenvalue of (A, B) if there exists a vector v 6= 0 such that Bv = 0 but Av 6= 0. Note that ∞ is an eigenvalue of (A, B) if and only if 0 is an eigenvalue of (B, A). The nite and innite eigenvalues of a Let A and nite generalized eigenvalue pencil are B n-by-n be complex matrices. of the matrix pencil (pair) determined eigenvalues (A, B) We say that a scalar if there is a vector (the eigenvector uniquely determines the eigenvalue). If Av = Bv = 0 for a vector v 6= 0, we say that v Av = λBv for any scalar λ. both is an indeterminate eigenvector, because The tools of support theory rely on symmetric factorizations A = UUT and B = V V T; this is why incidence factorizations are useful. In fact, the algebraic tools of support theory are particularly easy to apply when 3.1. 2 U and V are incidence matrices. From Generalized Eigenvalues to Singular Values. T If A = UUT then Λ(A) = T Σ (U ), where Λ(A) is the set of eigenvalues of A and Σ(U ) is the set of singular values of U T , and Σ2 is the set of the squares of the singular values. The following lemma extends this trivial result to generalized eigenvalues. Lemma 3.2. ([2])Let A = U U T and B = V V T with null(B) = null(A) = S. We have and Λ (A, B) = Σ2 V + U Λ (A, B) = Σ−2 U + V . In these expressions, Σ(·) is the set of nonzero singular values of the matrix within the parentheses, Σ` denotes the same singular values to the `th power, and V + denotes the MoorePenrose pseudoinverse of V . The lemma characterizes all the generalized eigenvalues of the pair (A, B), but for large matrices, it is not particularly useful. Even if U and V are highly structured (e.g., they are + + incidence matrices), U and V are usually not structured and are expensive to compute. The next section shows that if we lower our expectations a bit and only try to bound Λfrom above and below, then we do not need the pseudo-inverses. 3.2. The Symmetric Support Lemma. V U provide complete information on the generalized eigenvalues of (A, B). Wopt = V + U , we have singular values of If we denote In the previous section we have seen that the + V Wopt = V V + U = U. COMBINATORIAL PRECONDITIONERS It turns out that any eigenvalues of W VW = U such that 5 provides some information on the generalized (A, B). Lemma 3.3. ([9]) Let A = U U T and let B = V V T , and assume that null(B) ⊆ null(A). Then max {λ | Ax = λBx, Bx 6= 0} = min kW k22 | U = V W . This lemma is fundamental to support theory and preconditioning, because it is often W possible to prove that 3.3. Norm bounds. U =VW such that exists and to give a-priori bounds on its norm. The Symmetric Product Support Lemma bounds generalized eigen- values in terms of the 2-norm of some matrix simple way to construct such a W, W such that U = V W. Even if we have a we still cannot easily derive a corresponding bound on the spectrum from the Symmetric-Support Lemma. The diculty is that there is no simple closed form expression for the W 2-norm of a matrix, since it is not related to the entries of in a simple way. It is equivalent to the largest singular value, but this must usually be computed numerically. Fortunately, there are simple (and also some not-so-simple) functions of the elements of the matrix that yield useful bounds on its 2-norm. The following bounds are standard and are well known and widely used (see [5, Fact 9.8.10.ix] and [5, Fact 9.8.15]). Lemma 3.4. The two norm of W ∈ Ck×m is bounded by (3.1) kW k22 ≤ kW k2F = k X m X Wij2 , i=1 i=1 (3.2) kW k22 ≤ kW k1 kW k∞ = m max j=1 k X ! |Wij | k max i=1 i=1 m X ! |Wij | . j=1 The next two bounds are standard and well known; they follow directly from kW k22 and from the fact that kSk1 = kSk∞ for a symmetric S . kW W T k2 = Lemma 3.5. The two norm of W ∈ Ck×m is bounded by (3.3) (3.4) kW k22 ≤ kW W T k1 = kW W T k∞ , kW k22 ≤ kW T W k1 = kW T W k∞ . The following bounds are more specialized. They all exploit the sparsity of bounds that are usually tighter than the bounds given so far. Lemma 3.6. ([11]) The two norm of W ∈ Ck×m is bounded by W to obtains COMBINATORIAL PRECONDITIONERS kW k22 (3.5) ≤ max j kW k22 (3.6) ≤ max i X kWi, : k22 X kW : ,j k22 m X X = max j i:Wi,j 6=0 2 , Wi,c i:Wi,j 6=0 c=1 k X X = max i j:Wi,j 6=0 6 2 Wr,j . j:Wi,j 6=0 r=1 The bounds in this lemma are a renement of the bound kW k22 ≤ kW k2F . nious norm, which bounds the two norm, sums the squares of all The Frobe- the elements of W. The bounds (3.5) and (3.6) sum only the squares in some of the rows or some of the columns, unless the matrix has a row or a columns with no zeros. 2 There are similar renements of the bound kW k2 ≤ kW k1 kW k∞ . Lemma 3.7. ([11]) The two norm of W ∈ Ck×m is bounded by kW k22 ≤ max (3.7) j kW k22 (3.8) 3.4. Support numbers. ≤ max i X |Wi,j | · m X i:Wi,j 6=0 c=1 X k X |Wi,j | · j:Wi,j 6=0 ! |Wi,c | , ! |Wr,j | . r=1 Support numbers generalize the notion of the maximal eigenvalue of a matrix pencil. Denition 3.8. A matrix B dominates a matrix A if for any vector x we have xT (B −A)x ≥ 0. We denote domination by Denition 3.9. The B A. support number for a matrix pencil σ(A, B) = min {t|τ B A, (A, B) for all is τ ≥ t} . is symmetric positive denite and A is symmetric, then the support number is always T T T T nite, because x Bx/x x is bounded from below by min Λ(B) > 0 and x Ax/x x is bounded If B from above by max Λ(A), which is nite. In other cases, there may not be any the formula; in such cases, we say that Example 3.10. τ >0 we have satisfying σ(A, B) = ∞. x ∈ null(B) and that A is positive denite. x (τ B − A)x = −xT Ax < 0. Therefore, σ(A, B) = ∞. Suppose that t Then for any T Example 3.11. T is not positive semidenite, then there is some x for which x Bx < 0. T This implies that for any A and for any large enough τ , x (τ B − A)x < 0. Therefore, If B σ(A, B) = ∞. The next result, like the Symmetric Support Lemma, bounds generalized eigenvalues. COMBINATORIAL PRECONDITIONERS 7 Theorem 3.12. ([9]) Let A and B be symmetric, let B also be positive semidenite. If null(B) ⊆ null(A), then σ(A, B) = max {λ|Ax = λBx, Bx 6= 0} . A primary motivation for support numbers is to bound (spectral) condition numbers. For κ(A) = λmax (A)/λmin (A) holds. Let κ(A, B) denote the −1 pencil (A, B), that is, κ(B A) when B is non-singular. symmetric matrices, the relation condition number of the matrix Theorem 3.13. When A and B are symmetric positive denite, then κ(A, B) = σ(A, B)σ(B, A). A common strategy in support theory is to bound condition numbers by bounding both σ(A, B) and σ(B, A). Typically, one direction is easy and the other is hard. Applications usually do not solve singular systems. Nonetheless, it is often convenient to analyze preconditioners in the singular context. For example, nite element systems are often singular until boundary conditions are imposed, so we can build A B from the singular part of and then impose the same boundary-constraints on both matrices. 3.5. Splitting. Support numbers are convenient for algebraic manipulation. One of their most powerful properties is that they allow us to split complicated matrices into simpler pieces (matrices) and analyze these separately. B = B1 + B2 + · · · + Bq . A = A1 + A2 + · · · Aq , and similarly, pairs (Ai , Bi ) and consider the support Let We can then match up number for each such pencil separately. Lemma 3.14. ([17, Lemma 4.7]) Let A = A1 + A2 + · · · Aq ,and similarly, B = B1 + B2 + · · · + Bq , where all Ai and Bi are symmetric and positive semidenite. Then σ(A, B) ≤ max σ(Ai , Bi ) i Proof. Let σ = σ(A, B), let σi = σ(Ai , Bi ), and let σmax = maxi σi . Then for any x ! xT (σmax B − A)x = xT σmax X Bi − X i = X ≥ X Ai x i xT (σmax Bi − Ai ) x i xT (σi Bi − Ai ) x i ≥ 0. Therefore, σ ≤ σmax . The splitting lemma is quite general, and can be used in many ways. In practice we want to break both A and B into simpler matrices that we know how to analyze. The term simpler can mean sparser, or lower rank, and so on. In order to get a good upper bound on the support number, the splitting must be chosen carefully. Poor splittings give poor bounds. Here is an example. COMBINATORIAL PRECONDITIONERS 8 3 −2 2 −2 1 0 A = = + = A1 + A2 , and B = −2 2 −2 2 0 0 2 −1 1 −1 1 0 = + = B1 +B2 . Then the Splitting Lemma says σ(A, B) ≤ −1 2 −1 1 0 1 max {σ(A1 , B1 ), σ(A2 , B2 )}. It is easy to verify that σ(A1 , B1 ) = 2 and that σ(A2 , B2 ) = 1; hence σ(A, B) ≤ 2. Note that B1 can not support A2 , so correct pairing of the terms in A and B is essential. The exact support number is σ(A, B) = λmax (A, B) = 1.557. Example 3.15. Let 4. Embeddings and Combinatorial Support Bounds A To bound σ(A, B) using the Symmetric Support Lemma, we need to factor A and B into = U U T and B = V V T , and we need to nd a W such that U = V W . We have seen that if A and B are diagonally dominant, then there is an almost trivial way to factor A B such that U and V are about as sparse as A and B . But how do we nd a W such U = V W ? In this chapter, we show that when A and B are weighted (but not signed) Laplacians, we can construct such W using an embedding of the edges of GA into paths in GB . Furthermore, when W is constructed from an embedding, the bounds on kW k2 can be and that interpreted as combinatorial bounds on the quality of the embedding. 4.1. W Dening W using Path Embeddings. such that We start with the construction of a matrix U = V W. Lemma 4.1. Let (i1 , i2 , . . . , i` ) be a sequence of integers between 1 and n, such that ij 6= ij+1 for j = 1, . . . ` − 1.Then hi1 , −i` i = `−1 X hij , −ij+1 i , j=1 where all the edge vectors are length n. W . Suppose c of U by To see why this lemma is important, consider the role of a column of the columns of U and V are all positive edge vectors. Denote column that U : ,c = hmin(i1 , i` ), − max(i1 , i` )i = (−1)i1 >i` hi1 , −i` i , (−1)i1 >i` evaluates to −1 if i1 > i` and to 1 otherwise. This column corresponds to the edge (i1 , i` ) in GU U T . Now let (i1 , i2 , . . . , i` ) be a simple path in GV V T (a simple path is a sequence of vertices (i1 , i2 , . . . , i` ) such that (ij , ij+1 ) is an edge in the graph for 1 ≤ j < ` and such that any vertex appears at most once on the path). If U = V W , then where the U : ,c = V W : ,c = k X V : ,r Wr,c . r=1 r1 , r2 , . . . , r`−1 be the columns of V that corresponds to the edges of the path (i1 , i2 , . . . , i` ), order. That is, V : ,r1 = hmin(i1 , i2 ), − max(i1 , i2 )i, V : ,r2 = hmin(i2 , i3 ), − max(i2 , i3 )i, and Let in COMBINATORIAL PRECONDITIONERS 9 so on. By the lemma, U : ,c = (−1)i1 >i` hi1 , −i` i i1 >i` = (−1) `−1 X hij , −ij+1 i j=1 i1 >i` = (−1) `−1 X (−1)ij >ij+1 V : ,rj . j=1 It follows that if we dene Wr,c = W : ,c (−1)i1 >i` (−1)ij >ij+1 r = rj for 0 otherwise, U : ,c = V W : ,c . U = V W. then we have satises to be some 1≤j<` We can construct all the columns of W in this way, so that W A path of edge vectors that ends in a vertex vector supports the vertex vector associated with the rst vertex of the path. Lemma 4.2. Let (i1 , i2 , . . . , i` ) be a sequence of integers between 1 and n, such that ij 6= ij+1 for j = 1, . . . ` − 1.Then hi1 i = hi` i + `−1 X hij , −ij+1 i , j=1 where all the edge and vertex vectors are length n. The following theorem generalizes these ideas to scaled positive edge vectors and to scaled vertex vectors. The theorem also states how to construct all the columns of W. The theorem summarizes results in [9, 4]. Theorem 4.3. Let A and B be weighted (unsigned) Laplacians and let U and V be their canonical incidence factors. Let π be a path embedding of the edges and strictly-dominant vertices of GA into GB , such that for an edge (i1 , i` ) in GA , i1 < i` , we have π(i1 , i` ) = (i1 , i2 , . . . , i` ) for some simple path (i1 , i2 , . . . , i` ) in GB , and such that for a strictly-dominant i1 in GA , π(i1 ) = (i1 , i2 , . . . , i` ) for some simple path (i1 , i2 , . . . , i` ) in GB that ends in a strictly-dominant vertex i` in GB . Denote by cV (ij , ij+1 ) the index of the column of V that is a scaling of hij , −ij+1 i. That is, V : ,cV (ij ,ij+1 ) q = −Bij ,ij+1 hmin(ij , ij+1 ), − max(ij , ij+1 )i . COMBINATORIAL PRECONDITIONERS 10 Similarly, denote by cV (ij ) the index of the column of V that is a scaling of hij i, V : ,cV (ij ) v u n X u Bi ,i hij i , u = tBij ,ij − k j ik =1 ik 6=ij and similarly for U . We dene a matrix W as follows. For a column index cU (i1 , i` ) with i1 < i` we dene Wr,cU (i1 ,i` ) p (−1)ij >ij+1 Ai1 ,i` /Bij ,ij+1 if r = cV (ij , ij+1 ) for = some edge (ij , ij+1 ) in π(i1 , i` ) 0 otherwise. For a column index cU (i1 ), we dene Wr,cU (i1 ) = r P Ai1 ,i1 − j6=i |Ai1 ,j | P 1 Bi` ,i` − j6=i |Bi` ,j | r` P (−1)ij >ij+1 if r = cV (i` ) |Ai1 ,k | if r = cV (ij , ij+1 ) for |Bij ,ij+1 | some edge (ij , ij+1 ) in π(i1 ) Ai1 ,i1 − k6=i1 otherwise. 0 Then U = V W . Proof. For scaled edge-vector columns in V W : ,cU (i1 ,i` ) = X U we have V : ,r Wr,cU (i1 ,i` ) r X = V : ,r Wr,cU (i1 ,i` ) r=cV (ij ,ij+1 ) for some edge (ij ,ij+1 ) in π(i1 ,i` ) `−1 q X Bi ,i hmin(ij , ij+1 ), − max(ij , ij+1 )i (−1)ij >ij+1 = j j+1 j=1 = `−1 q X |Ai1 ,i` | hij , −ij+1 i j=1 = U : ,cU (i1 ,i` ) . s Ai1 ,i` Bij ,ij+1 COMBINATORIAL PRECONDITIONERS For scaled vertex-vector columns in V W : ,cU (i1 ) = X U 11 we have V : ,r Wr,cU (i1 ) r = V : ,cV (i` ) WcV (i` ),cU (i1 ,i` ) + X V : ,r Wr,cU (i1 ) r=cV (ij ,ij+1 ) for some edge (ij ,ij+1 ) in π(i1 ) s P s X Ai1 ,i1 − j6=i1 |Ai1 ,j | P = Bi` ,i` − |Bik ,i` | hi` i B i` ,i` − j6=i` |Bi` ,j | j6=i ` + `−1 q X Bi j ,ij+1 hmin(ij , ij+1 ), − max(ij , ij+1 )i j=1 s P A − i ,i 1 1 j6=i1 |Ai1 ,j | ·(−1)ij >ij+1 Bi ,i j j+1 s s `−1 X X X = Ai1 ,i1 − |Ai1 ,j | hi` i + Ai1 ,i1 − |Ai1 ,j | hij , −ij+1 i j6=i1 j6=i1 j=1 s X Ai1 ,i1 − = |Ai1 ,j | hi1 i j6=i1 = U : ,cU (i1 ) . The generalization of this theorem to signed Laplacians is more complex, because a path from i1 to i` supports an edge (i1 , i` ) only if the parity of positive edges in the path and in the edge is the same. In addition, a cycle with an odd number of positive edges spans all the vertex vectors of the path. For details, see [6]. Theorem 4.3 plays a fundamental role in many applications of support theory. embedding π that can be used to construct W exists if and only if the graphs of A A path and B have related in a specic way, which the next lemma species. Lemma 4.4. Let A = U U T and B = V V T be weighted (but not signed) Laplacians with arbitrary symmetric-product factorizations. The following conditions are necessary for the equation U = V W to hold for some matrix W (by Theorem 4.3, these conditions are also sucient). (1) For each edge (i, j) in GA , either i and j are in the same connected component in GB , or the two components of GB that contain i and j both include a strictly-dominant vertex. (2) For each strictly-dominant vertex i in GA , the component of GB that contains i includes a strictly-dominant vertex. COMBINATORIAL PRECONDITIONERS Proof. is a 12 Suppose for contradiction that one of the conditions is not satised, but that there W that satises U = V W. Without loss of generality, we assume that the vertices are ordered such that vertices that belong to a connected component in GB are consecutive. Under that assumption, V1 V2 V = .. , . Vk and B1 B2 B= .. V1 V1T = . V2 V2T .. . Vk VkT Bk The blocks of V . are possibly rectangular, whereas the nonzero blocks of B are all diagonal and square. We now prove the necessity of the rst condition. i and j belong to dierent connected components of Suppose for some edge GB (i, j) in GA , (without loss of generality, to the rst two components), and that one of the components (w.l.o.g. the rst) does not have a strictly-dominant vertex. Because this component does not have a strictly-dominant vertex, T T 1 = ~0, so V1 must be rank decient. the row sums in V1 V1 are exactly zero. Therefore, V1 V1 ~ Since (i, j) is in GA , the vector hi, −ji is in the column space of the canonical incidence T factor of A, and therefore in the column space of any U such that A = U U . If U = V W , then the vector hi, −ji V , so for some x x1 V1 x1 x2 V2 x2 . = . . .. .. must also be in the column space of V1 V2 hi, −ji = V x = .. . Vk Therefore, V1 x1 is a vertex vector. xk By Theorem 4.3, if V1 Vk xk spans a vertex vector, it spans all the vertex vectors associated with the vertices of the connected component. This implies that V1 is full rank, a contradiction. The necessity of the second condition follows from a similar argument. Suppose that vertex i is strictly dominant in GA and that it belongs to a connected component in the rst) that does not have a vertex that is strictly dominant in some GB . GB (w.l.o.g. This implies that for y V1 y1 V1 y1 y2 V2 y2 . = . . .. .. V2 hii = V y = .. . Vk Again V1 y1 is a vertex vector, so V1 V1T has zero row sums. V1 yk Vk yk must be full rank, but it cannot be full rank because COMBINATORIAL PRECONDITIONERS Not every W 13 U = V W corresponds to a path embedding, even if U and V are A and B .In particular, a column of W can correspond to a multiple paths. Also, even if W does correspond to a path embedding, such that the canonical incidence factors of linear combination of the paths are not necessarily simple. A linear combination of scaled positive edge vectors that correspond to a simple cycle can be identically zero, so the coecients of such linear combinations can be added to W V W . However, it seems that of W , so cycles are unlikely to without aecting the product adding cycles to a path embedding cannot reduce the 2-norm improve support bounds. 4.2. Combinatorial Support Bounds. T A into A = U U 2-norm of W from Lemma, we factor and bound the , B To bound σ(A, B) using the Symmetric Support T into B = V V , nd a matrix W such that U = V W , above. We have seen how to factor weighted Laplacians) and how to construct an appropriate GA . W A and B (if they are from an embedding of We now show how to use combinatorial metrics of the path embeddings to bound GB in kW k2 . 2-norm directly is hard, because the 2-norm is not related in a simple way to W . But the 2-norm can be bounded using simpler norms, such as the Frobenius norm, the innity norm, and the 1-norm (see section 3.3). These simpler norms have natural and useful combinatorial interpretations when W represents a path embedding. To keep the notation and the denition simple, we now assume that A and B are weighted Bounding the the entries of Laplacians with zero row sums. We will show later how to deal with positive row sums. We also assume that W corresponds to a path embedding π. The following denitions provide a combinatorial interpretation of these bounds. Denition 4.5. GB The weighted dilation of an edge of GA in an path embedding is s dilationπ (i1 , i2 ) X = (j1, ,j2 ) (j1, ,j2 )∈π(i1 ,i2 ) The weighted congestion of an edge of GB Ai1 ,i2 . Bj1 ,j2 is s congestionπ (j1 , j2 ) X = (i1, ,i2 ) (j1, ,j2 )∈π(i1 ,i2 ) The weighted stretch of an edge of GA is stretchπ (i1 , i2 ) X = (j1, ,j2 ) (j1, ,j2 )∈π(i1 ,i2 ) The weighted crowding of an edge in Ai1 ,i2 . Bj1 ,j2 GB Ai1 ,i2 . Bj1 ,j2 is crowdingπ (j1 , j2 ) = X (i1, ,i2 ) (j1, ,j2 )∈π(i1 ,i2 ) Ai1 ,i2 . Bj1 ,j2 π of GA into COMBINATORIAL PRECONDITIONERS 14 Note that stretch is a summation of the squares of the quantities that constitute dilation, and similarly for crowding and congestion. Papers in the support-preconditioning literature are not consistent about these terms, so check the denitions carefully when you consult a paper that deals with congestion, dilation, and similar terms. Lemma 4.6. ([9, 28]) Let A and B be weighted Laplacians with zero row sums, and let π be a path embedding of GA into GB . Then stretchπ (i1 , i2 ) X σ(A, B) ≤ (i1 ,i2 )∈GA crowdingπ (j1 , j2 ) X σ(A, B) ≤ (j1 ,j2 )∈GB max dilationπ (i1 , i2 ) σ(A, B) ≤ (i1 ,i2 )∈GA · max (j1 ,j2 )∈GB congestionπ (j1 , j2 ) . We now describe one way to deal with matrices with some positive row sums. The matrix is a preconditioner. In many applications, the matrix B B is not given, but rather constructed. A is to deneπ(i1 ) = (i1 ). That is, vertex A are mapped into the same vertex vectors in the incidence factor of B . In other words, we construct B to have exactly the same row sums as A. With such a construction, the rows of W that correspond to vertex vectors in U are One simple way to deal with positive row sums in vectors in the canonical incidence factor of columns of the identity. Lemma 4.7. Let A and B be weighted Laplacians with the same row sums, let π be a path embedding of GA into GB , and let ` be the number of rows with positive row sums in A and B . Then stretchπ (i1 , i2 ) X σ(A, B) ≤ ` + (i1 ,i2 )∈GA σ(A, B) ≤ max 1, max dilationπ (i1 , i2 ) (i1 ,i2 )∈GA · max 1, max congestionπ (j1 , j2 ) . (j1 ,j2 )∈GB Proof. Under the hypothesis of the lemma, the rows and columns of a block matrix W = where WZ WZ 0 0 I`×` The sparse bounds on the W can be permuted into , represents the path embedding of the edges of follow from the structure of W GA into paths in GB . The bounds and from the proof of the previous lemma. 2-norm of a matrix lead to tighter combinatorial bounds. COMBINATORIAL PRECONDITIONERS 15 Lemma 4.8. Let A and B be weighted Laplacians with zero row sums, and let π be a path embedding of GA into GB . Then σ(A, B) ≤ (j1 ,j2 )∈GB σ(A, B) ≤ stretchπ (i1 , i2 ) , X max (i1 ,i2 )∈GA (j1 ,j2 )∈π(i1 ,i2 ) crowdingπ (j1 , j2 ) . X max (i1 ,i2 )∈GA (j1 ,j2 )∈GA (j1 ,j2 )∈π(i1 ,i2 ) We can derive similar bounds for the other sparse 4.3. Subset Preconditioners. σ(A, B) and σ(B, A). not in V, κ(A, B), we need a bound on both σ(B, A) is trivial. Many support of GA , with the same weights. That is, V is U . If we denote by V̄ the set of columns of To obtain a bound on GB to be a subgraph constructed to have a subset of the columns in that are bounds. But in one common case, bounding preconditioners construct U 2-norm we have B = VVT A = UUT = V V T + V̄ V̄ T = B + V̄ V̄ T . xT Ax ≥ xT Bx This immediately implies 4.4. Combinatorial p Trace Bounds. for any x, so λmin (A, B) ≥ 1. The Preconditioned Conjugate Gradients (PCG) al- Θ( κ(A, B) iterations only when the generalized eigenvalues are distributed poorly between λmin (A, B) and λmax (A, B). For example, if there are only two distinct eigenvalues, PCG converges in two iterations. It turns out that a bound on trace(A, B) = P λi (A, B) also yields p a bound on the number of iterations, and in some cases this bound is sharper than the O( κ(A, B) bound. gorithm requires Lemma 4.9. ([29]) The Preconditioned Conjugate Algorithm converges to within a xed tolerance in s O 3 trace(A, B) λmin (A, B) ! iterations. A variation of the Symmetric Support Lemma bounds the trace, and this leads to a combinatorial support bound. Lemma 4.10. Let A = U U T ∈ Rn×n and let B = V V T ∈ Rn×n , and assume that null(B) = null(A). Then trace(A, B) = min kW k2F | U = V W . COMBINATORIAL PRECONDITIONERS Proof. Let W be a matrix such that U = V W. For every 16 x∈ / null(B) we can write xT Ax xT V W W T V T x = xT Bx xT V V T x yT W W T y = , yT y T n y =T V x. Let S ⊆ R be a subspace orthogonal to null(B) of dimesion k , and dene T to null(B) we have dim(TS ) = k . The sets S T= V T x | x ∈ S .Because TS is orthogonal T x Ax/x Bx | x ∈ S and y W W y | y ∈ TS are identical so their minimums are equal as well. The group of subspaces {TS | dim(S) = k, S ⊥ null(B)} is a subset of all subspaces of dimension k , thererfore where xT U U T x = max min T T dim(S) = k x ∈ S x V V x S ⊥ null(B) x = 6 0 yT W W T y yT y max dim(S) = k S ⊥ null(B) min y ∈ TS y 6= 0 max yT W W T y . yT y ≤ dim(T )=k min y∈T y 6= 0 According to the Courant-Fischer Minimax Theorem, λn−k+1 (W W T ) = max dim(T )=k min y∈T y 6= 0 yT W W T y yT y and by the generalization of Courant-Fischer in [3], λn−k+1 (U U T , V V T ) = xT U U T x . max min T T dim(S) = k x∈S x V V x V ⊥ null(B) k = 1, . . . , rank(B) we have λn−k+1 (A, B) ≤ λn−k+1 (W W T ) so trace(A, B) ≤ 2 2 trace(W W ) = kW kF . This shows that trace(A, B) ≤ min kW kF | U = V W . + According to Lemma 3.2 the minimum is attainable at W = V U . Therefore, for T To the best of our knowledge, Lemma 4.10 is new. It was inspired by a specialized bound on the trace from [29]. The next theorem generalizes the result form [29]. The proof is trivial given Denition 4.5 and Lemma 4.10 COMBINATORIAL PRECONDITIONERS 17 Theorem 4.11. Let A and B be weighted Laplacians with the same row sums, let π be a path embedding of GA into GB . Then X trace(A, B) ≤ stretchπ (i1 , i2 ) , (i1 ,i2 )∈GA X trace(A, B) ≤ crowdingπ (j1 , j2 ) . (j1 ,j2 )∈GB 5. Combinatorial Preconditioners The earliest graph algorithm to construct a preconditioner was proposed by Vaidya [31] (see also [4]). He proposed to use a so-called augmented maximum spanning tree of a weighted Laplacian as a preconditioner. This is a subset preconditioner that drops some of the edges −1 in GA while maintaining the weights of the remaining edges. When B is applied using 7/4 a sparse Cholesky factorization, this construction leads to a total solution time of O(n ) 6/5 for Laplacians with a bounded degree and O(n ) when the graph is planar. For regular unweighted meshes in 2 and 3 dimensions, special constructions are even more eective [20]. −1 Vaidya also proposed to use recursion to apply B , but without a rigorous analysis; in this scheme, sparse Gaussian elimination steps are performed on B as long as the reduced matrix has a row with only two nonzeros. At that point, the reduced system is solved by constructing a graph preconditioner, and so on. Vaidya's preconditioners are quite eective in practice [12]. A generalization to complex matrices proved eective in handling a certain kind of ill conditioning [19]. Vaidya's bounds used a congestion-dilation product. The research on subset graph preconditioners continued with an observation that the sum of the stretch can also yield a spectral bound, and that so-called low-stretch trees would give better worst-case bounds than the maximum-spanning trees that Vaidya used [8]. Constructing low-stretch trees is more complicated than constructing maximum spanning trees. When the utility of low-stretch trees was discovered, one algorithm for constructing them was known [1]; better algorithms were discovered later [15]. Low-stretch trees can also be augmented, and this leads to precondition4/3 ers that can solve any Laplacian system with m nonzeros in O(m ), up to some additional polylogarithmic factors [26, 29]. By employing recursion, the theoretical running time can be reduced to close to linear in m [27]. An experimental comparison of simplied versions of these sophisticated algorithms to Vaidya's algorithm did not yield a conclusive result [30]. Heuristic subset graph preconditioners have also been proposed [16]. Gremban and Miller proposed a class of combinatorial preconditioners called preconditioners [18, 17]. support-tree Their algorithms construct a weighted tree whose leaves are the Therefore, the graph GT of the preconditioner T has additional vertices. −1 They show that applying T to extensions of residuals is equivalent to using a preconditioner vertices of B GA . that is the Schur complement of condition number κ(A, B) T with respect to the original vertices. Bounding the is more dicult than bounding the condition number of subset preconditioners, because the Schur complement is dense. strategy have been proposed later [23, 21, 22]. More eective versions of this COMBINATORIAL PRECONDITIONERS 18 Eorts to generalize these constructions to matrices that are not weighted Laplacians followed several paths. Gremban showed how to transform a linear system whose coecient matrix is a signed Laplacian to a linear system of twice the size whose matrix is a weighted 2-by-2 block matrix with diagonal blocks with the same matrix A and with identity o-diagonal blocks. A dierent Laplacian. The coecient matrix is a sparsity pattern as the original approach is to extend Vaidya's construction to signed graphs [6]. The class of symmetric T matrices with a symmetric factorization A = U U where columns of U have at most 2 nonzeros contains not only signed graphs, but also gain graphs, which are not diagonally dominant [7]; it turns out that these matrices can be scaled to diagonal dominance, which allows graph preconditioners to be applied to them [14]. The matrices that arise in nite-element discretization of elliptic partial dierential equations (PDEs) are positive semi-denite, but in general they are not diagonally dominant. However, when the PDE is scalar (e.g., describes a problem in electrostatics), the matrices can sometimes be approximated by diagonally dominant matrices. In this scheme, the coecient matrix GD A is rst approximated by a diagonally-dominant matrix is used to construct the graph GB of the preconditioner B. D, and then For large matrices of this class, the rst step is expensive, but because nite-element matrices have a natural representation as a sum of very sparse matrices, the diagonally-dominant approximation can be constructed for each term in the sum separately. There are at least three ways to construct these approximations: during the nite-element discretization process [10], algebraically [2], and geometrically [32]. A slightly modied construction that can accommodate terms that do not have a close diagonally-dominant approximation works well in practice [2]. Another approach for constructing combinatorial preconditioners to nite element problems is to rely on a graph that describes the relations between neighboring elements. This graph is the dual of the nite-element mesh; elements in the mesh are the vertices of the graph. Once the graph is constructed, it can be sparsied much like subset preconditioners. This approach, which is applicable to vector problems like linear elasticity, was proposed in [24]; this paper also showed how to construct the dual graph algebraically and how to construct the nite-element problem that corresponds to the sparsied dual graph. The rst eective preconditioner of this class was proposed in [13]. It is not yet known how to weigh the edges of the dual graph eectively, which limits the applicability of this method. However, in applications where there is no need to weigh the edges, the method is eective [25]. References [1] Noga Alon, Richard M. Karp, David Peleg, and Douglas West. A graph-theoretic game and its application to the k -server problem. SIAM Journal on Computing, 24:78100, 1995. [2] Haim Avron, Doron Chen, Gil Shklarski, and Sivan Toledo. Combinatorial preconditioners for scalar elliptic nite-element problems. SIAM Journal on Matrix Analysis and Applications, 31(2):694720, 2009. [3] Haim Avron, Esmond Ng, and Sivan Toledo. Using perturbed QR factorizations to solve linear leastsquares problems. SIAM Journal on Matrix Analysis and Applications, 31(2):674693, 2009. [4] Marshall Bern, John R. Gilbert, Bruce Hendrickson, Nhat Nguyen, and Sivan Toledo. Support-graph preconditioners. SIAM Journal on Matrix Analysis and Applications, 27:930951, 2006. COMBINATORIAL PRECONDITIONERS 19 [5] Denis S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas with Applications to Linear Systems Theory. Princeton University Press, 2005. [6] Erik G. Boman, Doron Chen, Bruce Hendrickson, and Sivan Toledo. Maximum-weight-basis preconditioners. Numerical Linear Algebra with Applications, 11:695721, 2004. [7] Erik G. Boman, Doron Chen, Ojas Parekh, and Sivan Toledo. On the factor-width and symmetric H-matrices. Numerical Linear Algebra with Applications, 405:239248, 2005. [8] Erik G. Boman and Bruce Hendrickson. On spanning tree preconditioners. Unpublished manuscript, Sandia National Laboratories, 2001. [9] Erik G. Boman and Bruce Hendrickson. Support theory for preconditioning. SIAM J. Matrix Anal. Appl., 25(3):694717, 2003. [10] Erik G. Boman, Bruce Hendrickson, and Stephen Vavasis. Solving elliptic nite element systems in near-linear time with support preconditioners. SIAM Journal on Numerical Analysis, 46(6):32643284, 2008. [11] Doron Chen, John R. Gilbert, and Sivan Toledo. Obtaining bounds on the two norm of a matrix from the splitting lemma. Electronic Transactions on Numerical Analysis, 21:2846, 2005. [12] Doron Chen and Sivan Toledo. Vaidya's preconditioners: Implementation and experimental study. Electronic Transactions on Numerical Analysis, 16:3049, 2003. [13] Samuel I. Daitch and Daniel A. Spielman. Support-graph preconditioners for 2-dimensional trusses. Mar 2007. [14] Samuel I. Daitch and Daniel A. Spielman. Faster approximate lossy generalized ow via interior point algorithms. In STOC '08: Proceedings of the 40th annual ACM Symposium on Theory of Computing, pages 451460, New York, NY, USA, 2008. ACM. [15] Michael Elkin, Yuval Emek, Daniel A. Spielman, and Shang-Hua Teng. Lower-stretch spanning trees. In Proceedings of the 37th annual ACM symposium on Theory of computing (STOC), pages 494503, Baltimore, MD, 2005. ACM Press. [16] A. Frangioni and C. Gentile. New preconditioners for KKT systems of network ow problems. SIAM Journal on Optimization, 14:894913, 2004. [17] Keith D. Gremban. Combinatorial Preconditioners for Sparse, Symmetric, Diagonally Dominant Linear Systems. PhD thesis, School of Computer Science, Carnegie Mellon University, October 1996. Available as Technical Report CMU-CS-96-123. [18] Keith D. Gremban, Gary L. Miller, and Marco Zagha. Performance evaluation of a new parallel preconditioner. In Proceedings of the 9th International Parallel Processing Symposium, pages 6569. IEEE Computer Society, 1995. A longer version is available as Technical Report CMU-CS-94-205, CarnegieMellon University. [19] Victoria E. Howle and Stephen A. Vavasis. An iterative method for solving complex-symmetric systems arising in electrical power modeling. SIAM Journal on Matrix Analysis and Applications, 26(4):1150 1178, 2005. [20] Anil Joshi. Topics in Optimization and Sparse Linear Systems. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1997. [21] Ioannis Koutis and Gary L. Miller. A linear work, O(n1/6 ) time, parallel algorithm for solving planar Laplacians. In Nikhil Bansal, Kirk Pruhs, and Cliord Stein, editors, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 10021011. SIAM, 2007. [22] Ioannis Koutis and Gary L. Miller. Graph partitioning into isolated, high conductance clusters: theory, computation and applications to preconditioning. In SPAA '08: Proceedings of the twentieth annual Symposium on Parallelism in Algorithms and Architectures, pages 137145, New York, NY, USA, 2008. ACM. COMBINATORIAL PRECONDITIONERS 20 [23] Bruce M. Maggs, Gary L. Miller, Ojas Parekh, R. Ravi, and Shan Leung Maverick Woo. Finding eective support-tree preconditioners. In SPAA '05: Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, pages 176185. ACM Pres, 2005. [24] Gil Shklarski and Sivan Toledo. Rigidity in nite-element matrices: Sucient conditions for the rigidity of structures and substructures. SIAM Journal on Matrix Analysis and Applications, 30(1):740, 2008. [25] Gil Shklarski and Sivan Toledo. Computing the null space of nite element problems. Computer Methods in Applied Mechanics and Engineering, 198(37-40):30843095, August 2009. [26] Daniel A. Spielman and Shang-Hua Teng. Solving sparse, symmetric, diagonally-dominant linear systems in time 0(m1.31 ). In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pages 416427, October 2003. [27] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsication, and solving linear systems. In STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 8190, New York, NY, USA, 2004. ACM Press. [28] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. Unpublished manuscript available online at http://arxiv.org/abs/cs/0607105, 2009. [29] Daniel A. Spielman and Jaeoh Woo. A note on preconditioning by low-stretch spanning trees. Mar 2009. [30] Uri Unger. An experimental evaluation of combinatorial preconditioners. Master's thesis, Tel-Aviv University, July 2007. [31] Pravin M. Vaidya. Solving linear equations with symmetric diagonally dominant matrices by constructing good preconditioners. Unpublished manuscript. A talk based on this manuscript was presented at the IMA Workshop on Graph Theory and Sparse Matrix Computations, Minneapolis, October 1991. [32] Meiqiu Wang and Vivek Sarin. Parallel support graph preconditioners. In Yves Robert, Manish Parashar, Ramamurthy Badrinath, and Viktor K. Prasanna, editors, High Performance Computing - HiPC 2006, volume 4297, chapter 39, pages 387398. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.