Fast Poisson Solvers for Thermal Analysis Haifeng Qian Sachin S. Sapatnekar IBM T. J. Watson Research Center Yorktown Heights, NY [email protected] Department of Electrical and Computer Engineering University of Minnesota Minneapolis, MN [email protected] Abstract— Accurate and efficient thermal analysis for a VLSI chip is crucial, both for sign-off reliability verification and for design-time circuit optimization. To determine an accurate temperature profile, it is important to simulate a die together with its thermal mounts: this requires solving Poisson's equation on a non-rectangular 3D domain. This paper presents a class of eigendecomposition-based fast Poisson solvers (FPS) for chiplevel thermal analysis. We start with a solver that solves a rectangular 3D domain with mixed boundary conditions in O(NlogN) time, where N is the dimension of the finite-difference matrix. Then we reveal, for the first time in the literature, a strong relation between fast Poisson solvers and Green-functionbased methods. Finally, we propose an FPS method that leverages the preconditioned conjugate gradient method to solve non-rectangular 3D domains efficiently. We demonstrate that this approach solves a system of dimension 5.33e6 in only 11 Conjugate Gradient iterations, with a runtime of 171 seconds, a 6X speedup over the popular ICCG solver. Keywords-thermal analysis; fast Poisson solver; Green function I. INTRODUCTION With technology scaling and on-chip power density increasing, the ability to obtain an accurate full-chip temperature profile under various workload scenarios is a necessity during sign-off performance and reliability verification. Furthermore, accurate and efficient thermal analysis is an indispensable engine for many design-time circuit optimizations. Thermally-driven optimizations require a large number of thermal solves with different power dissipation distributions, and demand extremely high efficiency from the thermal analysis engine. Fast chip-level thermal analysis has been the subject of many prior works [5][7][12][14]. Most focus on solving Poisson's equation on a die, which is a 3D rectangular domain with layered materials. In a typical model, the boundary conditions on five surfaces of this domain are assumed to be adiabatic [7][14] ─ this is in the form of the Neumann boundary condition, which specifies temperature gradient (zero for adiabatic surfaces) along the direction that is perpendicular to a surface. The bottom surface of this domain is assumed to be convective, which is the Dirichlet boundary condition specifying temperature values on the bottom surface of the model [7][14], often assumed to be the ambient. A multigrid approach was proposed in [7], which uses finite-difference discretization and a geometric multigrid solver. Several Green-function-based algorithms were proposed in [12][14], which use boundary element method and leverage the efficiency of Fast Fourier Transform (FFT). These Green-functionbased algorithms have a strong relation to the works of [3][4][8] on the problems of substrate electrical analysis, which also involves Poisson's equation on a 3D rectangular domain with layered materials. One limitation of solving a rectangular domain, the die only, is the simplistic assumption about bottom surface, where the temperatures are far from uniform and vary greatly depending on packaging structures underneath. A typical configuration uses a copper heat spreader, a copper heat sink and thermal interface materials layers. It is possible to approximate these thermal mounts with an effective heat This work was supported in part by the NSF under award CCF-0634802. transfer coefficient [14], but such an approximation may incur substantial error ─ this was shown in [5] to be tens of degrees differences and distorted shapes of the temperature profiles. Hence it is important to simulate a die with its thermal mounts, which requires solving Poisson's equation on a non-rectangular 3D domain [2][5]. Fig. 1 illustrates the pyramid-shaped model in [5]. This model will be used in Section 4 and Section 5; however, our solver in Section 4 is not limited to solving pyramid-shaped domains, and can handle other non-rectangular models, e.g., those that include fins under the sink. Figure 1. The pyramid thermal model from [5]. The difficulty of a non-rectangular domain is that an analytical form is no longer available for the Green function. Therefore many existing techniques are no longer applicable, and an accurate solution of the finite difference matrix would require generic methods, for example, the finite difference method with the popular Incomplete Cholesky preconditioned Conjugate Gradient (ICCG) solver. An imaging technique was proposed in [5] as an approximate alternative. This paper studies eigendecomposition-based fast Poisson solvers for chip-level thermal analysis, and throughout this paper we will use FPS as a shorthand notation for these solvers. The idea of FPS has been known for a long time [9][11], but it has found limited usage in practical applications. In the literature, e.g. [9][10], FPS is often presented in the context of a 2D-grid Laplacian matrix with uniform diagonal entries, which corresponds to a finite-difference discretization of a 2D rectangular homogeneous domain with Dirichlet conditions on all four boundaries. It is rarely discussed with respect to 3D domains and/or more general boundary conditions [11]. Section 2 demonstrates a FPS solver for 3D rectangular thermal models with mixed boundary conditions as in [7][14]. The conventional wisdom has been that FPS solvers were an entirely different class of solvers from other known solvers. Section 3 proves, for the first time, a strong relation ─ in fact, an equivalence under certain conditions ─ between FPS and Green-function-based methods. Section 4 presents FPS-Preconditioned Conjugate Gradient method that solves non-rectangular domains efficiently. II. 3D FPS In this section, we study the use of FPS to solve a 3D finite difference matrix, Ax=b, of a rectangular domain with layered materials and mixed boundary conditions. To do so, we expand the applicability of FPS from the well-known 2D domains with Dirichlet conditions [9]. The following is a relaxed sufficient condition for FPS. ⎛ B1,1 ⎜ ⎜ B 2,1 A=⎜ # ⎜ ⎜B ⎝ K ,1 B1, 2 B 2, 2 # B K ,2 " B1, K " B2, K % # " BK ,K ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (1) such that Bi , j , ∀i, j are square matrices with the same dimension M ; ∀i , Bi , j , ∀i, j have the same eigenvectors. Let a and b be the dimensions of the 3D domain in x and y directions. We assume even-spaced discretization along the x-axis into Dx segments, even-spaced discretization along the y-axis into Dy segments, and arbitrary discretization along the z-axis into Dz segments. The result is Dx·Dy·Dz rectangular cells. For 0≤m≤ Dx-1, 0≤n≤ Dy-1, 0≤l≤ Dz-1, we use the notion cell (m,n,l) to refer to the cell formed by the mth segment of the x axis, the nth segment of the y axis, and the lth segment of the z axis. Let Tm,n,l be the average temperature of cell (m,n,l), and let Pm,n,l be the total power inside cell (m,n,l). We assume a natural ordering in Ax=b, in other words: (2) x = T −T m+ n⋅Dx +l ⋅Dx ⋅Dy m,n ,l (3) where Ta is the ambient temperature at the bottom of the heat sink. As in the thermal models in [7][14], we have Neumann conditions on five surfaces, and Dirichlet boundary condition on the z=0 surface. Fitting matrix A to (1) with M=Dx, K=Dy·Dz, we have special forms for Bi,j: Bi ,i = α i ⋅ I + β i ⋅ G ⎛ −1 − 1 0 " 0 0 0 ⎞ ⎟ ⎜ ⎜ −1 0 −1 " 0 0 0 ⎟ ⎜ 0 −1 0 % 0 0 0 ⎟ ⎟ ⎜ where α i and β i are scalars, G = ⎜ # # % % % # # ⎟ ⎜ 0 0 0 % 0 −1 0 ⎟ ⎟ ⎜ ⎜ 0 0 0 " − 1 0 − 1⎟ ⎟ ⎜ ⎝ 0 0 0 " 0 − 1 − 1⎠ ∀i ≠ j, Bi , j = γ i , j ⋅ I (4) where γ i , j is a scalar. Note that the above matrices have the same eigenvectors, which are eigenvectors of G; therefore condition (1) is satisfied. Also note that the Bi,i formulation in the above equation is different from the well known ones in [9][10]; this is due to the fact that we now have Neumann conditions on the x=0 and x=a surfaces, as opposed to Dirichlet conditions in [9][10]. It can be verified that the eigenvalues and eigenvectors of G are: ∀1 ≤ i ≤ M , ⎛1⎞ ⎜ ⎟ 1 ⎜1⎟ ⋅⎜ ⎟ M # ⎜⎜ ⎟⎟ ⎝1⎠ q1 = ⎛ (i − 1) ⋅ π ⎞ ⎟ ⎝ M ⎠ ⎛ ⎛ (i − 1) ⋅ π ⎞ cos⎜ ⎜ ⎟ ⎝ 2M ⎠ ⎜ ⎜ ⎛ 3 ⋅ (i − 1) ⋅ π ⎞ 2 ⎜ cos⎜ ⎟ ∀i > 1, q i = ⋅ 2M ⎠ ⎝ M ⎜ # ⎜ ⎜ cos⎛⎜ (2 M − 1) ⋅ (i − 1) ⋅ π ⎜ 2M ⎝ ⎝ λi = −2 cos⎜ (5) q2 " qM ) ⎛ Λ1,1 ⎜ ⎜ Λ 2,1 ⎜ # ⎜ ⎜Λ ⎝ K ,1 0 ⎞ ⎛ Q 0 0 0 ⎞⎛ Q T ⎟ ⎜ ⎟⎜ 0 ⎟ ⎜ 0 Q 0 0 ⎟⎜ 0 ⎟ A⎜ 0 0 % 0 ⎟⎜ 0 0 % 0 ⎟ ⎜ ⎟⎜ 0 0 Q T ⎟⎠ ⎜⎝ 0 0 0 Q ⎟⎠⎜⎝ 0 Λ1,2 " Λ1, K ⎞ ⎟ Λ 2 , 2 " Λ 2, K ⎟ ~ ~ ⋅x = b # % # ⎟ ⎟ Λ K , 2 " Λ K , K ⎟⎠ 0 0 0 QT 0 QT (6) ⎛ QT ⎜ ⎜ 0 where ~ x =⎜ ⎜ 0 ⎜ 0 ⎝ 0 QT 0 0 ⎛ QT 0 ⎞ ⎜ ⎟ 0 ⎟ ~ ⎜ 0 ⎟x , b = ⎜ % 0 ⎟ ⎜ 0 ⎜ 0 0 Q T ⎟⎠ ⎝ 0 0 ⎛ QT 0 ⎞ ⎟ ⎜ 0 ⎟ ⎜ 0 ⎟x = ⎜ % 0 ⎟ ⎜ 0 ⎜ 0 0 Q T ⎟⎠ ⎝ 0 0 0 QT 0 0 0 ⎞ ⎟ 0 ⎟ ⎟b % 0 ⎟ T⎟ 0 Q ⎠ 0 0 (7) 0 0 0 QT 0 0 0 ⎞ ⎟ 0 ⎟ ⎟b % 0 ⎟ 0 Q T ⎟⎠ 0 0 where the Λi,j matrices take the following diagonal form: (9) The advantage of FPS lies in the fact that Q has a special structure ~ such that a fast way exists to evaluate b from b by (7), and x from ~x by (9). In fact, for any vector v, the computation of QTv and Qv can be performed by FFT and with O(M·logM) complexity. Let us investigate the complexity of FPS on this thermal analysis. ~ The cost of computing b from b by equation (7), using FFT, is O(K·M·logM) = O(Dx·Dy·Dz·logDx). The cost of computing the final solution x by equation (9), using FFT, is also O(Dx·Dy·Dz·logDx). The remaining question is the cost of the middle step, solving equation (7) as Dx separate systems, each with dimension Dy·Dz. Studying (4)(7)(8) in more details, it can be shown that each of these Dy·Dz matrices has a 2D y-z grid structure and also satisfies conditions (1)(4). Therefore, we can apply the FPS procedure on each of them on the y direction (note the Neumann conditions on y=0 and y=b surfaces). It again is a three step approach: the first and third steps use the FFT, with complexity O(Dy·Dz·logDy), and the middle step now solves Dy separate systems, each with dimension Dz. Each of them is now a 1D structure in z direction, which is a tridiagonal matrix and can be solved in linear time. Therefore, the cost of solving a system with dimension Dy·Dz is O(Dy·Dz·logDy) + O(Dy·Dz) + O(Dy·Dz·logDy), which is bounded by O(Dy·Dz·logDy). Therefore, the overall cost of solving the original system Ax=b is O(Dx·Dy·Dz·logDx) + Dx·O(Dy·Dz·logDy), which is bounded by O(N·logN), where N=Dx·Dy·Dz is the overall dimension. RELATION BETWEEN FPS AND GREEN-FUNCTION-BASED METHODS To solve Ax=b, let us do the following transformation. ⎛ QT ⎜ ⎜ 0 ⎜ ⎜ 0 ⎜ 0 ⎝ (8) " % ⎛Q 0 0 ⎞ ⎟ ⎜ x = ⎜ 0 % 0 ⎟~ x ⎜ 0 0 Q⎟ ⎠ ⎝ III. ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎞⎟ ⎟⎟ ⎠⎠ Let matrix Q (which is orthonormal, i.e., QTQ = QQT = I) be: Q = (q 1 ∀i ≠ j , 0 ⎞ ⎟ 0 ⎟ # ⎟ ⎟ " λM ⎟⎠ " Now that the new left-hand-side matrix in (7) has a block form where each block is a diagonal submatrix, such an overall system can be decoupled into M separate sets of linear equations, each of which has dimension K and can be solved independently. Note that, if these sets of equations also satisfy condition (1), each of them can again be transformed and further decoupled into smaller sets of equations. After ~x is obtained, solution x to the original system is simply: a bm+ n⋅Dx +l⋅Dx ⋅Dy = Pm,n,l ∀i, ⎛ λ1 0 ⎜ ⎜ 0 λ2 Λ i , i = Q Bi , iQ = α i ⋅ I + β i ⋅ ⎜ # # ⎜ ⎜0 0 ⎝ Λ i , j = Q T Bi , j Q = γ i , j ⋅ I T This section demonstrates a strong relation ─ in fact equivalence under certain conditions ─ between the 3D FPS from Section 2 and Green-function-based solvers [4][8][14]. For clarity, we only compare with the full-chip thermal analysis method, Algorithm II, in [14]. The relation to other variations of Green-function-based methods follows similar derivations. For the rest of the paper, we will use GFS as a short-hand notation for Green-function-based solvers. For clarity and without loss of generality, we assume, as in [14], that heat sources are located on discrete horizontal planes. Also, like [14], we assume that each plane is divided into Dx×Dy rectangular grid cells of equal size, and the power density inside each cell is uniform. Therefore, the power density distribution function is in the following piecewise constant form, nonzero only at a finite set of z’ values. P ( x ′, y ′, z ′) = ⎛ Dx −1D y −1 a ∑ ∑ P (z ′) ⋅ Θ⎜⎜ x′ − (m + 0.5) D m ,n ⎝ a b ⎧ ′ 1 , , y′ < x < ⎪ where Θ( x ′, y ′) = ⎨ 2Dx 2D y ⎪⎩ 0, otherwise m=0 n =0 x , y ′ − (n + 0.5) b Dy ⎞ ⎟ ⎟ ⎠ (10) where again a and b are the dimensions of the domain in x and y directions. Note that the discreteness assumption about z axis is not essential, and, on removing it, the derivation still stands, only with constants Pm,n(z’) replaced by more complex forms. A. FPS formulations This section gives detailed FPS formulations based on the principles of Section 2, to be compared with GFS ones in Section 3.2. Let us use the same formation Ax=b as (2)(3), and the same x and y discretization as in (10). Let the z coordinates of cell centers be z0, z1, …, zDz-1. For clarity of presentation, we further assume that heat source planes in (10) coincide these z coordinates. In other words: Pm,n ( z ′) = 0, P ( x′, y ′, z ′) = 0, for z '∉ {z1 , z 2 ," z D −1 } (11) And we have the following relation between (3) and (10). z bm+n⋅Dx +l⋅Dx ⋅Dy = Pm,n,l = Pm,n (zl ) ⋅ a⋅b Dx ⋅ Dy (12) The first step of FPS is to apply the transformation of (7) with M=Dx and K=Dy·Dz, and compute the transformed right hand side: ⎛ Q DT 0 0 ⎞ (13) ⎟ ⎜ x ~ b=⎜ 0 ⎜ ⎝ 0 % 0 ⎟b ⎟ 0 Q DT x ⎠ where Q denotes the eigenvector matrix from (5)(6) with M=Dx. The new system (7) is then decoupled into Dx independent systems of linear equations. The ith system is, for 0≤i≤Dx-1: ~ ~ (14) Ai ~xi = b i ~ where A is a submatrix of the transformed left hand side in (7), ~xi is i ~ ~ a part of ~x , and b i is a part of b , such that % 0 ∀0 ≤ r ≤ D y ⋅ D z − 1 0 QDT y ~ ⎟b i ⎟⎟ ⎠ where Q DT denotes the matrix from (5)(6) with M=Dy. (14) becomes y ⎛ QDT ⎜ y ⎜ 0 ⎜⎜ 0 ⎝ 0 ⎞⎟ ⎛⎜ QDy ~ % 0 ⎟ Ai ⎜ 0 ⎜ T ⎟ ⎟ 0 QDy ⎠ ⎜⎝ 0 0 ~ 0 ⎞⎟ ~ ~ ~ xi = b i % 0 ⎟~ ⎟ ⎟ 0 QDy ⎠ 0 ⎛ QDT ⎜ y ~ ~ , where x i = ⎜ 0 ⎜⎜ 0 ⎝ 0 ⎞⎟ xi % 0 ⎟~ ⎟ 0 QDTy ⎟⎠ 0 (17) Vector ~xi can be solved from (17) in O(Dy·Dz) time because it can be decoupled into Dy independent systems, each of which is a ~ tridiagonal matrix of dimension Dz. After obtaining ~xi , we compute ⎛ QD (18) 0 0 ⎞⎟ ⎜ ~ xi = ⎜ 0 ⎜⎜ 0 ⎝ y % 0 1 =χj ⋅ ⋅ Dy D y −1 = ~ 0 ⎟~ xi ⎟ ⎟ QD y ⎠ Dx 1.5 ( = χj ⋅ ) ( D −1 1 y jπ (n + 0.5) ~ ⋅ ∑ cos ⋅ bi , Dy ⋅l + n D y n =0 Dy ) ( ⎛ bDx ⋅n + Dx ⋅ D y ⋅l ⎞ ⎜ ⎟ ⎜ b Dx ⋅n + Dx ⋅D y ⋅l +1 ⎟ ⋅⎜ ⎟ # ⎜ ⎟ ⎜ bD ⋅n+ D D ⋅l + D −1 ⎟ ⎝ x x⋅ y x ⎠ ) T (20) D −1 ⋅ Dy 1.5 Dx −1D y −1 ⋅ ∑ ∑ Pm , n (z l ) ⋅ cos m=0 n =0 iπ (m + 0.5) jπ (n + 0.5) ⋅ cos Dx Dy j=0 ⎧1, where χ j = ⎨ 2 , j>0 ⎩ Similarly, we can expand the last two steps of FPS, and by (2)(5)(15)(18)(19), the average temperature of cell (m,n,l) is: T m , n , l = T a + x m + n⋅ D x + l ⋅ D x ⋅ D y iπ (m + 0.5) ~ 1 Dx −1 ⋅ ∑ χ i ⋅ cos ⋅ x i + n⋅ D x + l ⋅ D x ⋅ D y D x i =0 Dx = Ta + (21) iπ (m + 0.5) ~ 1 Dx −1 = Ta + ⋅ ∑ χ i ⋅ cos ⋅ x i,n +l⋅Dy D x i =0 Dx D −1 D x −1 y ~ iπ ( m + 0.5) jπ (n + 0.5) 1 ⋅ ∑ ∑ χi ⋅ χ j ⋅ ~ ⋅ cos x i , j + l ⋅D y ⋅ cos Dx Dy D x D y i =0 j =0 where χj is as in (20). B. GFS formulations This section is largely based on Algorithm II in [14]. The GFS approach evaluates the following temperature calculation formula: a b z′ 0 0 T (x, y, z ) = Ta + ∑ ∫ dx ′∫ dy ′G (x, y, z , x′, y ′, z ′)P( x′, y ′, z ′) (22) where G(.) is the Green function. In the case of Neumann conditions on 4 side surfaces (x=0, y=0, x=a, y=b), by [8][14], it is: ∞ ∞ iπx jπy iπx′ jπy ′ (23) G ( x, y, z , x′, y ′, z ′) = ∑∑ Ci , j ( z, z ′) ⋅ cos i =0 j =0 a ⋅ cos b ⋅ cos a ⋅ cos b and [8] provided ways to calculate function Ci,j(.) analytically. Suppose we write (10) in the following frequency-domain form. ∞ ∞ iπx′ jπy ′ (24) P ( x′, y ′, z ′) = ∑∑ pi , j (z′) ⋅ cos i =0 j = 0 a ⋅ cos b By (10) and (24), we can derive pi , j ( z ′) = a ⋅ b ⋅ μ i , j ⋅ ∑ ∑ Pm ,n ( z ′) ⋅ cos m= 0 n =0 (19) Finally, the temperature values at each cell are mapped from (2). Overall, the FPS procedure can be summarized as follows. ~ 1. Calculate vector b from Pm,n(z’) by (12)(13), using FFT. ~ ~ 2. Calculate vectors b~ i from b by (15)(16), 0≤i≤Dx-1, using FFT. ~ ~ 3. Calculate vectors xi by solving (17) as Dy independent linear systems, each with dimension Dz, for 0≤i≤Dx-1. ~ 4. Calculate vector ~x from ~xi , 0≤i≤Dx-1, by (15)(18), using FFT. 5. Calculate vector x from ~x by (19), using FFT. To study the relation to GFS, let us expand the equations of the first two steps. Symbol qM,i will be used to denote the ith eigenvector in equation (5) with dimension M. By (5)(12)(13)(15)(16), we have: for 0 ≤ i ≤ Dx − 1, 0 ≤ j ≤ D y − 1, 0 ≤ l ≤ Dz − 1 ~ T ~ ~ ~ ~ T bi , j + Dy ⋅l = q Dy , j +1 ⋅ bi , Dy ⋅l bi , Dy ⋅l +1 " bi , D y ⋅l + Dy −1 jπ ( n + 0.5) ~ ⋅ bi + Dx ⋅n + Dx ⋅ D y ⋅l Dy jπ ( n + 0.5) ⋅ q D x ,i +1 cos ∑ Dy n =0 χi ⋅ χ j ⋅ a ⋅ b equation (15). Then the solution to the original system is 0 0 ⎞ ⎟ % 0 ⎟~ x ⎟ 0 QDx ⎠ n =0 Dx −1D y −1 After having ~xi for 0≤i≤Dx-1, we can now recover ~x based on ⎛ QDx ⎜ x=⎜ 0 ⎜ ⎝ 0 ∑ cos D x −1 y iπ (m + 0.5) jπ ( n + 0.5) 1 ⋅ ∑ ∑ Pm ,n ,l ⋅ cos ⋅ cos D x D y m =0 n =0 Dx Dy = χi ⋅ χ j ⋅ = Ta + (15) Then we apply the transformation of (7) on (14) with M=Dy and K=Dz, and compute another transformed right hand side: ⎛ Q DT 0 0 ⎞⎟ (16) ⎜ y ~ ~ bi = ⎜ 0 ⎜⎜ ⎝ 0 D y −1 for 0 ≤ m ≤ D x − 1, 0 ≤ n ≤ D y − 1, 0 ≤ l ≤ D z − 1 T Dx ~ ~ ~ xi , r = ~ xi + Dx ⋅r , bi ,r = bi + Dx ⋅r , 1 ⋅ Dy =χj ⋅ where μ i , j 1 ⎧ ⎪ D D , x y ⎪ ⎪ 2 sin jπ , 2Dy ⎪⎪ jπDx =⎨ 2 iπ ⎪ sin , 2Dx ⎪ iπD y ⎪ 4 iπ jπ sin ⎪ 2 sin 2Dx 2D y ⎪⎩ ijπ iπ (m + 0.5) jπ (n + 0.5) ⋅ cos Dx Dy (25) i= j=0 i = 0, j ≠ 0 i ≠ 0, j = 0 , i ≠ 0, j ≠ 0 Substituting (23) and (24) into (22), we get an alternative to (22): ∞ ∞ T ( x, y , z ) = Ta + a ⋅ b ⋅ ∑∑∑ ω i , j ⋅ pi , j ( z ′) ⋅ C i , j (z , z ′) ⋅ cos z ′ i =0 j =0 ∞ ∞ = Ta + a ⋅ b ⋅ ∑∑ Z i , j (z ) ⋅ cos i =0 j =0 where ωi , j ⎧ 1, i = j = 0 ⎪0.5, i = 0, j ≠ 0 ⎪ =⎨ ⎪0.5, i ≠ 0, j = 0 ⎪⎩0.25, i ≠ 0, j ≠ 0 , iπx jπy ⋅ cos a b iπx jπy ⋅ cos a b (26) Z i , j ( z ) = ∑ ω i , j ⋅ pi , j ( z ′) ⋅ C i , j ( z , z ′ ) z′ Practically, we need to truncate the infinite summations in (26). By truncating the two infinite sums at D′x and D′y respectively: D x′ −1 D′y −1 T ( x, y, z ) = Ta + a ⋅ b ⋅ ∑ ∑ Z i , j ( z ) ⋅ cos i = 0 j =0 iπx jπy ⋅ cos a b (27) In practice, as is done in [14], often we are interested in not the direct outcome of (27), but, by dividing the x-y plane into evenly D ′x′ by D ′y′ rectangular cells, average temperatures in each cell. They are: IV. for 0 ≤ m ≤ D x′′ − 1, 0 ≤ n ≤ D ′y′ − 1 Tm , n ( z ) = D ′x′D ′y′ ab b ( n +1) D ′y′ a (m +1) D ′x′ ∫ x= am D′x′ y= bn D ′y′ D x′ −1 D ′y −1 = T a + a ⋅ b ⋅ D x′′ ⋅ D ′y′ ⋅ ∑ ∑θ ⋅ Z i , j ( z ) ⋅ cos i, j i =0 j =0 where θ i , j (28) ∫ dyT (x, y, z ) dx Green-function based approach with frequency-domain truncation is equivalent to even-spaced finite-difference discretization. This is not surprising, because it is well known in signal processing theory that frequency-domain truncation is equivalent to time-domain (in this case space-domain) sampling. 1 ⎧ ⎪ D ′′ D ′′ , x y ⎪ ⎪ 2 sin jπ , 2 D ′y′ ⎪⎪ jπD x′′ =⎨ 2 iπ ⎪ sin , 2 D x′′ ⎪ iπD ′y′ ⎪ 4 jπ iπ sin ⎪ 2 sin 2 D x′′ 2 D ′y′ ⎩⎪ ijπ iπ ( m + 0.5) jπ (n + 0.5) ⋅ cos D x′′ D ′y′ i= j=0 i = 0, j ≠ 0 i ≠ 0, j = 0 , i ≠ 0, j ≠ 0 Overall, the GFS procedure can be summarized as follows. 1. Calculate function Ci,j(z,z’) for 0 ≤ i ≤ D x′ − 1, 0 ≤ j ≤ D ′y − 1 , analytically as in [8]. 2. Calculate function pi,j(z’) by (25), for 0 ≤ i ≤ D′x − 1 , 0 ≤ j ≤ D′y − 1 , using FFT. 3. Calculate function Zi,j(z) by (26), for 0 ≤ i ≤ D′x − 1 , 0 ≤ j ≤ D′y − 1 . 4. Calculate function Tm,n(z) by (28), for 0 ≤ m ≤ Dx′′ − 1 , 0 ≤ n ≤ D′y′ − 1 , using FFT. FPS-PCG FOR IRREGULAR GEOMETRIES A limitation in both GFS and FPS is that they can only solve a rectangular 3D domain with layered materials. For GFS, this comes from the need for an analytical Green function like (23). For FPS, this comes from condition (1). Recall that an accurate thermal model of a die with its spreader and heat sink should be a non-rectangular domain, and that significant errors can be introduced if the spreader and sink are ignored or simplistically modeled as a convective surface. To solve non-rectangular models, and in general thermal models with material or geometry irregularities, we propose a thermal analysis method called FPS-PCG: FPS-preconditioned Conjugate Gradient. This method is built upon prior works: the “two-problem approach” from [6][13], and the “boundary iteration process” from [10]. The common idea is that, to solve Ax=b where A does not satisfy the conditions of FPS/GFS, we can separate A into two parts: A = A’ + B (31) where A’ can be handled efficiently with FPS/GFS, and B represents the abnormities and is much sparser and/or smaller in value than A’. The proposed FPS-PCG works by using FPS as a preconditioner for Preconditioned Conjugate Gradient (PCG). Given any right hand side vector v, FPS can provide in an exact solution to A’x=v in O(N·logN) time, which would be an approximate solution to Ax=v. Since any approximate solving process can serve as a preconditioner [9], it is natural to put FPS and PCG together. C. Relating FPS to GFS Let us consider the special case of GFS where Dx = D′x = Dx′′ and D y = D′y = D ′y′ . It is true that GFS is more flexible than FPS in the sense that it has three resolution controls: (Dx, Dy) of the heat source distribution, (Dx′ , D′y ) of frequency domain truncation, and (Dx′′, D′y′ ) of temperature output. They are not completely independent choices, for example, it was shown in [14] that D′x should be a multiple of Dx and D′y should be a multiple of Dy, to enable efficient implementation. It is easy to see the relation between equation (20), which is steps 1 and 2 of FPS, and equation (25), which is step 2 of GFS. Specifically, both equations represent the same computation and produce values merely differing by a scaling factor: for 0 ≤ i ≤ Dx − 1, 0 ≤ j ≤ Dy − 1, 0 ≤ l ≤ Dz − 1 (29) pi , j ( zl ) ≡ μi , j ⋅ Dx1.5 ⋅ Dy1.5 ~ ~ ⋅ bi , j + D χi ⋅ χ j y ⋅l Similarly, there is a correspondence between (21), which is steps 4 and 5 of FPS, and (28), which is step 4 of GFS. Both equations represent the same computation and produce the same temperature ~ outcome (subject to numerical errors), based on input values ~xi and Zi,j(z) respectively, which merely differ by a scaling factor: for 0 ≤ i ≤ D x − 1, 0 ≤ j ≤ D y − 1, 0 ≤ l ≤ D z − 1 Tm , n ( z l ) ≈ Tm , n , l Z i , j (z l ) ≈ χi ⋅ χ j a ⋅ b ⋅ Dx 1.5 ⋅ Dy 1.5 ⋅ θ i, j ~ (30) ~ ⋅~ x i, j +l⋅Dy The above is not an exact equality, for ~xi and Zi,j(z) are obtained ~ differently: step 3 of FPS computes ~xi by solving Dx·Dy independent linear systems, while steps 1 and 3 of GFS obtain Zi,j(z) by solving the z-direction analytically. However, step 3 of FPS and steps 1 and 3 of GFS do the same task, ─ they take the same input, as shown in (29), and produce approximately the same output, as shown in (30). In conclusion, FPS and GFS are mathematically equivalent for the special case of Dx = Dx′ = Dx′′ and D y = D′y = D ′y′ . In other words, Figure 2. Applying FPS-PCG on the pyramid thermal model. (a) Original system A. (b) Approximate system A'. (c) Abnormity system B. Fig. 2 illustrates FPS-PCG on the pyramid thermal model from [5]. Note that FPS-PCG is not limited to solving pyramid-shaped domains, and can handle other non-rectangular models, e.g., those that include fins under the sink. The model in Fig. 2(a) is expanded to form the rectangular approximation in Fig. 2(b). Like in any PCG scheme, the better approximation FPS provides, the less PCG iterations are needed to converge. It can be argued that FPS provides a good approximation, ─ assuming that the spreader and sink design provides reasonably good thermal performance, they should be such that heat flows would remain relatively constant even if the spreader or the die was widened, as in Fig. 2(b). The performance of FPS-PCG will be verified in the next section. Finally, as is done similarly in [14], it is possible to solve part of the domain in finer levels of granularity. We obtain, from a FPS-PCG solution, the temperature profile at the bottom of the die, i.e., the interface between the top and middle sections of the pyramid. Then we can solve the die alone and with this temperature profile as the boundary temperatures. Since the die is a rectangular domain, this can be done with one pass of FPS or GFS efficiently, and therefore we can afford to solve with much finer resolution. V. RESULTS For testcases, we use the same geometries of die, spreader and sink, as well as material properties, as in [5]. Nine testcases are generated by discretization of this structure with different resolutions. Consequently, the nine finite difference matrices, m1-m9 in Table 1, have dimensions ranging from 1.33e4 in the coarsest resolution to 5.33e6 in the finest. A heat map is used to model one operation scenario of a four-core chip, with total power 175W. TABLE I. RUNTIME COMPARISON OF FPS-PCG AGAINST DIRECT SOLVER AND ICCG. N IS THE DIMENSION; T IS THE RUNTIME; I IS THE NUMBER OF CONJUGATE-GRADIENT ITERATIONS. Matrix m1 m2 m3 m4 m5 m6 m7 m8 m9 N 1.33e4 5.33e4 2.13e5 4.80e5 8.53e5 1.33e6 1.92e6 3.41e6 5.33e6 Direct Solver T (sec) 0.70 5.54 46.24 164.44 388.98 N/A N/A N/A N/A ICCG I T (sec) 62 0.34 76 2.52 114 10.56 151 28.17 187 62.09 221 132.48 258 189.06 342 439.75 425 1007.73 FPS-PCG I T (sec) 7 0.47 8 1.57 9 5.65 10 13.31 10 22.37 10 34.84 11 53.13 11 103.34 11 171.11 Table 1 compares FPS-PCG against a direct solver and an ICCG solver. The direct solver uses approximate minimum degree (AMD) ordering [1]. The ICCG solver is based on incomplete Cholesky factorization with zero fill-in [9], and with AMD ordering. All three solvers are coded in Matlab, and all runtimes are measured on a Linux workstation with 2.4GHz CPU frequency and 6GB memory. The ICCG runtimes do not include preconditioning. For both ICCG and FPS-PCG, the convergence criterion is 1e-6 error tolerance. acute in the hottest regions. This verifies the need for thermal analysis engine, such as FPS-PCG, for non-rectangular domains. Finally, as discussed in Section 4, after solving a full model, we have the temperature distribution at the bottom of the die, and can use that as an ambient temperature distribution to solve the die again with a rectangular model and with even finer resolution, which can be done efficiently using GFS or FPS of Section 2. Fig. 6(a) plots a 1200 by 1200 temperature profile, which is computed by the FPS method in 17.89 seconds. As expected, given the relation between GFS and FPS, this runtime is consistent with those reported in [14]. Comparing Fig. 4(a) and Fig. 5(a), Fig. 5(b) and Fig. 5(c), a general agreement is observed, and the high-resolution solution provides more details, which are useful at hot-spot regions. Figure 5. (a) A 1200-by-1200 resolution temperature profile by solving a die-only rectangular model with ambient distribution provided by solving a full model. (b) The hot-spot region in (a). (c) The hot-spot region in Fig. 4(a). REFERENCES [1] [2] [3] [4] Figure 3. Runtime T as a function of matrix dimension N in Table 1. As can be observed in Table 1, the direct solver runtime increases at approximately O(N1.5) rate for these matrices, and in fact its memory consumption becomes impractical for the last four testcases. FPS-PCG scales better than ICCG as the matrix size increases, and its advantage widens, eventually with approximately 6X speedup on m9. A closer look at Table 1 reveals the reason for FPS-PCG’s efficiency and scalability ─ it needs consistently less PCG iterations to converge. The complexity per iteration, which can be measured by T/I in Table 1, is O(N) for ICCG and O(NlogN) for FPS-PCG. However, as dimension increases from m1 to m9, the number of iterations, I in Table 1, increases from 62 to 425 for ICCG, while increase only from 7 to 11 for FPS-PCG. This verifies the claim in Section 4 that the FPS process is a good approximation to the original linear system, and therefore serves as an effective and scalable preconditioner for PCG. [5] [6] [7] [8] [9] [10] [11] Figure 4. (a) Chip temperature profile by non-rectangular model. (b) Chip temperature profile by rectangular model. (c) Temperature curves across chip diagonal by both models. Ta is ambient temperature at bottom of heat sink. Fig. 4 demonstrates the difference between rectangular and nonrectangular models. For fairness, convection coefficient at the bottom of the rectangular model is adjusted such that Fig. 4(a) and 4(b) have the same average temperature. Note that there is significant difference between the shape of distribution between Fig. 4(a) and 4(b), and between the two curves in Fig. 4(c), and this difference is especially [12] [13] [14] P. Amestoy, T. A. Davis, and I. Duff, “An approximate minimum degree ordering algorithm,” SIAM J. Matrix Anal. Appl., pp. 886-905, 1996. P. E. Bagnoli, C. Casarosa, and F. Stefani, ”DJOSER: Analytical thermal simulator for multilayer electronic structures. Theory and numerical implementation,” Proc. Int’l Conf. Thermal Issues in Emerging Technologies, Theory and Applications, pp. 111-118, 2007. J. Costa, M. Chou, and L. M. Silveira, “Efficient techniques for accurate modeling and simulation of substrate coupling in mixed-signal IC's,” IEEE Trans. Computer-Aided Design, vol. 18, no. 5, pp. 597-607, 1999. R. Gharpurey and R. G. Meyer, “Modeling and analysis of substrate coupling in integrated circuits,” IEEE Journal of Solid-State Circuits, vol. 31, no. 3, pp. 344-353, 1996. V. M. Heriz, J. Park, T. Kemper, S. Kang, and A. Shakouri, “Method of images for the fast calculation of temperature distributions in packaged VLSI chips,” Proc. 13th Int’l Workshop on Thermal Investigation of ICs and Systems, pp. 18-25, 2007. T. A. Johnson, R. W. Knepper, V. Marcello, and W. Wang, “Chip substrate resistance modeling technique for integrated circuit design,” IEEE Trans. Computer-Aided Design, vol. 3, pp. 126-134, 1984. P. Li, L. T. Pileggi, M. Asheghi, and R. Chandra, “Efficient full-chip thermal modeling and analysis,” Proc. ICCAD, pp. 319-326, 2004. A. M. Niknejad, R. Gharpurey, and R. G. Meyer, “Numerically stable Green function for modeling and analysis of substrate coupling in integrated circuits,” IEEE Trans. Computer-Aided Design, vol. 17, no. 4, pp. 305-315, 1998. Y. Saad. Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, PA, 2003. J. Shi, Y. Cai, W. Hou, L. Ma, S. X.-D. Tan, P. Ho, and X. Wang, “GPU friendly fast Poisson solver for structured power grid network analysis,” Proc. ACM/IEEE Design Automation Conference, pp. 178-183, 2009. P. N. Swarztrauber, “The methods of cyclic reduction, Fourier analysis and the FACR algorithm for the discrete solution of Poisson's equation on a rectangle,” SIAM Review, vol. 19, no. 3, pp. 490-501, 1977. B. Wang and P. Mazumder, “Accelerated chip-level thermal analysis using multilayer Green’s function,” IEEE Trans. Computer-Aided Design, vol. 26, no. 2, pp. 325-344, 2007. C. Xu, R. Gharpurey, T. S. Fiez, and K. Mayaram, “A Green functionbased parasitic extraction method for inhomogeneous substrate layers,” Proc. ACM/IEEE Design Automation Conference, pp. 141-146, 2005. Y. Zhan and S. S. Sapatnekar, “High efficiency Green function-based thermal simulation algorithms,” IEEE Trans. Computer-Aided Design, vol. 26, no. 9, pp. 1661-1675, 2007.