Fast Poisson solvers for thermal analysis

Fast Poisson Solvers for Thermal Analysis
Haifeng Qian
Sachin S. Sapatnekar
IBM T. J. Watson Research Center
Yorktown Heights, NY
[email protected]
Department of Electrical and Computer Engineering
University of Minnesota
Minneapolis, MN
[email protected]
Abstract— Accurate and efficient thermal analysis for a VLSI
chip is crucial, both for sign-off reliability verification and for
design-time circuit optimization. To determine an accurate
temperature profile, it is important to simulate a die together
with its thermal mounts: this requires solving Poisson's equation
on a non-rectangular 3D domain. This paper presents a class of
eigendecomposition-based fast Poisson solvers (FPS) for chiplevel thermal analysis. We start with a solver that solves a
rectangular 3D domain with mixed boundary conditions in
O(NlogN) time, where N is the dimension of the finite-difference
matrix. Then we reveal, for the first time in the literature, a
strong relation between fast Poisson solvers and Green-functionbased methods. Finally, we propose an FPS method that
leverages the preconditioned conjugate gradient method to solve
non-rectangular 3D domains efficiently. We demonstrate that
this approach solves a system of dimension 5.33e6 in only 11
Conjugate Gradient iterations, with a runtime of 171 seconds, a
6X speedup over the popular ICCG solver.
Keywords-thermal analysis; fast Poisson solver; Green function
I.
INTRODUCTION
With technology scaling and on-chip power density increasing, the
ability to obtain an accurate full-chip temperature profile under
various workload scenarios is a necessity during sign-off performance
and reliability verification. Furthermore, accurate and efficient thermal
analysis is an indispensable engine for many design-time circuit
optimizations. Thermally-driven optimizations require a large number
of thermal solves with different power dissipation distributions, and
demand extremely high efficiency from the thermal analysis engine.
Fast chip-level thermal analysis has been the subject of many prior
works [5][7][12][14]. Most focus on solving Poisson's equation on a
die, which is a 3D rectangular domain with layered materials. In a
typical model, the boundary conditions on five surfaces of this domain
are assumed to be adiabatic [7][14] ─ this is in the form of the
Neumann boundary condition, which specifies temperature gradient
(zero for adiabatic surfaces) along the direction that is perpendicular to
a surface. The bottom surface of this domain is assumed to be
convective, which is the Dirichlet boundary condition specifying
temperature values on the bottom surface of the model [7][14], often
assumed to be the ambient. A multigrid approach was proposed in [7],
which uses finite-difference discretization and a geometric multigrid
solver. Several Green-function-based algorithms were proposed in
[12][14], which use boundary element method and leverage the
efficiency of Fast Fourier Transform (FFT). These Green-functionbased algorithms have a strong relation to the works of [3][4][8] on the
problems of substrate electrical analysis, which also involves Poisson's
equation on a 3D rectangular domain with layered materials.
One limitation of solving a rectangular domain, the die only, is the
simplistic assumption about bottom surface, where the temperatures
are far from uniform and vary greatly depending on packaging
structures underneath. A typical configuration uses a copper heat
spreader, a copper heat sink and thermal interface materials layers. It
is possible to approximate these thermal mounts with an effective heat
This work was supported in part by the NSF under award CCF-0634802.
transfer coefficient [14], but such an approximation may incur
substantial error ─ this was shown in [5] to be tens of degrees
differences and distorted shapes of the temperature profiles. Hence it
is important to simulate a die with its thermal mounts, which requires
solving Poisson's equation on a non-rectangular 3D domain [2][5].
Fig. 1 illustrates the pyramid-shaped model in [5]. This model will
be used in Section 4 and Section 5; however, our solver in Section 4 is
not limited to solving pyramid-shaped domains, and can handle other
non-rectangular models, e.g., those that include fins under the sink.
Figure 1. The pyramid thermal model from [5].
The difficulty of a non-rectangular domain is that an analytical
form is no longer available for the Green function. Therefore many
existing techniques are no longer applicable, and an accurate solution
of the finite difference matrix would require generic methods, for
example, the finite difference method with the popular Incomplete
Cholesky preconditioned Conjugate Gradient (ICCG) solver. An
imaging technique was proposed in [5] as an approximate alternative.
This paper studies eigendecomposition-based fast Poisson solvers
for chip-level thermal analysis, and throughout this paper we will use
FPS as a shorthand notation for these solvers. The idea of FPS has
been known for a long time [9][11], but it has found limited usage in
practical applications. In the literature, e.g. [9][10], FPS is often
presented in the context of a 2D-grid Laplacian matrix with uniform
diagonal entries, which corresponds to a finite-difference
discretization of a 2D rectangular homogeneous domain with Dirichlet
conditions on all four boundaries. It is rarely discussed with respect to
3D domains and/or more general boundary conditions [11].
Section 2 demonstrates a FPS solver for 3D rectangular thermal
models with mixed boundary conditions as in [7][14]. The
conventional wisdom has been that FPS solvers were an entirely
different class of solvers from other known solvers. Section 3 proves,
for the first time, a strong relation ─ in fact, an equivalence under
certain conditions ─ between FPS and Green-function-based methods.
Section 4 presents FPS-Preconditioned Conjugate Gradient method
that solves non-rectangular domains efficiently.
II.
3D FPS
In this section, we study the use of FPS to solve a 3D finite
difference matrix, Ax=b, of a rectangular domain with layered
materials and mixed boundary conditions. To do so, we expand the
applicability of FPS from the well-known 2D domains with Dirichlet
conditions [9]. The following is a relaxed sufficient condition for FPS.
⎛ B1,1
⎜
⎜ B 2,1
A=⎜
#
⎜
⎜B
⎝ K ,1
B1, 2
B 2, 2
#
B K ,2
" B1, K
" B2, K
%
#
" BK ,K
⎞
⎟
⎟
⎟
⎟
⎟
⎠
(1)
such that
Bi , j , ∀i, j are square matrices with the same dimension M ;
∀i ,
Bi , j , ∀i, j have the same eigenvectors.
Let a and b be the dimensions of the 3D domain in x and y
directions. We assume even-spaced discretization along the x-axis into
Dx segments, even-spaced discretization along the y-axis into Dy
segments, and arbitrary discretization along the z-axis into Dz
segments. The result is Dx·Dy·Dz rectangular cells. For 0≤m≤ Dx-1,
0≤n≤ Dy-1, 0≤l≤ Dz-1, we use the notion cell (m,n,l) to refer to the cell
formed by the mth segment of the x axis, the nth segment of the y axis,
and the lth segment of the z axis. Let Tm,n,l be the average temperature
of cell (m,n,l), and let Pm,n,l be the total power inside cell (m,n,l). We
assume a natural ordering in Ax=b, in other words:
(2)
x
= T −T
m+ n⋅Dx +l ⋅Dx ⋅Dy
m,n ,l
(3)
where Ta is the ambient temperature at the bottom of the heat sink. As
in the thermal models in [7][14], we have Neumann conditions on five
surfaces, and Dirichlet boundary condition on the z=0 surface. Fitting
matrix A to (1) with M=Dx, K=Dy·Dz, we have special forms for Bi,j:
Bi ,i = α i ⋅ I + β i ⋅ G
⎛ −1 − 1 0 " 0 0 0 ⎞
⎟
⎜
⎜ −1 0 −1 " 0 0 0 ⎟
⎜ 0 −1 0 % 0 0 0 ⎟
⎟
⎜
where α i and β i are scalars, G = ⎜ #
# % % % #
# ⎟
⎜ 0 0 0 % 0 −1 0 ⎟
⎟
⎜
⎜ 0 0 0 " − 1 0 − 1⎟
⎟
⎜
⎝ 0 0 0 " 0 − 1 − 1⎠
∀i ≠ j, Bi , j = γ i , j ⋅ I
(4)
where γ i , j is a scalar.
Note that the above matrices have the same eigenvectors, which
are eigenvectors of G; therefore condition (1) is satisfied. Also note
that the Bi,i formulation in the above equation is different from the well
known ones in [9][10]; this is due to the fact that we now have
Neumann conditions on the x=0 and x=a surfaces, as opposed to
Dirichlet conditions in [9][10]. It can be verified that the eigenvalues
and eigenvectors of G are:
∀1 ≤ i ≤ M ,
⎛1⎞
⎜ ⎟
1 ⎜1⎟
⋅⎜ ⎟
M #
⎜⎜ ⎟⎟
⎝1⎠
q1 =
⎛ (i − 1) ⋅ π ⎞
⎟
⎝ M ⎠
⎛
⎛ (i − 1) ⋅ π ⎞
cos⎜
⎜
⎟
⎝ 2M ⎠
⎜
⎜
⎛ 3 ⋅ (i − 1) ⋅ π ⎞
2 ⎜
cos⎜
⎟
∀i > 1, q i =
⋅
2M
⎠
⎝
M ⎜
#
⎜
⎜ cos⎛⎜ (2 M − 1) ⋅ (i − 1) ⋅ π
⎜
2M
⎝ ⎝
λi = −2 cos⎜
(5)
q2 " qM )
⎛ Λ1,1
⎜
⎜ Λ 2,1
⎜ #
⎜
⎜Λ
⎝ K ,1
0 ⎞ ⎛ Q 0 0 0 ⎞⎛ Q T
⎟ ⎜
⎟⎜
0 ⎟ ⎜ 0 Q 0 0 ⎟⎜ 0
⎟ A⎜ 0 0 % 0 ⎟⎜
0
0 % 0 ⎟
⎜
⎟⎜
0 0 Q T ⎟⎠ ⎜⎝ 0 0 0 Q ⎟⎠⎜⎝ 0
Λ1,2 " Λ1, K ⎞
⎟
Λ 2 , 2 " Λ 2, K ⎟ ~ ~
⋅x = b
#
%
# ⎟
⎟
Λ K , 2 " Λ K , K ⎟⎠
0
0
0
QT
0
QT
(6)
⎛ QT
⎜
⎜ 0
where ~
x =⎜
⎜ 0
⎜ 0
⎝
0
QT
0
0
⎛ QT
0 ⎞
⎜
⎟
0 ⎟
~ ⎜ 0
⎟x , b = ⎜
% 0 ⎟
⎜ 0
⎜ 0
0 Q T ⎟⎠
⎝
0
0
⎛ QT
0 ⎞
⎟
⎜
0 ⎟
⎜ 0
⎟x = ⎜
% 0 ⎟
⎜ 0
⎜ 0
0 Q T ⎟⎠
⎝
0
0
0
QT
0
0
0 ⎞
⎟
0 ⎟
⎟b
% 0 ⎟
T⎟
0 Q ⎠
0
0
(7)
0
0
0
QT
0
0
0 ⎞
⎟
0 ⎟
⎟b
% 0 ⎟
0 Q T ⎟⎠
0
0
where the Λi,j matrices take the following diagonal form:
(9)
The advantage of FPS lies in the fact that Q has a special structure
~
such that a fast way exists to evaluate b from b by (7), and x from ~x
by (9). In fact, for any vector v, the computation of QTv and Qv can be
performed by FFT and with O(M·logM) complexity.
Let us investigate the complexity of FPS on this thermal analysis.
~
The cost of computing b from b by equation (7), using FFT, is
O(K·M·logM) = O(Dx·Dy·Dz·logDx). The cost of computing the final
solution x by equation (9), using FFT, is also O(Dx·Dy·Dz·logDx). The
remaining question is the cost of the middle step, solving equation (7)
as Dx separate systems, each with dimension Dy·Dz.
Studying (4)(7)(8) in more details, it can be shown that each of
these Dy·Dz matrices has a 2D y-z grid structure and also satisfies
conditions (1)(4). Therefore, we can apply the FPS procedure on each
of them on the y direction (note the Neumann conditions on y=0 and
y=b surfaces). It again is a three step approach: the first and third steps
use the FFT, with complexity O(Dy·Dz·logDy), and the middle step
now solves Dy separate systems, each with dimension Dz. Each of
them is now a 1D structure in z direction, which is a tridiagonal matrix
and can be solved in linear time. Therefore, the cost of solving a
system with dimension Dy·Dz is O(Dy·Dz·logDy) + O(Dy·Dz) +
O(Dy·Dz·logDy), which is bounded by O(Dy·Dz·logDy).
Therefore, the overall cost of solving the original system Ax=b is
O(Dx·Dy·Dz·logDx) + Dx·O(Dy·Dz·logDy), which is bounded by
O(N·logN), where N=Dx·Dy·Dz is the overall dimension.
RELATION BETWEEN FPS AND GREEN-FUNCTION-BASED
METHODS
To solve Ax=b, let us do the following transformation.
⎛ QT
⎜
⎜ 0
⎜
⎜ 0
⎜ 0
⎝
(8)
"
%
⎛Q 0 0 ⎞
⎟
⎜
x = ⎜ 0 % 0 ⎟~
x
⎜ 0 0 Q⎟
⎠
⎝
III.
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎞⎟
⎟⎟
⎠⎠
Let matrix Q (which is orthonormal, i.e., QTQ = QQT = I) be:
Q = (q 1
∀i ≠ j ,
0 ⎞
⎟
0 ⎟
# ⎟
⎟
" λM ⎟⎠
"
Now that the new left-hand-side matrix in (7) has a block form
where each block is a diagonal submatrix, such an overall system can
be decoupled into M separate sets of linear equations, each of which
has dimension K and can be solved independently. Note that, if these
sets of equations also satisfy condition (1), each of them can again be
transformed and further decoupled into smaller sets of equations.
After ~x is obtained, solution x to the original system is simply:
a
bm+ n⋅Dx +l⋅Dx ⋅Dy = Pm,n,l
∀i,
⎛ λ1 0
⎜
⎜ 0 λ2
Λ i , i = Q Bi , iQ = α i ⋅ I + β i ⋅ ⎜
# #
⎜
⎜0 0
⎝
Λ i , j = Q T Bi , j Q = γ i , j ⋅ I
T
This section demonstrates a strong relation ─ in fact equivalence
under certain conditions ─ between the 3D FPS from Section 2 and
Green-function-based solvers [4][8][14]. For clarity, we only compare
with the full-chip thermal analysis method, Algorithm II, in [14]. The
relation to other variations of Green-function-based methods follows
similar derivations. For the rest of the paper, we will use GFS as a
short-hand notation for Green-function-based solvers.
For clarity and without loss of generality, we assume, as in [14],
that heat sources are located on discrete horizontal planes. Also, like
[14], we assume that each plane is divided into Dx×Dy rectangular grid
cells of equal size, and the power density inside each cell is uniform.
Therefore, the power density distribution function is in the following
piecewise constant form, nonzero only at a finite set of z’ values.
P ( x ′, y ′, z ′) =
⎛
Dx −1D y −1
a
∑ ∑ P (z ′) ⋅ Θ⎜⎜ x′ − (m + 0.5) D
m ,n
⎝
a
b
⎧
′
1
,
, y′ <
x
<
⎪
where Θ( x ′, y ′) = ⎨
2Dx
2D y
⎪⎩ 0,
otherwise
m=0 n =0
x
, y ′ − (n + 0.5)
b
Dy
⎞
⎟
⎟
⎠
(10)
where again a and b are the dimensions of the domain in x and y
directions. Note that the discreteness assumption about z axis is not
essential, and, on removing it, the derivation still stands, only with
constants Pm,n(z’) replaced by more complex forms.
A. FPS formulations
This section gives detailed FPS formulations based on the
principles of Section 2, to be compared with GFS ones in Section 3.2.
Let us use the same formation Ax=b as (2)(3), and the same x and
y discretization as in (10). Let the z coordinates of cell centers be z0, z1,
…, zDz-1. For clarity of presentation, we further assume that heat
source planes in (10) coincide these z coordinates. In other words:
Pm,n ( z ′) = 0,
P ( x′, y ′, z ′) = 0,
for z '∉ {z1 , z 2 ," z D −1 } (11)
And we have the following relation between (3) and (10).
z
bm+n⋅Dx +l⋅Dx ⋅Dy = Pm,n,l = Pm,n (zl ) ⋅
a⋅b
Dx ⋅ Dy
(12)
The first step of FPS is to apply the transformation of (7) with
M=Dx and K=Dy·Dz, and compute the transformed right hand side:
⎛ Q DT
0
0 ⎞
(13)
⎟
⎜
x
~
b=⎜ 0
⎜
⎝ 0
% 0 ⎟b
⎟
0 Q DT x ⎠
where Q denotes the eigenvector matrix from (5)(6) with M=Dx.
The new system (7) is then decoupled into Dx independent systems
of linear equations. The ith system is, for 0≤i≤Dx-1:
~
~
(14)
Ai ~xi = b i
~
where A
is a submatrix of the transformed left hand side in (7), ~xi is
i
~
~
a part of ~x , and b i is a part of b , such that
%
0
∀0 ≤ r ≤ D y ⋅ D z − 1
0
QDT y
~
⎟b i
⎟⎟
⎠
where Q DT denotes the matrix from (5)(6) with M=Dy. (14) becomes
y
⎛ QDT
⎜ y
⎜ 0
⎜⎜ 0
⎝
0 ⎞⎟ ⎛⎜ QDy
~
% 0 ⎟ Ai ⎜ 0
⎜
T ⎟
⎟
0 QDy ⎠ ⎜⎝ 0
0
~
0 ⎞⎟
~ ~
~
xi = b i
% 0 ⎟~
⎟
⎟
0 QDy ⎠
0
⎛ QDT
⎜ y
~
~
, where x i = ⎜ 0
⎜⎜ 0
⎝
0 ⎞⎟
xi
% 0 ⎟~
⎟
0 QDTy ⎟⎠
0
(17)
Vector ~xi can be solved from (17) in O(Dy·Dz) time because it can
be decoupled into Dy independent systems, each of which is a
~
tridiagonal matrix of dimension Dz. After obtaining ~xi , we compute
⎛ QD
(18)
0
0 ⎞⎟
⎜
~
xi = ⎜ 0
⎜⎜ 0
⎝
y
%
0
1
=χj ⋅
⋅
Dy
D y −1
=
~
0 ⎟~
xi
⎟
⎟
QD y
⎠
Dx
1.5
(
= χj ⋅
) (
D −1
1 y
jπ (n + 0.5) ~
⋅ ∑ cos
⋅ bi , Dy ⋅l + n
D y n =0
Dy
)
(
⎛ bDx ⋅n + Dx ⋅ D y ⋅l ⎞
⎜
⎟
⎜ b Dx ⋅n + Dx ⋅D y ⋅l +1 ⎟
⋅⎜
⎟
#
⎜
⎟
⎜ bD ⋅n+ D D ⋅l + D −1 ⎟
⎝ x x⋅ y x ⎠
)
T
(20)
D −1
⋅ Dy
1.5
Dx −1D y −1
⋅ ∑ ∑ Pm , n (z l ) ⋅ cos
m=0 n =0
iπ (m + 0.5)
jπ (n + 0.5)
⋅ cos
Dx
Dy
j=0
⎧1,
where χ j = ⎨
2
,
j>0
⎩
Similarly, we can expand the last two steps of FPS, and by
(2)(5)(15)(18)(19), the average temperature of cell (m,n,l) is:
T m , n , l = T a + x m + n⋅ D x + l ⋅ D x ⋅ D y
iπ (m + 0.5) ~
1 Dx −1
⋅ ∑ χ i ⋅ cos
⋅ x i + n⋅ D x + l ⋅ D x ⋅ D y
D x i =0
Dx
= Ta +
(21)
iπ (m + 0.5) ~
1 Dx −1
= Ta +
⋅ ∑ χ i ⋅ cos
⋅ x i,n +l⋅Dy
D x i =0
Dx
D −1
D x −1 y
~
iπ ( m + 0.5)
jπ (n + 0.5)
1
⋅ ∑ ∑ χi ⋅ χ j ⋅ ~
⋅ cos
x i , j + l ⋅D y ⋅ cos
Dx
Dy
D x D y i =0 j =0
where χj is as in (20).
B. GFS formulations
This section is largely based on Algorithm II in [14]. The GFS
approach evaluates the following temperature calculation formula:
a
b
z′ 0
0
T (x, y, z ) = Ta + ∑ ∫ dx ′∫ dy ′G (x, y, z , x′, y ′, z ′)P( x′, y ′, z ′)
(22)
where G(.) is the Green function. In the case of Neumann conditions
on 4 side surfaces (x=0, y=0, x=a, y=b), by [8][14], it is:
∞ ∞
iπx
jπy
iπx′
jπy ′ (23)
G ( x, y, z , x′, y ′, z ′) = ∑∑ Ci , j ( z, z ′) ⋅ cos
i =0 j =0
a
⋅ cos
b
⋅ cos
a
⋅ cos
b
and [8] provided ways to calculate function Ci,j(.) analytically.
Suppose we write (10) in the following frequency-domain form.
∞ ∞
iπx′
jπy ′
(24)
P ( x′, y ′, z ′) = ∑∑ pi , j (z′) ⋅ cos
i =0 j = 0
a
⋅ cos
b
By (10) and (24), we can derive
pi , j ( z ′) = a ⋅ b ⋅ μ i , j ⋅ ∑ ∑ Pm ,n ( z ′) ⋅ cos
m= 0 n =0
(19)
Finally, the temperature values at each cell are mapped from (2).
Overall, the FPS procedure can be summarized as follows.
~
1. Calculate vector b from Pm,n(z’) by (12)(13), using FFT.
~
~
2. Calculate vectors b~ i from b by (15)(16), 0≤i≤Dx-1, using FFT.
~
~
3. Calculate vectors xi by solving (17) as Dy independent linear
systems, each with dimension Dz, for 0≤i≤Dx-1.
~
4. Calculate vector ~x from ~xi , 0≤i≤Dx-1, by (15)(18), using FFT.
5. Calculate vector x from ~x by (19), using FFT.
To study the relation to GFS, let us expand the equations of the
first two steps. Symbol qM,i will be used to denote the ith eigenvector in
equation (5) with dimension M. By (5)(12)(13)(15)(16), we have:
for 0 ≤ i ≤ Dx − 1, 0 ≤ j ≤ D y − 1, 0 ≤ l ≤ Dz − 1
~
T
~
~
~
~
T
bi , j + Dy ⋅l = q Dy , j +1 ⋅ bi , Dy ⋅l bi , Dy ⋅l +1 " bi , D y ⋅l + Dy −1
jπ ( n + 0.5) ~
⋅ bi + Dx ⋅n + Dx ⋅ D y ⋅l
Dy
jπ ( n + 0.5)
⋅ q D x ,i +1
cos
∑
Dy
n =0
χi ⋅ χ j ⋅ a ⋅ b
equation (15). Then the solution to the original system is
0
0 ⎞
⎟
% 0 ⎟~
x
⎟
0 QDx ⎠
n =0
Dx −1D y −1
After having ~xi for 0≤i≤Dx-1, we can now recover ~x based on
⎛ QDx
⎜
x=⎜ 0
⎜
⎝ 0
∑ cos
D x −1 y
iπ (m + 0.5)
jπ ( n + 0.5)
1
⋅ ∑ ∑ Pm ,n ,l ⋅ cos
⋅ cos
D x D y m =0 n =0
Dx
Dy
= χi ⋅ χ j ⋅
= Ta +
(15)
Then we apply the transformation of (7) on (14) with M=Dy and
K=Dz, and compute another transformed right hand side:
⎛ Q DT
0
0 ⎞⎟
(16)
⎜
y
~
~
bi = ⎜ 0
⎜⎜
⎝ 0
D y −1
for 0 ≤ m ≤ D x − 1, 0 ≤ n ≤ D y − 1, 0 ≤ l ≤ D z − 1
T
Dx
~
~
~
xi , r = ~
xi + Dx ⋅r , bi ,r = bi + Dx ⋅r ,
1
⋅
Dy
=χj ⋅
where μ i , j
1
⎧
⎪ D D ,
x
y
⎪
⎪ 2 sin jπ ,
2Dy
⎪⎪ jπDx
=⎨
2
iπ
⎪
sin
,
2Dx
⎪ iπD y
⎪ 4
iπ
jπ
sin
⎪ 2 sin
2Dx
2D y
⎪⎩ ijπ
iπ (m + 0.5)
jπ (n + 0.5)
⋅ cos
Dx
Dy
(25)
i= j=0
i = 0, j ≠ 0
i ≠ 0, j = 0
, i ≠ 0, j ≠ 0
Substituting (23) and (24) into (22), we get an alternative to (22):
∞
∞
T ( x, y , z ) = Ta + a ⋅ b ⋅ ∑∑∑ ω i , j ⋅ pi , j ( z ′) ⋅ C i , j (z , z ′) ⋅ cos
z ′ i =0 j =0
∞
∞
= Ta + a ⋅ b ⋅ ∑∑ Z i , j (z ) ⋅ cos
i =0 j =0
where ωi , j
⎧ 1, i = j = 0
⎪0.5, i = 0, j ≠ 0
⎪
=⎨
⎪0.5, i ≠ 0, j = 0
⎪⎩0.25, i ≠ 0, j ≠ 0
,
iπx
jπy
⋅ cos
a
b
iπx
jπy
⋅ cos
a
b
(26)
Z i , j ( z ) = ∑ ω i , j ⋅ pi , j ( z ′) ⋅ C i , j ( z , z ′ )
z′
Practically, we need to truncate the infinite summations in (26).
By truncating the two infinite sums at D′x and D′y respectively:
D x′ −1 D′y −1
T ( x, y, z ) = Ta + a ⋅ b ⋅ ∑ ∑ Z i , j ( z ) ⋅ cos
i = 0 j =0
iπx
jπy
⋅ cos
a
b
(27)
In practice, as is done in [14], often we are interested in not the
direct outcome of (27), but, by dividing the x-y plane into evenly D ′x′
by D ′y′ rectangular cells, average temperatures in each cell. They are:
IV.
for 0 ≤ m ≤ D x′′ − 1, 0 ≤ n ≤ D ′y′ − 1
Tm , n ( z ) =
D ′x′D ′y′
ab
b ( n +1)
D ′y′
a (m +1)
D ′x′
∫
x=
am
D′x′
y=
bn
D ′y′
D x′ −1 D ′y −1
= T a + a ⋅ b ⋅ D x′′ ⋅ D ′y′ ⋅ ∑
∑θ
⋅ Z i , j ( z ) ⋅ cos
i, j
i =0 j =0
where θ i , j
(28)
∫ dyT (x, y, z )
dx
Green-function based approach with frequency-domain truncation is
equivalent to even-spaced finite-difference discretization. This is not
surprising, because it is well known in signal processing theory that
frequency-domain truncation is equivalent to time-domain (in this case
space-domain) sampling.
1
⎧
⎪ D ′′ D ′′ ,
x
y
⎪
⎪ 2 sin jπ ,
2 D ′y′
⎪⎪ jπD x′′
=⎨
2
iπ
⎪
sin
,
2 D x′′
⎪ iπD ′y′
⎪ 4
jπ
iπ
sin
⎪ 2 sin
2 D x′′
2 D ′y′
⎩⎪ ijπ
iπ ( m + 0.5)
jπ (n + 0.5)
⋅ cos
D x′′
D ′y′
i= j=0
i = 0, j ≠ 0
i ≠ 0, j = 0
, i ≠ 0, j ≠ 0
Overall, the GFS procedure can be summarized as follows.
1. Calculate function Ci,j(z,z’) for 0 ≤ i ≤ D x′ − 1, 0 ≤ j ≤ D ′y − 1 ,
analytically as in [8].
2. Calculate function pi,j(z’) by (25), for 0 ≤ i ≤ D′x − 1 , 0 ≤ j ≤ D′y − 1 ,
using FFT.
3. Calculate function Zi,j(z) by (26), for 0 ≤ i ≤ D′x − 1 , 0 ≤ j ≤ D′y − 1 .
4. Calculate function Tm,n(z) by (28), for 0 ≤ m ≤ Dx′′ − 1 , 0 ≤ n ≤ D′y′ − 1 ,
using FFT.
FPS-PCG FOR IRREGULAR GEOMETRIES
A limitation in both GFS and FPS is that they can only solve a
rectangular 3D domain with layered materials. For GFS, this comes
from the need for an analytical Green function like (23). For FPS, this
comes from condition (1). Recall that an accurate thermal model of a
die with its spreader and heat sink should be a non-rectangular
domain, and that significant errors can be introduced if the spreader
and sink are ignored or simplistically modeled as a convective surface.
To solve non-rectangular models, and in general thermal models
with material or geometry irregularities, we propose a thermal analysis
method called FPS-PCG: FPS-preconditioned Conjugate Gradient.
This method is built upon prior works: the “two-problem
approach” from [6][13], and the “boundary iteration process” from
[10]. The common idea is that, to solve Ax=b where A does not satisfy
the conditions of FPS/GFS, we can separate A into two parts:
A = A’ + B
(31)
where A’ can be handled efficiently with FPS/GFS, and B represents
the abnormities and is much sparser and/or smaller in value than A’.
The proposed FPS-PCG works by using FPS as a preconditioner
for Preconditioned Conjugate Gradient (PCG). Given any right hand
side vector v, FPS can provide in an exact solution to A’x=v in
O(N·logN) time, which would be an approximate solution to Ax=v.
Since any approximate solving process can serve as a preconditioner
[9], it is natural to put FPS and PCG together.
C. Relating FPS to GFS
Let us consider the special case of GFS where Dx = D′x = Dx′′
and D y = D′y = D ′y′ . It is true that GFS is more flexible than FPS in the
sense that it has three resolution controls: (Dx, Dy) of the heat source
distribution, (Dx′ , D′y ) of frequency domain truncation, and (Dx′′, D′y′ ) of
temperature output. They are not completely independent choices, for
example, it was shown in [14] that D′x should be a multiple of Dx and
D′y should be a multiple of Dy, to enable efficient implementation.
It is easy to see the relation between equation (20), which is steps
1 and 2 of FPS, and equation (25), which is step 2 of GFS.
Specifically, both equations represent the same computation and
produce values merely differing by a scaling factor:
for 0 ≤ i ≤ Dx − 1, 0 ≤ j ≤ Dy − 1, 0 ≤ l ≤ Dz − 1
(29)
pi , j ( zl ) ≡
μi , j ⋅ Dx1.5 ⋅ Dy1.5 ~
~
⋅ bi , j + D
χi ⋅ χ j
y
⋅l
Similarly, there is a correspondence between (21), which is steps 4
and 5 of FPS, and (28), which is step 4 of GFS. Both equations
represent the same computation and produce the same temperature
~
outcome (subject to numerical errors), based on input values ~xi and
Zi,j(z) respectively, which merely differ by a scaling factor:
for 0 ≤ i ≤ D x − 1, 0 ≤ j ≤ D y − 1, 0 ≤ l ≤ D z − 1
Tm , n ( z l ) ≈ Tm , n , l
Z i , j (z l ) ≈
χi ⋅ χ j
a ⋅ b ⋅ Dx
1.5
⋅ Dy
1.5
⋅ θ i, j
~
(30)
~
⋅~
x i, j +l⋅Dy
The above is not an exact equality, for ~xi and Zi,j(z) are obtained
~
differently: step 3 of FPS computes ~xi by solving Dx·Dy independent
linear systems, while steps 1 and 3 of GFS obtain Zi,j(z) by solving the
z-direction analytically. However, step 3 of FPS and steps 1 and 3 of
GFS do the same task, ─ they take the same input, as shown in (29),
and produce approximately the same output, as shown in (30).
In conclusion, FPS and GFS are mathematically equivalent for the
special case of Dx = Dx′ = Dx′′ and D y = D′y = D ′y′ . In other words,
Figure 2. Applying FPS-PCG on the pyramid thermal model. (a) Original
system A. (b) Approximate system A'. (c) Abnormity system B.
Fig. 2 illustrates FPS-PCG on the pyramid thermal model from
[5]. Note that FPS-PCG is not limited to solving pyramid-shaped
domains, and can handle other non-rectangular models, e.g., those that
include fins under the sink. The model in Fig. 2(a) is expanded to form
the rectangular approximation in Fig. 2(b). Like in any PCG scheme,
the better approximation FPS provides, the less PCG iterations are
needed to converge. It can be argued that FPS provides a good
approximation, ─ assuming that the spreader and sink design provides
reasonably good thermal performance, they should be such that heat
flows would remain relatively constant even if the spreader or the die
was widened, as in Fig. 2(b). The performance of FPS-PCG will be
verified in the next section.
Finally, as is done similarly in [14], it is possible to solve part of
the domain in finer levels of granularity. We obtain, from a FPS-PCG
solution, the temperature profile at the bottom of the die, i.e., the
interface between the top and middle sections of the pyramid. Then we
can solve the die alone and with this temperature profile as the
boundary temperatures. Since the die is a rectangular domain, this can
be done with one pass of FPS or GFS efficiently, and therefore we can
afford to solve with much finer resolution.
V.
RESULTS
For testcases, we use the same geometries of die, spreader and
sink, as well as material properties, as in [5]. Nine testcases are
generated by discretization of this structure with different resolutions.
Consequently, the nine finite difference matrices, m1-m9 in Table 1,
have dimensions ranging from 1.33e4 in the coarsest resolution to
5.33e6 in the finest. A heat map is used to model one operation
scenario of a four-core chip, with total power 175W.
TABLE I.
RUNTIME COMPARISON OF FPS-PCG AGAINST DIRECT
SOLVER AND ICCG. N IS THE DIMENSION; T IS THE RUNTIME; I IS THE NUMBER
OF CONJUGATE-GRADIENT ITERATIONS.
Matrix
m1
m2
m3
m4
m5
m6
m7
m8
m9
N
1.33e4
5.33e4
2.13e5
4.80e5
8.53e5
1.33e6
1.92e6
3.41e6
5.33e6
Direct Solver
T (sec)
0.70
5.54
46.24
164.44
388.98
N/A
N/A
N/A
N/A
ICCG
I
T (sec)
62
0.34
76
2.52
114 10.56
151 28.17
187 62.09
221 132.48
258 189.06
342 439.75
425 1007.73
FPS-PCG
I
T (sec)
7
0.47
8
1.57
9
5.65
10 13.31
10 22.37
10 34.84
11 53.13
11 103.34
11 171.11
Table 1 compares FPS-PCG against a direct solver and an ICCG
solver. The direct solver uses approximate minimum degree (AMD)
ordering [1]. The ICCG solver is based on incomplete Cholesky
factorization with zero fill-in [9], and with AMD ordering. All three
solvers are coded in Matlab, and all runtimes are measured on a Linux
workstation with 2.4GHz CPU frequency and 6GB memory. The
ICCG runtimes do not include preconditioning. For both ICCG and
FPS-PCG, the convergence criterion is 1e-6 error tolerance.
acute in the hottest regions. This verifies the need for thermal analysis
engine, such as FPS-PCG, for non-rectangular domains.
Finally, as discussed in Section 4, after solving a full model, we
have the temperature distribution at the bottom of the die, and can use
that as an ambient temperature distribution to solve the die again with
a rectangular model and with even finer resolution, which can be done
efficiently using GFS or FPS of Section 2. Fig. 6(a) plots a 1200 by
1200 temperature profile, which is computed by the FPS method in
17.89 seconds. As expected, given the relation between GFS and FPS,
this runtime is consistent with those reported in [14].
Comparing Fig. 4(a) and Fig. 5(a), Fig. 5(b) and Fig. 5(c), a
general agreement is observed, and the high-resolution solution
provides more details, which are useful at hot-spot regions.
Figure 5. (a) A 1200-by-1200 resolution temperature profile by solving a
die-only rectangular model with ambient distribution provided by solving a
full model. (b) The hot-spot region in (a). (c) The hot-spot region in Fig. 4(a).
REFERENCES
[1]
[2]
[3]
[4]
Figure 3. Runtime T as a function of matrix dimension N in Table 1.
As can be observed in Table 1, the direct solver runtime increases
at approximately O(N1.5) rate for these matrices, and in fact its
memory consumption becomes impractical for the last four testcases.
FPS-PCG scales better than ICCG as the matrix size increases, and its
advantage widens, eventually with approximately 6X speedup on m9.
A closer look at Table 1 reveals the reason for FPS-PCG’s
efficiency and scalability ─ it needs consistently less PCG iterations to
converge. The complexity per iteration, which can be measured by T/I
in Table 1, is O(N) for ICCG and O(NlogN) for FPS-PCG. However,
as dimension increases from m1 to m9, the number of iterations, I in
Table 1, increases from 62 to 425 for ICCG, while increase only from
7 to 11 for FPS-PCG. This verifies the claim in Section 4 that the FPS
process is a good approximation to the original linear system, and
therefore serves as an effective and scalable preconditioner for PCG.
[5]
[6]
[7]
[8]
[9]
[10]
[11]
Figure 4. (a) Chip temperature profile by non-rectangular model. (b) Chip
temperature profile by rectangular model. (c) Temperature curves across chip
diagonal by both models. Ta is ambient temperature at bottom of heat sink.
Fig. 4 demonstrates the difference between rectangular and nonrectangular models. For fairness, convection coefficient at the bottom
of the rectangular model is adjusted such that Fig. 4(a) and 4(b) have
the same average temperature. Note that there is significant difference
between the shape of distribution between Fig. 4(a) and 4(b), and
between the two curves in Fig. 4(c), and this difference is especially
[12]
[13]
[14]
P. Amestoy, T. A. Davis, and I. Duff, “An approximate minimum degree
ordering algorithm,” SIAM J. Matrix Anal. Appl., pp. 886-905, 1996.
P. E. Bagnoli, C. Casarosa, and F. Stefani, ”DJOSER: Analytical
thermal simulator for multilayer electronic structures. Theory and
numerical implementation,” Proc. Int’l Conf. Thermal Issues in
Emerging Technologies, Theory and Applications, pp. 111-118, 2007.
J. Costa, M. Chou, and L. M. Silveira, “Efficient techniques for accurate
modeling and simulation of substrate coupling in mixed-signal IC's,”
IEEE Trans. Computer-Aided Design, vol. 18, no. 5, pp. 597-607, 1999.
R. Gharpurey and R. G. Meyer, “Modeling and analysis of substrate
coupling in integrated circuits,” IEEE Journal of Solid-State Circuits,
vol. 31, no. 3, pp. 344-353, 1996.
V. M. Heriz, J. Park, T. Kemper, S. Kang, and A. Shakouri, “Method of
images for the fast calculation of temperature distributions in packaged
VLSI chips,” Proc. 13th Int’l Workshop on Thermal Investigation of ICs
and Systems, pp. 18-25, 2007.
T. A. Johnson, R. W. Knepper, V. Marcello, and W. Wang, “Chip
substrate resistance modeling technique for integrated circuit design,”
IEEE Trans. Computer-Aided Design, vol. 3, pp. 126-134, 1984.
P. Li, L. T. Pileggi, M. Asheghi, and R. Chandra, “Efficient full-chip
thermal modeling and analysis,” Proc. ICCAD, pp. 319-326, 2004.
A. M. Niknejad, R. Gharpurey, and R. G. Meyer, “Numerically stable
Green function for modeling and analysis of substrate coupling in
integrated circuits,” IEEE Trans. Computer-Aided Design, vol. 17, no. 4,
pp. 305-315, 1998.
Y. Saad. Iterative Methods for Sparse Linear Systems, SIAM,
Philadelphia, PA, 2003.
J. Shi, Y. Cai, W. Hou, L. Ma, S. X.-D. Tan, P. Ho, and X. Wang, “GPU
friendly fast Poisson solver for structured power grid network analysis,”
Proc. ACM/IEEE Design Automation Conference, pp. 178-183, 2009.
P. N. Swarztrauber, “The methods of cyclic reduction, Fourier analysis
and the FACR algorithm for the discrete solution of Poisson's equation
on a rectangle,” SIAM Review, vol. 19, no. 3, pp. 490-501, 1977.
B. Wang and P. Mazumder, “Accelerated chip-level thermal analysis
using multilayer Green’s function,” IEEE Trans. Computer-Aided
Design, vol. 26, no. 2, pp. 325-344, 2007.
C. Xu, R. Gharpurey, T. S. Fiez, and K. Mayaram, “A Green functionbased parasitic extraction method for inhomogeneous substrate layers,”
Proc. ACM/IEEE Design Automation Conference, pp. 141-146, 2005.
Y. Zhan and S. S. Sapatnekar, “High efficiency Green function-based
thermal simulation algorithms,” IEEE Trans. Computer-Aided Design,
vol. 26, no. 9, pp. 1661-1675, 2007.