Circuit tuning for high-performance custom digital design.

Chandu Visweswariah
IBM Thomas J. Watson Research Center
Yorktown Heights, NY 10598
chandu@watson.ibm.com
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:1
Dynamic Tuning for High–Performance
Custom Digital Design
Chandu Visweswariah
Acknowledgements
Dynamic Tuning for High–Performance Custom Digital Design
Andrew R. Conn, Applied Mathematics, IBM Watson
Ruud A. Haring, Integrated System Design, IBM Watson
Chandu Visweswariah
Of 77:2
1.
Outline
Introduction and motivation
2. Circuit simulation
3. Time–domain gradient computation
4. Nonlinear optimization
5. Case study
– JiffyTune: algorithms and implementation
– human interface
– application to PowerPC microprocessor design
– methodology impact and implications
– historical perspective
Dynamic Tuning for High–Performance Custom Digital Design
6. Conclusions, future work and pointers to the reading list
Chandu Visweswariah
Of 77:3
Introduction and Motivation
Update
design
custom design poses challenges in performance (delay and
slew), area, power, noise, etc.,.
competitive environment drives aggressive circuit tuning
formerly a manual, tedious, error–prone procedure
Simulate
Good
enough?
Dynamic Tuning for High–Performance Custom Digital Design
Design re–use not easy!
Chandu Visweswariah
Of 77:4
Dynamic Tuning
dynamic tuning implies automatic tuning based on dynamic or
‘‘pattern–driven” simulation
hence dynamic tuning requires ‘‘input patterns” or ‘‘vectors”
Static
Tuning
25%
Dynamic
Tuning
Piecewise
approximate
5%
Dynamic Tuning for High–Performance Custom Digital Design
SPICE–
based
1%
Accuracy
Sensitivity
computation
tuning only as good as patterns capture switching situations
Chandu Visweswariah
10
30
1K
10K
Size
Of 77:5
Dynamic vs. Static Tuning
Advantages of static tuning
can handle larger problems (or problems of the same size
faster)
does not require sensitization patterns
tunes only the ‘‘worst–case” or ‘‘most pessimistic” condition
more applicable to control–flow than data–flow circuits
requires either pre–characterized cells or simulation on the fly
Advantages of dynamic tuning
more accurate than static tuning
does not suffer from the false–path problem
more applicable to data–flow than control–flow circuits
has no problem with custom circuits
Dynamic Tuning for High–Performance Custom Digital Design
The two types of tuning complement each other at
different stages of the design process and on different
kinds of circuits.
Chandu Visweswariah
Of 77:6
Statement of the Dynamic Tuning Problem
Given
a working circuit schematic with initial transistor sizes
input signals or patterns or stimuli
a precise statement of circuit performance requirements
Determine
optimal transistor size assignments
While supporting...
flexible objective functions and constraints
simple bounds
ratio–ing of transistors and grouping of similar structures
i
delay, rise/fall time, area and power measurements
minimax optimization: min [ max { f i(x) } ]
x
easy–to–use graphical user interface in schematic environment
Dynamic Tuning for High–Performance Custom Digital Design
Must be able to tune a circuit with a few 100s of tunable
transistors in a few minutes of CPU time.
Chandu Visweswariah
Of 77:7
Sample Dynamic Tuning Problem
Tri–state driver
D_in
En
D_in
fast path D_in → D_out when enabled (minimize delay)
Dynamic Tuning for High–Performance Custom Digital Design
– small margin if D_in switches, larger margin if En switches
break–before–make on P_drv and N_drv (constraints)
Chandu Visweswariah
D_out
Of 77:8
Patterns
Device
models
Structure of a Dynamic Tuner
Circuit schematic
Netlist
Optimization
problem
description
Simulator and gradient calculator
Function and gradient values
Dynamic Tuning for High–Performance Custom Digital Design
Y Done? N
Nonlinear optimizer
Back–annotate
Chandu Visweswariah
Clobber transistor
widths with new
widths from non–
linear optimizer
Of 77:9
Circuit simulation
Mini–outline
1. Introduction and motivation
2.
– SPICE–like simulation
– piecewise–approximate simulation
3. Time–domain gradient computation
4. Nonlinear optimization
5. Case study
– JiffyTune: algorithms and implementation
– human interface
– application to PowerPC microprocessor design
– methodology impact and implications
– historical perspective
Dynamic Tuning for High–Performance Custom Digital Design
6. Conclusions, future work and pointers to the reading list
Chandu Visweswariah
Of 77:10
Simulation
incremental in time
analytic device models (gradients and continuity required)
1% numerical accuracy
limited to about 10K transistors
(n1.5) in size complexity, (n) in simulation interval
SPICE–like simulators
Piecewise approximate simulators
Of 77:11
examples: SPECS, ACES, PowerMill
modeling approximations, typically piecewise approximate i–v
characteristics, constant (grounded) capacitances
event–driven
special step a priori to create device models from SPICE models
5% accuracy, 2 orders of magnitude speedup over SPICE
100K transistors easily handled
(n) in size complexity, (n) in simulation interval
Dynamic Tuning for High–Performance Custom Digital Design
Tuning accuracy is only as good as underlying
simulator and device models!
Chandu Visweswariah
SPICE–like Simulation
Equation formulation
MNA, STA, TLA
Integration for all time
TR, BE, BD2, Gear
Linearization until convergence
Newton iterations
Device model evaluation
and solution
Sparse LU factorization
device model evaluation time is significant
Dynamic Tuning for High–Performance Custom Digital Design
Algebraic, non–
linear ODEs
Algebraic, non–
linear equations
Of 77:12
Algebraic, linear
equations
Both device modeling and analysis must be attacked
to get reasonable speedups!
Chandu Visweswariah
v
PIECEWISE
APPROXIMATE
SPECS, ACES,
PowerMill
i
Run time
CIRCUIT
SPICE
i = f(v)
Piecewise Approximate Circuit Simulation
SWITCH
COSMOS, PARCH
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:13
discretization in i–v characteristics allows event–driven
simulation and variable accuracy
event–driven simulation leads to latency/multi–rate exploitation
creative integration methods employed to avoid matrix equations
Chandu Visweswariah
Accuracy
Mini–outline
1. Introduction and motivation
Time–domain gradient computation
2. Circuit simulation
3.
– introduction to gradient computation
– direct and adjoint methods
– extension to nonlinear circuits and time–domain
4. Nonlinear optimization
5. Case study
– JiffyTune: algorithms and implementation
– human interface
– application to PowerPC microprocessor design
– methodology impact and implications
– historical perspective
Dynamic Tuning for High–Performance Custom Digital Design
6. Conclusions, future work and pointers to the reading list
Chandu Visweswariah
Of 77:14
signal
Time–domain Gradient Computation
delay
width
sensitivity lim width0
Dynamic Tuning for High–Performance Custom Digital Design
delay
width
Of 77:15
time
delay
we would like to determine sensitivities not by finite differences
or by perturbation, but by incremental methods; perturbation
methods are inefficient and numerically ill–behaved
performance measures like delay, slew, power, noise, area are
sensitivity functions
tunable parameters like transistor widths, interconnect sizes are
sensitivity parameters
in the direct method, the sensitivity of any number of functions
with respect to one parameter can be computed efficiently
in the adjoint method, the sensitivity of one function with respect
to any number of parameters can be computed efficiently
both methods are typically implemented as part of circuit simulator
Chandu Visweswariah
t 1 C i W eff
t I ds W eff
2
C i W eff PW
I ds W eff PW
i
t 2 C i W eff
C i W eff PW
Incremental Sensitivity and Chain Ruling
i
t 1
t 1 C i area W eff
C i
perimeter W eff
C i area W eff PW
C i perimeter
PW
W eff
delay t 1 t 2
t 1
t
delay
2
PW
PW PW
t I ds W eff
1
I ds W eff PW
Of 77:16
number of ‘‘electrical” gradient computations can be very large
example: problem with 10 delays, 10 slews and 1 power
function, with 100 tunable transistors
number of gradient evaluations (conservatively) = 40 iterations x
41 functions x 100 tunable transistors x 6 electrical parameters
per transistor 1 Million!
dependent (ratio–ed) transistor widths can be accommodated
similarly by chain ruling
Dynamic Tuning for High–Performance Custom Digital Design
Incremental sensitivity computation has to be
extremely efficient!
Chandu Visweswariah
Chandu Visweswariah
v
–
.
C
^
i I
p
v^ V
p
+–
v^
+
+–
^
R
i
i R
p
^
v^ i R Ri
p
+
^
i
C v
^
^ p
.
i C dv C v
p
dt
–
–
Sensitivity Analysis by the Direct Method
+
i
R
v iR
v
R
i
i
R
p
p
p
vV
+–
V
v
p
p
iI
i
I
p
p
+ i v
–
C
i C dv
dt
dv
dv C
i
C
(
)
p dt
p
dt p
Key: directly differentiate BCRs
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:17
V5
+
–
Direct Method: Procedure
R1
+–
i
R1
R2
replace elements by directly differentiating BCRs to get
sensitivity circuit
reuse LU factors of original solution
R3
R4
v R
4
,
so
p
R
1
any number of functions, only one parameter
Example:
R1
R2
problem is to find
R 1
Dynamic Tuning for High–Performance Custom Digital Design
4
solve for v^ R ; that is the required answer!
R3
replace V 5 by a short; add a voltage source in series with R 1;
Chandu Visweswariah
+
v^ R
–
4
R4
Of 77:18
^
^
i i (v, p)
di
i dv i
v dp p
dp
^
or i G v^ I x
G eq
Linear model : i G eq v I eq
v
Direct Method: Extension to Nonlinear Circuits
i
i solution
I eq
v solution
in the sensitivity circuit,
^
^
i G solution v^ I x
Dynamic Tuning for High–Performance Custom Digital Design
extra source
LU factors can be re–used
think of it as the sensitivity of the linearized equivalent circuit at
the DC solution
Chandu Visweswariah
Of 77:19
sensitivity circuit
time
nominal circuit
Direct Method: Extension to Time–domain
series of linear(ized) equivalents
Problems
LU factors can be re–used only if the Jacobians are identical
Jacobians are identical provided time points are identical
time points selected, however, only with consideration to
solution of nominal circuit, hence may not be suitable to the
sensitivity circuit
must either use conservative time–steps or interpolate
Jacobians
event–driven simulators may have an advantage if event times
of original and sensitivity circuit are inherently identical
^
Dynamic Tuning for High–Performance Custom Digital Design
can find ‘‘full” transient sensitivity, like v^(t) or i(t)
Chandu Visweswariah
Of 77:20
Sensitivity Analysis by the Adjoint Method
Matrix formulation of adjoint method
Ax b
A x A x b
p
p
p
f
f
Given a scalar function f (x),
[ ] T[x]
p
x p
f
Postulate A T [ ]
x
f
TA[x] T[b – A x]
p
p
p
p
Then
re–use LU factors of original solution; LU factors of A T are
U T and L T
any number of parameters, only one function
.
^.
Of 77:21
the adjoint of a dynamical system x Ax u is x A Tx^ u^
Dynamic Tuning for High–Performance Custom Digital Design
electrical formulation based on Tellegen’s theorem
Chandu Visweswariah
Tellegen’s Theorem
branches
v bi v n ifrom– v nito i
current of branches incident at each node 0
1, 0
Ai b 0
A topological incidence matrix v b A Tv n
i
^
^
i
i
^
^
same
topology
^
^
^
^
^
Dynamic Tuning for High–Performance Custom Digital Design
i
vb ib vbT ib [ATvn]T ib vnT A ib 0
i
ib vb ibT vb ibT AT vn [A ib]T vn 0
i
Tellegen’s theorem
Chandu Visweswariah
KCL
KVL
Of 77:22
instantaneous power of branch 0 Conservation of energy
v b Ti b 0
nodes
Adjoint Sensitivity via Tellegen’s Theorem
i
^
^
bi
i
i
i
^
i
^
v^b,i b
i
^
Key relation!
i
ib vb
0
i
(ib ib ) vb 0
i
i b i b v^ b ) 0
i
i
Application of Tellegen’s theorem to a perturbed circuit
v b, i b
^
ib
i
v ) i bi
v b v b, i b i b
i
bi
vb
i
(v
i
(vb
i
Motivation
^
Vs
^
others
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:23
and then into
f k p k
then we can pick up (in the limit) all the required sensitivities fp k
I
vI iI iV vV typical term
if we can manipulate the above equation into
Chandu Visweswariah
Adjoint Sensitivity Computation
i I I
^
^
v R i R R R i R R i R
Typical term (v I i I I v^ I)
i I I;
Consider a circuit with only resistors and current sources:
for any current source:
^
for any resistor: v R i R R;
Is
^
Is
^
Rs
^
.
f
v I.
v I
vI iI I vI iR iR R
By choosing v^ R R i R, we get typical term i R i R R.
total circuit:
^
f
v I
and
^
f
iR iR
R
given a scalar performance function f (v I), f choose
iI
f
v^ I
I
pick off the required sensitivities:
Thus
Dynamic Tuning for High–Performance Custom Digital Design
any number of parameters can be accommodated at once!
Chandu Visweswariah
Of 77:24
I1
R1
R3
R4
^
+
v^ I1
–
^
R1
R2
Adjoint Method: An Example
R2
^
^
choose I 2 1
^
v
R4
0,
R 2
v
^
R4
i R3i R3,
R 3
R3
R4
v
^
R4
i R4i R4
R 4
v
v
R4
R4
0 and
0
R 3
R 4
v R4
^
v
I1
I 1
v
^
R4
i R2i R2,
R 2
i R1i R1R 1 i R2i R2R 2 i R3i R3R 3 i R4i R4R 4 I 1v^ I1 v I2 v R4
v
^
R4
i R1i R1,
R 1
v
note that R4 0,
R 1
can re–use LU factors of nominal solution
Dynamic Tuning for High–Performance Custom Digital Design
All parameters handled at once!
Chandu Visweswariah
^
I2
Of 77:25
+
v2
v^ 2
i2 i1
Adjoint Sensitivity: Controlled Sources
i1
–
^
^
v 1 v 2 i 2 i 2
^
Current–controlled current source
+
v1 0
–
^
^
v 1 i 1 i 1
v 1 i 1 i 1 v^ 1 v 2 i 2 v^ 2 ( i 1 i 1)
Typical term ^
^
i1
+
v^ 2
–
i2 0
^
Choose i 2 0 (current source)
KEY TERM!
v^ 1 – v^ 2 (voltage–controlled voltage source)
v^ 1 v^ 2
+
–
final typical term is v^ 2 i 1 Dynamic Tuning for High–Performance Custom Digital Design
control is reversed!
Chandu Visweswariah
Of 77:26
^
Vs
^
Rs
^
Cs
+
dv
dv
^
C
C
v
C
v^
dt C
dt C KEY TERM!
iC
vC
vI iI iV vV iR iR R ??
Adjoint Sensitivity: Extension to Time–domain
Recall
I
^
Typical term for Cs v C i C C –
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:27
these have go although a
v C(t 0) term would be handy
tf
tf
tf
tf
^
^.
.
t
C
Typical term dt v i dt – C v^ v f C v v dt– v^ v dt C
C
C
C
C
C
C
C
t0
t0
t0
t0
t0
tf
tf
^
^.
.
C
(i C v ) dt v C v^ (t ) v (t ) C v^ (t ) v (t ) v^ v dt
C
C
C
C
C
C
0
C
0
C
C
f
f
t0
t0
Chandu Visweswariah
^.
Of 77:28
Adjoint Sensitivity: Extension to Time–domain, Cont’d.
^
to get rid of the unwanted terms, i C Cv C and v^ C(t f) 0
Solution: run time backwards!!
k t, usually t 0 t f t
make use of the fact that Tellegen’s theorem holds for any
choice of time point in the two circuits
dv^
dv^ (t)
^
^.
then i C() C C C C Cv C(t) and choose
dt
d
v^ C(t f) v^ C( t 0) as an initial condition with value 0
Dynamic Tuning for High–Performance Custom Digital Design
tf
.
C
so finally, typical term C v^ (t ) v (t ) v^ v dt
C
0
C
0
C
C
t0
required sensitivity becomes a convolution integral; in fact, all
sensitivities become convolutions
Chandu Visweswariah
tf
t0
(i V v^ V) dt ^
k p k
Adjoint Sensitivity: Function Tricks
^
(–v I i I) dt Vs
the left hand side of the general relation
tf
Is
t0
^
must be chosen to be f by good choices of i I and v^ V
V
k pk
t start
1
dt t end
sampling of v^ I (t t 1): choose i I (t t 1) so that
t end
i
v I| tt1 k p k
power: choose
v^ V
0
so that
t start
Dynamic Tuning for High–Performance Custom Digital Design
remember to map ts to s
Elmore delay is easily expressed as an integral, too
Chandu Visweswariah
Of 77:29
cross
Adjoint Sensitivity: Crossing Delays
Crossing delay sensitivity
v cross v N tt
v^ N| ttcross
dt cross
dp
.
v N| tt
cross
v^ N dt cross
.
–
vN dp
tt cross
t cross
Dynamic Tuning for High–Performance Custom Digital Design
t
v
v dv
dt
cross
cross
0 d v (t, p) tt N N
cross
t
p
dp N
dp
dp
tt cross
tt cross
vN
v cross
Chandu Visweswariah
Of 77:30
Sensitivity Computation: Wrap–up
Direct method
directly differentiate BCRs
any number of functions, one parameter
can re–use LU factors (modulo time–point mismatch)
original sources removed; a new source is introduced in
conjunction with the single sensitivity parameter
‘‘full” transient sensitivity, e.g., v N(t)W eff can be computed
Adjoint method
Dynamic Tuning for High–Performance Custom Digital Design
based on the ‘‘adjoint” or transposed set of circuit equations;
most intuitively approached via Tellegen’s theorem
one function, any number of parameters
control reversed; time run backwards
can re–use LU factors (modulo time–point mismatch)
sources removed; a new source is introduced and manipulated
to express the single sensitivity function as a convolution
sensitivity results are convolutions, hence an overhead
‘‘full” transient sensitivity harder to compute
Chandu Visweswariah
Of 77:31
Mini–outline
1. Introduction and motivation
2. Circuit simulation
3. Time–domain gradient computation
Dynamic Tuning for High–Performance Custom Digital Design
4. Nonlinear optimization
– terminology
– basics of linear programming
– basics of nonlinear programming
– LANCELOT and MINOS
– minimax optimization
– key messages
5. Case study
– JiffyTune: algorithms and implementation
– human interface
– application to PowerPC microprocessor design
– methodology impact and implications
– historical perspective
6. Conclusions, future work and pointers to the reading list
Chandu Visweswariah
Of 77:32
Terminology
i 1, 2, , n
i l 1, l 2, , l m
i 1, 2,..., l
min f (x)
s.t. c i(x) 0
c i(x) 0
li xi u i
objective function (maximize/minimize)
equality constraints
inequality constraints ( and )
c i(x) 0
c i(x) s i 0
s.t. s i 0
c i(x) s i 0
s.t. s i 0
– often handled by introducing a slack variable s i
c i(x) 0
– note that s i is a new variable of the problem
simple bounds (either l i or u i can be unbounded)
Dynamic Tuning for High–Performance Custom Digital Design
constrained vs. unconstrained optimization
Chandu Visweswariah
Of 77:33
Terminology, Cont’d.
i 1, 2, , n
i l 1, l 2,..., l m
min f (x)
s.t. c i(x) 0
i 1, 2,..., l
c i(x) 0
li xi u i
solution vectors x 1, x 2, , x k, , x *
feasibility (feasible points, feasible regions)
activities are constraints that are ‘‘tight”
if f (x) or any of c i(x) is a nonlinear function of x, we have a
nonlinear problem
linear programming: 1M+ variables is large but tractable
nonlinear programming: 1K+ variables is large but tractable
Key message: use nonlinear programming to solve
nonlinear problems (all circuit tuning problems are
nonlinear)!
Dynamic Tuning for High–Performance Custom Digital Design
Consider good solutions to bad models vs. less good
solutions to better (possibly nonlinear) models!
Chandu Visweswariah
Of 77:34
Classical Theory
Unconstrained optima
to minimize f (x), f 2
f
– Jacobian = x f 0 is a necessary condition
x i*
2f
2 f
– Hessian = xx
0 is a sufficient condition
x i* x j*
all eigenvalues of the Hessian must be non–negative
think of the second condition as ensuring ‘‘positive curvature”
the second condition avoids saddle points and maxima
Constrained optima: Lagrangian formulation
i1
ici(x)
l
the minimizer of f (x) subject to equality constrains
0, i 1, 2,..., l is a stationary point of the Lagrangian
c i(x)
(x, ) f (x) c i(x
l
ixci(x*) 0
i1
Dynamic Tuning for High–Performance Custom Digital Design
– dual feasibility: x f (x *) applying the above criteria, we get
*) 0 i
– primal feasibility:
Chandu Visweswariah
Of 77:35
Kuhn–Tucker Conditions for Equality Constraints
min f (x)
s.t. c(x) 0, f, c 2
consider the set of all feasible directions, i.e., d such that
c(x d) – c(x) d Tc 0
optimality implies that there does not exist even a single
feasible descent direction, i.e.,
d Tf 0
we know that if d Tc 0 then d is in the null space of c
d Z t for any t is therefore the set of all feasible directions,
where Z is a basis for the null space of c (or the columns of Z are
a basis for that null space or the columns of Z span the null space)
hence t TZ Tf 0 t or Z Tf 0 is the first–order condition
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:36
just as f (x d) f (x) d f(x) 12 d 2 f(x) is a scalar
quadratic model, f (x d) f (x) d Tf 12 d T 2f d is a
quadratic model for f (x)
using a quadratic model for f (x) and c(x) and restricting ourselves
to feasible d, we derive the second–order condition Z T 2f Z 0
Chandu Visweswariah
Of 77:37
Kuhn–Tucker Conditions for Inequality Constraints
min f (x)
s.t. c(x) 0
optimality now implies that there is no further descent direction
d for which
c(x d) – c(x) d T c 0
d Tf 0
f is a (negative) linear combination of the gradient of the
(active) constraints
the Lagrange multipliers corresponding to inactive constraints
are zero
first–order condition:
Z Tf 0
where Z is a basis for the null space of the Jacobian of the
active constraints (which have positive Lagrange multipliers)
Dynamic Tuning for High–Performance Custom Digital Design
Z T 2f Z 0
second order condition:
Chandu Visweswariah
2
3
Whirlwind Tour of Linear Programming
Active set methods (simplex)
4
min f (x)
s.t. c i(x) 0
descent
direction
1
Dynamic Tuning for High–Performance Custom Digital Design
there is always a solution at a point where at least n constraints
intersect in n–space (2 in the figure)
strategy: move from one vertex to an adjacent vertex while
seeking a decrease in the objective function
theoretical worst–case complexity is exponential, but works well
in practice
Chandu Visweswariah
Of 77:38
Whirlwind Tour of Linear Programming, Cont’d.
Penalty function methods: interior point methods
log(xi) for the simple bounds
start feasible
always stay feasible
use ‘‘barrier functions” in merit function to stay within simple bounds
1
example: (x) f (x) i
x i 0 ( is a penalty parameter)
polynomial complexity
less likely to suffer from degeneracy problems
Penalty function methods: exterior point methods
feasible is guaranteed only at the solution
Aside for nonlinear programming
log ci(x) for ci(x) 0
Of 77:39
can use ‘‘barrier” function for inequalities in interior method merit
1
functions such as (x) f (x) i
Dynamic Tuning for High–Performance Custom Digital Design
i
can use penalty term for equalities in exterior method merit
1
function such as (x) f (x) c i2(x)
Chandu Visweswariah
Nonlinear Programming
Derivative (D) vs. Derivative Free (DF) Methods
never use DF if D is possible
– DF popular since easy to understand
– DF popular since easy to implement
most implementations of DF:
– useful only if # variables < 20
– not robust
– not efficient (especially for circuit applications)
– no convergence theory
– however, sometimes the only usable method
recent developments
– DF algorithms with convergence theory and better
robustness have been developed...
– ... but they are applicable only to small problems
– ... and are still inefficient (but parallelizable)
Dynamic Tuning for High–Performance Custom Digital Design
Key message: never use DF unless you have no choice!
Chandu Visweswariah
Of 77:40
Nonlinear Programming
Some general comments
Key message: tremendous progress in nonlinear optimization in the last 15 years!
global optimality is almost impossible to achieve or detect on
large–scale problems
the best we can hope for is
– problem is convex (in which case we can achieve
and detect a global optimum)
– we converge to a local minimum
– improvement is obtained!
exploit structure where possible
– example: sparsity or group partial separability
line–search vs. trust–region approaches
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:41
Key message: requires sophisticated user or optimization
specialist to best exploit nonlinear optimization... using
the software as a black box is not ideal!
Chandu Visweswariah
Unconstrained Nonlinear Optimization
Steepest descent (SD)
‘‘low storage” solution, simple and efficient per iteration
unfortunately, slow linear– or even non–convergence
used early in the optimization to capture global behavior
Safeguarded Newton’s method and variations (NM)
(safeguarded) step in direction that minimizes quadratic model
second–order convergence
more expensive
used late in the optimization for fast asymptotic convergence
Alternatives
modified Newton’s method to guarantee positive definiteness of
Hessian of quadratic model and hence descent
Of 77:42
quasi–Newton (low rank updates to the Hessian) (converts SD to NM)
coordinate descent (BAD) and its generalization, conjugate
directions (GOOD)
Dynamic Tuning for High–Performance Custom Digital Design
Levenberg–Marquardt
Chandu Visweswariah
Unconstrained Nonlinear Optimization
Conjugate
gradients
Steepest
descent Newton’s
method
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:43
Method of choice for large problems: conjugate gradient
(CG) iterations with appropriate choice of pre–conditioner
Chandu Visweswariah
MINOS
Stanford, ca. 1980
min F(x) c Tx d Ty
Dynamic Tuning for High–Performance Custom Digital Design
augmentation
objective function Lagrangian of
nonlinear part
of constraints
~
s.t. f(x) A 1y b 1
linearized constraints
A 2x A 3y b 2
linear constraints
lx x ux
simple bounds
ly y uy
where
~
where f(x) f (x) J k(x–x k)
(linearization)
nonlinear linear
s.t. f (x) A 1y b 1
nonlinear constraints
A 2x A 3y b 2
linear constraints
lx x ux
simple bounds
ly y uy
is remapped to
~
~
~
min F(x) c Tx d Ty kT[f (x) – f(x)] 1 [f (x) – f(x)] T[f (x) – f(x)]
2
Chandu Visweswariah
Of 77:44
MINOS, Cont’d.
Dynamic Tuning for High–Performance Custom Digital Design
large–scale, general–purpose, nonlinear optimization package
intuition:
– separate linear and nonlinear parts
– use active set methods for the linear part (LP–like)
– use augmented Lagrangian for the nonlinear part
– minimize quadratic model of the merit function
subject to linear model of constraints (QP)
– line–search methods for minimizing the quadratic
model
in practice, very effective, especially if
– most constraints are linear or near linear
– real number of degrees of freedom at the solution is
not large
Chandu Visweswariah
Of 77:45
LANCELOT
large–scale, general–purpose, nonlinear optimization package
efficient, especially for nonlinear constraints
exploits structure via group partial separability
Major Iteration
update l or m value(s)
formulate augmented Lagrangian and iteratively:
Minor Iteration
trust–region approach
simple bounds handled explicitly and well
minimize merit function
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:46
test for convergence
many options:
– l l 2 trust regions (box and sphere)
– various preconditioners and initializations available
– approximate/accurate bounded quadratic problem (BQP) solver
has successfully handled 20,000 variable problems with 20,000
nonlinear constraints; has been tested on 15,000 test cases
representing 1,000 unique problems
Chandu Visweswariah
i
Lagrangian
i
min f (x 1, x 2)
0 x1 a
0 x2 b
x2
Of 77:47
Quadratic model
penalty function
to weight feasibility
i ci(x) 21 ci(x)2
How LANCELOT Works
(x, , ) f (x) composite function objective
for unconstrained function
–f
minimization
x1
v
w
b
Dynamic Tuning for High–Performance Custom Digital Design
xk
u
Trust region
Simple bounds
a
Chandu Visweswariah
Minimax Optimization
x
i
x*
x
z
the original problem is min [ max { f i(x) } ], i 1, 2, , m
x, z
f 2(x)
xk
not directly differentiable
remap to min (z) s.t. z f i(x) i
f 1(x)
Special considerations
i* 1, so initialize i 1m i
initialize z to max f i(x) directly after the first function evaluation
optimality condition:
i
Dynamic Tuning for High–Performance Custom Digital Design
can handle minimax problems in other ways
Chandu Visweswariah
Of 77:48
Andy and Chandu’s Top–10 Key Messages!
Time to
wake up!
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:49
1. Replace good solutions to bad models by potentially less good
solutions to better models!
2. Use nonlinear optimization on nonlinear problems!
3. Swapping objective function and constraints does not a new ICCAD
paper make!
4. Tremendous progress has been made in nonlinear optimization in
the last 15 years; do not use outdated methods!
5. Never use derivative–free methods unless you have no choice!
6. In circuit tuning, CPU time is dominated by simulation and gradient
computation; must leverage optimizer to reduce number of iterations!
7. Efficient sensitivity computation is essential to any high–performance
circuit tuner!
8. Carefully specify all aspects of the problem you really want to solve
(forces careful understanding of the circuit and problem being posed)!
9. ‘‘Black box” optimization under–utilizes capabilities of a package
(and often leads to greatly sub–optimal solutions)!
10.Any simulation–based optimization is inherently noisy!
Chandu Visweswariah
Mini–outline
1. Introduction and motivation
2. Circuit simulation
3. Time–domain gradient computation
Case study
4. Nonlinear optimization
5.
– JiffyTune: algorithms and implementation
simulation
gradient computation
nonlinear optimization
– human interface
– application to PowerPC microprocessor design
– methodology impact and implications
– historical perspective
Dynamic Tuning for High–Performance Custom Digital Design
6. Conclusions, future work and pointers to the reading list
Chandu Visweswariah
Of 77:50
Case Study: JiffyTune
dynamic circuit optimization tool called JiffyTune has been
developed at IBM over the last 16 months
– circuit simulated by SPECS (piecewise approximate,
event–driven, fast circuit simulation program)
– takes advantage of both direct and adjoint transient
sensitivity computation
– uses LANCELOT (augmented Lagrangian,
trust–region–based, large–scale, general–purpose
nonlinear optimization package)
has successfully tuned circuits with up to 500 tunable
transistors
has been used by about 50 designers in about 2,700 tuning
runs in many different technologies and sites of IBM
human interface and methodology issues carefully taken into
account during development
Dynamic Tuning for High–Performance Custom Digital Design
was extensively used on domino circuits of a custom,
high–performance, PowerPC microprocessor
Chandu Visweswariah
Of 77:51
JiffyTune: Features at a Glance
Dynamic Tuning for High–Performance Custom Digital Design
accommodates delay, slew, area and power functions
supports minimax optimization
any combination of weighted constraints and objective
function(s) permitted
all three types of constraints (equality, less–than and
greater–than) can be specified
ratio–ing of transistors allowed
similar instances can be grouped for regular layout
automatic failure–recovery allows optimizer to take aggressive
steps
two manufacturability modes implemented
– post–tuning information mode at various process
corners
– simultaneous tuning at multiple process corners
easy–to–use human interface for problem specification as well
as back–annotation of results
specialized stopping criteria and considerations for noise
Chandu Visweswariah
Of 77:52
Circuit, models,
optimization criteria
Chandu Visweswariah
User interface
Problem specification
LANCELOT
JiffyTune
administration
chain ruling ad nauseam
X new
X old
Of 77:53
User interface
Back annotation
JiffyTune: Software Overview
g, J
f, f
SPECS
JiffyTune ‘‘Engine”
Dynamic Tuning for High–Performance Custom Digital Design
Optimized circuit
iL
vL
iL
vL
Of 77:54
SPECS: Piecewise Approximate Circuit Simulation
Modeling
piecewise constant i–v characteristics ( 1% error)
linearized junction and diffusion capacitances ( 1% error)
gate capacitances linearized and grounded ( 5% error)
Dynamic Tuning for High–Performance Custom Digital Design
tree/link formulation of circuit equations
event–driven simulation to exploit latency and multi–rate
behavior
incremental processing of KCL and KVL equations whenever a
device moves from one segment of its i–v table to the next
relative timing error inversely proportional to square of signal
slope
Analysis
Chandu Visweswariah
Upside
SPECS Summary
speed: 70x (6,000 events per second on a workstation)
capacity: 10x (0.25M transistors)
accuracy: greater than 1s of delays with 5% relative timing
accuracy; worst case is 10%
used in 12 IBM sites for
– timing verification
– functional verification
– tuning
– power estimation
of digital and memory circuits
Downside
Of 77:55
constant–grounded–capacitance–assumption causes inaccuracy
stiff circuits a problem
Dynamic Tuning for High–Performance Custom Digital Design
no inductors
Chandu Visweswariah
i
Sensitivity Analysis in SPECS
^
G
v
i i (v, p)
^
^
^
di
i dv i
or
i Gv^ I x
v dp p
dp
sensitivity or adjoint circuit consists of mostly disconnected,
grounded capacitors
little ‘‘spurts” of charge sharing occur instantaneously at the
times of the events of the original nominal circuit
event information from nominal simulation is saved
results of sensitivity or adjoint circuit yield required sensitivities;
convolution is required in the case of adjoint sensitivity
both direct and adjoint methods implemented
Dynamic Tuning for High–Performance Custom Digital Design
sensitivity or adjoint circuit repeatedly solved with appropriate
excitation
Chandu Visweswariah
v^
Of 77:56
Sensitivity Analysis in SPECS, Cont’d.
Functions
any crossing time (delay and slew are expressed as the
difference of two crossing times)
power in any voltage source over any sub–interval of time
transient node voltage (direct method only)
Parameters
transistor widths (chain ruling done automatically)
resistor and capacitance values
Implementation notes
Dynamic Tuning for High–Performance Custom Digital Design
convolution in the adjoint method is carried out in an
event–driven fashion
automatic heuristic to choose between direct and adjoint
method (presently checks if # parameters > 3 * # functions)
can force direct or adjoint method
Chandu Visweswariah
Of 77:57
Efficiency of Sensitivity Analysis in SPECS
144 MOS transistors, 27 ns of simulation
36 functions, 104 parameters (more internally)
nominal simulation: 2 s on Risc/System 6000 mod 590
sensitivity run time: 10.9 s
overhead per adjoint sensitivity circuit simulation = 15%
overhead per gradient computation = 0.14%
Example 1: PowerPC Branch Scan and Select Circuit
Dynamic Tuning for High–Performance Custom Digital Design
6,166 MOS transistors, 770 resistors, 66 ns of simulation
99 functions, 6,166 parameters (more internally)
nominal simulation: 3 mins 5 s
sensitivity run time: 20 mins 5 s
overhead per adjoint sensitivity circuit simulation = 6.5%
overhead per gradient computation = 0.001% !!!!
Logic panel module IOmux
Chandu Visweswariah
Of 77:58
v N
R
–0.98795
–1.98795
0
Chandu Visweswariah
4
6
+
–
R
8
SPECS
Analytic
10
Of 77:59
Time(ns)
C
vN
Sensitivity Results on Simple Circuit
2
Dynamic Tuning for High–Performance Custom Digital Design
delay
6.20
6.00
5.80
5.60
5.40
5.20
5.00
4.80
4.60
4.40
4.20
4.00
3.80
5.00
15.00
Delay Sensitivity
SENS DATA Results
10.00
Dynamic Tuning for High–Performance Custom Digital Design
delay sensitivity plotted as tangents to delay–width curve
Chandu Visweswariah
20.00
width
Of 77:60
LANCELOT Optimization: Special Issues
Formulation
minimax optimization (and initialization of Lagrange multipliers)
implemented as described earlier
CUTE testing environment has allowed us to integrate MINOS
from Stanford as an alternative optimizer
Problems due to noise
any simulation–based optimization is inherently noisy
unable to recover from poor initializations; optimizer often
wastes time going nowhere
does not stop gracefully or predictably
Special considerations for noisy optimization
tolerance for feasibility and line search break points
Of 77:61
consideration of small step beneath which further progress unlikely
initial trust region radius
initial Hessian approximation for secant methods
Dynamic Tuning for High–Performance Custom Digital Design
stopping criteria
Chandu Visweswariah
Manufacturability Modes in JiffyTune
general goal: make parametric manufacturability a push–button,
transparent, easy–to–use part of tuning
‘‘Information” mode
perform nominal tuning
as a postprocess, simulate the tuned circuit at each process
corner
summarize the distribution of circuit performance across
process corners
Simultaneous tuning at multiple process corners
simultaneously tune at all process corners; no surprises after
tuning is completed
replicate and modify tuning criteria for all process corners i
– constraint: convert d(x) t to d i(x) t i
x
x
x
i, j
i
– objective function: convert min d(x) to min[max d i(x)] i
x
j
– minimax: convert min[max d j(x)] j to min[max d ij(x)] i, j
Dynamic Tuning for High–Performance Custom Digital Design
run time overhead is roughly n where n number of corners
Chandu Visweswariah
Of 77:62
u
Chandu Visweswariah
Two–step Updates
min f (x)
s.t. c(x) u 0,
u0
[c(x) u] 2
2
f (x) [c(x) u] x k1, u k1
minimize along this line
Trust region
x k, u k
Dynamic Tuning for High–Performance Custom Digital Design
x
Of 77:63
Intuition
Two–step Updates, Cont’d.
if a variable appears in the merit function in a known functional
form, a second update can be applied to it without incurring the
cost of a new function/gradient evaluation
under mild assumptions, the second step can be outside the
trust–region; convergence is still guaranteed
two–step updates lead to faster convergence, better
optimization efficiency
Minimax problems
i
i[z f i(x) u i] 1
2
i
[z fi(x) ui]2
can be applied to the z variable in minimax problems;
min [ max f i(x) ] leads to
x
z
i
Dynamic Tuning for High–Performance Custom Digital Design
is a quadratic in z and is therefore amenable to two–step
updates
Chandu Visweswariah
Of 77:64
JiffyTune: The Human Interface
good tools are useless without a good human interface
JiffyTune engine is file–driven and hence is not amenable to
quick and productive design
Features of Cadence interface
Dynamic Tuning for High–Performance Custom Digital Design
developed by actual circuit designer
all functionality provided from schematic diagram
– familiar environment ⇒ no steep learning curve
– visual and interactive
– complexities of tool hidden (but accessible)
enforces clear expression of otherwise tacit circuit requirements
– JiffyTune is greedy!
adds significant additional functionality like grouping & ratio–ing
overcomes the drudgery of a file–driven interface
configurable to different site/project environments
Chandu Visweswariah
Of 77:65
JiffyTune Interface: Main Form
480ps
Dynamic Tuning for High–Performance Custom Digital Design
rise time <= 400ps 1400ps
CLKG
CLKM
Chandu Visweswariah
Of 77:66
JiffyTune Interface: Parameters Form
Dynamic Tuning for High–Performance Custom Digital Design
tunable/untunable transistors
upper/lower bounds
grouping (tracking) of instances
individual settings
support for hierarchy
Parameters
Chandu Visweswariah
Of 77:67
JiffyTune Interface: Measurements Form
rise time <= 400ps 1400ps
CLKG
CLKM
480ps
delay
rise/fall time
area
minimax function
Dynamic Tuning for High–Performance Custom Digital Design
Measurements (optimization targets)
Chandu Visweswariah
Of 77:68
JiffyTune Interface: Back Annotation
Back annotation of results
Dynamic Tuning for High–Performance Custom Digital Design
same constraints, 2.5x load
(0.6 pF → 1.5 pF)
new (suggested) transistor
widths and waveform data
back–annotated on schematic
Chandu Visweswariah
Of 77:69
JiffyTune: Methodology Impact
JiffyTune was used to tune120 circuits of a custom
dynamic–logic microprocessor and was found to be a key
productivity tool
measurements, functions, weights, tunability, simple bounds,
etc., are stored as attributes of schematic
they are indeed part of the intellectual exercise of designing a
good circuit
thus design re–use is encouraged
– when device models change
– for different loading and arrival time situations
– when design specifications change
– during design remap
forces careful and explicit specification of circuit requirements:
the issue becomes how to correctly specify the optimization
problem rather than solving the optimization problem itself!
can create a power–delay or area–delay tradeoff curve by
rerunning optimizer
Dynamic Tuning for High–Performance Custom Digital Design
Emphasis shifts to: (a) starting with a good topology
and a working circuit, (b) posing the problem well
Chandu Visweswariah
Of 77:70
Historical Perspective: DELIGHT.SPICE
W. Nye, D. C. Riley, A. S. Vincentelli, A. L. Tits, 1988
accommodates semi–infinite constraints
hard and soft constraints; the latter have ‘‘good” and ‘‘bad”
values from which the quality of the solution is judged
minimax PHASE–I–II–III optimization by method of feasible
directions (always produces feasible design, monotonically
decreases merit function while handling semi–infinite
constraints)
implements sensitivity in SPICE
Weaknesses
Dynamic Tuning for High–Performance Custom Digital Design
sloooooooooooow, tailored to small analog circuits
hard to use
Chandu Visweswariah
Of 77:71
Historical Perspective: TAILOR
D. Marple, 1989
operates on the layout constraint graph of a cell
no separate post–layout closure needed
uses constraint graph with corner–stitching data structure;
variable weight for each transistor arc
constructs a delay graph based on a single time–constant
approximation (static)
then works to minimize cell’s pitch in one dimension
uses augmented Lagrangian with penalty term
alternates steepest descent and steepest ascent to find saddle
point of Lagrangian
Weaknesses
Dynamic Tuning for High–Performance Custom Digital Design
used on small cells only (50 transistors)
no hierarchy
crude delay model
Chandu Visweswariah
Of 77:72
Historical Perspective: Interconnect Sizing
J. Cong, K.–S. Leung, 1994
N. Menezes, S. Pullela, L. T. Pileggi, 1995
concurrent gate and interconnect sizing; flurry of interest in this
topic since
Elmore delay approximation for gates
moment matching models for wires
formulated into a convex optimization problem
delay models differentiated analytically to provide gradients
sequential quadratic programming (SQP) applied
– quadratic fit about last solution point
– minimize quadratic to get new solution vector
routability enhanced by simple congestion weight factors
Weaknesses
Dynamic Tuning for High–Performance Custom Digital Design
inaccurate gate delay model
Chandu Visweswariah
Of 77:73
Mini–outline
1. Introduction and motivation
2. Circuit simulation
3. Time–domain gradient computation
4. Nonlinear optimization
Dynamic Tuning for High–Performance Custom Digital Design
Conclusions, future work and pointers to the reading list
– JiffyTune: algorithms and implementation
– human interface
– application to PowerPC microprocessor design
– methodology impact and implications
– historical perspective
5. Case study
6.
Chandu Visweswariah
Of 77:74
Chandu Visweswariah
Conclusions
dynamic tuning is possible and meaningful
1,000 tunable transistors well within reach; perhaps 10,000
tunable transistors is not far out of reach
reasonable compromises on simulation accuracy (piecewise
approximate circuit simulation) are both necessary and practical
gradients necessary to carry out tuning
adjoint method can be used to compute large number of
gradients efficiently
sophisticated nonlinear optimization algorithms essential
extensions to manufacturability based on a
design–of–experiments paradigm straightforward
having a tuning capability not only improves design time and
circuit quality, it has methodology impact too
Dynamic Tuning for High–Performance Custom Digital Design
Of 77:75
Future Work
Dynamic Tuning for High–Performance Custom Digital Design
accommodation of semi–infinite constraints, e.g., delay should
be less than a target for a range of power supply voltages
parallel processing for sensitivity computation
Hessian computation
automatic generation of problems from a static tuner; automatic
generation of stimulus patterns and problem constraints
exploitation of structure, e.g., group partial separability
formulation as a mixed integer–continuous problem for discrete
transforms
formulation improvements to speedup large optimization runs
Chandu Visweswariah
Of 77:76
Reading List
Dynamic Tuning for High–Performance Custom Digital Design
classified into
– circuit optimization in general
– SPICE–like simulation
– piecewise approximate simulation
– gradient computation
– nonlinear optimization
bibliography has been annotated
NEOS home page (take the state–of–the–art optimization
software for a test run):
http://www.mcs.anl.gov/home/otc/Server/neos.html
HAPPY READING!
Chandu Visweswariah
Of 77:77