Efficient time-domain simulation and optimization of digital FET circuits.

Efficient Time–Domain Simulation and
Of 29:1
Optimization of Digital FET Circuits
Andrew R. Conn
Ruud A. Haring
Chandu Visweswariah
Thomas J. Watson Research Center
Yorktown Heights, NY 10598
Conn, Haring and Visweswariah, MTNS ’96
1. Motivation
Outline
2. Circuit analysis (SPECS)
3. Time–domain gradient computation (SPECS)
4. Circuit optimization (JiffyTune with LANCELOT)
5. Conclusions and future work
Conn, Haring and Visweswariah, MTNS ’96
Of 29:2
Part 1: Motivation
Simulate
Good
enough?
optimization of custom, high–performance digital circuits is
usually a tedious, manual process
the time spent in closing the loop on delay, area and power
requirements is too long
Update
design
Conn, Haring and Visweswariah, MTNS ’96
Of 29:3
Motivation
circuit analysis is computationally expensive
– evaluation of device modeling equations
– equation solution
Solution
Device model
evaluation
must attack both segments!
Conn, Haring and Visweswariah, MTNS ’96
Of 29:4
Motivation
time
time
activity factor of a digital circuit is usually low; try to exploit this
multi–rate or latency behavior
Conn, Haring and Visweswariah, MTNS ’96
Of 29:5
Motivation
avoid the accuracy overkill
low accuracy
variable accuracy analysis is useful for critical path analysis
high accuracy
Conn, Haring and Visweswariah, MTNS ’96
Of 29:6
Part 2: Circuit Analysis
Classical circuit analysis
Consider a circuit with nonlinear capacitors (C’s), nonlinear
resistive elements and current sources (L’s).
.
C C v C A Li L 0
where C C = capacitance matrix
i C , v C = currents, voltages of the C’s
i L , v L = currents, voltages of the L’s
A L = incidence matrix of the L’s
Now model i Yv L J (admittance formulation).
L
.
C Cv C A L(Yv L J) 0
.
–1A Yv – C –1A J
vC – CC
L L
C L
But since v L A LTv C,
.
^
^
–1A YA Tv – C –1A J
vC – CC
L
L C
C L
.
of the general form x Ax Bu
coupled, nonlinear DAEs
Conn, Haring and Visweswariah, MTNS ’96
Of 29:7
Classical Circuit Analysis
Equation formulation
MNA, STA, TLA
Integration for all time
TR, BE, BD2, Gear
Linearization until convergence
Newton iterations
Device model evaluation
and solution
Sparse LU factorization
1% relative timing accuracy
O(n 1.4) growth of run time with circuit size
Algebraic, non–
linear ODEs
Algabraic, non–
linear equations
Of 29:8
Algebraic, linear
equations
size of circuit analyzed is limited to about 10,000 transistors
Conn, Haring and Visweswariah, MTNS ’96
.
SPECS: Key Idea
iL
iL y vL j
vf
v0
f (v) dv
–1A YA Tv – C –1A J
vC – CC
L
L C
C L
J pc
vf
1
i L J pc v v0
f
v0
vL
.
–1
vC – CC
A L J pc
Implications of this approximation
Of 29:9
nonlinearity of equations eliminated
if C C is independent of v C implicit nature of equations
eliminated (i.e., assume that all capacitances are constant)
if C C is diagonal, equations are scalar (i.e., all capacitors are
grounded)
all currents are piecewise constant (pwc) in time, voltage slopes
are pwc in time, voltages are piecewise linear (pwl) in time
integration becomes trivial; event–driven analysis possible
Conn, Haring and Visweswariah, MTNS ’96
iL
SPECS: Modeling and Analysis
iL
v
L
.
.
–1
–1
vC – CC
A LJ pc and v L – A TC C
A LJ pc
1. Every time an element of J ‘‘jumps,” an event occurs; only
pc
.
.
one element of i L changes; update v C and v L.
– v
v
target
present
.
; update all affected event times.
2.
vL
t target
3. Update the event queue and continue by picking most
imminent event on the queue.
which is very advantageous for digital circuits.
Variable accuracy analysis achieved
Relative timing error due to approximation is proportional to
2
. i
v L
Conn, Haring and Visweswariah, MTNS ’96
vL
Of 29:10
i L
.
iL
.
D
i
– L 0
C eq
.
C
B
A
vL
E
Pseudo–segment
SPECS: The Steady–State Problem
^
i L
.
v LA
if v LA 0, we have an ‘‘increasing event”
then if
.
v LC
^
find i L v LA C eq so that v LB 0
Of 29:11
interpolate later if necessary to create the pseudo–segment
DBE; schedule the element to traverse the segment and reach
either D or E
extend this scheme to multiple dimensions
Conn, Haring and Visweswariah, MTNS ’96
CLEAR
TOGGLE
5.536928
V
4.536928
3.536928
2.536928
1.536928
0.536928
–0.463072
0
20
40
SPECS: Sample Waveforms
REFRESH
OUTPUT
REFRESH
Conn, Haring and Visweswariah, MTNS ’96
60
80
SPECS
ASX
100
time (ns)
Of 29:12
Upside
SPECS Summary
speed: 70x (6,000 events per second on a workstation)
capacity: 10x (0.25M transistors)
accuracy: greater than 1σ of delays with 5% relative timing
accuracy; worst case is 10%
used in 12 IBM sites for
– timing verification
– functional verification
– tuning
– power estimation
of digital and memory circuits
Of 29:13
constant–grounded–capacitance–assumption causes inaccuracy
stiff circuits a problem
no inductors
steady–state level errors incurred
Downside
Conn, Haring and Visweswariah, MTNS ’96
signal
Part 3: Time–Domain Gradients
time
change in delay
Sensitivity of delay w.r.t. width = ∂delay/∂width
time–domain sensitivity computation is a unique feature of
SPECS
sensitivity computation is incremental; not by finite difference or
perturbation methods
both the adjoint and direct methods have been implemented
Of 29:14
sensitivity computation is a small overhead on nominal transient
simulation run
Conn, Haring and Visweswariah, MTNS ’96
v
–
^
i
C
C v
^
^ p
.
i C dv C v
p
dt
.
^
i I
p
v^ V
p
+–
v^
+
+–
^
R
i
i R
p
^
v^ i R Ri
p
–
Sensitivity Analysis by the Direct Method
+
i
R
v iR
v
R
i
i
R
p
p
p
vV
+–
V
v
p
p
iI
i
I
p
p
+ i v
–
C
i C dv
dt
dv
dv C
i
C
(
)
p dt
p
dt p
Key: directly differentiate BCRs
Conn, Haring and Visweswariah, MTNS ’96
Of 29:15
V5
+
–
Direct Method: Procedure
R1
+–
i
R1
R2
replace elements by directly differentiating BCRs to get
sensitivity circuit
reuse LU factors of original solution
R3
R4
v
R4
, so p R 1
any number of functions, only one parameter
Example:
R1
R2
problem is to find
R 1
R3
replace V 5 by a short; add a voltage source in series with R 1;
4
solve for v^ R ; that is the required answer!
Conn, Haring and Visweswariah, MTNS ’96
+
v^ R
–
4
R4
Of 29:16
Sensitivity Analysis by the Adjoint Method
Matrix formulation of adjoint method
Ax b
A x A x b
p
p
p
f
f
Given a scalar function f (x),
[ ] T[x]
p
x p
f
Postulate A T [ ]
x
f
Then TA [ ] T
x
f
TA[x] T[b – A x]
p
p
p
p
So
re–use LU factors of original solution; LU factors of A T are
U T and L T
any number of parameters, only one function
electrical formulation based on Tellegen’s theorem
Conn, Haring and Visweswariah, MTNS ’96
Of 29:17
i
^
i
^
same
topology
^
^
^
^
^
^
i
^
v b A Tv n
Ai b 0
Adjoint Sensitivity via Tellegen’s Theorem
i
vb ib vbT ib [ATvn]T ib vnT Aib 0
i
ib vb ibT vb ibT AT vn [A ib]T vn 0
i
i
^
i
i
^
i
i
^
i
i
i
^
i
^
v^b,i b
i
^
i
Of 29:18
Application of Tellegen’s theorem to a perturbed circuit
v b, i b
^
v b v b, i b i b
i
i
i
vb ib
ib vb
0
i
i
(vb vb ) ib (ib ib ) vb 0
i
i
(vb ib ib vb ) 0
Key relation!
i
Conn, Haring and Visweswariah, MTNS ’96
Adjoint Sensitivity Computation
i J J;
^
i J J
Consider a circuit with only resistors and current sources:
Current source:
^
^
v R i R R R i R R i R
Typical term (v J i J J v^ J)
Resistor: v R i R R;
Js
^
Js
^
^
f Rs
f (v Js),
.
f
v J.
v J
vJ iJ J vJ iR iR R
By choosing v^ R R i R, we get typical term i R i R R.
Total circuit:
^
f
v J
and
^
f
iR iR
R
Given a scalar performance function
Choose
iJ
f
v^ J
J
Pick off the required sensitivities:
Thus
Any number of parameters can be accommodated at once!
Conn, Haring and Visweswariah, MTNS ’96
Of 29:19
i
G
^
Sensitivity Analysis in SPECS
vN
v^
Of 29:20
v
i i (v, p)
^
^
^
di
i dv i
or
i Gv^ I x
v dp p
dp
sensitivity or adjoint circuit consists of mostly disconnected
capacitors
little ‘‘spurts” of charge sharing occur at the times of the events
of the original nominal circuit
results of sensitivity or adjoint circuit yield required sensitivities;
convolution is required in the case of adjoint sensitivity
compute delay sensitivities as follows:
v cross v N tt
cross
v
v dv cross
dt
cross
0 d v (t, p) N N
t p
dp N
dt
dp
ttcross
tt cross
v^ N dt cross
.
–
dp
tt cross
Conn, Haring and Visweswariah, MTNS ’96
v N
p
–0.98795
–1.98795
0
4
6
8
Sensitivity Results on Simple Circuit
2
Conn, Haring and Visweswariah, MTNS ’96
SPECS
Analytic
Time(ns)
10
Of 29:21
Part 4: Circuit Optimization
designers spend OODLES of time manually tuning their
(custom) circuits by adjusting transistor widths
slow, tedious, error–prone process with circuit analysis in the
inner loop
Given
a circuit schematic with initial transistor sizes
input signals or stimuli
precise statement of circuit performance requirements
Determine
optimal transistor size assignments
While supporting...
x
i
flexible objective functions and constraints
simple bounds
ratio–ing of transistors and grouping of similar structures
delay, rise/fall time, area and power measurements
minimax optimization: Min [ Max { f i(x) } ]
Of 29:22
easy–to–use graphical user interface in schematic environment
Conn, Haring and Visweswariah, MTNS ’96
Circuit, models,
optimization criteria
g, J
f, f
Optimized circuit
JiffyTune Overview
X new
X old
Of 29:23
User interface
Back annotation
Lancelot
JiffyTune
administration
chain ruling ad nauseam
SPECS
JiffyTune ‘‘Engine”
Conn, Haring and Visweswariah, MTNS ’96
User interface
Problem specification
What is Lancelot?
general–purpose nonlinear optimization package
designed for large–scale problems for which it is remarkably
robust
efficient, especially for nonlinear constraints
competitive for small and medium sized problems
Problem size and history
has successfully handled 20,000 variable problems with 20,000
nonlinear constraints
has been tested on 15,000 test cases representing 1,000
unique problems
Technical features
Of 29:24
augmented Lagrangian to handle general constraints
simple bounds handled explicitly and well
trust region approach
many options available
– l l 2 trust–region
– approximate / accurate bounded quadratic problem (BQP) solver
– various preconditioners, different initializations
Conn, Haring and Visweswariah, MTNS ’96
m
Lagrangian
b
m
min f (x 1, x 2)
0 x1 a
0 x2 b
x2
Of 29:25
Quadratic model
penalty function
to weight feasibility
i ci(x) 21 sii ci(x)2
How Lancelot Works
(x, , s, ) f (x) v
w
composite function objective
for unconstrained function
minimization
–f
x1
Trust region
Simple bounds
a
u
xk
Conn, Haring and Visweswariah, MTNS ’96
Minimax Optimization
Original problem
i
for all i
Min [ Max { f i(x) } ]
x
Remapped problem
Min (k z)
Subject to k z f i(x)
Special considerations
the scaling constant k helps create a well–scaled problem and
helps with stopping criteria
Of 29:26
kz is initialized to the largest of the f i(x)s after the first function
evaluation
Conn, Haring and Visweswariah, MTNS ’96
Lancelot: Special Issues
Problems due to noise
unable to recover from poor initializations
optimizer wastes time going nowhere
does not stop gracefully or predictably
Special considerations for noisy optimization
tolerance for feasibility
tolerance for line search break points
consideration of small step beneath which further progress
unlikely
initial trust region radius
initial Hessian approximation for secant methods
stopping criteria
Miscellaneous
slack variables updated at each major iteration
testing environment has allowed us to integrate MINOS from
Stanford as an alternative optimizer
Conn, Haring and Visweswariah, MTNS ’96
Of 29:27
Results
Of 29:28
JiffyTune has been used to tune about 120 circuits of a custom
dynamic–logic microprocessor; it is a key productivity tool
forces careful and explicit specification of circuit requirements:
the issue becomes how to correctly specify the optimization
problem rather than solving the optimization problem itself!
produces ‘‘sensible” results, like progressively sizing up buffers,
and tapering stacks of transistors
design re–use a bonanza
Conn, Haring and Visweswariah, MTNS ’96
Conclusions
Conclusions
1. By simplifying device models, we are able to perform fast,
event–driven circuit analysis.
2. The simplified analysis lends itself to efficient time–domain
gradient computation by both the direct and adjoint methods.
Of 29:29
3. The simulation and gradient computation has been placed in the
inner loop of a nonlinear optimization package to tune custom
circuits.
Future work
1. Improve adjoint sensitivity performance by event–driven
convolution; compute gradients in parallel.
2. Accommodate semi–infinite constraints; perhaps extend
JiffyTune to solve manufacturability problems.
3. Automatic recovery from non–working circuits.
4. Investigate efficient computation of the Hessian.
Conn, Haring and Visweswariah, MTNS ’96
SWITCH
PARCH, COSMOS
SPECS
v
PIECEWISE
APPROXIMATE
SPECS
i
Conn, Haring and Visweswariah, MTNS ’96
Accuracy
CIRCUIT
AS/X, SPICE
i = f(v)
Run time
Of 29:30
10K
1K
JiffyTune
SPECS–
based
5%
Of 29:31
SPICE–
based
1%
Accuracy
Sensitivity
computation
Static vs. Dynamic Tuning
Static
Tuning
30%
need for pre–characterization
applicability to custom circuits
false–path problem vs. need for input vectors
Static and dynamic tuning complement each other!
Conn, Haring and Visweswariah, MTNS ’96
Size