Efficient Time–Domain Simulation and Of 29:1 Optimization of Digital FET Circuits Andrew R. Conn Ruud A. Haring Chandu Visweswariah Thomas J. Watson Research Center Yorktown Heights, NY 10598 Conn, Haring and Visweswariah, MTNS ’96 1. Motivation Outline 2. Circuit analysis (SPECS) 3. Time–domain gradient computation (SPECS) 4. Circuit optimization (JiffyTune with LANCELOT) 5. Conclusions and future work Conn, Haring and Visweswariah, MTNS ’96 Of 29:2 Part 1: Motivation Simulate Good enough? optimization of custom, high–performance digital circuits is usually a tedious, manual process the time spent in closing the loop on delay, area and power requirements is too long Update design Conn, Haring and Visweswariah, MTNS ’96 Of 29:3 Motivation circuit analysis is computationally expensive – evaluation of device modeling equations – equation solution Solution Device model evaluation must attack both segments! Conn, Haring and Visweswariah, MTNS ’96 Of 29:4 Motivation time time activity factor of a digital circuit is usually low; try to exploit this multi–rate or latency behavior Conn, Haring and Visweswariah, MTNS ’96 Of 29:5 Motivation avoid the accuracy overkill low accuracy variable accuracy analysis is useful for critical path analysis high accuracy Conn, Haring and Visweswariah, MTNS ’96 Of 29:6 Part 2: Circuit Analysis Classical circuit analysis Consider a circuit with nonlinear capacitors (C’s), nonlinear resistive elements and current sources (L’s). . C C v C A Li L 0 where C C = capacitance matrix i C , v C = currents, voltages of the C’s i L , v L = currents, voltages of the L’s A L = incidence matrix of the L’s Now model i Yv L J (admittance formulation). L . C Cv C A L(Yv L J) 0 . –1A Yv – C –1A J vC – CC L L C L But since v L A LTv C, . ^ ^ –1A YA Tv – C –1A J vC – CC L L C C L . of the general form x Ax Bu coupled, nonlinear DAEs Conn, Haring and Visweswariah, MTNS ’96 Of 29:7 Classical Circuit Analysis Equation formulation MNA, STA, TLA Integration for all time TR, BE, BD2, Gear Linearization until convergence Newton iterations Device model evaluation and solution Sparse LU factorization 1% relative timing accuracy O(n 1.4) growth of run time with circuit size Algebraic, non– linear ODEs Algabraic, non– linear equations Of 29:8 Algebraic, linear equations size of circuit analyzed is limited to about 10,000 transistors Conn, Haring and Visweswariah, MTNS ’96 . SPECS: Key Idea iL iL y vL j vf v0 f (v) dv –1A YA Tv – C –1A J vC – CC L L C C L J pc vf 1 i L J pc v v0 f v0 vL . –1 vC – CC A L J pc Implications of this approximation Of 29:9 nonlinearity of equations eliminated if C C is independent of v C implicit nature of equations eliminated (i.e., assume that all capacitances are constant) if C C is diagonal, equations are scalar (i.e., all capacitors are grounded) all currents are piecewise constant (pwc) in time, voltage slopes are pwc in time, voltages are piecewise linear (pwl) in time integration becomes trivial; event–driven analysis possible Conn, Haring and Visweswariah, MTNS ’96 iL SPECS: Modeling and Analysis iL v L . . –1 –1 vC – CC A LJ pc and v L – A TC C A LJ pc 1. Every time an element of J ‘‘jumps,” an event occurs; only pc . . one element of i L changes; update v C and v L. – v v target present . ; update all affected event times. 2. vL t target 3. Update the event queue and continue by picking most imminent event on the queue. which is very advantageous for digital circuits. Variable accuracy analysis achieved Relative timing error due to approximation is proportional to 2 . i v L Conn, Haring and Visweswariah, MTNS ’96 vL Of 29:10 i L . iL . D i – L 0 C eq . C B A vL E Pseudo–segment SPECS: The Steady–State Problem ^ i L . v LA if v LA 0, we have an ‘‘increasing event” then if . v LC ^ find i L v LA C eq so that v LB 0 Of 29:11 interpolate later if necessary to create the pseudo–segment DBE; schedule the element to traverse the segment and reach either D or E extend this scheme to multiple dimensions Conn, Haring and Visweswariah, MTNS ’96 CLEAR TOGGLE 5.536928 V 4.536928 3.536928 2.536928 1.536928 0.536928 –0.463072 0 20 40 SPECS: Sample Waveforms REFRESH OUTPUT REFRESH Conn, Haring and Visweswariah, MTNS ’96 60 80 SPECS ASX 100 time (ns) Of 29:12 Upside SPECS Summary speed: 70x (6,000 events per second on a workstation) capacity: 10x (0.25M transistors) accuracy: greater than 1σ of delays with 5% relative timing accuracy; worst case is 10% used in 12 IBM sites for – timing verification – functional verification – tuning – power estimation of digital and memory circuits Of 29:13 constant–grounded–capacitance–assumption causes inaccuracy stiff circuits a problem no inductors steady–state level errors incurred Downside Conn, Haring and Visweswariah, MTNS ’96 signal Part 3: Time–Domain Gradients time change in delay Sensitivity of delay w.r.t. width = ∂delay/∂width time–domain sensitivity computation is a unique feature of SPECS sensitivity computation is incremental; not by finite difference or perturbation methods both the adjoint and direct methods have been implemented Of 29:14 sensitivity computation is a small overhead on nominal transient simulation run Conn, Haring and Visweswariah, MTNS ’96 v – ^ i C C v ^ ^ p . i C dv C v p dt . ^ i I p v^ V p +– v^ + +– ^ R i i R p ^ v^ i R Ri p – Sensitivity Analysis by the Direct Method + i R v iR v R i i R p p p vV +– V v p p iI i I p p + i v – C i C dv dt dv dv C i C ( ) p dt p dt p Key: directly differentiate BCRs Conn, Haring and Visweswariah, MTNS ’96 Of 29:15 V5 + – Direct Method: Procedure R1 +– i R1 R2 replace elements by directly differentiating BCRs to get sensitivity circuit reuse LU factors of original solution R3 R4 v R4 , so p R 1 any number of functions, only one parameter Example: R1 R2 problem is to find R 1 R3 replace V 5 by a short; add a voltage source in series with R 1; 4 solve for v^ R ; that is the required answer! Conn, Haring and Visweswariah, MTNS ’96 + v^ R – 4 R4 Of 29:16 Sensitivity Analysis by the Adjoint Method Matrix formulation of adjoint method Ax b A x A x b p p p f f Given a scalar function f (x), [ ] T[x] p x p f Postulate A T [ ] x f Then TA [ ] T x f TA[x] T[b – A x] p p p p So re–use LU factors of original solution; LU factors of A T are U T and L T any number of parameters, only one function electrical formulation based on Tellegen’s theorem Conn, Haring and Visweswariah, MTNS ’96 Of 29:17 i ^ i ^ same topology ^ ^ ^ ^ ^ ^ i ^ v b A Tv n Ai b 0 Adjoint Sensitivity via Tellegen’s Theorem i vb ib vbT ib [ATvn]T ib vnT Aib 0 i ib vb ibT vb ibT AT vn [A ib]T vn 0 i i ^ i i ^ i i ^ i i i ^ i ^ v^b,i b i ^ i Of 29:18 Application of Tellegen’s theorem to a perturbed circuit v b, i b ^ v b v b, i b i b i i i vb ib ib vb 0 i i (vb vb ) ib (ib ib ) vb 0 i i (vb ib ib vb ) 0 Key relation! i Conn, Haring and Visweswariah, MTNS ’96 Adjoint Sensitivity Computation i J J; ^ i J J Consider a circuit with only resistors and current sources: Current source: ^ ^ v R i R R R i R R i R Typical term (v J i J J v^ J) Resistor: v R i R R; Js ^ Js ^ ^ f Rs f (v Js), . f v J. v J vJ iJ J vJ iR iR R By choosing v^ R R i R, we get typical term i R i R R. Total circuit: ^ f v J and ^ f iR iR R Given a scalar performance function Choose iJ f v^ J J Pick off the required sensitivities: Thus Any number of parameters can be accommodated at once! Conn, Haring and Visweswariah, MTNS ’96 Of 29:19 i G ^ Sensitivity Analysis in SPECS vN v^ Of 29:20 v i i (v, p) ^ ^ ^ di i dv i or i Gv^ I x v dp p dp sensitivity or adjoint circuit consists of mostly disconnected capacitors little ‘‘spurts” of charge sharing occur at the times of the events of the original nominal circuit results of sensitivity or adjoint circuit yield required sensitivities; convolution is required in the case of adjoint sensitivity compute delay sensitivities as follows: v cross v N tt cross v v dv cross dt cross 0 d v (t, p) N N t p dp N dt dp ttcross tt cross v^ N dt cross . – dp tt cross Conn, Haring and Visweswariah, MTNS ’96 v N p –0.98795 –1.98795 0 4 6 8 Sensitivity Results on Simple Circuit 2 Conn, Haring and Visweswariah, MTNS ’96 SPECS Analytic Time(ns) 10 Of 29:21 Part 4: Circuit Optimization designers spend OODLES of time manually tuning their (custom) circuits by adjusting transistor widths slow, tedious, error–prone process with circuit analysis in the inner loop Given a circuit schematic with initial transistor sizes input signals or stimuli precise statement of circuit performance requirements Determine optimal transistor size assignments While supporting... x i flexible objective functions and constraints simple bounds ratio–ing of transistors and grouping of similar structures delay, rise/fall time, area and power measurements minimax optimization: Min [ Max { f i(x) } ] Of 29:22 easy–to–use graphical user interface in schematic environment Conn, Haring and Visweswariah, MTNS ’96 Circuit, models, optimization criteria g, J f, f Optimized circuit JiffyTune Overview X new X old Of 29:23 User interface Back annotation Lancelot JiffyTune administration chain ruling ad nauseam SPECS JiffyTune ‘‘Engine” Conn, Haring and Visweswariah, MTNS ’96 User interface Problem specification What is Lancelot? general–purpose nonlinear optimization package designed for large–scale problems for which it is remarkably robust efficient, especially for nonlinear constraints competitive for small and medium sized problems Problem size and history has successfully handled 20,000 variable problems with 20,000 nonlinear constraints has been tested on 15,000 test cases representing 1,000 unique problems Technical features Of 29:24 augmented Lagrangian to handle general constraints simple bounds handled explicitly and well trust region approach many options available – l l 2 trust–region – approximate / accurate bounded quadratic problem (BQP) solver – various preconditioners, different initializations Conn, Haring and Visweswariah, MTNS ’96 m Lagrangian b m min f (x 1, x 2) 0 x1 a 0 x2 b x2 Of 29:25 Quadratic model penalty function to weight feasibility i ci(x) 21 sii ci(x)2 How Lancelot Works (x, , s, ) f (x) v w composite function objective for unconstrained function minimization –f x1 Trust region Simple bounds a u xk Conn, Haring and Visweswariah, MTNS ’96 Minimax Optimization Original problem i for all i Min [ Max { f i(x) } ] x Remapped problem Min (k z) Subject to k z f i(x) Special considerations the scaling constant k helps create a well–scaled problem and helps with stopping criteria Of 29:26 kz is initialized to the largest of the f i(x)s after the first function evaluation Conn, Haring and Visweswariah, MTNS ’96 Lancelot: Special Issues Problems due to noise unable to recover from poor initializations optimizer wastes time going nowhere does not stop gracefully or predictably Special considerations for noisy optimization tolerance for feasibility tolerance for line search break points consideration of small step beneath which further progress unlikely initial trust region radius initial Hessian approximation for secant methods stopping criteria Miscellaneous slack variables updated at each major iteration testing environment has allowed us to integrate MINOS from Stanford as an alternative optimizer Conn, Haring and Visweswariah, MTNS ’96 Of 29:27 Results Of 29:28 JiffyTune has been used to tune about 120 circuits of a custom dynamic–logic microprocessor; it is a key productivity tool forces careful and explicit specification of circuit requirements: the issue becomes how to correctly specify the optimization problem rather than solving the optimization problem itself! produces ‘‘sensible” results, like progressively sizing up buffers, and tapering stacks of transistors design re–use a bonanza Conn, Haring and Visweswariah, MTNS ’96 Conclusions Conclusions 1. By simplifying device models, we are able to perform fast, event–driven circuit analysis. 2. The simplified analysis lends itself to efficient time–domain gradient computation by both the direct and adjoint methods. Of 29:29 3. The simulation and gradient computation has been placed in the inner loop of a nonlinear optimization package to tune custom circuits. Future work 1. Improve adjoint sensitivity performance by event–driven convolution; compute gradients in parallel. 2. Accommodate semi–infinite constraints; perhaps extend JiffyTune to solve manufacturability problems. 3. Automatic recovery from non–working circuits. 4. Investigate efficient computation of the Hessian. Conn, Haring and Visweswariah, MTNS ’96 SWITCH PARCH, COSMOS SPECS v PIECEWISE APPROXIMATE SPECS i Conn, Haring and Visweswariah, MTNS ’96 Accuracy CIRCUIT AS/X, SPICE i = f(v) Run time Of 29:30 10K 1K JiffyTune SPECS– based 5% Of 29:31 SPICE– based 1% Accuracy Sensitivity computation Static vs. Dynamic Tuning Static Tuning 30% need for pre–characterization applicability to custom circuits false–path problem vs. need for input vectors Static and dynamic tuning complement each other! Conn, Haring and Visweswariah, MTNS ’96 Size