Of 67:1 Optimization Techniques for High–Performance Digital Circuits Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown Heights, NY (914)945–3124 [email protected] C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Embedded Tutorial Session 4B Optimization Techniques for High–Performance Digital Circuits Acknowledgements Nonlinear optimization Andy Conn and Chai Wah Wu Dynamic tuning Ruud Haring Static tuning Sachin Sapatnekar, University of Minnesota Statistical tuning Sani Nassif, IBM Austin Research Lab. Special topics and suggestions on the paper Phillip Restle (clock/interconnect tuning), Ken Shepard (noise–aware tuning), Prabhakar Kudva Project partners C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Phil Strenski, Bill Wright, Nam Nguyen, Abe Elfadel, David Ling, Paula Coulman, Greg Morrill, Katya Scheinberg, Peter O’Brien, Jeff Soreff Optimization Techniques for High–Performance Digital Circuits Of 67:2 Warning C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Opinionated presentation ahead! Optimization Techniques for High–Performance Digital Circuits Of 67:3 Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:4 Trends Introduction and Motivation increased custom design competition drives aggressive tuning with focus on performance quicker turnaround (design re–use is key) focus on manufacturability C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B for the highest possible performance for rapid, repeatable and robust tuning (productivity) for design re–use for objective comparisons of circuit alternatives to ensure manufacturability Automatic circuit optimization is essential Optimization Techniques for High–Performance Digital Circuits Of 67:5 Introduction and Motivation Update design manual tuning is a slow, iterative and error–prone procedure it relies on human intuition and experience Simulate Good enough? C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Design re–use is not easy! Optimization Techniques for High–Performance Digital Circuits Of 67:6 Some Definitions Statement of the problem given a logically correct schematic, optimally assign sizes to transistors and/or wires metrics: delay, slew, area, signal integrity, parametric yield i Optimization problem min f (x) Objective function s.t. c e(x) 0 Equality constraint Inequality constraints s.t. c l(x) 0 { s.t. c g(x) 0 s.t. x l x x u Simple bounds Minimax optimization min [ max f i(x) ], i 1, 2, , m x C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Semi–infinite problem min f (x) s.t. g(x, p) 0 for all p P x Optimization Techniques for High–Performance Digital Circuits Of 67:7 Desired Features ratio–ing of transistor sizes grouping of similar structures ease of use and back–annotation of results accommodation of complex timing constraints layout constraints simultaneous transistor/wire sizing Design requirements Algorithmic requirements C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B convergence (local? global?) minimax optimization flexible constraints and objective functions manufacturability (ability to tune at multiple process corners) failure recovery Optimization Techniques for High–Performance Digital Circuits Of 67:8 Taxonomy based on time–domain simulation of the circuit accurate but slow; limited to small circuits requires input patterns; tunes only identified paths best suited to small data–flow circuits or ‘‘cross sections” Dynamic tuning Static tuning based on static timing analysis of the circuit all paths considered at once no input patterns required; works well on control circuits C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Monte Carlo analysis across process corners extreme case analysis yield prediction design centering Statistical tuning Optimization Techniques for High–Performance Digital Circuits Of 67:9 Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:10 Variables & simple bounds Y Dynamic Tuning Objective fn. & constraints Nonlinear optimizer Done? Of 67:11 Simulation info. (netlist, patterns, device models) Tuner Back annotate results on the schematic C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B x new N Circuit simulator Optimization Techniques for High–Performance Digital Circuits Function & gradient values Advantages Dynamic Tuning accurate and realistic detects failing circuits false paths are avoided Disadvantages inherits all the weaknesses of transient simulation – slow, can handle only small circuits – requires input patterns paths to be tuned must be identified and sensitized requires a working circuit to begin vulnerable to omission of tacit requirements C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B – loading on previous stage – rise/fall time – shrinking of non–switching transistors Optimization Techniques for High–Performance Digital Circuits Of 67:12 Dynamic Tuning – Methodology Impact dynamic tuning is best applied to pre–layout schematics after extraction, correspondence between transistor sizes and diffusion capacitances is lost => impossible to tune (poses a new extraction challenge)! Dynamic tuner Of 67:13 since specification of a dynamic tuning run is quite involved, save tuning criteria as attributes of the schematic encourages design re–use; changes the paradigm! 1. Determine paths 2. Prune circuit 3. Sensitize paths can use a static timer automatically to generate dynamic tuning problems Static timer C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Static and dynamic tuning complement each other! Optimization Techniques for High–Performance Digital Circuits Dynamic Tuners DELIGHT.SPICE (Nye et al, 1988) Of 67:14 SPICE–based minimax PHASE I–II–III optimization by method of feasible directions handled semi–infinite constraints tailored to small analog circuits JiffyTune (Conn et al, 1996) SPECS–based (device modeling simplifications, fast event–driven simulation at reduced accuracy) relies on fast adjoint gradient computation uses the general purpose nonlinear optimization package LANCELOT – trust–region approach – augmented Lagrangian merit function C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B tailored to large circuits (5K+ transistors) Optimization Techniques for High–Performance Digital Circuits Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:15 Gradient Computation C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Circuit tuning is best approached by means of gradient–based nonlinear optimization Optimization Techniques for High–Performance Digital Circuits Of 67:16 Gradient Computation Never use Derivative Free (DF) if Derivative (D) is possible DF popular since easy to understand DF popular since easy to implement useful only if # variables < 20 not robust not efficient (especially for circuit applications) no convergence theory Most implementations of DF: C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:17 That’s why gradient computation is so important! Optimization Techniques for High–Performance Digital Circuits Gradient Computation Nonlinear optimization Administration and overhead Gradient computation is often the bottleneck Gradient evaluation Function evaluation example: 1,000 transistor circuit, all transistors tunable Of 67:18 50 measurements in the objective function and constraints 50 iterations to convergence C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B number of gradients required = 1,000 transistors x 6 parameters per transistor x 50 measurements x 50 iterations = 15 million! Optimization Techniques for High–Performance Digital Circuits Direct and Adjoint Sensitivity Computation Direct method v – C v. p v^ ^ C v^ + +– ^ R i i R p ^ v^ i R Ri p + ^ i ^ – – C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B i C dv C v p dt . directly differentiate electrical characteristics of devices: + i R v iR R i v i R p p p + – i v C i C dv dt dv dv C i C ( ) p dt p dt p Optimization Techniques for High–Performance Digital Circuits Of 67:19 Procedure Direct Method C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B add a zero–valued source at each parameter run a nominal transient simulation turn off the original sources; activate the zero–valued sources with the results of the nominal simulation solve the ‘‘sensitivity circuit” re–use nominal LU factors at each time point any number of measurements, one parameter can run the nominal and sensitivity simulations simultaneously in our previous example: number of gradient analyses = 1000 transistors x 6 parameters per transistor x 50 iterations = 300,000 Optimization Techniques for High–Performance Digital Circuits Of 67:20 KVL KCL Adjoint Method via Tellegen’s Theorem branches current of branches incident at each node 0 1, 0 A topological incidence matrix Ai b 0 v bi v nifrom – v nito i ^ ^ same topology ^ ^ ^ i bi v bT i b [A Tv n] T i b v nT A i b 0 v^bi i bT v^b i bT A T v^n [A i b] T v^n 0 C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:21 instantaneous power of branch 0 Conservation of energy bi Tellegen’s theorem i bi v i i Optimization Techniques for High–Performance Digital Circuits v b Ti b 0 v b A Tv n nodes Adjoint Sensitivity via Tellegen’s Theorem, Contd. i ^ i v^b,i b ^ Tellegen’s theorem applied to a perturbed circuit v b, i b v b v b, i b i b ^ i i i bi i ^ ^ i i i bi – i bi v^bi ) 0 i i ^ Key relation! i vb ib ib vb 0 i i (vb vb ) ib (ib ib ) vb 0 i (v i Motivation ^ Vs ^ others –vI iI iV vV typical term I C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:22 k pk , we can pick up all the gradients fpk if we can manipulate into and then into f Optimization Techniques for High–Performance Digital Circuits , Adjoint Sensitivity Computation Consider a circuit with only resistors and current sources: ^ for any current source: i I; i I I I ^ Typical term (v I i I–I v^I) v R i R R R i R R i R for any resistor: v R i R R; ^ ^ ^ Rs ^ –vI iI –I vI iR iR R Is By choosing v^R R i R, we get typical term i R i R R. total circuit: Is given a performance v Ik, f v Ik. ^ choose i Ik –1 and ^ f i Rj i Rj . R j pick off the required sensitivities: Thus f –v^Ij I j C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B any number of parameters can be accommodated at once! Optimization Techniques for High–Performance Digital Circuits Of 67:23 Procedure Adjoint Method C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B add a zero–valued source at each measurement point run a nominal transient simulation turn off the original sources; activate the zero–valued sources appropriately reverse time and control solve the ‘‘adjoint circuit” re–use the LU factors at each time point (modulo time–point mismatch problems) obtain gradients by convolving nominal and adjoint waveforms any number of parameters, one measurement in our previous example: number of gradient analyses = 50 measurements x 50 iterations = 2,500 Optimization Techniques for High–Performance Digital Circuits Of 67:24 Adjoint Lagrangian Sensitivity Computation Motivation the gradient computation is the bottleneck; it is the stumbling block in being able to tune an entire circuit macro Of 67:25 if the problem has m measurements and p tunable parameters, we need to compute m × p gradients at each iteration if m >> p, the direct method is used, which involves solving an associated sensitivity circuit p times if p >> m the adjoint method is used, which involves solving an associated adjoint circuit m times Is there a way of solving an associated circuit just once to obtain all the gradients of interest? Adjoint Lagrangian formulation for the purposes of optimization, yes, it is possible to get all the required gradients at once! C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B key observation: optimizer minimizes scalar merit function Optimization Techniques for High–Performance Digital Circuits p 1, p 2, , p n i d1 . slope v 1 An Example to Demonstrate the Theory + – p 1, p 2, , p n Nominal simulation Sensitivity run #1 d 1 1 – . conv(p i) i p i v1 p 1, p 2, , p n min d 1 s.t. d 2 0 d 1 d 2 1 d 22 2 Sensitivity run #2 d 2 – 1. conv(p i) i p i v2 Combine the gradients C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B d 1 ( d 2) d 2 p i p i p i Optimization Techniques for High–Performance Digital Circuits . slope v 2 d2 Of 67:26 + – h2 d1 h1 slope v 1 d – 1. ( 2) v2 p 1, p 2, , p n min d 1 s.t. d 2 0 d 1 d 2 1 d 22 2 p 1, p 2, , p n h2 d2 . slope v 2 An Example to Demonstrate the Theory,. Contd. Nominal simulation h – 1. 1 v1 Single sensitivity run Gradients conv(p ) i i p i C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:27 any number of measurements or constraints handled at once! any number of parameters handled at once (inherent in adjoint)! speedup #measurements Optimization Techniques for High–Performance Digital Circuits Example Run time measurements on an example Direct method Adjoint method 635.4 (10.6 minutes) 827.1 5.8% (57 solutions) 0.001% 1.82 ms Adjoint Lagrangian 21.9 213.6 11.4% (1 solution) Of 67:28 0.00003% 0.063 ms (63 ms) C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B 39,041 (10.8 hours) 39,233 3.3% (6,114 solutions) 0.06% 112 ms IOmux circuit from an actual LPM chip with 6,900 MOSFETs, 770 RC wire models, 57 functions, 6,114 parameters, 348,498 gradients Gradient computation CPU time (s) Total run time (s) Run time overhead per sensitivity circuit solution Overhead per gradient Run time per gradient Optimization Techniques for High–Performance Digital Circuits Run time of adjoint method Complexity Analysis Run time of direct method Of 67:29 Adjoint Lagrangian run time The 3–D plots on this page have been deleted so that the document can be viewed with Ghostview Run time with heuristic choice of method C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B The 3–D plots on this page have been deleted so that the document can be viewed with Ghostview Optimization Techniques for High–Performance Digital Circuits Application to Piecewise Approximate Simulation C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B i–v characteristics are approximated by piecewise approximate functions when the i–v characteristics are differentiated, very simple elements are obtained event times of the nominal circuit correspond one–to–one with the event times of the sensitivity or adjoint circuits => time–point mismatch problems are solved! in SPECS, the sensitivity or adjoint circuit consists of disconnected grounded capacitors with impulses of charge being exchanged at event times convolution of piecewise approximate waveforms is efficient and can be computed in an event–driven fashion particularly in this context, adjoint computation of sensitivities is extremely attractive! Optimization Techniques for High–Performance Digital Circuits Of 67:30 Gradient Computation: Summary i t C W t I W eff eff i 1 ds 2 I ds W eff PW C i W eff PW i t 2 C i W eff C i W eff PW method of choice for large circuits is the adjoint method the adjoint method is even more attractive if the optimizer builds a scalar merit function adjoint Lagrangian formulation requires close cooperation between the simulator and the nonlinear optimizer sensitivity computation involves plenty of chain ruling and combining of gradients (beware of errors!); diffusion capacitances must be taken into account delay t 1 t 2 t t delay 1 – 2 PW PW PW t I W eff ds 1 I ds W eff PW t 1 t 1 C i area W eff C i perimeter W eff W eff PW C i area W eff PW C i perimeter C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:31 Never use gradient–free approaches if you have a choice! Optimization Techniques for High–Performance Digital Circuits Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:32 Static Tuning AT out max(AT i d ij) C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:33 all paths considered at once arrival times (ATs) and required arrival times (RATs) computed at each node of the graph; the difference is the slack no input vectors required analysis is conservative in both early and late modes various timing checks can be performed as part of the analysis Optimization Techniques for High–Performance Digital Circuits Convex Functions and Convex Sets xb f xb Non–convex xa f(x) is convex if the line joining any two points f(xa ) and f(xb ) lies on or above the function f xa Convex formally, f(x) is convex if Of 67:34 f (x a x b) f (x a) f (x b) 1, 0, 0 C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B any local minimum of a convex function is a global minimum Optimization Techniques for High–Performance Digital Circuits Convex Problems min f (x) s.t. c(x) 0 is a convex problem if f(x) is convex and the feasible region is a convex set; i.e., f(x) and c(x) are both convex functions i ij j xi , j 0, ij any local solution of a convex problem is a global solution Posynomials g(x) j C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B a posynomial is like a polynomial but with positive coefficients and real exponents by substituting xi =e zi , we can map a posynomial into a convex function Optimization Techniques for High–Performance Digital Circuits Of 67:35 Transistor Sizing min Area s.t. Delay Target C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B area S xi is a posynomial delay ∝ RC, R ∝ x –1, C ∝ x with positive coefficients path delay = S gate delays is a posynomial hence both the objective function and the constraint are posynomials they can be mapped into convex functions any local solution is therefore a global solution! note that Elmore delay is a posynomial function of transistor and wire sizes Optimization Techniques for High–Performance Digital Circuits Of 67:36 Procedure TILOS (Fishburn 1985) C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:37 set all transistors to the minimum size (minimum area solution) conduct a static timing analysis using Elmore (or Penfield– Rubinstein) delay models find the sensitivity of delay to transistor size for all transistors on the critical path bump us the size of the transistor with the largest negative sensitivity by a fixed increment repeat until delay cannot be improved further entire delay–area trade–off curve can be generated Optimization Techniques for High–Performance Digital Circuits F Inaccuracies in TILOS A E D C B Path at a time model is inadequate C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B may be better to size A than B, C and D if F–E is near–critical and A–D is critical, size A, not D CONTRAST (Vaidya 1989) solves the convex problem exactly there have been subsequent efforts to incorporate power optimization and simultaneous gate/wire tuning in the same general framework Optimization Techniques for High–Performance Digital Circuits Of 67:38 Pros and Cons of Static Tuning Advantages fast (linear time complexity); can handle large circuits pattern–independent; all paths considered at once interconnect easily accommodated into basic algorithm Disadvantages poor delay model; tuning gains are suspect – Elmore and other RC–based delays inadequate – lumping into ‘‘equivalent inverters” inadequate C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B improving the delay model often destroys the posynomial form, sacrificing the guarantee of a global solution conservative and prone to false–path problems not applicable to full–custom circuitry or circuits without delay rules non–working circuits are not detectable Very interesting and challenging area of research! Optimization Techniques for High–Performance Digital Circuits Of 67:39 Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:40 Statistical Tuning Yield loss on a fabrication line catastrophic (e.g., dust particles) parametric (or circuit–limited) Impact of parametric yield loss in non–sorted designs, bad parts have to be discarded in sorted designs like microprocessors, poor yield means fewer parts in the ‘‘high profit” bins variations dominated by ACLV effects general goal is to characterize the variability in parameter space and then map the variability into performance space C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Aggressive tuning can drive a design into a precipitous corner of the design space! Optimization Techniques for High–Performance Digital Circuits Of 67:41 P1 2 4 Uncertainty 3 5 6 7 Variability Variability and Uncertainty 1 10 9 8 P2 C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:42 variability arises from known simulatable phenomena like Vdd or processing variations uncertainty arises in the face of unknown or unsimulatable phenomena like substrate noise or modeling errors uncertainty forces conservatism in the design in order to guarantee sufficient yield Optimization Techniques for High–Performance Digital Circuits p1 99% p1 2 3 4 Of 67:43 p2 Values of Performance contours 1 Parameter Space and Performance Histograms 90% 60% p2 Distribution of parameters # C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B 12 3 4 Performance histogram Optimization Techniques for High–Performance Digital Circuits Given Problem Statement a design specification a target performance a model of the variability and uncertainty Determine Of 67:44 a minimal representative sampling of the parameter space – Monte Carlo methods the expected spread of the performance – extreme case analysis the distribution of the performance – yield estimation C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B how to change the design to improve the performance – design centering – yield optimization Optimization Techniques for High–Performance Digital Circuits Monte Carlo Analysis sample the parameter space use correlation and principal component analysis to reduce the dimensionality of the sampled space the sampled points are like process corners run a simulation or tuning at each sample point given sufficient sample points, can predict worst– and best–case behavior, distribution of performance, yield Example convert a nominal tuning problem to a problem across the process corners j: x j min[max f j(x)] x ij min [ max f ij(x) ] x C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B min f (x) ] c j(x) 0 max f i(x) i c(x) 0 min [ x Optimization Techniques for High–Performance Digital Circuits Of 67:45 Extreme Case Analysis attempts to predict the worst–case behavior of the circuit also need to predict the best–case behavior for short paths statistical analysis too expensive for large circuits Shortcut perform analysis on typical circuit for typical performances compute the sensitivity of performances to parameters to determine the direction in which the parameters must be moved to mimic worst–case behavior identify extreme–case parameter sets use these parameter sets on all other circuits C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Unfortunately, sensitivities can be different for different performances and for different circuits/circuit families. Optimization Techniques for High–Performance Digital Circuits Of 67:46 p1 Yield Estimation 4 Yield loss 3 Values of 2 Of 67:47 12 3 4 Performance histogram # yield is determined in parameter space many academic approaches such as yield integrals and geometric methods 1 Performance contours p 2 C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B instead, can use intelligent sampling to build performance model as a function of the important parameters (response surface model RSM) and derive complete PDF characterizing the performance Optimization Techniques for High–Performance Digital Circuits Yield Optimization and Design Centering Yield optimization very expensive, instead: – optimize the performance of the worst circuit – optimize the performance of the extreme circuits – optimize yield as estimated by intelligent sampling once a good RSM is built, yield optimization can be RSM–based Design centering C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:48 attempts to create robust designs by ‘‘centering” the circuit in the feasible region the technique used may be geometric or ‘‘method of moments” Optimization Techniques for High–Performance Digital Circuits Statistical Tuning: The Bottom Line statistical design and tuning is obviously crucial to obtaining reasonable yields and hence profits design for manufacturability (design for profit) must be an integral part of the design methodology obtaining meaningful statistical models is a challenge manufacturability analysis is an art practised by a few experts! Monte Carlo and extreme–case analyses are the most popular techniques since they are easy to understand and the most accessible among available tools C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Much work needs to be done in bringing theoretical approaches proposed in the literature to mainstream practice. Optimization Techniques for High–Performance Digital Circuits Of 67:49 Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:50 Nonlinear Optimization: Terminology 0 c i(x) s i 0 s.t. s i 0 ici(x) constrained vs. unconstrained optimization linear vs. nonlinear programming feasibility and feasible region constraints that are ‘‘tight” are called activities c i(x) inequalities typically converted to equalities using slack variables: Lagrangian: (x, ) f (x) i optimality conditions: c i(x *) 0 (primal feasibility) and C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B x(x *, *) 0 (dual feasibility) Optimization Techniques for High–Performance Digital Circuits Of 67:51 Some Key Messages C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B circuit tuning is a nonlinear problem; no amount of familiarity with linear programming will change this fact we are interested in solving large problems we are interested in general–purpose optimization with general nonlinear constraints the good news: there has been tremendous progress in large–scale nonlinear optimization in the last 15 years MINOS (active set and line–search methods): very effective on constraints that are linear or near linear LANCELOT (trust–region methods): very effective with highly nonlinear constraints CUTE testing environment allows the testing of several optimization packages with relatively low investment Optimization Techniques for High–Performance Digital Circuits Of 67:52 MINOS (Stanford, ca. 1980) min F(x) c Tx d Ty nonlinear linear s.t. f (x) A 1y b 1 nonlinear constraints A 2x A 3y b 2 linear constraints lx x ux simple bounds ly y uy is remapped to ~ ~ ~ min F(x) c Tx d Ty kT[f (x) – f(x)] 1 [f (x) – f(x)] T[f (x) – f(x)] 2 C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B augmentation objective function Lagrangian of nonlinear part of constraints ~ s.t. f(x) A 1y b 1 linearized constraints A 2x A 3y b 2 linear constraints lx x ux simple bounds ly y uy where ~ where f(x) f (x) J k(x–x k) (linearization) Optimization Techniques for High–Performance Digital Circuits Of 67:53 MINOS, Cont’d. large–scale, general–purpose, nonlinear optimization package intuition: – separate linear and nonlinear parts – use active set methods for the linear part (LP–like) – use augmented Lagrangian for the nonlinear part – minimize quadratic model of the merit function subject to linear model of constraints (QP) – line–search methods for minimizing the quadratic model in practice, very effective, especially if C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B – most constraints are linear or near linear – real number of degrees of freedom at the solution is not large Optimization Techniques for High–Performance Digital Circuits Of 67:54 LANCELOT large–scale, general–purpose, nonlinear optimization package efficient, especially for nonlinear constraints exploits structure via group partial separability Major Iteration update l or m value(s) formulate augmented Lagrangian and iteratively: Minor Iteration trust–region approach simple bounds handled explicitly and well minimize merit function test for convergence many options: Of 67:55 – l l 2 trust regions (box and sphere) – various preconditioners and initializations available – approximate/accurate bounded quadratic problem (BQP) solver C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B has successfully handled 20,000 variable problems with 20,000 nonlinear constraints Optimization Techniques for High–Performance Digital Circuits c (x) 1 i i 2 b ci(x)2 i penalty function to weight feasibility x2 Of 67:56 Quadratic model min f (x 1, x 2) 0 x1 a 0 x2 b C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Lagrangian i How LANCELOT Works (x, , ) f (x) v w composite function objective for unconstrained function –f minimization x1 u Trust region Simple bounds a xk Optimization Techniques for High–Performance Digital Circuits Minimax Optimization x i the original problem is min [ max { f i(x) } ], i 1, 2, , m f 2(x) x* x, z x z not directly differentiable, so remap to min (z) s.t. z f i(x) i f 1(x) xk Special considerations i* 1 Of 67:57 , so initialize i 1m i initialize z to max fi (x) after the first function evaluation optimality condition: C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B can handle minimax problems in other ways Optimization Techniques for High–Performance Digital Circuits Andy and Chandu’s Top–10 Key Messages! 10.Circuit tuning is a nonlinear problem. The only guarantee is convergence to a local stationary point (hopefully the problem is convex and hopefully we get improvement). 9. We need general–purpose, large–scale, robust optimizers; we require flexible objective functions and constraints (exchanging objective function and constraints does not a new ICCAD paper make). Of 67:58 8. Tremendous progress has been made in nonlinear optimization in the last 15 years; do not use outdated methods! 7. Never use derivative–free methods unless you have no choice! Efficient sensitivity computation is essential in any high–performance circuit tuner. C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B 6. In circuit tuning, CPU time is dominated by simulation and gradient computation; must leverage optimizer to reduce number of iterations! Optimization Techniques for High–Performance Digital Circuits Andy and Chandu’s Top–10 Key Messages! Of 67:59 5. ‘‘Black box” optimization under–utilizes capabilities of a package (and often leads to greatly sub–optimal solutions)! Enhancements like the ‘‘adjoint Lagrangian” formulation require tinkering with the optimizer source. 4. Follow Seidel’s advice, ‘‘If you know it, use it!” 3. Carefully specify all aspects of the problem you really want to solve (forces careful understanding of the circuit and problem being posed)! 2. Any simulation–based optimization is inherently noisy! C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B 1. Replace good solutions to bad models by potentially less good solutions to better models! Optimization Techniques for High–Performance Digital Circuits Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:60 Observations and Guidelines N Y If you are evaluating a proposal, research paper or vendor software: 1. Does the tool have a proof of convergence? 2. Does it use a nonlinear optimizer that the nonlinear optimization community laughed off the face of the earth several years ago? 3. Is it gradient–based? Does it stab blindly at n–dimensional space? Are the gradients computed by finite–differences? C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:61 4. Is the simulation/delay model sufficiently accurate? Are the tuner gains credible? Optimization Techniques for High–Performance Digital Circuits Handy–Dandy Questionnaire, Cont’d. 5. Are there heuristics involved? Y N 6. Will the tool have manageable methodology impact? 7. Is the tool flexible for application to different types of circuits? 8. Is it easy to use? Does it allow ratio–ing? Grouping? Do the resulting circuits lend themselves to your layout style and tools? C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:62 If you checked even a single red box, render swift judgment! Optimization Techniques for High–Performance Digital Circuits Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:63 Special Topics Interconnect tuning topic of a separate embedded tutorial by J. Cong, session 8B on Tuesday afternoon at 4:00 p.m. simultaneous tuning of gates and wires has been shown to yield superior results reduced–order modeling has allowed analysis of large interconnect networks in an efficient manner; these models are amenable to sensitivity computations interconnect tuners work best when tightly coupled to the floorplanner and layout data several approaches: – wire sizing – buffer placement – topology changes C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B need a holistic solution special considerations for clock and power distribution Optimization Techniques for High–Performance Digital Circuits Of 67:64 Special Topics Noise–aware tuning C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Of 67:65 noise checking usually carried out on a static timing basis noise considerations pose additional constraints to the tuner these constraints may take the form of semi–infinite constraints; or they may be derived from rules of thumb stability criteria can be additional metrics in the tuning problem often a direct trade–off between noise immunity and performance channel lengths may be tuned for additional noise immunity Optimization Techniques for High–Performance Digital Circuits Outline 1. Introduction and motivation 2. Dynamic tuning 3. Gradient computation direct and adjoint methods 4. Static tuning 5. Statistical tuning 6. Nonlinear optimization as applied to circuit tuning 7. Observations and guidelines 8. Special topics C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B interconnect tuning power, clock and ground networks signal integrity 9. Research opportunities Optimization Techniques for High–Performance Digital Circuits Of 67:66 Research Opportunities tuning based on static timing analysis (while maintaining accuracy, reasonable speed, convergence and the ability to accommodate custom circuitry) tremendous opportunity in applying statistical design techniques to improve yields; at 1 GHz and above, manufacturability is key! noise–aware tuning (on the heels of formal noise analysis tools) incremental circuit Hessian computation mixed integer/continuous problems semi–infinite problems multi–criteria optimization C. Visweswariah, ICCAD ’97 Embedded Tutorial, Session 4B Thank you for your attention! Optimization Techniques for High–Performance Digital Circuits Of 67:67