MICRO-44 5 December 2011 Active Management of Timing Guardband to Save Energy in POWER7 Charles Lefurgy, Alan Drake, Michael Floyd, Malcolm Allen-Ware, Bishop Brock, Jose Tierno, and John Carter © 2011 IBM Corporation Excess guardband The voltage used on a microprocessor is conservative to provide a safe timing margin under worst-case conditions – workload-induced voltage droops (dI/dt or load line) – high temperature Concern: Energy-efficiency is reduced to guarantee reliability. Opportunity: Worst-case conditions rarely occur. Can actual timing margin be controlled? Microprocessor operating points IL FA Maximum frequency Nominal guardband reliable for worst-case load min 2 Voltage max Reduced guardband with active management © 2011 IBM Corporation Our solution New capability to keep timing margin nearly constant – Convert excess timing margin into a voltage reduction – Reduce traditional voltage margin when conditions are not worst-case (Some voltage margin is retained for aging, calibration inaccuracy, etc.) 1. Measure excess operational margin with timing margin sensor – Difference from a calibrated reference point 2. Protect timing margin against voltage droop by adjusting frequency – Hardware-based timing margin controller 3. Save energy by converting excess timing margin into voltage reduction – Software-based performance controller Measured frequency Adjust clock Frequency Target Timing margin controller Critical path monitor Timing margin differential Performance controller Voltage Adjust voltage © 2011 IBM Corporation Measure timing margin Use Critical Path Monitor (CPM) circuit. Mimics behavior of real critical path. Each cycle: generate pulse, traverse synthesized critical path and calibrated delay, capture in edge detector Critical Path Monitor Edge detector 12-bit output: (bit 0 = less margin, bit 11 = more margin) Edge Detector © 2011 IBM Corporation Critical path monitor 5 Critical Path Monitors per core in POWER7 (8 core chip) Middle bits of edge detector are forwarded to DPLL ISU CORE FXU D F U VSU & FPU CPM output IFU LSU CPM L2 5 L 3 5 5-bit output per CPM “11111” = large margin “11110” = some margin “11100” = ideal margin “11000” = margin too small “10000” = not enough margin DPLL NCU POWER7 core chiplet 5 © 2011 IBM Corporation Example of critical path monitor output Inject 60 mV droop into Power 755 Express Server (with no load-line) – Instruction fetch throttling Critical path sensor follows on-chip voltage reduction Injected Instantaneous voltage droop Voltage difference from nominal (mV) 30 10 -10 -30 -50 -70 1500 2000 2500 Time (ns) 3000 Measured CPM response (controllers disabled) Edge Detector Position 8 7 6 5 4 3 1500 2000 2500 Time (ns) 3000 © 2011 IBM Corporation Protect timing margin Timing margin controller responds to changing operating conditions by adjusting frequency to maintain timing margin target. – Implemented in hardware of POWER7. – Can reduce frequency by -7% in about 5 ns to handle fast voltage droop. ISU CORE FXU D F U VSU & FPU Workload, temperature, voltage, and frequency influence CPM output IFU LSU CPM CPM 5-bit output per CPM L2 5 L 3 5 DPLL +/- freq DPLL NCU POWER7 core chiplet POWER7 © 2011 IBM Corporation Calibration of critical path Teach the chip the desired timing margin to use during field operation Done once during manufacturing of chip Run chip at desired timing margin – Set voltage, frequency, and temperature – Run stressful workload Find delay setting that places timing edge on position 6 in edge detector – Position 6 is the setpoint for the timing margin controller Calibrations used in our study “100% guardband” – original product “50% guardband” – removes roughly 50% of the guardband “0% guardband” – removes nearly all guardband © 2011 IBM Corporation Timing margin controller response time Quick enough to follow voltage droops Frequency response to droop event (timing margin control enabled) 3650 20 3550 3450 Frequency 3350 (MHz) 3250 0 Deviation from Nominal -20 (mV) -40 3150 3050 -60 0 500 1000 1500 2000 2500 Time (nanoseconds) 3000 3500 CPM response to droop event (timing margin control enabled) Edge Detector Position 8 7 6 5 4 3 0 9 500 1000 1500 2000 Time (Nanoseconds) 2500 3000 3500 © 2011 IBM Corporation Save energy Performance controller adjusts voltage to meet desired clock frequency target. – Implemented in firmware of on-board microcontroller – Frequency is capped at target + 28 MHz (clock resolution) • Prevent energy waste • Allow for detection of excess timing margin for voltage reduction Workload, temperature, voltage, and frequency influence CPM output Core target frequency CPM +/- freq DPLL POWER7 Voltage regulator Adjust voltage - Real frequency Microcontroller © 2011 IBM Corporation Demonstration 11 © 2011 IBM Corporation Results Fan Processor + memory buffer DIMM Other Implement in Power 750 Express Server (4 POWER7 processors, 64 GB) – Run SPEC CPU 2006 workloads – Frequency target = 3864 MHz (Turbo) Guardband reduced to 50% – 20% chip power reduction Voltage reduced 113 mV – 140 mV – 18% system power reduction – 50% fan power reduction – No change in performance Normalized System DC Power Upper bound on power reduction (0% guardband) – 24% chip power reduction – 21% system power reduction 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Traditional 50% voltage (CPM off) 0% Guardband 12 © 2011 IBM Corporation Related work Razor [Ernst et al., MICRO-36, 2003] – Instrument actual critical paths • Compute both speculative and known-good values – Rollback and replay instructions when timing error detected • Allows safe removal of all timing margin Razor Active Guardband Management in POWER7 Area Less than 3% of chip [Das, 2009] 0.2% of core Power reduction 33-50% [Das, 2009] Up to 24% Performance 1-3% overhead from roll-back recovery [Ernst, MICRO 2003] No change Design and Verification More (pervasive) Effort Less (localized) © 2011 IBM Corporation Conclusions Demonstration of a new capability to keep timing margin nearly constant Architecture combines two feedback controllers – Hardware-based timing margin controller (safety) – Software-based performance controller (undervolting) Reduced average chip power by 20% for SPEC CPU2006 Working prototype in POWER7 server Commercially viable © 2011 IBM Corporation Acknowledgements Philip Restle Alexander Rylyakov Daniel Friedman Daniel Beece (IBM Watson Research Lab) Groundwork for the CPM-clock feedback control loop and extensive modeling to validate the CPM-DPLL feedback implemented in the POWER7 chip. Richard Willaman (IBM Austin POWER Systems Lab) Collected the instantaneous droop scope data. This work was supported in part by the Defense Advanced Research Projects Agency under contract #HR0011-07-9-0002. 15 © 2011 IBM Corporation © 2011 IBM Corporation