Poster: PDF

Accurate Fine-Grained Processor Power Proxies
∗∗ , Alper Buyuktosunoglu∗,
Wei Huang†, Charles Lefurgy∗, William Kuk∗∗,‡
Michael Floyd∗∗, Karthick Rajamani∗, Malcolm Allen-Ware∗, Bishop Brock∗∗
†AMD, ∗IBM Research, ‡Purdue University, ∗∗IBM System and Technology Group
The 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Situation
Solution
Practical ways to directly measure power consumption of a
core within a microprocessor do not exist.
Determine idle power model.
Determine active power model.
Servers commonly measure power consumption of chips at the voltage
regulator. Power management could be improved by providing accurate, finegrain power measurements for individual processor cores.
On idle chip, sweep voltage and frequency
(253 measurement points)
Run training kernel workload
(V=Vnom, F=Fnom)
Core power measurement would enable virtual machine energy billing and
forecasting the effects of power management actuators.
<Power,Voltage,Frequency,
Temp> x 253 x 4 chips
Sense
circuitry
Vdd rail
(12 V)
Bulk
Power
Supply
Voltage
regulator
A/D
Real power
measurement for
system management.
Genetic Algorithm-based Optimization
Find fitting parameters to minimize
Pmeasure - Pidle for all measurement points
POWER7+
chip
Find weights to minimize
Pactive - ActivityProxy/R
Fitting parameters
# , %, _ , ', (
EnergyScale
microcontroller
51 weights
, "
Active power model
Idle power model
Opportunity
POWER7 includes on-chip hardware to compute per-chiplet
activity proxies used to estimate active power.
Activity counters and the calculation of activity proxies are implemented in
hardware logic of each core. Instead of implementing a weighted sum, some
weights are applied to groups of activity counters to reduce circuit area.
! " ! "
DFU
FXU
,-./
1
1 ( 3 4 3
! " ! "
• Trained with 762 kernels, spanning a range of memory
sizes and threading modes.
• Idle power model has accuracy of 3% across
voltage and frequency range.
• Fit using 4 chips from distinct process corners.
Results
• Unsigned error is 1.8% (2.0% std dev) across all
tested workloads (only SPEC CPU2006 shown).
1.1
Experiment
• Calibrate POWER7+ power proxy hardware.
• Run workloads (SPEC CPU2006, SPECpower_ssj2008, etc.).
• Measure power of Vdd voltage rail.
VSU
&
FPU
ISU
_
,
Chip Vdd power proxy has a mean error of 0.2% (2.6% std dev).
Power proxy tracks change in voltage, frequency, temperature,
and workload activity.
POWER7 Chiplet (core + L2 + L3)
CORE
0
)
,
*+ ,-./+
Normalized
Chip Power
1200
Power measurement (left axis)
Power proxy (left axis)
Vdd voltage (right axis)
1
1150
1100
0.9
1050
0.8
1000
Vdd voltage
(mV)
950
0.7
900
Observations
• Tracks changes in voltage, frequency, temperature, and workload activity
• Power estimation made every 32 ms (30x faster than prior work)
• Power proxy is accurate even when voltage and frequency do not have fixed
pairings. Useful for undervolting (with fixed frequency) and overclocking (with
fixed voltage) scenarios.
• Power proxy implementation on service processor and system management
network does not impact workload performance.
IFU
LSU
Core
Activity
L2
5 events
0.6
850
0.5
800
0
5
10
Time (s)
15
20
25
• Fixed frequency run of dealII workload.
• Power proxy continues to track actual power while undervolting up to 112.5 mV.
Applications
L3
4 events
Enable billing of energy consumption for virtual machines
on a per-core basis.
Activity
Proxy
NCU
1
= Activity Sense point
The weights to different activity events are programmable by writing to special
on-chip registers. The EnergyScale microcontroller receives the activity
proxies and adjusts them to account for the effects of leakage, temperature,
process variations and voltage to form chip and core power proxies.
Estimation of core power
Bulk
Power
Supply
Vdd rail
(12 V)
Sense
circuitry
Voltage
regulator
POWER7+
chip
A/D
EnergyScale
microcontroller
1.1
0.9
0.9
1
0.8
0.8
0.9
0.7
0.7
0.8
0.6
0.6
C2 power proxy (mload)
0.5
0.5
C3 power proxy (sqroot)
0.4
0.4
C4 power proxy (mcopy)
0.3
0.3
Power sensor (DPS)
Power proxy (DPS)
Power sensor (DPS,UV)
Power proxy (DPS,UV)
Power sensor (Nominal)
Power proxy (Nominal)
Chip measured Vdd power
C5 power proxy (fma)
C6 power proxy (daxpy)
0.7
Normalized
chip Vdd
power
100% load
0.6
0.5
0.4
0.3
0.2
0.2
0.1
0.1
0.2
0
0.1
0
Activity proxy per core,
clock frequency per core,
temperature per core,
voltage
Improve power management controllers by forecasting power
due to change in voltage, frequency, temperature, and workload.
1
C7 power proxy (mlr)
1
80
159
238
317
396
475
554
633
712
791
870
Per-core power
estimation
Normalized
chip Vdd
power
Time (32 ms)
• Run 6 workloads on 6-core chip and compare to real chip Vdd power (3% err).
• Power proxy tracks thermal rise.
10
astar
bwaves
bzip2
cactusADM
calculix
dealII
gamess
gcc
GemsFDTD
gobmk
gromacs
h264ref
hmmer
lbm
leslie3d
libquantum
mcf
milc
namd
omnetpp
perlbench
povray
sjeng
soplex
sphinx3
tonto
wrf
xalanbmk
zeusmp
Average
Counters and active power
" , Genetic Algorithm-based Optimization
Absolute percent err (%)
2
4
6
8
0
Measure activity counters
Measure power on Vdd rail
Pactive = Pmeasured - Pidle(V,F,T)
Measure power on Vdd rail and
measure chip temperature
Conventional power measurement of a chip voltage rail
Validation: Pchip measurement
matches Pidle + Pactive models.
40% load
20% load
0
0
100
200
Time (s)
300
400
• SPECPower_ssj2008 run under different conditions
UV = undervolting, DPS = Dynamic Power Saver (voltage and frequency scaling).