Slides: PDF

Power Management for Computer Systems
and Datacenters
Karthick Rajamani, Charles Lefurgy, Soraya Ghiasi, Juan C Rubio,
Heather Hanson, Tom Keller
{karthick, lefurgy, sghiasi, rubioj, hlhanson, tkeller}@us.ibm.com
IBM Austin Research Labs
ISLPED 2008
© 2008 IBM Corporation
Overview of Tutorial
2
1. Introduction and Background
2. Power Management Concepts
• New focus on power management - why,
who, what ?
• Understanding the problem
ƒ Diverse requirements
ƒ WHAT is the problem ?
ƒ Understanding variability
• Basic solutions
• Advanced solutions
ƒ Building blocks - sensors and actuators
ƒ Feedback-driven and model-assisted
solution design
3. Industry solutions
4. Datacenter
• Sample solutions
• In-depth solutions
• Facilities Management
ƒ Anatomy of a datacenter
ƒ Improving efficiency
ISLPED 2008
© 2008 IBM Corporation
Scope of this tutorial
Power Management Solutions for Servers and the Data Center
•
Problems
•
Solution concepts, characteristics and context
•
Tools and approaches for developing solutions
•
Brief overview of some industrial solutions
Outside Scope
Power-efficient microprocessor design, circuit, process technologies
Power-efficient middleware, software, compiler technologies
Embedded systems
3
ISLPED 2008
© 2008 IBM Corporation
Servers and Storage Heat Density Trends
5
ISLPED 2008
© 2008 IBM Corporation
Cost of Power and Cooling
Spending
(US$B)
$120
$100
Worldwide Server Market
Installed Base
(M Units)
50
45
New Server Spending
40
Power and Cooling
35
$80
30
$60
25
20
$40
15
10
$20
5
$0
0
Source: IDC, The Impact of Power and Cooling on Data Center Infrastructure, May 2006
6
ISLPED 2008
© 2008 IBM Corporation
Government and Organizations in Action
EPA EnergyStar for Computers
Japan Energy
Savings Law
2005
ECMA
TC38-TG2
EEP
US Congress
HR5646
2006
2007
2008
SPECpower
Regulations and standardization of computer energy efficiency (e.g. Energy Star for
computers), because of
• Spiraling energy and cooling costs at data centers
• Environmental impact of high energy consumption
Japan, Australia, EU and US EPA working out joint/global regulation efforts
Benchmarks for power and performance
• Metrics: GHz, BW, FLOPS → SpecInt, SpecFP, Transactions/sec →
Performance/Power
8
ISLPED 2008
© 2008 IBM Corporation
Benchmarking Power-Performance: SPECPower_ssj_2008
Based on SPECjbb, a java performance
benchmark.
ƒ Generalization to other workloads,
ƒ Self-calibration phases determine peak
throughput on system-under-test
ƒ Benchmark consists of 11 load levels: 100%
of peak throughput to idle, in 10% steps
ƒ Fixed time interval per load level
ƒ Random arrival times for transactions to
mimic realistic variations within each load level
70
60
50
40
30
20
10
power per level
0%
10%
20%
30%
40%
50%
60%
70%
90%
80%
SPECPower Load Levels
Time
Source: Heather Hanson, IBM
idle
11
100%
0
calibration
∑
80
calibration
idle
100%
Average Utilization
calibration
∑ Throughput per level
% Load, Utilization, and Power
Range of load levels – not just peak
100%
Normalized Power
90
blade/cluster environments underway.
Primary benchmark metric
Load Level
100
ISLPED 2008
© 2008 IBM Corporation
Outline
1. Introduction and Background
2. Power Management Concepts
• New focus on power management - why,
who, what ?
• Understanding the problem
ƒ Diverse requirements
ƒ WHAT is the problem?
ƒ Understanding variability
• Basic solutions
• Advanced solutions
ƒ Building blocks - sensors and actuators
ƒ Feedback-driven and model-assisted
solution design
3. Industry solutions
4. Datacenter
• Sample solutions
• In-depth solutions
• Facilities Management
ƒ Anatomy of a datacenter
ƒ Improving efficiency
15
ISLPED 2008
© 2008 IBM Corporation
User Requirements
Users desire different goals
•
High performance
•
Safe and reliable operation
•
Low operating costs
But solution for one can be potentially
contradictory to a different goal
16
•
Increasing frequency for higher
performance can lead to unsafe
operating temperatures
•
Lowering power consumption by
operating at lower active states can
help safe operation while increasing
execution time and potentially total
energy costs
ISLPED 2008
© 2008 IBM Corporation
Is a single sub-system the main problem? No
Server power budget breakdown for different classes/types.
Normalized Power Breakdown
100%
90%
80%
70%
60%
Processor / Cache
50%
Memory subsystem
40%
IO and Disk Storage
30%
Cooling
20%
Power subsystem
10%
0%
de
no
e
ad
Bl
PC
H
nd
he
ig
H
e
m
ra
nf
ai
M
Using budget estimates for specific machine
configurations, for illustration purposes.
Biggest power consumer varies with server class
Important to understand composition of target class for delivering targeted solutions
Important to address power consumption/efficiencies in all main subsystems
18
ISLPED 2008
© 2008 IBM Corporation
Workloads and Configuration Dictate Power Consumption
Estimated power breakdown for two Petaflop supercomputer HPC node designs, each configuration tailored to application class
Big Science System Power Breakdown
(HPL)
Big Simulation System Power Breakdown
(Stream)
9%
9%
16%
14%
3%
1%
11%
4%
59%
2%
24%
48%
Processors
Optics
Processors
Optics
Memory DIMMs
Package Distribution Losses
Memory DIMMs
Package Distribution Losses
DC-DC Losses
AC-DC Losses
DC-DC Losses
AC-DC Losses
Source: Karthick Rajamani, IBM
ƒ
ƒ
Depending on configuration and usage, processor or memory sub-system turn out to be
dominant
Power distribution/conversion losses is a significant fraction
Note1: Room A/C energy costs are not captured – significant at the HPC data center level.
Note 2: I/O and storage can also be significant power consumers for commercial computing installations
19
ISLPED 2008
© 2008 IBM Corporation
Power Variability across Real Systems
Large variation between different applications
•
2x between LINPACK and idle, Linpack about 20% higher than memory-bound
SPEC CPU2000 swim benchmark.
Smaller power variation between individual blades (8%)
Source: Charles Lefurgy, IBM
250
LINPACK
Power measured on 33 IBM HS21 blades
P o w e r (W )
200
Swim
(SPEC 2000)
150
Idle
100
Processor clock modulation
50
Processor voltage/freq scaling
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
Processor effective frequency (MHz)
21
ISLPED 2008
© 2008 IBM Corporation
Impact of Process Variation
1.12
1.10
Normalized Power
Source: Juan Rubio, Karthick Rajamani, IBM
1.10
1.08
1.08
1.08
1.06
1.03
1.04
1.02
1.00
1.00
0.98
0.96
0.94
Part 5
Part 4
Part 3
Part 2
Part 1
10% power variation in random sample of five ‘identical’ Intel PentiumM processor chips
running Linpack.
23
ISLPED 2008
© 2008 IBM Corporation
Source: Karthick Rajamani, IBM
1.51
1.51
1.28
1.16
1.00
1.0
1.52
1.5
1.52
2.0
1.55
2.07
2.5
1.00
Normalized Current/Power
Variations from Design
0.5
0.0
Max Active (Idd7)
Vendor 1
Vendor 2
Vendor 3
Max Idle (Idd3N)
Vendor 4
Vendor 5
Memory power specifications for DDR2 parts with identical performance
specifications
•
24
2X difference in active power and 1.5X difference in idle power
ISLPED 2008
© 2008 IBM Corporation
Environmental Conditions: Ambient Temperature
~5 C drop in ambient
temperature
~5 C drop in CPU
temperature
Frequency
(MHz)
time
time
Time (hh:mm)
Source: Heather Hanson, Coordinated Power, Energy, and Temperature Management, Dissertation, 2007.
25
ISLPED 2008
© 2008 IBM Corporation
Take Away
Power management solutions need to target the computer system as a
whole with mechanisms to address all major sub-systems. Further,
function-oriented design and workload’s usage of a system can create
dramatic differences in the distribution of power by sub-systems.
User requirements impose a diverse set of constraints which can
sometimes be contradictory.
There is increasing variability and even unpredictability of power
consumption due to manufacturing technologies, incorporation of circuitlevel power reduction techniques, workload behavior and environmental
effects.
Power management solutions need to be flexible and adaptive to
accommodate all of the above.
27
ISLPED 2008
© 2008 IBM Corporation
Outline
1. Introduction and Background
2. Power Management Concepts
• New focus on power management - why,
who, what ?
• Understanding the problem
ƒ Diverse requirements
ƒ WHAT is the problem ?
ƒ Understanding variability
• Basic solutions
• Advanced solutions
ƒ Building blocks - sensors and actuators
ƒ Feedback-driven and model-assisted
solution design
3. Industry solutions
4. Datacenter
• Sample solutions
• In-depth solutions
• Facilities Management
ƒ Anatomy of a datacenter
ƒ Improving efficiency
28
ISLPED 2008
© 2008 IBM Corporation
Basic Solutions – Save Energy
Function Definition: Reduce power consumption of computer system when not in
active use.
Conventional Implementation:
Exploiting architected idle modes of processor entered by executing special
code/instructions with interrupts causing exit.
•
Multiple modes with increasing reduction in power consumption usually with
increased latency to entry/exit
•
Exploit circuit-level clock gating of significant portions of the chip and more recently
voltage-frequency reductions for idle states
Exploiting device idle modes through logic in controller or device driver e.g.
29
•
standby state for disks
•
self-refresh for DRAM in embedded systems
ISLPED 2008
© 2008 IBM Corporation
Basic Solutions – Avoid System Failure
Function Definition: Maintain continued computer operation in the face of power or
cooling emergencies.
Conventional Implementation
Using redundant components in power distribution and cooling sub-systems
•
N+M solutions.
Throttling of components under thermal overload.
•
Employing thermal sensors, components are throttled when temperatures exceed
pre-set thresholds.
Fan speed is adjusted based on heat load detected with thermal sensors – to
balance acoustics, power and cooling considerations.
30
ISLPED 2008
© 2008 IBM Corporation
Adaptive Solutions: Key to address variability and diverse conditions
Feedback-driven control provides
1. capability to adapt to environment,
workload, varying user requirements
2. regulate to desired constraints even with
imperfect information
Models provide ability to
1. estimate unmeasured quantities
2. predict impact for change
Feedback-driven and model-assisted
control framework
Sensors
ƒ Provide real-time feedback
ƒ Power, temperature, performance
(activity), stability
ƒ Weapon against variability and
unpredictability
31
Actuators
ƒ Regulate
ƒ component states e.g. voltage/freq, DRAM
power-down;
ƒ activity – instruction/request throughput.
ƒ Manage CPU, memory, fans, disk.
ƒ Tune power-performance levels to
environment, workload and constraints.
ISLPED 2008
© 2008 IBM Corporation
Thermal Sensors
Thermal sensor key characteristics
• Accuracy and precision - lower values require higher tolerance margins for thermal
control solutions.
• Accessibility and speed - Impact placement of control and rate of response.
Ambient measurement sensors
• Located on-board, inlet temperature, outlet temperature, at the fan e.g. National
Semiconductor LM73 on-board sensor with +/-1 deg C accuracy.
• Relatively slower response time – observing larger thermal constant effects.
• Standard interfaces for accessing include PECI, I2C, SMBus, and 1-wire
On-chip/-component sensors
• Measure temperatures at specific locations on the processor or in specific units
• Need more rapid response time, feeding faster actuations e.g. clock throttling.
• Proprietary interfaces with on-chip control and standard interfaces for off-chip control.
32
ISLPED 2008
© 2008 IBM Corporation
Power Measurement Sensors
AC power
• External components – Intelligent PDU,
SmartWatt
• Intelligent power supplies – PSMI standard
• Instrumented power supplies
DC power
• Most laptops – battery discharge rate
• IBM Active Energy Manager – system power
• Intel Foxton technology – processor power
Sensor must suit the application:
• Access rate (second, ms, us)
• Accuracy
• Precision
• Accessibility (I2C, ethernet, open source driver)
33
(1)
C. Poirier, R. McGowen, C. Bostak, S. Naffziger. “Power and Temperature Control on a 90nm
Itanium-Family Processor”, ISSCC 2007
(2)
HotChips - http://www.hotchips.org/archives/hc17/3_Tue/HC17.S8/HC17.S8T3.pdf
ISLPED 2008
© 2008 IBM Corporation
Activity Monitors
‘Performance’ Counters
• Traditionally part of processor performance monitoring unit
• Can track microarchitecture and system activity of all kinds
• A fast feedback for activity, have also been shown to serve as
potential proxies for power and even thermals
Resource utilization metrics in the operating system
• Serve as useful input to resource state scheduling solutions for
power reduction
Application performance metrics
• Best feedback for assessing power-performance trade-offs
35
ISLPED 2008
© 2008 IBM Corporation
Actuators for Processor Power Control
Processor pipeline throttling in IBM Power4 and follow-ons.
Clock throttling in x86 architectures.
Dynamic voltage and frequency scaling (DVFS) in modern Intel, AMD, IBM POWER6 processors.
Power (W)
Power vs Performance
Chart shows DFS/DVFS
power-performance trade-offs
on an IBM LS-20 blade that
uses AMD Opteron processor.
180
160
140
120
100
80
60
40
20
0
DVFS significantly more
efficient than DFS.
DVFS trade-offs also nearlinear at system-level.
0
0.2
0.4
0.6
0.8
1
1.2
Pe rformance
LS-20 Power (DVFS)
LS-20 Power (DFS @1.4V)
Source: Karthick Rajamani, IBM
36
ISLPED 2008
© 2008 IBM Corporation
Memory Systems Power Management
MP Stream-Copy Pow er
•
Request throttling an effective means to limit memory
power with linear power-performance trade-offs
Incorporation of DRAM idle power-down modes can
significantly lower their power.
•
Can be implemented in memory controller
•
Can increase performance in power-constrained systems
by reducing throttling on active ranks as idle ranks
consume less power
Normalized Power
DRAM power linearly related to memory bandwidth.
1.00
1
0.8
0.6
0.4
0.2
0
0.89
0.71
0.52 0.49
16 ranks
8 ranks
0.57
0.35
4 ranks
Basic
0.40
16 ranks
0.58
0.31
8 ranks
0.46
0.29
4 ranks
With Pow er Management
1 chgrp
2 chgrp
N or m aliz ed B u s B an dw id th
1.00
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Normal
With Power Management
0.83
0.72
0.72
0.63
0.50
0.50
0.39
Normalized Performance
MP Stream-Copy Performance
0
Commercial Best Case Commercial Worst Case
Technical Best Case
Technical Worst Case
1.16
1.00
16 ranks
1.030.99
8 ranks
1.03
0.83
4 ranks
Basic
1.021.06 1.011.05
1.02
0.78
16 ranks
8 ranks
4 ranks
With Power Management
1 chgrp
2 chgrp
Combination of 2-channel grouping and Power-down
management attacks both active and idle power
consumption yielding the best results.
Source: Karthick Rajamani, IBM
37
1.2
1
0.8
0.6
0.4
0.2
0
ISLPED 2008
© 2008 IBM Corporation
Low-power Enterprise Storage
Better components
•
Use disks with fewer higher capacity spindles
•
Flash solid-state drives
•
Variable speed disks – with disk power being proportional to rotational speed
(squared), tailoring speed to required performance could improve efficiency
Massive Array of Idle Disks (MAID) – turn on only 25% of array at one time
•
Targets “persistent data” layer between highly available, low latency disk storage
and tape
•
May need to tolerate some latency when a disk turns on, keep some always on
•
Improve performance by tuning applications to target only 25% of array
‘Virtualization’
38
•
“Thin provisioning” or over-subscription of storage can drive utilization toward
100%, delaying purchase of additional storage
•
Physical storage is dedicated only when data is actually written
ISLPED 2008
© 2008 IBM Corporation
Outline
1. Introduction and Background
2. Power Management Concepts
• New focus on power management - why,
who, what ?
• Understanding the problem
ƒ Diverse requirements
ƒ WHAT is the problem ?
ƒ Understanding variability
• Basic solutions
• Advanced solutions
ƒ Building blocks - sensors and actuators
ƒ Feedback-driven and model-assisted
solution design
3. Industry solutions
4. Datacenter
• Sample solutions
• In-depth solutions
• Facilities Management
ƒ Anatomy of a datacenter
ƒ Improving efficiency
39
ISLPED 2008
© 2008 IBM Corporation
Adaptive Power Management Demo
Power6 blade prototype power capping demo
• Demonstrates ability to adapt at runtime to workload changes and user
input (power cap limit) while maintaining blade power below specified
value.
Power6 blade power savings demo
• Demonstrates ability to adapt to load characteristics to provide increased
power savings by matching power-performance level to load-demand.
- Adapting to load level
- Adapting to load type
41
ISLPED 2008
© 2008 IBM Corporation
Advanced Solutions – Performance Maximization
Usage:
When cooling or power resources are shared by multiple entities, dynamic
partitioning of the resource among the sharing entities can increase utilization of the
shared resource enabling higher performance. We term this technique as power
shifting.
Figure shows an example of CPU-memory
power shifting.
60
ƒ Points show execution intervals of many
workloads with no limit on power budget.
CPU Power Watts
50
40
ƒ ‘Static’ dotted rectangle encloses
unthrottled intervals for a budget of 40W
partitioned statically – 27W CPU, 13W
Memory.
30
Static
20
10
Dynamic
0
0
10
20
30
40
50
60
Memory Power Watts
47
ƒ ‘Dynamic’ dashed triangle encloses
unthrottled intervals for power shifting with
40W budget
ƒ Better performance as a much
larger set of intervals run unthrottled.
ISLPED 2008
© 2008 IBM Corporation
Power Shifting between Processor and Memory
Models
Pcpu = DPC * C1 + Pstby _ cpu
Pmem = BW * M 1 + Pstby _ mem
Re-adjust budgets based on feedback
Pdynamic = Pbudget − Pcpu _ stby − Pmem _ stby
Pest = DPC 0 * C1 + BW 0 * M 1
Budgetcpu = DPC 0 *C1/ Pest * Pdynamic + Pcpu _ stby
Budgetmem = BW 0 * M 1/ Pest * Pdynamic + Pmem _ stby
New budgets determine new
throughput and bandwidth
limits.
Graph shows the increase in execution time for constrained power
budget with SPECCPU2000 workloads
ƒ Proportional-Last-Interval – particular power shifting algorithm
ƒ Static – common budget allocation – using average consumption
across workloads.
48
ISLPED 2008
Reference: A Performance Conserving
Approach for Reducing Peak Power in
Server Systems – W Felter, K Rajamani, T
Keller, C Rusu, ICS 2005
© 2008 IBM Corporation
Take Away
Advanced functions extend basic power management capabilities
• Providing adaptive solutions for tackling variability in systems,
environment and requirements, and
• Enabling dynamic power-performance trade-offs.
They can be implemented by methodologies incorporating
• Targeted sensors and actuators.
• Feedback-control systems.
• Model-assisted frameworks.
53
ISLPED 2008
© 2008 IBM Corporation
Virtualization – Opportunities and Challenges for Power Reduction
Different studies have shown significant under-utilization of compute resources in large data
centers.
Virtualization enables multiple low-utilization OS images to occupy a single physical server.
•
Multi-core processors with virtualization support and large SMP systems provide a growing
infrastructure which facilitates virtualization-based consolidation.
The common expectation is
•
A net reduction in energy costs.
•
Lower infrastructure costs for power delivery and cooling.
However:
•
Processor capacity is not the only resource workloads need – memory, I/O..
•
Workloads on partitions sharing a system might have different power-performance needs.
Isolating and understanding the characteristics and consequent power management needs
for each partition is non-trivial, requiring
54
•
Additional instrumentation and monitoring capabilities.
•
Augmenting VM managers for coordinating, facilitating or managing power management
functions
ISLPED 2008
© 2008 IBM Corporation
Managing Server Power in a Data Center
http://dilbert.com/strips/comic/2008-02-12
http://dilbert.com/strips/comic/2008-02-14
57
ISLPED 2008
© 2008 IBM Corporation
Outline
1. Introduction and Background
2. Power Management Concepts
• New focus on power management - why,
who, what ?
• Understanding the problem
ƒ Diverse requirements
ƒ WHAT is the problem ?
ƒ Understanding variability
• Basic solutions
• Advanced solutions
ƒ Building blocks - sensors and actuators
ƒ Feedback-driven and model-assisted
solution design
3. Industry solutions
4. Datacenter
• Sample solutions
• In-depth solutions
• Facilities Management
ƒ Anatomy of a datacenter
ƒ Improving efficiency
58
ISLPED 2008
© 2008 IBM Corporation
Efficiency improvements
Computer Industry Response
DC-powered data center
Free cooling
Hot-aisle containment
In-rack cooling
Power measurement
Cluster power trending
Efficient power supplies
Virtualization
Dynamic Smart Cooling
Mobile data center
Power capping
Adaptive power savings
Thermal sensors
Idle modes
DVFS
Data Center
Liquid cooling
Rack
Server
Components
59
ISLPED 2008
© 2008 IBM Corporation
In-depth Examples
1. Active Energy Manager – for cluster-wide system monitoring and policy
management.
2. EnergyScale – for power management of POWER6 systems.
60
ISLPED 2008
© 2008 IBM Corporation
IBM Systems Director Active Energy Manager
Provides a single view of the actual power usage across multiple platforms.
Measures, trends, and controls energy usage of all managed systems.
Management server
IBM Systems Director
Active Energy Mgr.
Network
Network
61
Rack
servers
Blade Enclosures
Service
processor
X86 blade
Cell blade
Power blade
Management
module
ISLPED 2008
Legacy
systems or
without
compatible
instrumentation
iPDU
© 2008 IBM Corporation
Rack-mount Server Management
Measure and Trend: power, thermals
Control: power cap, energy savings
62
ISLPED 2008
© 2008 IBM Corporation
Intelligent Power Distribution Unit (iPDU)
Web interface
User defined
receptacle names
Current Power
Cumulative use
64
ISLPED 2008
© 2008 IBM Corporation
EnergyScale Elements and Functionality
Thermal / Power measurement
System health monitoring/maintenance
Power/thermal capping
Hardware
Management
IBM Director Active
Energy Manager
Module
Power saving
Performance-aware power management
IBM
POWER6™
Power
Power
Modules
Modules
EnergyScale
65
Flexible
Service
Processor
(FSP)
EnergyScale
Controller
System-levelpower
power
System-level
management
management
firmware
firmware
Power
Power
Measurement
Measurement
ISLPED 2008
© 2008 IBM Corporation
EnergyScale Sensors
67
Sensor
Feedback On
Description
On chip digital temperature
sensors (internal use)
POWER6 chip temperature
24 digital temp. sensitive ring oscillators
– 8 per core, 8 in nest
On chip analog temperature
sensors
POWER6 chip temperature
3 metal thermistors – 1 per core, 1 in
nest – analog
Critical path monitor (internal
use)
POWER6 operational stability
24 sensors providing real-time timing
margin feedback
Voltage Regulation Module
(VRM)
Component/voltage-rail power
Voltage and current provided for each
voltage-rail in the system
Temperature of VRM components
On-board power measurement
sensors
System and component power
Calibrated sensors, accurate, real-time
feedback
Discrete temperature sensor
System level ambient temperature
Reports temperature of air seen by
components
Dedicated processor activity
counters
Core activity and performance
Per-core activity information
Dedicated memory controller
activity counters
DRAM usage
Per-controller activity and power
management information
ISLPED 2008
© 2008 IBM Corporation
Temperature Sensors on POWER6
Core 0
Digital thermal sensors
• Quick response time
• Placed in hot-spots identified
during simulation and early part
characterization
Metal thermistors
• Very accurate
MC 0
• Used in previous designs
SMP Coherency
Fabric
MC 1
Core 1
• Large area for a 65nm chip
Source: M. Floyd et al., IBM J. R&D, November 2007
68
ISLPED 2008
© 2008 IBM Corporation
EnergyScale Actuators
Actuator/Mode
Controlled By
Description
Dynamic Voltage and Frequency
Scaling (DVFS)
Service processors
Variable frequency oscillator control,
voltage control for array and logic
domains, system-wide
Processor Pipeline Throttling
On-chip thermal protection
circuitry, service processors
6 different modes of pipeline throttling,
manageable on a per-core basis
Processor Standby Modes
OS and Hypervisor
Nap/Active
Memory Throttling
Service processors
4 different modes of activity/request
throttling
Memory Standby Modes
Memory controller
Per-rank power-down mode enable –
FSP/dedicated microcontroller
Fan Speeds
Service processors
Based on ambient temperature
Service Processors: FSP and EnergyScale Controller on slide 70
69
ISLPED 2008
© 2008 IBM Corporation
Water cooling
Thermal
conductivity
[W/(m*K)]
Volumetric
heat capacity
[kJ/(m3*K)]
Air
0.0245
1.27
H2O
0.6
4176
Water-cooled
chips
NCAR Bluefire Supercomputer using IBM p575 hydrocluster. Images courtesy of UCAR maintained
Bluefire web gallery.
73
ISLPED 2008
© 2008 IBM Corporation
Typical Industry Solutions in 2008: Other Related
Function
Description
Example
On demand
Purchase cycles on demand (avoid owning idle resources)
Amazon: Elastic Compute
Cloud (EC2)
Data center
assessment
Measure power/thermal/airflow trends. Use computational fluid
dynamics to model data center.
IBM, HP, Sun, and many others
Recommend changes to air flow, equipment placement, etc.
Certification for
carbon offsets
3rd party verifies energy reduction of facilities.
Trade certificates for money on certificate trading market.
Neuwing Energy Ventures
Utility rebates
Encourage data centers to use less power (e.g. by using
virtualization)
PG&E
Solutions shown in example column are representative ones incorporating the specific function/technique.
Many of these solutions also provide other functions.
No claim is being made regarding superiority of any example shown over any alternatives.
78
ISLPED 2008
© 2008 IBM Corporation
Take Away
There is significant effort from industry on power and temperature
management of computer systems
The current intent is not only to make individual components more
energy efficient, but:
• Enhance functionality in certain components
• Integrate them into system-level power management solutions
79
ISLPED 2008
© 2008 IBM Corporation
Outline
1. Introduction and Background
2. Power Management Concepts
• New focus on power management - why,
who, what ?
• Understanding the problem
ƒ Diverse requirements
ƒ WHAT is the problem ?
ƒ Understanding variability
• Basic solutions
• Advanced solutions
ƒ Building blocks - sensors and actuators
ƒ Feedback-driven and model-assisted
solution design
3. Industry solutions
4. Datacenter
• Sample solutions
• In-depth solutions
• Facilities Management
ƒ Anatomy of a datacenter
ƒ Improving efficiency
80
ISLPED 2008
© 2008 IBM Corporation
The Data Center Raised Floor
No two are the same
81
ISLPED 2008
© 2008 IBM Corporation
A Typical Data Center Raised Floor
Racks (computers,
storage, tape)
Networking
equipment
(switches)
Secured
Vault
Network
Operating
Center
82
Fiber Connectivity Terminating
on Frame Relay Switch
ISLPED 2008
© 2008 IBM Corporation
Power Delivery Infrastructure for a Typical Large Data Center
(30K sq ft of raised-floor and above)
Several pounds
of copper
Transfer panel switch
83
Power
Distribution
Unit (PDU)
Diesel generators
Uninterruptible Power
Supply (UPS) modules
Diesel tanks
ISLPED 2008
UPS batteries
Power feed substation
© 2008 IBM Corporation
Cooling Infrastructure for a Typical Large Data Center
(30K sq ft of raised-floor and above)
Computer Room Air Conditioning
(CRAC) units
Water pumps
Water chillers
84
Cooling towers
ISLPED 2008
© 2008 IBM Corporation
Sample Data Center Energy Consumption Breakdown
Fans in the servers already
consume 5-20% of the
computer load
Reference: Tschudi, et al., “Data Centers and Energy Use – Let’s Look at the Data”, ACEEE 2003
85
ISLPED 2008
© 2008 IBM Corporation
Data Center Efficiency Metrics
Need metrics to indicate energy efficiency of entire facility
• Metrics do not include quality of IT equipment
Power Usage
Effectiveness
(PUE)
Most commonly used metrics
Power Usage Effectiveness ( PUE ) =
Total facility power
IT equipment power
Data Center Efficiency ( DCE ) =
IT equipment power
Total facility power
Study of 22 sample data centers
3.0
2.5
Fallacy: Cooling
power = IT power.
Reality: Data center
efficiency varies.
2.0
1.5
1.0
0.5
Minimum PUE
0.0
1
2
3
4
5
6
7
8 9 10 11 12 14 16 17 18 19 20 21 22
Data Center number
Reference: Tschudi, et al., “Measuring and Managing Data Center Energy Use”, 2006
86
ISLPED 2008
© 2008 IBM Corporation
Outline
1. Introduction and Background
2. Power Management Concepts
• New focus on power management - why,
who, what ?
• Understanding the problem
ƒ Diverse requirements
ƒ WHAT is the problem ?
ƒ Understanding variability
• Basic solutions
• Advanced solutions
ƒ Building blocks - sensors and actuators
ƒ Feedback-driven and model-assisted
solution design
3. Industry solutions
4. Datacenter
• Sample solutions
• In-depth solutions
• Facilities Management
ƒ Anatomy of a datacenter
ƒ Improving efficiency
87
ISLPED 2008
© 2008 IBM Corporation
Data Center Power Distribution
88
ISLPED 2008
© 2008 IBM Corporation
Maintainability and Availability vs. Energy Efficiency
Distributing power across a large area requires a decentralized/hierarchical
architecture
• Power transformers (13.2 kV to 480 V or 600 V)
• Power distribution units (PDU)
• And a few miles of copper
Maintaining the uptime of a data center requires the use of redundant
components
• Uninterrupted Power Supplies (UPS)
• Emergency Power Supply (EPS) – e.g. diesel power generators
• Redundant configurations (N+1, 2N, 2(N+1), …) to guarantee power and
cooling for IT equipment
Both introduce energy losses
89
ISLPED 2008
© 2008 IBM Corporation
Data Center Power Delivery
Upstream tap
230 kV
Downstream tap
230 kV
Other
Loads
Power Grid
Circuit
Breaker
Circuit
Breaker
Circuit
Breaker
Circuit
Breaker
Substation
13.2 kV
Power Feeds
‘A’ feed to Data Center
‘B’ feed to Data Center
13.2 kV
230 kV
13.2 kV
EPS
Generators
Sub
Sub
Sub
n/c bkr
n/o bkr
n/c tie
breaker
UPS
Batteries
Static Bypass
UPS System
Maintenance
Bypass
n/o bkr
n/c tie
breaker
Alternate
Bypass
Maintenance
Bypass
480 V
Transfer Switch Gear A
Transfer Switch Gear B
tie breaker
PDU
Substation usually
receives power from 2 or
more points
Substation provides 2
redundant feeds to data
center
If utility power is lost:
1. UPS can support
critical load for at
least 15 minutes
2. EPS generators can
be online in 10-20
seconds
3. The transfer switch
will switch to EPS
generators for power
4. EPS generators are
diesel fueled and can
run for an extended
period of time
480 V
Critical Load
90
ISLPED 2008
© 2008 IBM Corporation
Data Center Power Conversion Efficiencies
UPS(1) Power Distribution(2)
88 - 92%
98 - 99%
Power Supply(3,4)
55 - 90%
DC/DC(5)
78% - 93%
The heat generated from the losses at each step of power
conversion requires additional cooling power
(1) http://hightech.lbl.gov/DCTraining/graphics/ups-efficiency.html
(2) N. Rasmussen. “Electrical Efficiency Modeling for Data Centers”, APC White Paper, 2007
(3) http://hightech.lbl.gov/documents/PS/Sample_Server_PSTest.pdf
(4) “ENERGY STAR® Server Specification Discussion Document”, October 31, 2007.
(5) IBM internal sources
91
ISLPED 2008
© 2008 IBM Corporation
Stranded power in the data center
Data center must wire to nameplate power
However, real workloads do not use that much power
Result: available power is stranded and cannot be used
Example: IBM HS20 blade server – nameplate power is 56 W above real workloads.
300
Nameplate power: 308 W
Stranded power: 56 W
250
200
150
Real workload maximum
100
0
nameplate
50
idle
gzip-1
vpr-1
gcc-1
mcf-1
crafty-1
parser-1
eon-1
perlbmk-1
gap-1
vortex-1
bzip2-1
twolf-1
wup
swim-1
mgrid-1
applu-1
mesa-1
galgel-1
art-1
equake-1
facerec-1
ammp-1
lucas-1
fma3d-1
sixtrack-1
apsi-1
gzip-2
vpr-2
gcc-2
mcf-2
crafty-2
parser-2
eon-2
perlbmk-2
gap-2
vortex-2
bzip2-2
twolf-2
wup
swim-2
mgrid-2
applu-2
mesa-2
galgel-2
art-2
equake-2
facerec-2
ammp-2
lucas-2
fma3d-2
sixtrack-2
apsi-2
SPECJBB
LINPACK
Server Power (W)
350
Source: Lefurgy, IBM
93
ISLPED 2008
© 2008 IBM Corporation
Data Center Direct Current Power Distribution
Goal:
• Reduce unnecessary conversion losses
Approach:
• Distribute power from the substation to the rack as DC
• Distribute at a higher voltage than with AC to address voltage
drops in transmission lines
Challenges:
• Requires conductors with very low resistance to reduce losses
• Potential changes to server equipment
Prototype:
• Sun, Berkeley Labs and other partners.
(1) http://hightech.lbl.gov/dc-powering/
94
ISLPED 2008
© 2008 IBM Corporation
AC System Losses Compared to DC
480 VAC
Bulk Power
Supply
AC/DC
DC/AC
AC/DC
DC/DC
12 V
PSU
UPS
VRM
12 V
VRM
5V
PDU
VRM
9% measured
improvement
480 VAC
Bulk Power
Supply
2-5% measured
improvement
VRM
1.2 V
VRM
1.8 V
Server
VRM
12 V
AC/DC
DC UPS
or
Rectifier
DC/DC
0.8 V
VRM
12 V
VRM
5V
380 VDC
PSU
3.3 V
3.3 V
Loads
using
Legacy
Voltages
Loads
using
Silicon
Voltages
Loads
using
Legacy
Voltages
VRM
Server
95
ISLPED 2008
VRM
1.2 V
VRM
1.8 V
0.8 V
Loads
using
Silicon
Voltages
VRM
© 2008 IBM Corporation
Typical Industry Solutions in 2008: Power Consumption
Function
Description
Example
Configurator
Estimate power/thermal load of system before purchase
Sun: Sim Datacenter
Measurement
Servers with built-in sensors measure power, inlet temperature,
outlet temperature.
HP: server power supplies that
monitor power
Power capping
Set power consumption limit for individual servers to meet
rack/enclosure constraints.
IBM: Active Energy Manager
Energy savings
Performance-aware modeling to enable energy-savings modes
with minimal impact on application performance.
IBM: POWER6 EnergyScale
Power off
Turn off servers when idle. Based on user-defined policies
(load, time of day, server interrelationships)
Cassatt: Active Response
Virtualization
Consolidate computing resources for increased efficiency and
freeing up idle resources to be shutdown or kept in low-power
modes.
VMware: ESX Server
DC-powered
data center
Use DC power for equipment and eliminate AC-DC conversion.
Validus DC Systems
Componentlevel control
Enable control of power-performance trade-offs for individual
components in the system.
AMD: PowerNow,
Intel: Enhanced Speedstep
Solutions shown in example column are representative ones incorporating the specific function/technique.
Many of these solutions also provide other functions.
No claim is being made regarding superiority of any example shown over any alternatives.
96
ISLPED 2008
© 2008 IBM Corporation
Data Center Cooling
97
ISLPED 2008
© 2008 IBM Corporation
Raised Floor Cooling
Thermodynamic part of cooling:
Hot spots (high inlet temperatures) impact CRAC efficiency (~ 1.7% per oF)
Transport part of cooling:
Low CRAC utilization impacts CRAC blower efficiency (~3 kW/CRAC)
Source: Hendrik Hamann, IBM
99
ISLPED 2008
© 2008 IBM Corporation
Impact of Raised Floor Air Flow on Server Power
When there is not enough cold air
coming from the perforated tiles
Source: J. Rubio, IBM
• Air pressure drops in front of the
machine
• Servers fans need to work harder to
get cold air across its components.
• Additionally, hot air can create a high
pressure area, and overflow into the
cold aisle
System Power (W)
340
335
330
325
320
315
310
305
300
0
Basic experiment
100
200
300
Air flow at perf tile (CFM)
• Create enclosed micro-system –
rack, 2 perforated tiles and path to
CRAC
• Linpack running on single server in
bottom half of the rack, other servers
idle.
• Adjust air flow from perforated tiles
100
ISLPED 2008
© 2008 IBM Corporation
Air Flow Management
Equipment
• Laid out to create hot and cold
aisles
1x10 tiles
A
Tiles
• Standard tiles are 2’ x 2’
1x10 tiles
B
1x10 tiles
• Perforated tiles are placed
according to amount of air needed
for servers
• Cold aisles usually 2-3 tiles wide
• Hot aisles usually 2 tiles wide
Under-floor
• Floor cavity height sets total
cooling capability
• 3’ height in new data centers
Reference: R. Schmidt, “Data Center Airflow: A Predictive Model”, 2000
101
ISLPED 2008
© 2008 IBM Corporation
Modeling the Data Center
Computational Fluid Dynamics (CFD)
•Useful for initial planning phase and
what-if scenarios
•Input parameters such as rack flows, etc.
are very difficult/expensive to come by
(garbage in – garbage out problem).
•Coupled partial differential equations
require long-winding CFD calculations
Measurement-based
•Find problems in existing data centers
•Measure the temperature and air flow
throughout the data center
•Highlights differences between actual
data center and the ideal data center
modeled by CFD
105
ISLPED 2008
© 2008 IBM Corporation
Analysis for Improving Efficiencies, MMT
(a) CFD model results @ 5.5 feet
(b) Experimental data @ 5.5 feet
36.0oC
12.6oC
9oC
Temperature legend
for model and data
contours
Temperature
legend for
model and
data contours
-19oC
Source: Hendrik Hamann, IBM
106
(c ) Difference between model and data
ISLPED 2008
© 2008 IBM Corporation
Typical Data Center Raised Floor
#10: CRAC layout
#9:
recirculation
#1:flow control
Problems !
#7: layout
#6:Hot/cold
Aisle problem
#5: hot
air
max
#3:leak
#2:rack
layout
min
#8: intermixing
#4:flow control
(overprovisoned)
Source: Hendrik Hamann, IBM
107
ISLPED 2008
© 2008 IBM Corporation
HP Dynamic Smart Cooling
Deploy air temperature sensor(s) on each
rack
Collect temperature readings at
centralized location
Apply model to determine setting of CRAC
fans to maximize cooling of IT equipment
Challenges:
• Difficult to determine impact of particular
CRAC(s) on temperature of a given rack –
using offline principal component analysis
(PCA) and online neural networks to assist
logic engine
• Requires CRACs with variable frequency
drives (VFD) – not standard in most data
centers, but becoming available with time.
Category
Typical size
Small
(air cooling)
Medium
Large
(air and
(air and
chilled water chilled water
cooling)
cooling)
10K sq ft
30K sq ft
>35K sq ft
Energy savings (% of
cooling costs)
40%
30%
15%
Estimated MWh saved
5,300
9,100
10,500
(1) C. Bash, C. Patel, R. Sharma, “Dynamic Thermal Management of Air Cooled Data Centers”,
HPL-2006-11, 2006
(2) L. Bautista and R. Sharma, “Analysis of Environmental Data in Data Centers”, HPL-2007-98, 2007
(3) http://www.hp.com/hpinfo/globalcitizenship/gcreport/energy/casestudies.html
108
ISLPED 2008
© 2008 IBM Corporation
Commercial Liquid Cooling Solutions for Racks
Purpose:
• Localized heat removal – attack it
before it reaches the rest of the computer
room
• Allows us to install equipment with high
power densities in a room not designed
for it
Implementation:
• Self-contained air cooling solution
(water or glycol for taking heat from the
air)
• Air movement
Liebert XDF™
Enclosure (1)
Types:
• Enclosures – create cool microclimate
for selected ‘problem’ equipment
• Sidecar heat exchanger – to address
rack-level hotspots without increasing
HVAC load
(1)
APC InfraStruXure
InRow RP (2)
“Liebert XDF™ High Heat-Density Enclosure with Integrated Cooling”,
http://www.liebert.com/product_pages/ProductDocumentation.aspx?id=40
(2)
“APC InfraStruXure InRow RP Chilled Water”,
http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=ACRP501
109
ISLPED 2008
© 2008 IBM Corporation
Chilled Water System
Two separate water loops
CRAC
Chilled water (CW) loop
Chiller
CW pump
• Chiller(s) cool water which is used by
CRAC(s) to cool down the air
• Chilled water usually arrives to the
CRACs near 45°F-55°F
Condensation
water pump
Condensation water loop
• Usually ends in a cooling tower
• Needed to remove heat out of the
facilities
Cooling tower
Sample chilled water circuit
110
ISLPED 2008
© 2008 IBM Corporation
Air-Side and Water-Side Economizers (a.k.a. Free Cooling)
Air-side Economizer (1)
•
A control algorithm that brings in outside air when it is cooler than the raised floor return air
•
Needs to consider air humidity and particles count
•
One data center showed reduction of ~30% in cooling power
Water-side Economizer (2)
•
Circulate chilled water (CW) thru an external cooling tower (bypassing the chiller) when
outside air is significantly cold
•
Usually suited for climates that have wetbulb temperatures lower than 55°F for 3,000 or more
hours per year, and chilled water loops designed for 50°F and above chilled water
Thermal energy storage (TES) (3)
•
Create chilled water (or even ice) at night.
•
Use to assist in generation of CW during day, reducing overall electricity cost for cooling
•
Reservoir can behave as another chiller, or be part of CW loop
(1) A. Shehabi, et al. “Data Center Economizer Contamination and Humidity Study”, March 13, 2007.
http://hightech.lbl.gov/documents/DATA_CENTERS/EconomizerDemoReportMarch13.pdf
(2) Pacific Gas & Electric, “High Performance Data Centers: A Design Guidelines Sourcebook”, January
2006. http://hightech.lbl.gov/documents/DATA_CENTERS/06_DataCenters-PGE.pdf
(3) “Cool Thermal Energy Storage”, ASHRAE Journal, September 2006
111
ISLPED 2008
© 2008 IBM Corporation
Typical Industry Solutions in 2008: Cooling
Function
Description
Example
Hot aisle
containment
Close hot aisles to prevent mixing of warm and cool air. Add
doors to ends of aisle and ceiling tiles spanning over aisle.
American Power Conversion
Corp.
Sidecar heat
exchange
Sidecar heat exchange uses water/refrigerant to optimize
hot/cold aisle air flow. Closed systems re-circulate cooled air in
the cabinet, preventing mixing with room air.
Emerson Network Power:
Liebert XD products
Air flow
regulation
Control inlet/outlet temperature of racks by regulating CRAC
airflow. Model relationship between individual CRAC airflow and
rack temperature.
HP: Dynamic Smart Cooling
Cooling
economizers
Use cooling tower to produce chilled water when outside air
temperature is favorable. Turn off chiller’s compressors.
Wells Fargo & Co. data center
in Minneapolis
Cooling storage
Generate ice or cool fluid with help of external environment, or
while energy rates are reduced
IBM ice storage
Modular data
center
Design data center for high-density physical requirements. Data
center in a shipping container.
Sun: Project Blackbox
Airflow goes rack-to-rack, with heat exchangers in between.
Solutions shown in example column are representative ones incorporating the specific function/technique.
Many of these solutions also provide other functions.
No claim is being made regarding superiority of any example shown over any alternatives.
116
ISLPED 2008
© 2008 IBM Corporation
Next generation solutions – Component to Datacenter
Integrated fan
control for
thermal
management
Partition-level
power
management
Datacenter
modeling and
optimization
Dynamic and
deterministic
performance
boost
On-chip
power
measurement
Integrated IT
and facilities
management
Power-aware
workload
management
Process
Technologies
Enhanced
processor idle
states
Hardware
Acceleration
Finegrained,
faster
clock
scaling
125
Power-aware
microarchitecture
Power
Shifting
Power-aware
virtualization for
resource and
server
consolidation
Enhanced
memory power
modes
ISLPED 2008
Combined
firmware-OS
power
management
© 2008 IBM Corporation
Conclusion
There is lot of work going on in the industry and academia to address
power and cooling issues – its a very hot topic !
Much of it has been done in the last few years and we’re nowhere near
solving all the issues.
Scope of the problem is vast from thermal failures of individual
components to efficiency of data centers and beyond.
There is no silver bullet – the problem has to be attacked right from
better manufacturing technologies to coordinated facilities and IT
management.
Key lies in adaptive solutions that are real-time information driven, and
incorporate adequate understanding of the interplay between diverse
requirements, workloads and system characteristics.
127
ISLPED 2008
© 2008 IBM Corporation