2 - NEC Corporation

Brand-New
Vector Supercomputer
NEC Corporation
IT Platform Division
Shintaro MOMOSE
SC13
1
New Product
NEC Released A Brand-New
Vector Supercomputer, SX-ACE Just Now.
Vector Supercomputer for
Memory Bandwidth Intensive Applications
Sustained performance
Usability
Productivity
2
© NEC Corporation, 2013
Concept
3
SX History and Technical Evolutions
Support >1000 nodes
ECO
NEC has always provided the high sustained
performance by Vector Super-Computer SX series.
Performance
ES2
ES
MPI support
to Multilane IXS
Support
Over 100 nodes
Cluster
SX-9
SX-8/8R
Multi-core
All in One Chip
ECO
Distributed Parallelization
100GF
SX-7
(MPI-SX)
Processor
Automatic Parallelization
3D node
SX-6
Function &
module
SUPER-UX
SX-5
1 Chip
Auto Vectorization
Vector Processor
SX-4
Compiler
Multi node
CMOS
SX-3
Air Cooling
Bipolar
SX-2 Water Cooling
1990
4
© NEC Corporation, 2013
2000
2010
Trend of TOP500 (1st ~ 10th system)
Growing of LINPAC performance has been provided by system enlarging
User mast spend their time to extract massively parallelism
Smaller # of cores with big cores can reduce the difficulty
1.0E+10
1.0E+08
1.0E+07
Exa Flops
Increasing
1.0E+06
1.0E+05
big core
1.0E+04
1.0E+03
1.0E+02
fewer cores
1.0E+09
Linpack [TF]
Lipack ave. [TF]
# of cores
# of cores ave.
# of nodes
# of nodes ave.
Core performance [GF]
Core perfromance ave. [GF]
Frequency [GHz]
Frequency ave. [GHz]
1.0E+01
Nearly
constant
Frequency [GHz] 115%/year
1.0E+00
1.0E-01
2000
5
2002
2004
© NEC Corporation, 2013
2006
2008
2010
2012
2014
2016
2018
2020
Required Byte/Flop in Real Applications
According to Japanese Government (MEXT) working group report of wide
variety of strategic segment applications, diverse characteristics are observed.
MEXT: Ministry of Education, Culture, Sports, Science & Technology
Required memory bandwidth [Byte/Flop]
B/F requirement from each application is differ greatly.
Only one architecture cannot cover all application areas.
6
10
MD, Weather
Cosmo physics
Particle physics
Structural analysis
Fluid dynamics
memory
bandwidth
intensive
1
0.1
0.01
Calculation
intensive
scalar
CPUs
0.001
Quantum chemistry
Nuclear physics
0.0001
0.001
0.01
0.1
1
10
100
1000
Required memory capacity [PB]
© NEC Corporation, 2013
Reference: “Report on Strategic Direction/Development of HPC
in Japan”, March 2012
Concepts of SX-ACE
Big Core
Inherit SX-DNA
Providing higher sustained performance
The highest single core performance:
The largest single core memory bandwidth:
64GF
64~256GB/s
Low Power Consumption
Higher power efficiency
compared to SX-9
Small Installation Space
To reduce floor space cost
Compared to SX-9
7
© NEC Corporation, 2013
1
10
1
5
Providing Big Core
The SX-ACE inherits SX-DNA and overwhelm other CPUs with its
world’s No.1 CPU core performance and word’s No.1 memory
bandwidth.
Single core comparison
SX-ACE
Scalar A
30
Scalar B
Scalar C
8
64GB/s
64GF
24
16
2~4x
15
6
4~13x
5
peak performance
memory bandwidth
2~4x performance
4~13x performance
compared to competitors
compared to competitors
© NEC Corporation, 2013
Architecture
9
CPU Architecture (Big Core, Large memory bandwidth)
Scalar Processing Unit
Vector Processing Unit
SPU
Remote access Control Unit
Architecture
VPU
Vector
CORE
256GB/s
core core core
Interconnect
ADB
RCU
(Assignable Data Buffer)
256GB/s
8GB/s x2
8GB/s x2
crossbar
Performance
64GFlops
ADB size
1MB
ADB bandwidth
256GB/s
Memory bandwidth
64GB/s~
256GB/s
Memory Byte/Flop
1.0 ~ 4.0
CPU
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
256GB/s
Memory controller
256GB/s
Memory(DDR3)
10
© NEC Corporation, 2013
Cores
4
Performance
256GFlops
Memory bandwidth
256GB/s
Byte/Flop
1.0
All-in-one Processor
 4 powerful cores and each controller (memory, network,
I/O) are integrated in one-CPU = Power saving
 Compact card design = Space saving
SX-ACE CPU
I/O Controller
Connection to storage device, Ethernet
Network controller
Node card
Performance:
256GF
Memory bandwidth: 256GB/s
11cm
8GB/s/direction, Fat-tree
Powerful core
World’s fastest CPU core
64GF x 4cores
1MB ADB/core
Memory Controller
256GB/s BW control
memory
Very large memory BW
World’s largest BW 256GB/s
11
© NEC Corporation, 2013
37cm
Node Card
CPU
4 cores
256GF
256GB/s
37cm
Memory
11cm
4GB x 16DIMMs
DDR3 2000MHz
12
(c) NEC Corporation, 2012
System Configuration
Each node is connected with 2 stages full Fat-tree network
Implementing global communication function into HW
• Fat-tree
• HW function of
global
communication
SW2 #00
SW2 #15
32 links
16 links
SW1 #00
SW1 #01
SW1 #31
16 links
node
13
© NEC Corporation, 2013
512 nodes, 2048 cores, 131TFlops, 1B/F
node
#511
node
#496
node
#031
node
#016
node
#015
node
#000
8GB/s /direction
Configuration
System
Rack
64 nodes = 16TF, 16TB/s
16-Node Cage x4
4 cages = 32 modules = 64 nodes = 64CPUs
16-Node Cage
8 modules = 16 nodes = 16 CPUs
2-Node Module
2 nodes = 2 CPUs
Node Card
1CPU, 256GF, 256GB/s
14
(c) NEC Corporation, 2013
Rack Specifications
16TF, 16TB/s, 64 CPUs
0.75m x 1.5m x 2.0m
30KW
Detail of Rack Implementation
network switch
2 nodes
2 nodes
2 nodes
2 nodes
node manager
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
node manager
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
2 nodes
node manager
2 nodes
2 nodes
2 nodes
2 nodes
OS disk
rack manager
2-node module
15
(c) NEC Corporation, 2013
power junction
coolant pipe (outlet)
16-node cage
coolant pipe (inlet)
42U
2 nodes
2 nodes
node manager
2 nodes
2 nodes
Downsizing and Power Saving
Providing 5x smaller space and 10x lower power
consumption compared to SX-9 by power saving design
and compact implementation.
Comparison with same performance (131TF)
SX-9
SX-ACE
24m
7m
12m
80 nodes
8m
25m swimming pool size
131TF
288m2
2.4MW
16
© NEC Corporation, 2013
512 nodes
Meeting room size
space
power
1/5
1/10
131TF
56m2
0.24MW
NEC’s Exhibitor Forum is Today !
Exhibitor Forum
Nov. 19th (Tue), 15:30 – 16:00
Room 501/502
NEC’s Brand-New Vector Supercomputer
and HPC Roadmap
17
(c) NEC Corporation, 2013
18
©NEC Corporation, 2012
Confidential