Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth Intensive Applications Sustained performance Usability Productivity 2 © NEC Corporation, 2013 Concept 3 SX History and Technical Evolutions Support >1000 nodes ECO NEC has always provided the high sustained performance by Vector Super-Computer SX series. Performance ES2 ES MPI support to Multilane IXS Support Over 100 nodes Cluster SX-9 SX-8/8R Multi-core All in One Chip ECO Distributed Parallelization 100GF SX-7 (MPI-SX) Processor Automatic Parallelization 3D node SX-6 Function & module SUPER-UX SX-5 1 Chip Auto Vectorization Vector Processor SX-4 Compiler Multi node CMOS SX-3 Air Cooling Bipolar SX-2 Water Cooling 1990 4 © NEC Corporation, 2013 2000 2010 Trend of TOP500 (1st ~ 10th system) Growing of LINPAC performance has been provided by system enlarging User mast spend their time to extract massively parallelism Smaller # of cores with big cores can reduce the difficulty 1.0E+10 1.0E+08 1.0E+07 Exa Flops Increasing 1.0E+06 1.0E+05 big core 1.0E+04 1.0E+03 1.0E+02 fewer cores 1.0E+09 Linpack [TF] Lipack ave. [TF] # of cores # of cores ave. # of nodes # of nodes ave. Core performance [GF] Core perfromance ave. [GF] Frequency [GHz] Frequency ave. [GHz] 1.0E+01 Nearly constant Frequency [GHz] 115%/year 1.0E+00 1.0E-01 2000 5 2002 2004 © NEC Corporation, 2013 2006 2008 2010 2012 2014 2016 2018 2020 Required Byte/Flop in Real Applications According to Japanese Government (MEXT) working group report of wide variety of strategic segment applications, diverse characteristics are observed. MEXT: Ministry of Education, Culture, Sports, Science & Technology Required memory bandwidth [Byte/Flop] B/F requirement from each application is differ greatly. Only one architecture cannot cover all application areas. 6 10 MD, Weather Cosmo physics Particle physics Structural analysis Fluid dynamics memory bandwidth intensive 1 0.1 0.01 Calculation intensive scalar CPUs 0.001 Quantum chemistry Nuclear physics 0.0001 0.001 0.01 0.1 1 10 100 1000 Required memory capacity [PB] © NEC Corporation, 2013 Reference: “Report on Strategic Direction/Development of HPC in Japan”, March 2012 Concepts of SX-ACE Big Core Inherit SX-DNA Providing higher sustained performance The highest single core performance: The largest single core memory bandwidth: 64GF 64~256GB/s Low Power Consumption Higher power efficiency compared to SX-9 Small Installation Space To reduce floor space cost Compared to SX-9 7 © NEC Corporation, 2013 1 10 1 5 Providing Big Core The SX-ACE inherits SX-DNA and overwhelm other CPUs with its world’s No.1 CPU core performance and word’s No.1 memory bandwidth. Single core comparison SX-ACE Scalar A 30 Scalar B Scalar C 8 64GB/s 64GF 24 16 2~4x 15 6 4~13x 5 peak performance memory bandwidth 2~4x performance 4~13x performance compared to competitors compared to competitors © NEC Corporation, 2013 Architecture 9 CPU Architecture (Big Core, Large memory bandwidth) Scalar Processing Unit Vector Processing Unit SPU Remote access Control Unit Architecture VPU Vector CORE 256GB/s core core core Interconnect ADB RCU (Assignable Data Buffer) 256GB/s 8GB/s x2 8GB/s x2 crossbar Performance 64GFlops ADB size 1MB ADB bandwidth 256GB/s Memory bandwidth 64GB/s~ 256GB/s Memory Byte/Flop 1.0 ~ 4.0 CPU MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC 256GB/s Memory controller 256GB/s Memory(DDR3) 10 © NEC Corporation, 2013 Cores 4 Performance 256GFlops Memory bandwidth 256GB/s Byte/Flop 1.0 All-in-one Processor 4 powerful cores and each controller (memory, network, I/O) are integrated in one-CPU = Power saving Compact card design = Space saving SX-ACE CPU I/O Controller Connection to storage device, Ethernet Network controller Node card Performance: 256GF Memory bandwidth: 256GB/s 11cm 8GB/s/direction, Fat-tree Powerful core World’s fastest CPU core 64GF x 4cores 1MB ADB/core Memory Controller 256GB/s BW control memory Very large memory BW World’s largest BW 256GB/s 11 © NEC Corporation, 2013 37cm Node Card CPU 4 cores 256GF 256GB/s 37cm Memory 11cm 4GB x 16DIMMs DDR3 2000MHz 12 (c) NEC Corporation, 2012 System Configuration Each node is connected with 2 stages full Fat-tree network Implementing global communication function into HW • Fat-tree • HW function of global communication SW2 #00 SW2 #15 32 links 16 links SW1 #00 SW1 #01 SW1 #31 16 links node 13 © NEC Corporation, 2013 512 nodes, 2048 cores, 131TFlops, 1B/F node #511 node #496 node #031 node #016 node #015 node #000 8GB/s /direction Configuration System Rack 64 nodes = 16TF, 16TB/s 16-Node Cage x4 4 cages = 32 modules = 64 nodes = 64CPUs 16-Node Cage 8 modules = 16 nodes = 16 CPUs 2-Node Module 2 nodes = 2 CPUs Node Card 1CPU, 256GF, 256GB/s 14 (c) NEC Corporation, 2013 Rack Specifications 16TF, 16TB/s, 64 CPUs 0.75m x 1.5m x 2.0m 30KW Detail of Rack Implementation network switch 2 nodes 2 nodes 2 nodes 2 nodes node manager 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes node manager 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes 2 nodes node manager 2 nodes 2 nodes 2 nodes 2 nodes OS disk rack manager 2-node module 15 (c) NEC Corporation, 2013 power junction coolant pipe (outlet) 16-node cage coolant pipe (inlet) 42U 2 nodes 2 nodes node manager 2 nodes 2 nodes Downsizing and Power Saving Providing 5x smaller space and 10x lower power consumption compared to SX-9 by power saving design and compact implementation. Comparison with same performance (131TF) SX-9 SX-ACE 24m 7m 12m 80 nodes 8m 25m swimming pool size 131TF 288m2 2.4MW 16 © NEC Corporation, 2013 512 nodes Meeting room size space power 1/5 1/10 131TF 56m2 0.24MW NEC’s Exhibitor Forum is Today ! Exhibitor Forum Nov. 19th (Tue), 15:30 – 16:00 Room 501/502 NEC’s Brand-New Vector Supercomputer and HPC Roadmap 17 (c) NEC Corporation, 2013 18 ©NEC Corporation, 2012 Confidential