Exar Hadoop Acceleration

Exar
Optimizing Hadoop – Is Bigger Better??
[email protected]
Exar Corporation
48720 Kato Road
Fremont, CA
510-668-7000
March 2013
www.exar.com
• Section I: Exar Introduction
–
Exar Corporate Overview
• Section II: Big Data Pain-Points
–
–
Debunking Top 5 Hadoop Myths
3 Main System Constraints
• Section III: Hadoop Optimization Solution
–
Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results
–
–
–
OEM 1 Results
OEM 2 Results
OEM Results
• Section V: Summary
Exar At-A-Glance
Global Leader in Data Management Solutions and Mixed Signal Components
• Well Established Fabless IC Company
– 42 years of history in Silicon Valley
– ~ 300 Employees Worldwide
– Healthy balance sheet - $229M in assets
• Broad-base Component and Solution Supplier
– Specialty SoCs, FPGA/ASIC Boards and Software
• DCS (Data Compression & Security)
– Analog Mixed Signal Components
• Interface
• Power
• Section I: Exar Introduction
–
Exar Corporate Overview
• Section II: Big Data Pain-Points
–
–
Debunking Top 5 Hadoop Myths
3 Main System Constraints
• Section III: Hadoop Optimization Solution
–
Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results
–
–
–
OEM 1 Results
OEM 2 Results
OEM Results
• Section V: Summary
It is not about Size of Big-Data Deployment
Return on Investment would be defined by Optimal Utilization of Resources
Is Bigger Always Better??
Debunking the Top 5 Hadoop Myths
1. More CPUs or More Storage does not mean better Analytics
Increasing Number of Jobs Per Node,
or, Improving Job processing time,
implies more powerful Nodes…..
No!!!
Rack Density maximization and effective
resource utilization (CPU, Storage and
Memory) is the solution
Debunking Top 5 Hadoop Myths
2. Operational Expenditure is a significant component of 3-5 Years TCO
Capital expenditure is the primary contributor to
the 3 or 5 Year TCO
No!!!
Operational expenditure is a significant
contributor in the TCO
Debunking the Top 5 Hadoop Myths
3. Storage scaling is significantly constrained by Size and Space
Storage can Scale Easily
No!!!
Size, Space and Connectivity constrains
scaling capacity
Debunking the Top 5 Hadoop Myths
4. Data Nodes costs are driven by Storage rather than CPUs
Compute defines the Data node cost
No!!!!
Storage defines the node cost, and the
ratio is often as high as 10:1 (Storage to
CPU)
Debunking the Top 5 Hadoop Myths
5. For larger Hadoop Clusters Network (Shuffle) traffic reduction is a key
Network Traffic Reduction is not
relevant in Hadoop TCO
No!!!
10G WAN Links are expensive. It is
preferable to optimize traffic on 1G WAN
Links, and avoid/minimize 10G Links
Summary of Hadoop Cluster Constraints
Hadoop Clusters can be Optimized for Storage, Network Bandwidth & Compute Resources
Storage
Capacity
Server OEMs are Struggling to provide enough Capacity
to keep up with every growing Data Needs
E.g. – Leading Server OEM Latest Configuration supports
30 Disks/Server!!!
Disk IOPs
Bottleneck
The biggest bottleneck for Data Analytics is the Disk IOPs
limitation
E.g. – Even the most optimally configured Hadoop System
is struggling to get better than 80% CPU Utilization, as
Disk IO bandwidth is not able to keep up, especially for
high CPU Core to HDD Ratios
Network
Bandwidth
Data is often Replicated 3 times, and Large Clusters are
distributed globally. Minimizing bandwidth (across WAN)
and minimizing Switch/HW Cost (across LAN) is key
E.g. – A Leading eCommerce Company has 6 Clusters
distributed globally, with each Cluster having 2,000-3,000
Data Nodes
Exar Hadoop Optimization Solutions
By optimizing CPU, Storage, Memory , & Network Bandwidth, TCO can be reduced up to 40%
Can Hadoop Cluster TCO be reduced
without impacting job execution time??
Exar Hadoop Acceleration Solutions
can lower Cluster TCO by 20-40%!!
• Section I: Exar Introduction
–
Exar Corporate Overview
• Section II: Big Data Pain-Points
–
–
Debunking Top 5 Hadoop Myths
3 Main System Constraints
• Section III: Hadoop Optimization Solution
–
Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results
–
–
–
OEM 1 Results
OEM 2 Results
OEM Results
• Section V: Summary
Exar Hadoop Acceleration Solution Overview
Exar Solution optimizes all the Hadoop Cluster Constraints mentioned earlier
Exar Hadoop Acceleration Solution Highlights:
Storage Optimization – Exar Solution uses Advanced Data Compression
technique to Compress Input and Output Data, which drastically reduces
Storage requirement in each Data Node
CPU Optimization – Data Compression/Decompression is Offloaded from
CPU, which releases additional CPU Cycles for Enhanced Data Analytics
Memory Management – Exar Solution uses advanced Memory Management,
which optimizes the System Memory Usage
Network Bandwidth Optimization – Exar Solution Compresses Intermittent or
Shuffle traffic, which optimizes Network Bandwidth
Exar Hadoop Acceleration Solution Overview
Exar offers a Certified Plug N Play Hadoop Acceleration solution
Plug N Play Solution:
No Code Change – Filter Layer SW sits below the HDFS. No APIs required.
SW installs in minutes!
Standard HW – Offload card supports PCIe Gen 1 and Gen 2
Linux OS Compatible – Solution supports Linux 6.X, and works across RHEL,
Ubuntu and SUSE
Certified by Cloudera:
Solution Certified on both CDH3 and CDH4
OEM Tested:
Solutions evaluated and benchmarked on leading OEM HW including IBM,
HP, Dell, SuperMicro etc
Big Data (Hadoop) Optimization Solution
Exar Solutions Reduce Storage Requirement & Optimize System Resource Utilization
A Hadoop Cluster Accelerated with AltraSTAR consists
of:
CeDeFS Filter Layer SW
Hadoop Map/Reduce
Hadoop FS
Exar Hardware Accelerator
CeDeFS is a transparent Filter Layer SW and sits below
HDFS. No code changes are required and workflow
remains the same
Exar Accelerator is a FPGA based PCIe HW Accelerator
3x-6x increase in storage capacity in each node
Enhanced CPU utilization and reduced runtime through
I/O reduction and optimization
Significantly benefits I/O bound tasks
Increased data density; reduces the shuffle traffic
Reduction in Power – Per Node, Per Cluster
Linux System
CeDeFS +
CeDeFN
Exar
Driver
Storage
Volume
Exar
Offload
Card
• Section I: Exar Introduction
–
Exar Corporate Overview
• Section II: Big Data Pain-Points
–
–
Debunking Top 5 Hadoop Myths
3 Main System Constraints
• Section III: Hadoop Optimization Solution
–
Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results
–
–
–
OEM 1 Results
OEM 2 Results
OEM Results
• Section V: Summary
Test Procedure
Validate Exar Acceleration Solutions on Typical Hadoop Clusters
Configure System to Default Hadoop Setting
Establish Benchmark for Native Config (with LZO)
Rerun Tests with Exar Acceleration Solution
Disk
Reduction
Network
Link Opt
Large File
Optimization
Quantify Results;
Calculate ROI
Exar Hadoop Acceleration – OEM 1 Results
Exar’s GX1745 based Acceleration Test Results
Cluster
Configuration
Job Execution
& Resource Req
300 TB EXAR Hadoop Accelerated Solution
End-Users could reduce their Capital Expenditure up to 40%!!!
Exar Hadoop Acceleration – OEM 2 Results
OEM Sorted 1 TB in an industry leading time; Exar reduced the cost by 30%
Servers = 10
Expansion Units = 10
Servers = 10
Expansion Units = 5
Exar Solution
Exar Hadoop Acceleration – OEM 3 Results
Solution gave the flexibility to increase Storage/CPU density per Rack
Cluster
Configuration
Terasort Test on AppSystem Cluster
12 Disks
Job Execution
& Resource Req
Single
Job (512GB)
14m 15s
Native LZO
AltraSTAR + LZO
Performance Gain
Capacity Gain
1.
2.
3.
Single
Job (1TB)
6 Disks
Multiple Job
Job 2
8m 9s
16m 0s
33m 32s
19m 3s
70%
101%
76%
33m 36s
Single Job
(512GB)
21m 34s
12m 07s
77%
Reduce cost and Improve performance through.
Improve performance
Remove disks or Lower Capacity disks
Increase Capacity
Exar Hadoop Acceleration – OEM 3 Results
Exar Solution improved Analytics up to 70%, or, reduced Storage Cost up to 50%
Performance
Maximized
Configuration
Cost
Minimized
Configuration
Exar Hadoop Accelerated Solutions Outperformed CPU solutions
Implied or Calculated Results shed light on 4 of the 5 Hadoop Implementation Myths
Storage
Density
Effective Storage
per 40U Rack
Cap-Ex
Efficiency
$$ Cap Investment
1 GB Sort
Op-Ex
Efficiency
KWh Consumed
per 1 GB Sort
1:2
1:1
261
430
N/A
N/A
N/A
N/A
100%
Exar Acceleration
Ratio of CPU Cores
to Hard Disks
AltraSTAR
Accel Gain
With
System Resource
Optimization
Acceleration
Benchmarks
EXAR Acceleration
Parameter
Definition
No
Efficiency
Parameter
61%
27%
20%
• Section I: Exar Introduction
–
Exar Corporate Overview
• Section II: Big Data Pain-Points
–
–
Debunking Top 5 Hadoop Myths
3 Main System Constraints
• Section III: Hadoop Optimization Solution
–
Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results
–
–
–
OEM 1 Results
OEM 2 Results
OEM Results
• Section V: Summary
Exar Hadoop Acceleration Solution
Exar Acceleration Solution optimizes all of the Hadoop Constraints


Significant ROI:

Highest Rack Density

Lowest $$/GB Sort

Most Power Efficient

Optimized Network Bandwidth
Flexibility: Offers flexibility to cater to both Disk
IO Bound or CPU Bound Solutions

Certified: Certified on all Cloudera Releases, and
tested on most of the major OEM HW
Conclusion
• Hardware accelerated compression provides
meaningful acceleration as well as added capacity
• Acceleration plus added capacity means bigger jobs
executed in less time
• Very significant savings in both CAPEX and OPEX
Ramana Jampala
Vice-President – Business Development
[email protected]
(732) 440-1280 x238
www.exar.com