Project Boltzmann HIP Datasheet

It’s HIP to be Open
Implementation
Convert your CUDA Code to C++ Using AMD’s New HIP Tool
The world of HPC as it exists today consists of many GPU-accelerated applications that use the
proprietary CUDA language and infrastructure. The problem with using proprietary software is
that it is almost always tightly controlled and its source code is most often kept secret, and with
CUDA, the hardware options are limited to one vendor.
AMD, a strong proponent of open source and open standards, has created a new tool that will
allow developers to convert CUDA code to common C++. The resulting C++ code can run through
either CUDA NVCC or AMD HCC compilers. This new Heterogeneous-compute Interface for
Portability, or HIP, is a tool that provides customers with more choice in hardware and
development tools.
How does HIP’s portability compare to OpenCL™?
Both AMD and our competitors support OpenCL™ 1.2 on their devices, and this can be used to
write portable code.
HIP offers several benefits over OpenCL:
Developers can code in C++, and mix host and device C++ code in their source files. HIP C++
code can use templates, lamdbas, classes, etc.
HIP API is less verbose than OpenCL, and C++ is familiar to CUDA developers.
Because both CUDA and HIP are C++ languages, porting from CUDA to HIP is significantly
easier than porting from CUDA to OpenCL.
HIP uses the state-of-the-art development tools on each platform: on competitor GPUs,
HIP code is compiled with NVCC and can use nSight profiler and debugger.
HIP provides pointers and host-side pointer-arithmetic.
HIP provides device-level control over memory allocation and placement.
HIP offers an offline compilation model.
CUDA RT +
NVCC
AMD
GPU
Basic HCC support to make
porting easier
Can utilize #ifdef for complicated
cases and/or performance tuning
Supported APIs
HIP provides:
Devices
(hipSetDevice(),
hipGetDeviceProperties(), …)
Memory management
(hipMalloc(), hipMemcpy(),
hipFree(), …)
Streams
(hipStreamCreate(),
hipStreamWaitEvent(),
hipStreamSynchronize(),
hipStreamDestroy(),…)
Kernel launching
(hipLaunchKernel is standard C/
C++ function replacing <<< >>>)
CUDA-style kernel indexing
(hipBlockIdx, hipThreadIdx, …)
HIP Code
ISA
Strong subset of CUDA RT
functionality, focused on most
commonly used functions like
memory management, events and
streams
Events
(hipEventRecord(),
hipEventElapsedTime(), etc)
Code Conversion Workflow Diagram
AMD
HCC C++
Compiler
Header maps hip* calls to CUDA
RT or HSA RT
Device-side math builtins
(200+ functions covering entire
CUDA math library)
NVIDIA
NVCC
Compiler
Error reporting
(hipGetLastError(),
hipGetErrorString(), …)
ISA
NVIDIA
GPU
HIP DATA SHEET
Application
Accelerated App Region
hip_runtime.hpp
HCC RT +
Compiler
CUDA RT +
NVCC
New HIP layer
provides portability
Example Code Conversion
Cuda Code
cudaMalloc((void **) &m_cuda, Size * Size * sizeof(float));
cudaMemcpy(m_cuda, m, Size * Size * sizeof(float),cudaMemcpyHostToDevice );
gpu_kernel<<<dimGridXY,dimBlockXY>>>(m_cuda,a_cuda,b_cuda,Size,Size-t,t);
cudaThreadSynchronize();
cudaMemcpy(m, m_cuda, Size * Size * sizeof(float),cudaMemcpyDeviceToHost );
cudaFree(m_cuda);
__global__ void gpu_kernel(float *m_cuda, float *a_cuda, float *b_cuda,int Size,
int j1, int t)
{
…
}
HIP-ify
hipMalloc((void **) &m_cuda, Size * Size * sizeof(float));
hipMemcpy(m_cuda, m, Size * Size * sizeof(float),hipMemcpyHostToDevice );
hipLaunchKernel(gpu_kernel, dim3(dimGridXY), dim3(dimBlockXY), 0, 0,
m_cuda,a_cuda,b_cuda,Size,Size-t,t);
hipDeviceSynchronize();
hipMemcpy(m, m_cuda, Size * Size * sizeof(float),hipMemcpyDeviceToHost );
hipFree(m_cuda);
__global__ void HIP_FUNCTION(gpu_kernel, float *m_cuda, float *a_cuda, float *b_
cuda,int Size, int j1, int t)
{
…
}
HIP_FUNCTION_END
Learn more about AMD and the Heterogeneous computing at http://developer.amd.com/heterogeneous-computing/
OpenCL is a trademark of Apple Inc. used by permission by Khronos.
HIP DATA SHEET

Open as PDF

Similar pages: STMICROELECTRONICS ST20; Heterogeneous Computing OpenCL™ and the Radeon HD 5870 Architecture PDF; Download Databrief; AD ADDS-2171-EZ-LAB; TI TMX320F2802448S; 44-Pin Demo Board User's Guide; MEC1308 Keyboard and Embedded Controller for Notebook PC Product Brief; Explorer 16 Development Board User's Guide; FREESCALE CWS-H12-PROED-UX; STMICROELECTRONICS ST72T37X; AFDS 2011 Keynote: “Heterogeneous Parallelism at Microsoft”; Development of a CUDA Based Virtual EMI/ESD Lab; Automating the Provisioning of Application Services; Programming models for next generation of GPGPU architectures (Benedict R. Gaster); ATI FirePro™ V8800