Programming models for next generation of GPGPU architectures (Benedict R. Gaster)

Programming models for next generation
of GPGPU architectures
Benedict R. Gaster
February, 2011
Motivation
2 | Programing models for next generation GPGPU | February, 2011 | Public
OPENCL™ PROGRAM STRUCTURE
Host C/C++ Code
OpenCL™ C Device Code
CPU
DEVICE
(Platform and
Runtime APIs)
(OpenCL C)
3 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD OPENCL™ C SOURCE
__constant char hw[] = "Hello World\n";
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
4 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD OPENCL™ C SOURCE
__constant char hw[] = "Hello World\n";
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
•
•
•
This is a separate source file (or string)
Cannot directly access host data
Compiled at runtime
5 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD - HOST PROGRAM
// create the OpenCL context on a GPU device
cl_context = clCreateContextFromType(0,
CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
// get the list of GPU devices associated with context
clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,
NULL, &cb);
devices = malloc(cb);
clGetContextInfo(context, CL_CONTEXT_DEVICES, cb,
devices, NULL);
// create a command-queue
cmd_queue = clCreateCommandQueue(context, devices[0],
0, NULL);
memobjs[0] = clCreateBuffer(context,CL_MEM_WRITE_ONLY,
sizeof(cl_char)*strlen(“Hello World”, NULL,
NULL);
// create the program
program = clCreateProgramWithSource(context, 1,
&program_source, NULL, NULL);
6 | Programing models for next generation GPGPU | February, 2011 | Public
// build the program
err = clBuildProgram(program, 0, NULL, NULL, NULL,
NULL);
// create the kernel
kernel = clCreateKernel(program, “vec_add”, NULL);
// set the args values
err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],
sizeof(cl_mem));
// set work-item dimensions
global_work_size[0] = strlen(“Hello World”);;
// execute kernel
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1,
NULL, global_work_size, NULL, 0, NULL, NULL);
// read output array
err = clEnqueueReadBuffer(cmd_queue, memobjs[0],
CL_TRUE, 0, strlen(“Hello World”) *sizeof(cl_char),
dst, 0, NULL, NULL);
HELLO WORLD - HOST PROGRAM
// create the OpenCL context on a GPU device
cl_context = clCreateContextFromType(0,
CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
Define platform and queues
// get the list of GPU devices associated with context
clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,
NULL, &cb);
devices = malloc(cb);
clGetContextInfo(context, CL_CONTEXT_DEVICES, cb,
devices, NULL);
// create a command-queue
Define Memory objects
cmd_queue = clCreateCommandQueue(context, devices[0],
0, NULL);
Build the program
// build the program
err = clBuildProgram(program, 0, NULL, NULL, NULL,
NULL);
// create the kernel
kernel = clCreateKernel(program, “vec_add”, NULL);
Create and setup kernel
// set the args values
err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],
sizeof(cl_mem));
// set work-item dimensions
global_work_size[0] = n;
// execute kernel
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1,
NULL, global_work_size, NULL, 0, NULL, NULL);
// read output array
err = clEnqueueReadBuffer(context, memobjs[2], CL_TRUE,
0, n*sizeof(cl_float), dst, 0, NULL, NULL);
Execute the kernel
// allocate the buffer memory objects
memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR, sizeof(cl_char)*strlen(“Hello
World”), srcA, NULL);}
// create the program
program = clCreateProgramWithSource(context, 1,
&program_source, NULL, NULL);
Create the program
7 | Programing models for next generation GPGPU | February, 2011 | Public
Read results on the host
What can we learn
8 | Programing models for next generation GPGPU | February, 2011 | Public
LEARN FROM CURRENT GENERATION ARCHITECTURE
9 | Programing models for next generation GPGPU | February, 2011 | Public
COMMON USE CASES
 In OpenCL™ we generally see:
– Pick single device (often GPU or CL_DEVICE_TYPE_DEFAULT)
– All “kernels” in cl_program object are used in application
 In CUDA the default for runtime mode is:
– Pick single device (always GPU)
– All “kernels” in scope are exported to the host application for specific
translation unit, i.e. calling kernels is syntactic and behave similar to
static linkage.
10 | Programing models for next generation GPGPU | February, 2011 | Public
A look into the future
11 | Programing models for next generation GPGPU | February, 2011 | Public
NEXT GENERATION GPGPU PROGRAM STRUCTURE
C++0x Code
CPU
(C++0x)
12 | Programing models for next generation GPGPU | February, 2011 | Public
DEVICE
(C++0x)
HELLO WORLD C++0X SOURCE
hw[] = "Hello World\n";
void __attribute__(gpu) hello(
Index<1> index,
char * out)
{
size_t id = index.getX();
out[id] = hw[id];
}
int main(void)
{
char output[100];
parallelFor(Range<1>(length(hw)),
[output] (Index<1> index) {
hello(index, output);
});
}
13 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD C++0X SOURCE
•
•
•
hw[] = "Hello World\n";
void __attribute__(gpu) hello(
Index<1> index,
char * out)
{
size_t id = index.getX();
out[id]C++0x
= hw[id];
A single
program
} directly access data on host and device
Can
Compiled offline
int main(void)
{
char output[100];
parallelFor(Range<1>(length(hw)),
[output] (Index<1> index) {
hello(index, output);
});
}
14 | Programing models for next generation GPGPU | February, 2011 | Public
What questions still need
to be answered
15 | Programing models for next generation GPGPU | February, 2011 | Public
WHAT QUESTIONS NEED TO BE ANSWERED
 How close can the CPU and GPU really be?
– How does it effect the models of today:
 GPU implies through put computing!
 CPU implies local latency hiding in branchy code!
 What effect does it have on the kind of applications one can run on these
Fusion systems?
 How does this all fit with managed languages?
 Is C++0x enough on its own?
 What about languages like Haskell or other high-level models?
16 | Programing models for next generation GPGPU | February, 2011 | Public
QUESTIONS
17 | Programing models for next generation GPGPU | February, 2011 | Public
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States
and/or other jurisdictions. OpenCL is a trademark of Apple Inc. used with permission by Khronos. Other names used in this
presentation are for identification purposes only and may be trademarks of their respective owners.
©2011 Advanced Micro Devices, Inc. All rights reserved.
18 | Programing models for next generation GPGPU | February, 2011 | Public