Programming models for next generation
of GPGPU architectures
Benedict R. Gaster
February, 2011
Motivation
2 | Programing models for next generation GPGPU | February, 2011 | Public
OPENCL™ PROGRAM STRUCTURE
Host C/C++ Code
OpenCL™ C Device Code
CPU
DEVICE
(Platform and
Runtime APIs)
(OpenCL C)
3 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD OPENCL™ C SOURCE
__constant char hw[] = "Hello World\n";
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
4 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD OPENCL™ C SOURCE
__constant char hw[] = "Hello World\n";
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
•
•
•
This is a separate source file (or string)
Cannot directly access host data
Compiled at runtime
5 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD - HOST PROGRAM
// create the OpenCL context on a GPU device
cl_context = clCreateContextFromType(0,
CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
// get the list of GPU devices associated with context
clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,
NULL, &cb);
devices = malloc(cb);
clGetContextInfo(context, CL_CONTEXT_DEVICES, cb,
devices, NULL);
// create a command-queue
cmd_queue = clCreateCommandQueue(context, devices[0],
0, NULL);
memobjs[0] = clCreateBuffer(context,CL_MEM_WRITE_ONLY,
sizeof(cl_char)*strlen(“Hello World”, NULL,
NULL);
// create the program
program = clCreateProgramWithSource(context, 1,
&program_source, NULL, NULL);
6 | Programing models for next generation GPGPU | February, 2011 | Public
// build the program
err = clBuildProgram(program, 0, NULL, NULL, NULL,
NULL);
// create the kernel
kernel = clCreateKernel(program, “vec_add”, NULL);
// set the args values
err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],
sizeof(cl_mem));
// set work-item dimensions
global_work_size[0] = strlen(“Hello World”);;
// execute kernel
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1,
NULL, global_work_size, NULL, 0, NULL, NULL);
// read output array
err = clEnqueueReadBuffer(cmd_queue, memobjs[0],
CL_TRUE, 0, strlen(“Hello World”) *sizeof(cl_char),
dst, 0, NULL, NULL);
HELLO WORLD - HOST PROGRAM
// create the OpenCL context on a GPU device
cl_context = clCreateContextFromType(0,
CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
Define platform and queues
// get the list of GPU devices associated with context
clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,
NULL, &cb);
devices = malloc(cb);
clGetContextInfo(context, CL_CONTEXT_DEVICES, cb,
devices, NULL);
// create a command-queue
Define Memory objects
cmd_queue = clCreateCommandQueue(context, devices[0],
0, NULL);
Build the program
// build the program
err = clBuildProgram(program, 0, NULL, NULL, NULL,
NULL);
// create the kernel
kernel = clCreateKernel(program, “vec_add”, NULL);
Create and setup kernel
// set the args values
err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],
sizeof(cl_mem));
// set work-item dimensions
global_work_size[0] = n;
// execute kernel
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1,
NULL, global_work_size, NULL, 0, NULL, NULL);
// read output array
err = clEnqueueReadBuffer(context, memobjs[2], CL_TRUE,
0, n*sizeof(cl_float), dst, 0, NULL, NULL);
Execute the kernel
// allocate the buffer memory objects
memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR, sizeof(cl_char)*strlen(“Hello
World”), srcA, NULL);}
// create the program
program = clCreateProgramWithSource(context, 1,
&program_source, NULL, NULL);
Create the program
7 | Programing models for next generation GPGPU | February, 2011 | Public
Read results on the host
What can we learn
8 | Programing models for next generation GPGPU | February, 2011 | Public
LEARN FROM CURRENT GENERATION ARCHITECTURE
9 | Programing models for next generation GPGPU | February, 2011 | Public
COMMON USE CASES
In OpenCL™ we generally see:
– Pick single device (often GPU or CL_DEVICE_TYPE_DEFAULT)
– All “kernels” in cl_program object are used in application
In CUDA the default for runtime mode is:
– Pick single device (always GPU)
– All “kernels” in scope are exported to the host application for specific
translation unit, i.e. calling kernels is syntactic and behave similar to
static linkage.
10 | Programing models for next generation GPGPU | February, 2011 | Public
A look into the future
11 | Programing models for next generation GPGPU | February, 2011 | Public
NEXT GENERATION GPGPU PROGRAM STRUCTURE
C++0x Code
CPU
(C++0x)
12 | Programing models for next generation GPGPU | February, 2011 | Public
DEVICE
(C++0x)
HELLO WORLD C++0X SOURCE
hw[] = "Hello World\n";
void __attribute__(gpu) hello(
Index<1> index,
char * out)
{
size_t id = index.getX();
out[id] = hw[id];
}
int main(void)
{
char output[100];
parallelFor(Range<1>(length(hw)),
[output] (Index<1> index) {
hello(index, output);
});
}
13 | Programing models for next generation GPGPU | February, 2011 | Public
HELLO WORLD C++0X SOURCE
•
•
•
hw[] = "Hello World\n";
void __attribute__(gpu) hello(
Index<1> index,
char * out)
{
size_t id = index.getX();
out[id]C++0x
= hw[id];
A single
program
} directly access data on host and device
Can
Compiled offline
int main(void)
{
char output[100];
parallelFor(Range<1>(length(hw)),
[output] (Index<1> index) {
hello(index, output);
});
}
14 | Programing models for next generation GPGPU | February, 2011 | Public
What questions still need
to be answered
15 | Programing models for next generation GPGPU | February, 2011 | Public
WHAT QUESTIONS NEED TO BE ANSWERED
How close can the CPU and GPU really be?
– How does it effect the models of today:
GPU implies through put computing!
CPU implies local latency hiding in branchy code!
What effect does it have on the kind of applications one can run on these
Fusion systems?
How does this all fit with managed languages?
Is C++0x enough on its own?
What about languages like Haskell or other high-level models?
16 | Programing models for next generation GPGPU | February, 2011 | Public
QUESTIONS
17 | Programing models for next generation GPGPU | February, 2011 | Public
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States
and/or other jurisdictions. OpenCL is a trademark of Apple Inc. used with permission by Khronos. Other names used in this
presentation are for identification purposes only and may be trademarks of their respective owners.
©2011 Advanced Micro Devices, Inc. All rights reserved.
18 | Programing models for next generation GPGPU | February, 2011 | Public