Making OpenCL™ Simple with Haskell

Making OpenCL™ Simple with Haskell
Benedict R. Gaster
January, 2011
Attribution and WARNING
 The ideas and work presented here are in collaboration with:
– Garrett Morris (AMD intern 2010 & PhD student Portland State)
 Garrett is the Haskell expert and knows a lot more than I do about
transforming Monads!
2 | Making OpenCL™ Simple | January, 2011 | Public
AGENDA
Motivation
Whistle stop introduction to OpenCL
Bringing OpenCL to Haskell
Lifting to something more in the spirit of Haskell
Quasiquotation
Issues we face using Haskell at AMD
Questions
3 | Making OpenCL™ Simple | January, 2011 | Public
Motivation
4 | Making OpenCL™ Simple | January, 2011 | Public
OPENCL™ PROGRAM STRUCTURE
Host C/C++ Code
OpenCL C Device Code
CPU
DEVICE
(Platform and
Runtime APIs)
(OpenCL C)
5 | Making OpenCL™ Simple | January, 2011 | Public
HELLO WORLD OPENCL™ C SOURCE
__constant char hw[] = "Hello World\n";
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
6 | Making OpenCL™ Simple | January, 2011 | Public
HELLO WORLD OPENCL™ C SOURCE
__constant char hw[] = "Hello World\n";
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
•
•
•
This is a separate source file (or string)
Cannot directly access host data
Compiled at runtime
7 | Making OpenCL™ Simple | January, 2011 | Public
HELLO WORLD - HOST PROGRAM
// create the OpenCL context on a GPU device
cl_context = clCreateContextFromType(0,
CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
// get the list of GPU devices associated with context
clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,
NULL, &cb);
devices = malloc(cb);
clGetContextInfo(context, CL_CONTEXT_DEVICES, cb,
devices, NULL);
// create a command-queue
cmd_queue = clCreateCommandQueue(context, devices[0],
0, NULL);
memobjs[0] = clCreateBuffer(context,CL_MEM_WRITE_ONLY,
sizeof(cl_char)*strlen(“Hello World”, NULL,
NULL);
// create the program
program = clCreateProgramWithSource(context, 1,
&program_source, NULL, NULL);
8 | Making OpenCL™ Simple | January, 2011 | Public
// build the program
err = clBuildProgram(program, 0, NULL, NULL, NULL,
NULL);
// create the kernel
kernel = clCreateKernel(program, “vec_add”, NULL);
// set the args values
err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],
sizeof(cl_mem));
// set work-item dimensions
global_work_size[0] = strlen(“Hello World”);;
// execute kernel
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1,
NULL, global_work_size, NULL, 0, NULL, NULL);
// read output array
err = clEnqueueReadBuffer(cmd_queue, memobjs[0],
CL_TRUE, 0, strlen(“Hello World”) *sizeof(cl_char),
dst, 0, NULL, NULL);
HELLO WORLD - HOST PROGRAM
// create the OpenCL context on a GPU device
cl_context = clCreateContextFromType(0,
CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
Define platform and queues
// get the list of GPU devices associated with context
clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,
NULL, &cb);
devices = malloc(cb);
clGetContextInfo(context, CL_CONTEXT_DEVICES, cb,
devices, NULL);
// create a command-queue
Define Memory objects
cmd_queue = clCreateCommandQueue(context, devices[0],
0, NULL);
Build the program
// build the program
err = clBuildProgram(program, 0, NULL, NULL, NULL,
NULL);
// create the kernel
kernel = clCreateKernel(program, “vec_add”, NULL);
Create and setup kernel
// set the args values
err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],
sizeof(cl_mem));
// set work-item dimensions
global_work_size[0] = n;
// execute kernel
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1,
NULL, global_work_size, NULL, 0, NULL, NULL);
// read output array
err = clEnqueueReadBuffer(context, memobjs[2], CL_TRUE,
0, n*sizeof(cl_float), dst, 0, NULL, NULL);
Execute the kernel
// allocate the buffer memory objects
memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR, sizeof(cl_char)*strlen(“Hello
World”), srcA, NULL);}
// create the program
program = clCreateProgramWithSource(context, 1,
&program_source, NULL, NULL);
Create the program
9 | Making OpenCL™ Simple | January, 2011 | Public
Read results on the host
USING HASKELL OUR GOAL IS TO WRITE THIS
import Language.OpenCL.Module
hstr = "Hello world\n“
hlen = length hstr + 1
prog = initCL [$cl|
__constant char hw[] = $hstr;
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
|]
main :: IO ()
main = withNew prog $
using (bufferWithFlags hwlen [WriteOnly]) $ \b ->
do [k] <- theKernels
invoke k b `overRange` ([0], [hwlen], [1])
liftIO . putStr . map castCCharToChar . fst =<< readBuffer b 0 (hlen - 1)
10 | Making OpenCL™ Simple | January, 2011 | Public
USING HASKELL OUR GOAL IS TO WRITE THIS
import Language.OpenCL.Module
hstr = "Hello world\n“
hlen = length hstr + 1
Single source
prog = initCL [$cl| Quasiquoting
__constant char hw[] = $hstr;
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
|]
OpenCL C code
statically checked
at compile time.
No OpenCL or
Haskell compiler
modifications
main :: IO ()
main = withNew Monad
prog $ transformers
using (bufferWithFlags hwlen [WriteOnly]) $ \b ->
do [k] <- theKernels
invoke k b `overRange` ([0], [hwlen], [1])
liftIO . putStr . map castCCharToChar . fst =<< readBuffer b 0 (hlen - 1)
11 | Making OpenCL™ Simple | January, 2011 | Public
USING HASKELL OUR GOAL IS TO WRITE THIS
import Language.OpenCL.Module
hstr = "Hello world\n“
hlen = length hstr + 1
Single source
prog = initCL [$cl| Quasiquoting
__constant char hw[] = $hstr;
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
|]
OpenCL C code
statically checked
at compile time.
No OpenCL or
Haskell compiler
modifications
main :: IO ()
main = withNew Monad
prog $ transformers
using (bufferWithFlags hwlen [WriteOnly]) $ \b ->
do [k] <- theKernels
invoke k b `overRange` ([0], [hwlen], [1])
liftIO . putStr . map castCCharToChar . fst =<< readBuffer b 0 (hlen - 1)
12 | Making OpenCL™ Simple | January, 2011 | Public
LEARN FROM COMMON USES
 In OpenCL we generally see:
– Pick single device (often GPU or CL_DEVICE_TYPE_DEFAULT)
– All “kernels” in cl_program object are used in application
 In CUDA the default for runtime mode is:
– Pick single device (always GPU)
– All “kernels” in scope are exported to the host application for specific
translation unit, i.e. calling kernels is syntactic and behave similar to
static linkage.
13 | Making OpenCL™ Simple | January, 2011 | Public
Whistle stop introduction
to OpenCL™
14 | Making OpenCL™ Simple | January, 2011 | Public
IT’S A HETEROGENEOUS WORLD
 A modern platform Includes:
– One or more CPUs
– One or more GPUs
– And more
CPU
CPU
North Bridge
System Memory
Fusion
GPU
PCIE
Discrete GPU
15 | Making OpenCL™ Simple | January, 2011 | Public
Graphics
Memory
OPENCL™ PLATFORM MODEL
One Host + one or more Compute Devices
– Each Compute Device is composed of one or more
Compute Units
 Each Compute Unit is further divided into one or more
Processing Elements
16 | Making OpenCL™ Simple | January, 2011 | Public
OPENCL™ EXECUTION MODEL
 An OpenCL application runs on a host which
submits work to the compute devices
– Work item: the basic unit of work on an
OpenCL device
GPU
CPU
– Kernel: the code for a work item.
Basically a C function
– Program: Collection of kernels and other
functions (Analogous to a dynamic library)
– Context: The environment within which workitems executes … includes devices and their
memories and command queues
 Applications queue kernel execution instances
– Queued in-order … one queue to a device
– Executed in-order or out-of-order
17 | Making OpenCL™ Simple | January, 2011 | Public
Context
Queue
Queue
THE BIG IDEA BEHIND OPENCL™
OpenCL execution model …
– Define N-dimensional computation domain
– Execute a kernel at each point in computation domain
Traditional loops
void
trad_mul(int n,
const float *a,
const float *b,
float *c)
{
int i;
for (i=0; i<n; i++)
c[i] = a[i] * b[i];
}
18 | Making OpenCL™ Simple | January, 2011 | Public
THE BIG IDEA BEHIND OPENCL™
OpenCL execution model …
– Define N-dimensional computation domain
– Execute a kernel at each point in computation domain
Traditional loops
void
trad_mul(int n,
const float *a,
const float *b,
float *c)
{
int i;
for (i=0; i<n; i++)
c[i] = a[i] * b[i];
}
19 | Making OpenCL™ Simple | January, 2011 | Public
Data Parallel OpenCL
kernel void
dp_mul(global const float *a,
global const float *b,
global float *c)
{
int id = get_global_id(0);
c[id] = a[id] * b[id];
} // execute over “n” work-items
AN N-DIMENSION DOMAIN OF WORK-ITEMS
 Global Dimensions: 1024 x 1024 (whole problem space)
 Local Dimensions:
128 x 128
1024
1024
(work group … executes together)
Synchronization between work-items
possible only within workgroups:
barriers and memory fences
Cannot synchronize outside
of a workgroup
• Choose the dimensions that are “best” for your algorithm
20 | Making OpenCL™ Simple | January, 2011 | Public
OPENCL™ MEMORY MODEL
•Private Memory
–Per work-item
•Local Memory
–Shared within a workgroup
•Global/Constant Memory
–Visible to all workgroups
•Host Memory
–On the CPU
Private Private
Memory Memory
Private Private
Memory Memory
WorkItem
WorkItem
WorkItem
Local Memory
Workgroup
WorkItem
Local Memory
Workgroup
Global/Constant Memory
Compute Device
Host Memory
Host
Memory management is Explicit
You must move data from host -> global -> local … and back
21 | Making OpenCL™ Simple | January, 2011 | Public
OPENCL™ SUMMARY
CPU
GPU
Context
Programs
Programs
Kernels
Kernels
dp_mul
__kernel void
dp_mul(global const float *a,
global const float *b,
global float *c)
{
int id = get_global_id(0);
c[id] = a[id] * b[id];
}
dp_mul
CPU program binary
dp_mul
GPU program binary
22 | Making OpenCL™ Simple | January, 2011 | Public
arg [0]
[0]
arg
value
value
arg[0]
value
arg [1]
[1]
arg
value
value
arg[1]
value
arg [2]
[2]
arg
value
value
arg[2] value
Memory Objects
Images
Command Queues
Buffers
InIn
Order
Order
Queue
Queue
Outofof
Out
Order
Order
Queue
Queue
GPU Device
Compute
OPENCL™ SUMMARY
CPU
GPU
Context
Programs
Programs
All objects
are referenced
counted Command Queues
Kernels
Memory Objects
Kernels
(clRetainXXX(…), clReleaseXXX(…))
dp_mul
__kernel void
dp_mul(global const float *a,
global const float *b,
global float *c)
{
int id = get_global_id(0);
c[id] = a[id] * b[id];
}
dp_mul
CPU program binary
dp_mul
GPU program binary
23 | Making OpenCL™ Simple | January, 2011 | Public
arg [0]
[0]
arg
value
value
arg[0]
value
arg [1]
[1]
arg
value
value
arg[1]
value
arg [2]
[2]
arg
value
value
arg[2] value
Images
Buffers
InIn
Order
Order
Queue
Queue
Outofof
Out
Order
Order
Queue
Queue
GPU Device
Compute
SYNCHRONIZATION: QUEUES & EVENTS
Each individual queue can execute in-order or out-of-order
– For in-order queue, all commands execute in order
You must explicitly synchronize between queues
– Multiple devices each have their own queue
– Multiple queues per device
– Use events to synchronize
 Events
– Commands return events and obey waitlists
– clEnqueue*(. . . , num_events_in_waitlist, *event_waitlist, *event);
24 | Making OpenCL™ Simple | January, 2011 | Public
PROGRAMMING KERNELS: OPENCL™ C
 Derived from ISO C99
– But without some C99 features such as standard C99 headers,
function pointers, recursion, variable length arrays, and bit fields
 Language Features Added
– Work-items and workgroups
– Vector types
– Synchronization
– Address space qualifiers
 Also includes a large set of built-in functions
– Image manipulation
– Work-item manipulation,
– Math functions, etc.
25 | Making OpenCL™ Simple | January, 2011 | Public
PROGRAMMING KERNELS: WHAT IS A KERNEL
 A data-parallel function executed by each work-item
kernel void square(global float* input,
global float* output)
{
int i = get_global_id(0);
output[i] = input[i] * input[i];
}
get_global_id(0) = 7
Input 6 1 1 0 9 2 4 1 1 9 7 6 8 2 2 5
Output 36 1 1 0 81 4 16 1 1
26 | Making OpenCL™ Simple | January, 2011 | Public
81 49 36 64 4
4 25
PROGRAMMING KERNELS: DATA TYPES
 Scalar data types
– char , uchar, short, ushort, int, uint, long, ulong, float
– bool, intptr_t, ptrdiff_t, size_t, uintptr_t, void, half (storage)
 Image types
– image2d_t, image3d_t, sampler_t
 Vector data types
Double is an
optional type in
OpenCL 1.0
– Vector lengths 2, 4, 8, & 16 (char2, ushort4, int8, float16, double2, …)
– Endian safe
– Aligned at vector length
– Vector operations
27 | Making OpenCL™ Simple | January, 2011 | Public
Bringing OpenCL™ to
Haskell
28 | Making OpenCL™ Simple | January, 2011 | Public
THE BASICS - HELLO WORLD
main = do (p:_) <- getPlatforms
putStrLn . ("Platform is by: " ++) =<< getPlatformInfo p PlatformVendor
c <- createContextFromType
(pushContextProperty ContextPlatform p noProperties) (bitSet [GPU])
p <- createProgramWithSource c . (:[]) =<< readFile “hello_world.cl"
ds <- getContextInfo c ContextDevices
buildProgram p ds ""
k <- createKernel p "hello"
b :: Buffer CChar <- createBuffer c (bitSet [WriteOnly]) hwlen
setKernelArg k 0 b
q <- createCommandQueue c (head ds) (bitSet [])
enqueueNDRangeKernel q k [0] [hwlen] [1] []
putStr . map castCCharToChar =<<
enqueueBlockingReadBuffer q b 0 (hwlen - 1) []
29 | Making OpenCL™ Simple | January, 2011 | Public
MAPPING OPENCL™ STATIC VALUES
class Const t u | t -> u
where value :: t -> u
data Platform_
type Platform = Ptr Platform_
type CLPlatformInfo = Word32
data PlatformInfo t
where PlatformProfile :: PlatformInfo String
PlatformVersion
:: PlatformInfo String
PlatformName
:: PlatformInfo String
PlatformVendor
:: PlatformInfo String
PlatformExtensions :: PlatformInfo String
instance Const (PlatformInfo t) CLPlatformInfo
where value PlatformProfile
= 0x0900
value PlatformVersion
= 0x0901
value PlatformName
= 0x0902
value PlatformVendor
= 0x0903
value PlatformExtensions = 0x0904
30 | Making OpenCL™ Simple | January, 2011 | Public
MAPPING OPENCL™ API
getPlatforms :: IO [Platform]
getPlatforms = appendingLocation "getPlatforms" $
getCountedArray clGetPlatformIDs
getPlatformInfo :: Platform -> PlatformInfo u -> IO u
getPlatformInfo platform plInf =
case plInf of
PlatformProfile
-> get
PlatformVersion
-> get
PlatformName
-> get
PlatformVendor
-> get
PlatformExtensions -> get
where get = appendingLocation "getPlatformInfo" $
getString (clGetPlatformInfo platform (value plInf))
31 | Making OpenCL™ Simple | January, 2011 | Public
AND SO ON FOR THE REST OF OPENCL™ API
 Standard use of Haskell FFI to implement the calls in an out of Haskell
into the OpenCL C world:
foreign import stdcall "cl.h clGetPlatformIDs" clGetPlatformIDs
:: CLCountedArrayGetter(Platform)
foreign import stdcall "cl.h clGetPlatformInfo" clGetPlatformInfo
:: Platform -> CLPlatformInfo -> CLGetter
 Simple extensions to handle OpenGL interop, with the HOpenGL (GLUT)
packages. Allows performance, directly from Haskell, close to original C
version.
32 | Making OpenCL™ Simple | January, 2011 | Public
Lifting to something
more in the spirit of
Haskell
33 | Making OpenCL™ Simple | January, 2011 | Public
SO FAR NOTHING THAT INTERESTING
 HOpenCL Native API
– Standard native function interface within the IO monad
– Little advantage over programming in C++ or PyOpenCL
 HOpenCL Contextual API
– Monad transformers to lift HOpenCL Native API beyond the IO monad
– Quasiquotation to lift OpenCL source into Haskell as a first class
citizen.
34 | Making OpenCL™ Simple | January, 2011 | Public
MOVE TO AN IMPLICIT MODEL
CPU
GPU
Context
Programs
Programs
Kernels
Kernels
Memory Objects
Command Queues
InIn
Order
Order
Queue
Queue
Outofof
Out
Order
Order
Queue
Queue
GPU Device
Compute
35 | Making OpenCL™ Simple | January, 2011 | Public
MOVE TO AN IMPLICIT MODEL
CPU
GPU
Context
Programs
Programs
Kernels
Kernels
Memory Objects
• Context is an environment
• CommandQueue is shared for
read/write device memory and
executing kernels, easily stored
implicitly with an environment
36 | Making OpenCL™ Simple | January, 2011 | Public
Command Queues
InIn
Order
Order
Queue
Queue
Outofof
Out
Order
Order
Queue
Queue
GPU Device
Compute
CONTEXTUAL/QUEUED MONADS
 Function like reader monads for OpenCL Context and
CommandQueue objects, avoiding having to pass Contexts
and/or CommandQueues as the first parameter to many
OpenCL operations.
 Introducing new classes gets around the dependency on the
MonadReader class that would prevent asserting both
MonadReader Context m and MonadReader
CommandQueue m for the same type m.
37 | Making OpenCL™ Simple | January, 2011 | Public
EMBEDDING MONADS
 Wraps class provides a uniform way to embed one
computation into another, for example:
with :: MonadIO m => Context -> ContextM t -> m t
embeds a computation in ContextM (a contextual monad) into
any IO monad.
38 | Making OpenCL™ Simple | January, 2011 | Public
EMBEDDING MONADS
 In fact, the only way we expect to use a context/command queue/etc. is to
populate a contextual/queued computation, so we provide a function that
combines construction in the outer monad with computation in the inner
monad:
withNew :: MonadIO m => m Context -> ContextM t -> m t
 This mechanism is extensible: the quasiquotation support uses a single
data structure to contain context, queue, and device information; however,
by adding it to the 'Contextual„, 'Queued', and 'Wraps' classes we can use
the same code we would have with the more conventional initialization.
39 | Making OpenCL™ Simple | January, 2011 | Public
LIFESPAN
 Many OpenCL objects are reference counted, with clRetainXXX and
clReleaseXXX functions. We overload retain and release operators for all
reference counted CL objects.
 We can use those to build C#-like control structures to automatically
release objects: using for newly constructed objects and retaining for
parameters:
using :: (Lifespan t, MonadIO m) => m t -> (t -> m u) -> m u
retaining :: (Lifespan t, MonadIO m) => t -> (t -> m u) -> m u
 In fact, the withNew function also uses the Lifespan class to automatically
release the objects used to run the sub-computation
40 | Making OpenCL™ Simple | January, 2011 | Public
Quasiquotation
41 | Making OpenCL™ Simple | January, 2011 | Public
THE ROAD TO SINGLE SOURCE
 Embedded OpenCL C concrete syntax into Haskell with quasiquation
 Set of default qausiquoters:
– [$cl| … |]
OpenCL defult device, command queue and so on
– [$clCPU| … |] OpenCL CPU device, command queue, and so on
– [$clALL| … |] all OpenCL devices, with N command queues, and so on
– and on and on … (it would be nice to automatically compute these)
 Statically check OpenCL C source
 Antiquoting can be used to reference Haskell values
 Save OpenCL binaries for reloading at runtime (guarantees correctness)
42 | Making OpenCL™ Simple | January, 2011 | Public
HELLO WORLD REVISITED
import Language.OpenCL.Module
hstr = "Hello world\n“
hlen = length hstr + 1
prog = initCL [$cl|
__constant char hw[] = $hstr;
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
|]
main :: IO ()
main = withNew prog $
using (bufferWithFlags hwlen [WriteOnly]) $ \b ->
do [k] <- theKernels
invoke k b `overRange` ([0], [hwlen], [1])
liftIO . putStr . map castCCharToChar . fst =<< readBuffer b 0 (hlen - 1)
43 | Making OpenCL™ Simple | January, 2011 | Public
SIMPLIFIED VERSION
data CLMod = CLModule String DeviceType
deriving(Show, Typeable, Data)
parseCLProg :: Monad m => Src -> String -> m CLModule
initCL :: CatchIO m => CLModule -> m Module
initCL (CLModule src dtype) =
do (p:_) <- CL.platforms
withNew (CL.contextFromType p [dtype]) $
using (CL.programFromSource src) $ \prog ->
do c <- CL.theContext
ds <- CL.queryContext ContextDevices
CL.buildProgram prog ds ""
ks <- kernels prog
q <- queue (head ds)
return (Mod c ks q)
44 | Making OpenCL™ Simple | January, 2011 | Public
REALLY WE WOULD LIKE TO WRITE
import Language.OpenCL.Module
hstr = "Hello world\n“
hlen = length hstr + 1
prog = [$cl|
__constant char hw[] = $hstr;
__kernel void hello(__global char * out) {
size_t tid = get_global_id(0);
out[tid] = hw[tid];
}
|]
main :: IO ()
main = withNew prog $
using (bufferWithFlags hwlen [WriteOnly]) $ \b ->
do [k] <- theKernels
invoke k b `overRange` ([0], [hwlen], [1])
liftIO . putStr . map castCCharToChar . fst =<< readBuffer b 0 (hlen - 1)
45 | Making OpenCL™ Simple | January, 2011 | Public
BUT THERE IS A PROBLEM (I THINK)
 Quasiquotation requires instances for Typeable and Data
 Remember that HOpenCL has definitions of the form:
data Platform_
type Platform = Ptr Platform_
 No way to derive Typeable and Data for Platform_
 This makes sense in the case that we want to build and retain values of
type Platform at compile time, what does it mean to preserve a pointer?
46 | Making OpenCL™ Simple | January, 2011 | Public
WHAT WE REALLY WANT
 Build a delayed instance of CatchIO m that when evaluated at runtime
performs OpenCL initialization
 Quasiquoting does not seem to allow this due to the requirement of
Typeable and Data instances.
 Remember we do not want to build data values containing Platform_,
rather we want to construct a function that will perform the act at runtime.
 Question: is there a known solution to this problem?
47 | Making OpenCL™ Simple | January, 2011 | Public
WE COULD THEN GENERATE KERNEL WRAPPERS
hello :: KernelInvocation r => Buffer Char -> r
hello = invoke …
or
hello :: Wraps t m n => [Char] -> n [Char]
hello = …
48 | Making OpenCL™ Simple | January, 2011 | Public
Issues we face using
Haskell at AMD
49 | Making OpenCL™ Simple | January, 2011 | Public
USING HASKELL AT AMD
 Monads are a simple idea but they are not easy to explain and use in
practice. At least for many people.
– Originally looking at Garrett‟s Monad transformers I said it seemed
complicated. His response was “ you need to look at them sideways”!
(He was right )
 Haskell is not the Miranda that I learned as an undergraduate, it is big and
a lot of very complicated type theory stuff. Really scary for many people!
– Benjamin C. Pierce has a great talk discussing just this phenomena;
 Types Considered Harmful, May 2008. Invited talk at Mathematical
Foundations of Programming Semantics (MFPS).”
 “I have long advocated type systems…but I‟ve changed my mind”
50 | Making OpenCL™ Simple | January, 2011 | Public
USING HASKELL AT AMD
 Many Universities in the US do not teach Haskell!
 Perception of Haskell is often low:
– “You probably know every Haskell programmer”
– “Why not use Python or …”
 Finally, it is not easy to find interns with the required Haskell experience to
work on these kinds of projects.
To be fair no one has said don’t use Haskell!
51 | Making OpenCL™ Simple | January, 2011 | Public
QUESTIONS
52 | Making OpenCL™ Simple | January, 2011 | Public
DISCLAIMER
The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD
assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this
information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND
ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY
PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL
OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States
and/or other jurisdictions. OpenCL is a trademark of Apple Inc. used with permission by Khronos. Other names used in this
presentation are for identification purposes only and may be trademarks of their respective owners.
©2011 Advanced Micro Devices, Inc. All rights reserved.
53 | Making OpenCL™ Simple | January, 2011 | Public