Tải bản đầy đủ (.pdf) (458 trang)

Tài liệu How to accelerate graphics and computationIN ACTIONMatthew ScarpinoMANNING.OpenCL in docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (12.09 MB, 458 trang )

MANNING
Matthew Scarpino
How to accelerate graphics and computation
IN ACTION
OpenCL in Action
Download from Wow! eBook <www.wowebook.com>
Download from Wow! eBook <www.wowebook.com>
OpenCL in Action
HOW TO ACCELERATE GRAPHICS AND COMPUTATION
MATTHEW SCARPINO
MANNING
SHELTER ISLAND
Download from Wow! eBook <www.wowebook.com>
For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Email:
©2012 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning
Publications was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have


the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books are
printed on paper that is at least 15 percent recycled and processed without the use of elemental
chlorine.
Manning Publications Co. Development editor: Maria Townsley
20 Baldwin Road Copyeditor: Andy Carroll
PO Box 261 Proofreader: Maureen Spencer
Shelter Island, NY 11964 Typesetter: Gordan Salinovic
Cover designer: Marija Tudor
ISBN 9781617290176
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – MAL – 16 15 14 13 12 11
Download from Wow! eBook <www.wowebook.com>
v
brief contents
PART 1 FOUNDATIONS OF OPENCL PROGRAMMING 1
1 ■ Introducing OpenCL 3
2 ■ Host programming: fundamental data structures 16
3 ■ Host programming: data transfer and partitioning 43
4 ■ Kernel programming: data types and device memory 68
5 ■ Kernel programming: operators and functions 94
6 ■ Image processing 123
7 ■ Events, profiling, and synchronization 140
8 ■ Development with C++ 167
9 ■ Development with Java and Python 196
10 ■ General coding principles 221
PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL 235
11 ■ Reduction and sorting 237
12 ■ Matrices and QR decomposition 258
13 ■ Sparse matrices 278

14 ■ Signal processing and the fast Fourier transform 295
Download from Wow! eBook <www.wowebook.com>
BRIEF CONTENTS
vi
PART 3 ACCELERATING OPENGL WITH OPENCL 319
15 ■ Combining OpenCL and OpenGL 321
16 ■ Textures and renderbuffers 340
Download from Wow! eBook <www.wowebook.com>
vii
contents
preface xv
acknowledgments xvii
about this book xix
PART 1 FOUNDATIONS OF OPENCL PROGRAMMING 1
1
Introducing OpenCL 3
1.1 The dawn of OpenCL 4
1.2 Why OpenCL? 5
Portability 6

Standardized vector processing 6

Parallel
programming 7
1.3 Analogy: OpenCL processing and a game of cards 8
1.4 A first look at an OpenCL application 10
1.5 The OpenCL standard and extensions 13
1.6 Frameworks and software development kits (SDKs) 14
1.7 Summary 14
Download from Wow! eBook <www.wowebook.com>

CONTENTS
viii
2
Host programming: fundamental data structures 16
2.1 Primitive data types 17
2.2 Accessing platforms 18
Creating platform structures 18

Obtaining platform
information 19

Code example: testing platform extensions 20
2.3 Accessing installed devices 22
Creating device structures 22

Obtaining device
information 23

Code example: testing device extensions 24
2.4 Managing devices with contexts 25
Creating contexts 26
Obtaining context information 28

Contexts and the reference
count 28

Code example: checking
a context’s reference
count 29
2.5 Storing device code in programs 30

Creating programs 30

Building programs 31

Obtaining
program information 33

Code example: building a program from
multiple source files 35
2.6 Packaging functions in kernels 36
Creating kernels 36

Obtaining kernel information 37
Code example: obtaining kernel information 38
2.7 Collecting kernels in a command queue 39
Creating command queues 40

Enqueuing kernel execution
commands 40
2.8 Summary 41
3
Host programming: data transfer and partitioning 43
3.1 Setting kernel arguments 44
3.2 Buffer objects 45
Allocating buffer objects 45

Creating subbuffer objects 47
3.3 Image objects 48
Creating image objects 48


Obtaining information about image
objects 51
3.4 Obtaining information about buffer objects 52
3.5 Memory object transfer commands 54
Read/write data transfer 54

Mapping memory objects 58
Copying data between memory objects 59
Download from Wow! eBook <www.wowebook.com>
CONTENTS
ix
3.6 Data partitioning 62
Loops and work-items 63

Work sizes and offsets 64

A simple
one-dimensional example 65

Work-groups and compute units 65
3.7 Summary 67
4
Kernel programming: data types and device memory 68
4.1 Introducing kernel coding 69
4.2 Scalar data types 70
Accessing the double data type 71

Byte order 72
4.3 Floating-point computing 73
The float data type 73


The double data type 74

The half
data type 75

Checking IEEE-754 compliance 76
4.4 Vector data types 77
Preferred vector widths 79

Initializing vectors 80

Reading
and modifying vector components 80

Endianness and memory
access 84
4.5 The OpenCL device model 85
Device model analogy part 1: math students in school 85

Device
model analogy part 2: work-items in a device 87

Address spaces
in code 88

Memory alignment 90
4.6 Local and private kernel arguments 90
Local arguments 91


Private arguments 91
4.7 Summary 93
5
Kernel programming: operators and functions 94
5.1 Operators 95
5.2 Work-item and work-group functions 97
Dimensions and work-items 98

Work-groups 99

An
example application 100
5.3 Data transfer operations 101
Loading and storing data of the same type 101

Loading vectors
from a scalar array 101

Storing vectors to a scalar array 102
5.4 Floating-point functions 103
Arithmetic and rounding functions 103

Comparison
functions 105

Exponential and logarithmic functions 106
Trigonometric functions 106

Miscellaneous floating-point
functions 108

Download from Wow! eBook <www.wowebook.com>
CONTENTS
x
5.5 Integer functions 109
Adding and subtracting integers 110

Multiplication 111
Miscellaneous integer functions 112
5.6 Shuffle and select functions 114
Shuffle functions 114

Select functions 116
5.7 Vector test functions 118
5.8 Geometric functions 120
5.9 Summary 122
6
Image processing 123
6.1 Image objects and samplers 124
Image objects on the host: cl_mem 124

Samplers on the host:
cl_sampler 125

Image objects on the device: image2d_t and
image3d_t 128

Samplers on the device: sampler_t 129
6.2 Image processing functions 130
Image read functions 130


Image write functions 132
Image information functions 133

A simple example 133
6.3 Image scaling and interpolation 135
Nearest-neighbor interpolation 135

Bilinear interpolation 136
Image enlargement in OpenCL 138
6.4 Summary 139
7
Events, profiling, and synchronization 140
7.1 Host notification events 141
Associating an event with a command 141

Associating an event
with a callback function 142

A host notification example 143
7.2 Command synchronization events 145
Wait lists and command events 145

Wait lists and user
events 146

Additional command synchronization
functions 148

Obtaining data associated with events 150
7.3 Profiling events 153

Configuring command profiling 153

Profiling data
transfer 155

Profiling data partitioning 157
7.4 Work-item synchronization 158
Barriers and fences 159

Atomic operations 160

Atomic
commands and mutexes 163

Asynchronous data transfer 164
7.5 Summary 166
Download from Wow! eBook <www.wowebook.com>
CONTENTS
xi
8
Development with C++ 167
8.1 Preliminary concerns 168
Vectors and strings 168

Exceptions 169
8.2 Creating kernels 170
Platforms, devices, and contexts 170

Programs and kernels 173
8.3 Kernel arguments and memory objects 176

Memory objects 177

General data arguments 181

Local
space arguments 182
8.4 Command queues 183
Creating CommandQueue objects 183

Enqueuing kernel-
execution commands 183

Read/write commands 185
Memory mapping and copy commands 187
8.5 Event processing 189
Host notification 189

Command synchronization 191
Profiling events 192

Additional event functions 193
8.6 Summary 194
9
Development with Java and Python 196
9.1 Aparapi 197
Aparapi installation 198

The Kernel class 198

Work-items

and work-groups 200
9.2 JavaCL 201
JavaCL installation 202

Overview of JavaCL development 202
Creating kernels with JavaCL 203

Setting arguments and
enqueuing commands 206
9.3 PyOpenCL 210
PyOpenCL installation and licensing 210

Overview of PyOpenCL
development 211

Creating kernels with PyOpenCL 212

Setting
arguments and executing kernels 215
9.4 Summary 219
10
General coding principles 221
10.1 Global size and local size 222
Finding the maximum work-group size 223

Testing kernels and
devices 224
10.2 Numerical reduction 225
OpenCL reduction 226


Improving reduction speed with vectors 228
Download from Wow! eBook <www.wowebook.com>
CONTENTS
xii
10.3 Synchronizing work-groups 230
10.4 Ten tips for high-performance kernels 231
10.5 Summary 233
PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL 235
11
Reduction and sorting 237
11.1 MapReduce 238
Introduction to MapReduce 238

MapReduce and
OpenCL 240

MapReduce example: searching for text 242
11.2 The bitonic sort 244
Understanding the bitonic sort 244

Implementing the bitonic sort
in OpenCL 247
11.3 The radix sort 254
Understanding the radix sort 254

Implementing the radix sort
with vectors 254
11.4 Summary 256
12
Matrices and QR decomposition 258

12.1 Matrix transposition 259
Introduction to matrices 259

Theory and implementation of
matrix transposition 260
12.2 Matrix multiplication 262
The theory of matrix multiplication 262

Implementing matrix
multiplication in OpenCL 263
12.3 The Householder transformation 265
Vector projection 265

Vector reflection 266

Outer products
and Householder matrices 267

Vector reflection in
OpenCL 269
12.4 The QR decomposition 269
Finding the Householder vectors and R 270

Finding the
Householder matrices and Q 272

Implementing QR
decomposition in OpenCL 273
12.5 Summary 276
13

Sparse matrices 278
13.1 Differential equations and sparse matrices 279
Download from Wow! eBook <www.wowebook.com>
CONTENTS
xiii
13.2 Sparse matrix storage and the Harwell-Boeing collection 280
Introducing the Harwell-Boeing collection 281

Accessing data in
Matrix Market files 281
13.3 The method of steepest descent 285
Positive-definite matrices 285

Theory of the method of steepest
descent 286

Implementing SD in OpenCL 288
13.4 The conjugate gradient method 289
Orthogonalization and conjugacy 289

The conjugate gradient
method 291
13.5 Summary 293
14
Signal processing and the fast Fourier transform 295
14.1 Introducing frequency analysis 296
14.2 The discrete Fourier transform 298
Theory behind the DFT 298

OpenCL and the DFT 305

14.3 The fast Fourier transform 306
Three properties of the DFT 306

Constructing the fast Fourier
transform 309

Implementing the FFT with OpenCL 312
14.4 Summary 317
PART 3 ACCELERATING OPENGL WITH OPENCL 319
15
Combining OpenCL and OpenGL 321
15.1 Sharing data between OpenGL and OpenCL 322
Creating the OpenCL context 323

Sharing data between OpenGL
and OpenCL 325

Synchronizing access to shared data 328
15.2 Obtaining information 329
Obtaining OpenGL object and texture information 329

Obtaining
information about the OpenGL context 330
15.3 Basic interoperability example 331
Initializing OpenGL operation 331

Initializing OpenCL
operation 331

Creating data objects 332


Executing the
kernel 333

Rendering graphics 334
15.4 Interoperability and animation 334
Specifying vertex data 335

Animation and display 336
Executing the kernel 337
15.5 Summary 338
Download from Wow! eBook <www.wowebook.com>
CONTENTS
xiv
16
Textures and renderbuffers 340
16.1 Image filtering 341
The Gaussian blur 343

Image sharpening 344

Image
embossing 344
16.2 Filtering textures with OpenCL 345
The init_gl function 345

The init_cl function 345

The
configure_shared_data function 346


The execute_kernel
function 347

The display function 348
16.3 Summary 349
appendix A Installing and using a software development kit 351
appendix B Real-time rendering with OpenGL 364
appendix C The minimalist GNU for Windows and OpenCL 398
appendix D OpenCL on mobile devices 412
index 415
Download from Wow! eBook <www.wowebook.com>
xv
preface
In the summer of 1997, I was terrified. Instead of working as an intern in my major
(microelectronic engineering), the best job I could find was at a research laboratory
devoted to high-speed signal processing. My job was to program the two-dimensional
fast Fourier transform (
FFT) using C and the Message Passing Interface (MPI), and get
it running as quickly as possible. The good news was that the lab had sixteen brand new
SPARCstations. The bad news was that I knew absolutely nothing about MPI or the FFT.
Thanks to books purchased from a strange new site called Amazon.com, I man-
aged to understand the basics of
MPI: the application deploys one set of instructions
to multiple computers, and each processor accesses data according to its ID. As each
processor finishes its task, it sends its output to the processor whose ID equals 0.
It took me time to grasp the finer details of MPI (blocking versus nonblocking data
transfer, synchronous versus asynchronous communication), but as I worked more
with the language, I fell in love with distributed computing. I loved the fact that I
could get sixteen monstrous computers to process data in lockstep, working together

like athletes on a playing field. I felt like a choreographer arranging a dance or a com-
poser writing a symphony for an orchestra. By the end of the internship, I coded mul-
tiple versions of the 2-D
FFT in MPI, but the lab’s researchers decided that network
latency made the computation impractical.
Since that summer, I’ve always gravitated toward high-performance computing, and
I’ve had the pleasure of working with digital signal processors, field-programmable gate
arrays, and the Cell processor, which serves as the brain of Sony’s PlayStation 3. But noth-
ing beats programming graphics processing units (
GPUs) with OpenCL. As today’s
Download from Wow! eBook <www.wowebook.com>
PREFACE
xvi
supercomputers have shown, no CPU provides the same number-crunching power per
watt as a GPU. And no language can target as wide a range of devices as OpenCL.
When AMD released its OpenCL development tools in 2009, I fell in love again.
Not only does OpenCL provide new vector types and a wealth of math functions, but it
also resembles MPI in many respects. Both toolsets are freely available and their rou-
tines can be called in C or C++. In both cases, applications deliver instructions to mul-
tiple devices whose processing units rely on
IDs to determine which data they should
access. MPI and OpenCL also make it possible to send data using similar types of block-
ing/non-blocking transfers and synchronous/asynchronous communication.
Open
CL is still new in the world of high-performance computing, and many pro-
grammers don’t know it exists. To help spread the word about this incredible lan-
guage, I decided to write Open
CL in Action. I’ve enjoyed working on this book a great
deal, and I hope it helps newcomers take advantage of the power of OpenCL and dis-
tributed computing in general.

As I write this in the summer of 2011, I feel as though I’ve come full circle. Last
night, I put the finishing touches on the
FFT application presented in chapter 14. It
brought back many pleasant memories of my work with MPI, but I’m amazed by how
much the technology has changed. In 1997, the sixteen SPARCstations in my lab took
nearly a minute to perform a 32k FFT. In 2011, my $300 graphics card can perform an
FFT on millions of data points in seconds.
The technology changes, but the enjoyment remains the same. The learning curve
can be steep in the world of distributed computing, but the rewards more than make
up for the effort expended.
Download from Wow! eBook <www.wowebook.com>
xvii
acknowledgments
I started writing my first book for Manning Publications in 2003, and though much
has changed, they are still as devoted to publishing high-quality books now as they
were then. I’d like to thank all of Manning’s professionals for their hard work and
dedication, but I’d like to acknowledge the following folks in particular:
First, I’d like to thank Maria Townsley, who worked as developmental editor. Maria
is one of the most hands-on editors I’ve worked with, and she went beyond the call of
duty in recommending ways to improve the book’s organization and clarity. I bristled
and whined, but in the end, she turned out to be absolutely right. In addition, despite
my frequent rewriting of the table of contents, her pleasant disposition never flagged
for a moment.
I’d like to extend my deep gratitude to the entire Manning production team. In
particular, I’d like to thank Andy Carroll for going above and beyond the call of duty
in copyediting this book. His comments and insight have not only dramatically
improved the polish of the text, but his technical expertise has made the content
more accessible. Similarly, I’d like to thank Maureen Spencer and Katie Tennant for
their eagle-eyed proofreading of the final copy and Gordan Salinovic for his painstak-
ing labor in dealing with the book’s images and layout. I’d also like to thank Mary

Piergies for masterminding the production process and making sure the final product
lives up to Manning’s high standards.
Jörn Dinkla is, simply put, the best technical editor I’ve ever worked with. I tested
the book’s example code on Linux and Mac
OS, but he went further and tested the
code with software development kits from Linux, AMD, and Nvidia. Not only did he
Download from Wow! eBook <www.wowebook.com>
ACKNOWLEDGMENTS
xviii
catch quite a few errors I missed, but in many cases, he took the time to find out why
the error had occurred. I shudder to think what would have happened without his
assistance, and I’m beyond grateful for the work he put into improving the quality of
this book’s code.
I’d like to thank Candace Gilhooley for spreading the word about the book’s pub-
lication. Given Open
CL’s youth, the audience isn’t as easy to reach as the audience
for Manning’s many Java books. But between setting up web articles, presentations,
and conference attendance, Candace has done an exemplary job in marketing
Open
CL in Action.
One of Manning’s greatest strengths is its reliance on constant feedback. During
development and production, Karen Tegtmeyer and Ozren Harlovic sought out
reviewers for this book and organized a number of review cycles. Thanks to the feed-
back from the following reviewers, this book includes a number of important subjects
that I wouldn’t otherwise have considered: Olivier Chafik, Martin Beckett, Benjamin
Ducke, Alan Commike, Nathan Levesque, David Strong, Seth Price, John J. Ryan III,
and John Griffin.
Last but not least, I’d like to thank Jan Bednarczuk of Jandex Indexing for her
meticulous work in indexing the content of this book. She not only created a thor-
ough, professional index in a short amount of time, but she also caught quite a few

typos in the process. Thanks again.
Download from Wow! eBook <www.wowebook.com>
xix
about this book
OpenCL is a complex subject. To code even the simplest of applications, a developer
needs to understand host programming, device programming, and the mechanisms
that transfer data between the host and device. The goal of this book is to show how
these tasks are accomplished and how to put them to use in practical applications.
The format of this book is tutorial-based. That is, each new concept is followed by
example code that demonstrates how the theory is used in an application. Many of the
early applications are trivially basic, and some do nothing more than obtain informa-
tion about devices and data structures. But as the book progresses, the code becomes
more involved and makes fuller use of both the host and the target device. In the later
chapters, the focus shifts from learning how Open
CL works to putting OpenCL to use
in processing vast amounts of data at high speed.
Audience
In writing this book, I’ve assumed that readers have never heard of OpenCL and know
nothing about distributed computing or high-performance computing. I’ve done my
best to present concepts like task-parallelism and
SIMD (single instruction, multiple
data) development as simply and as straightforwardly as possible.
But because the Open
CL API is based on C, this book presumes that the reader has
a solid understanding of C fundamentals. Readers should be intimately familiar with
pointers, arrays, and memory access functions like
malloc
and
free
. It also helps to be

cognizant of the C functions declared in the common math library, as most of the ker-
nel functions have similar names and usages.
Download from Wow! eBook <www.wowebook.com>
ABOUT THIS BOOK
xx
OpenCL applications can run on many different types of devices, but one of its
chief advantages is that it can be used to program graphics processing units (GPUs).
Therefore, to get the most out of this book, it helps to have a graphics card attached
to your computer or a hybrid
CPU-GPU device such as AMD’s Fusion.
Roadmap
This book is divided into three parts. The first part, which consists of chapters 1–10,
focuses on exploring the OpenCL language and its capabilities. The second part,
which consists of chapters 11–14, shows how OpenCL can be used to perform large-
scale tasks commonly encountered in the field of high-performance computing. The
last part, which consists of chapters 15 and 16, shows how Open
CL can be used to
accelerate OpenGL applications.
The chapters of part 1 have been structured to serve the needs of a programmer
who has never coded a line of Open
CL. Chapter 1 introduces the topic of OpenCL,
explaining what it is, where it came from, and the basics of its operation. Chapters 2
and 3 explain how to code applications that run on the host, and chapters 4 and 5
show how to code kernels that run on compliant devices. Chapters 6 and 7 explore
advanced topics that involve both host programming and kernel coding. Specifically,
chapter 6 presents image processing and chapter 7 discusses the important topics of
event processing and synchronization.
Chapters 8 and 9 discuss the concepts first presented in chapters 2 through 5, but
using languages other than C. Chapter 8 discusses host/kernel coding in C++, and
chapter 9 explains how to build Open

CL applications in Java and Python. If you aren’t
obligated to program in C, I recommend that you use one of the toolsets discussed in
these chapters.
Chapter 10 serves as a bridge between parts 1 and 2. It demonstrates how to take
full advantage of Open
CL’s parallelism by implementing a simple reduction algorithm
that adds together one million data points. It also presents helpful guidelines for cod-
ing practical Open
CL applications.
Chapters 11–14 get into the heavy-duty usage of OpenCL, where applications com-
monly operate on millions of data points. Chapter 11 discusses the implementation of
MapReduce and two sorting algorithms: the bitonic sort and the radix sort. Chapter 12
covers operations on dense matrices, and chapter 13 explores operations on sparse
matrices. Chapter 14 explains how Open
CL can be used to implement the fast Fourier
transform (FFT).
Chapters 15 and 16 are my personal favorites. One of OpenCL’s great strengths is
that it can be used to accelerate three-dimensional rendering, a topic of central inter-
est in game development and scientific visualization. Chapter 15 introduces the topic
of Open
CL-OpenGL interoperability and shows how the two toolsets can share data
corresponding to vertex attributes. Chapter 16 expands on this and shows how
Open
CL can accelerate OpenGL texture processing. These chapters require an
understanding of OpenGL 3.3 and shader development, and both of these topics are
explored in appendix B.
Download from Wow! eBook <www.wowebook.com>
ABOUT THIS BOOK
xxi
At the end of the book, the appendixes provide helpful information related to

OpenCL, but the material isn’t directly used in common OpenCL development.
Appendix A discusses the all-important topic of software development kits (SDKs), and
explains how to install the SDKs provided by AMD and Nvidia. Appendix B discusses
the basics of OpenGL and shader development. Appendix C explains how to install
and use the Minimalist
GNU for Windows (MinGW), which provides a GNU-like envi-
ronment for building executables on the Windows operating system. Lastly, appendix
D discusses the specification for embedded Open
CL.
Obtaining and compiling the example code
In the end, it’s the code that matters. This book contains working code for over 60
OpenCL applications, and you can download the source code from the publisher’s
website at www.manning.com/OpenCLinAction or www.manning.com/scarpino2/.
The download site provides a link pointing to an archive that contains code
intended to be compiled with
GNU-based build tools. This archive contains one folder
for each chapter/appendix of the book, and each top-level folder has subfolders for
example projects. For example, if you look in the Ch5/shuffle_test directory, you’ll
find the source code for Chapter 5’s shuffle_test project.
As far as dependencies go, every project requires that the Open
CL library
(OpenCL.lib on Windows, libOpenCL.so on *nix systems) be available on the develop-
ment system. Appendix A discusses how to obtain this library by installing an appro-
priate software development kit (
SDK).
In addition, chapters 6 and 16 discuss images, and the source code in these chap-
ters makes use of the open-source
PNG library. Chapter 6 explains how to obtain this
library for different systems. Appendix B and chapters 15 and 16 all require access to
OpenGL, and appendix B explains how to obtain and install this toolset.

Code conventions
As lazy as this may sound, I prefer to copy and paste working code into my applica-
tions rather than write code from scratch. This not only saves time, but also reduces
the likelihood of producing bugs through typographical errors. All the code in this
book is public domain, so you’re free to download and copy and paste portions of it
into your applications. But before you do, it’s a good idea to understand the conven-
tions I’ve used:

Host data structures are named after their data type. That is, each
cl_platform_id
structure is called
platform
, each
cl_device_id
structure is
called
device
, each
cl_context
structure is called
context
, and so on.

In the host applications, the
main
function calls on two functions:
create_device
returns a
cl_device
, and

build_program
creates and compiles a
cl_program
.
Note that
create_device
searches for a GPU associated with the first available
platform. If it can’t find a GPU, it searches for the first compliant CPU.
Download from Wow! eBook <www.wowebook.com>
ABOUT THIS BOOK
xxii

Host applications identify the program file and the kernel function using macros
declared at the start of the source file. Specifically, the
PROGRAM_FILE
macro iden-
tifies the program file and
KERNEL_FUNC
identifies the kernel function.

All my program files end with the .cl suffix. If the program file only contains one
kernel function, that function has the same name as the file.

For GNU code, every makefile assumes that libraries and header files can be found
at locations identified by environment variables. Specifically, the makefile
searches for
AMDAPPSDKROOT
on AMD platforms and
CUDA
on Nvidia platforms.

Author Online
Nobody’s perfect. If I failed to convey my subject material clearly or (gasp) made a
mistake, feel free to add a comment through Manning’s Author Online system. You
can find the Author Online forum for this book by going to www.manning.com/
Open
CLinAction and clicking the Author Online link.
Simple questions and concerns get rapid responses. In contrast, if you’re unhappy
with line 402 of my bitonic sort implementation, it may take me some time to get back
to you. I’m always happy to discuss general issues related to Open
CL, but if you’re
looking for something complex and specific, such as help debugging a custom FFT, I
will have to recommend that you find a professional consultant.
About the cover illustration
The figure on the cover of OpenCL in Action is captioned a “Kranjac,” or an inhabitant
of the Carniola region in the Slovenian Alps. This illustration is taken from a recent
reprint of Balthasar Hacquet’s Images and Descriptions of Southwestern and Eastern Wenda,
Illyrians, and Slavs published by the Ethnographic Museum in Split, Croatia, in 2008.
Hacquet (1739–1815) was an Austrian physician and scientist who spent many years
studying the botany, geology, and ethnography of the Julian Alps, the mountain range
that stretches from northeastern Italy to Slovenia and that is named after Julius Cae-
sar. Hand drawn illustrations accompany the many scientific papers and books that
Hacquet published.
The rich diversity of the drawings in Hacquet's publications speaks vividly of the
uniqueness and individuality of the eastern Alpine regions just 200 years ago. This was
a time when the dress codes of two villages separated by a few miles identified people
uniquely as belonging to one or the other, and when members of a social class or
trade could be easily distinguished by what they were wearing. Dress codes have
changed since then and the diversity by region, so rich at the time, has faded away. It is
now often hard to tell the inhabitant of one continent from another and today the
inhabitants of the picturesque towns and villages in the Slovenian Alps are not readily

distinguishable from the residents of other parts of Slovenia or the rest of Europe.
We at Manning celebrate the inventiveness, the initiative, and the fun of the com-
puter business with book covers based on costumes from two centuries ago brought
back to life by illustrations such as this one.
Download from Wow! eBook <www.wowebook.com>
Part 
Foundations of
OpenCL programming
Part 1 presents the OpenCL language. We’ll explore OpenCL’s data structures
and functions in detail and look at example applications that demonstrate their
usage in code.
Chapter 1 introduces Open
CL, explaining what it’s used for and how it
works. Chapters 2 and 3 explain how host applications are coded, and chapters 4
and 5 discuss kernel coding. Chapters 6 and 7 explore the advanced topics of
image processing and event handling.
Chapters 8 and 9 discuss how Open
CL is coded in languages other than C,
such as C++, Java, and Python. Chapter 10 explains how OpenCL’s capabilities
can be used to develop large-scale applications.
Download from Wow! eBook <www.wowebook.com>
Download from Wow! eBook <www.wowebook.com>

×