Tải bản đầy đủ (.pdf) (458 trang)

OpenCL in action

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.65 MB, 458 trang )

How to accelerate graphics and computation

IN ACTION
Matthew Scarpino

MANNING
www.it-ebooks.info


OpenCL in Action

www.it-ebooks.info


www.it-ebooks.info


OpenCL in Action
HOW TO ACCELERATE GRAPHICS AND COMPUTATION

MATTHEW SCARPINO

MANNING
SHELTER ISLAND

www.it-ebooks.info


For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact


Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Email:
©2012 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning
Publications was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have
the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books are
printed on paper that is at least 15 percent recycled and processed without the use of elemental
chlorine.

Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964

Development editor:
Copyeditor:
Proofreader:
Typesetter:

Cover designer:

Maria Townsley
Andy Carroll
Maureen Spencer
Gordan Salinovic
Marija Tudor

ISBN 9781617290176
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – MAL – 16 15 14 13 12 11

www.it-ebooks.info


brief contents
PART 1

PART 2

FOUNDATIONS OF OPENCL PROGRAMMING...........................1
1



Introducing OpenCL

3

2




Host programming: fundamental data structures

3



Host programming: data transfer and partitioning

4



Kernel programming: data types and device memory

5



Kernel programming: operators and functions 94

6



Image processing

7




Events, profiling, and synchronization

8



Development with C++

9



Development with Java and Python

10



General coding principles

16
43
68

123
140


167
196

221

CODING PRACTICAL ALGORITHMS IN OPENCL ................... 235
11



Reduction and sorting

237

12



Matrices and QR decomposition 258

13



Sparse matrices

14




Signal processing and the fast Fourier transform

278

v

www.it-ebooks.info

295


vi

PART 3

BRIEF CONTENTS

ACCELERATING OPENGL WITH OPENCL .........................319
15



Combining OpenCL and OpenGL

16



Textures and renderbuffers


www.it-ebooks.info

340

321


contents
preface xv
acknowledgments xvii
about this book xix

PART 1 FOUNDATIONS OF OPENCL PROGRAMMING ..............1

1

Introducing OpenCL 3
1.1
1.2

The dawn of OpenCL
Why OpenCL? 5

4

Portability 6 Standardized vector processing
programming 7


1.3

1.4
1.5
1.6
1.7

6



Parallel

Analogy: OpenCL processing and a game of cards 8
A first look at an OpenCL application 10
The OpenCL standard and extensions 13
Frameworks and software development kits (SDKs) 14
Summary 14

vii

www.it-ebooks.info


viii

CONTENTS

2

Host programming: fundamental data structures
2.1

2.2

16

Primitive data types 17
Accessing platforms 18
Creating platform structures 18 Obtaining platform
information 19 Code example: testing platform extensions 20




2.3

Accessing installed devices

22

Creating device structures 22 Obtaining device
information 23 Code example: testing device extensions




2.4

Managing devices with contexts

24


25

Creating contexts 26 Obtaining context information 28
Contexts and the reference count 28 Code example: checking
a context’s reference count 29




2.5

Storing device code in programs

30

Creating programs 30 Building programs 31 Obtaining
program information 33 Code example: building a program from
multiple source files 35






2.6

Packaging functions in kernels

36


Creating kernels 36 Obtaining kernel information
Code example: obtaining kernel information 38


2.7

Collecting kernels in a command queue 39
Creating command queues
commands 40

2.8

3

37

Summary

40



Enqueuing kernel execution

41

Host programming: data transfer and partitioning 43
3.1
3.2


Setting kernel arguments
Buffer objects 45
Allocating buffer objects

3.3

Image objects

45

Creating subbuffer objects 47



48

Creating image objects
objects 51

3.4
3.5

44

48



Obtaining information about image


Obtaining information about buffer objects
Memory object transfer commands 54

52

Read/write data transfer 54 Mapping memory objects 58
Copying data between memory objects 59


www.it-ebooks.info


ix

CONTENTS

3.6

Data partitioning

62

Loops and work-items 63 Work sizes and offsets 64 A simple
one-dimensional example 65 Work-groups and compute units 65







3.7

4

Summary

67

Kernel programming: data types and device memory
4.1
4.2

Introducing kernel coding
Scalar data types 70

69

Accessing the double data type 71

4.3

68

Floating-point computing



Byte order

72


73

The float data type 73 The double data type 74 The half
data type 75 Checking IEEE-754 compliance 76






4.4

Vector data types

77

Preferred vector widths 79 Initializing vectors 80 Reading
and modifying vector components 80 Endianness and memory
access 84






4.5

The OpenCL device model


85

Device model analogy part 1: math students in school 85 Device
model analogy part 2: work-items in a device 87 Address spaces
in code 88 Memory alignment 90






4.6

Local and private kernel arguments
Local arguments

4.7

5

Summary

91



90

Private arguments 91


93

Kernel programming: operators and functions
5.1
5.2

Operators 95
Work-item and work-group functions
Dimensions and work-items
example application 100

5.3

Data transfer operations

98



94

97

Work-groups 99



An

101


Loading and storing data of the same type 101 Loading vectors
from a scalar array 101 Storing vectors to a scalar array 102




5.4

Floating-point functions 103
Arithmetic and rounding functions 103 Comparison
functions 105 Exponential and logarithmic functions 106
Trigonometric functions 106 Miscellaneous floating-point
functions 108






www.it-ebooks.info


x

CONTENTS

5.5

Integer functions


109

Adding and subtracting integers 110
Miscellaneous integer functions 112

5.6

6

111

Shuffle and select functions 114
Shuffle functions 114

5.7
5.8
5.9

Multiplication





Select functions

116

Vector test functions 118

Geometric functions 120
Summary 122

Image processing 123
6.1

Image objects and samplers

124

Image objects on the host: cl_mem 124 Samplers on the host:
cl_sampler 125 Image objects on the device: image2d_t and
image3d_t 128 Samplers on the device: sampler_t 129






6.2

Image processing functions

130

Image read functions 130 Image write functions 132
Image information functions 133 A simple example 133





6.3

Image scaling and interpolation 135
Nearest-neighbor interpolation 135 Bilinear interpolation
Image enlargement in OpenCL 138


6.4

7

Summary

136

139

Events, profiling, and synchronization 140
7.1

Host notification events

141

Associating an event with a command 141 Associating an event
with a callback function 142 A host notification example 143





7.2

Command synchronization events

145

Wait lists and command events 145 Wait lists and user
events 146 Additional command synchronization
functions 148 Obtaining data associated with events 150






7.3

Profiling events

153

Configuring command profiling 153 Profiling data
transfer 155 Profiling data partitioning 157




7.4


Work-item synchronization

158

Barriers and fences 159 Atomic operations 160 Atomic
commands and mutexes 163 Asynchronous data transfer 164






7.5

Summary

166

www.it-ebooks.info


xi

CONTENTS

8

Development with C++
8.1


167

Preliminary concerns
Vectors and strings

8.2

Creating kernels

168

168

Exceptions 169



170

Platforms, devices, and contexts 170

8.3



Programs and kernels

Kernel arguments and memory objects

176


Memory objects 177 General data arguments
space arguments 182


8.4

Command queues

173

181



Local

183

Creating CommandQueue objects 183 Enqueuing kernelexecution commands 183 Read/write commands 185
Memory mapping and copy commands 187




8.5

Event processing 189
Host notification 189 Command synchronization 191
Profiling events 192 Additional event functions 193





8.6

9

Summary

194

Development with Java and Python
9.1

Aparapi 197
Aparapi installation 198
and work-groups 200

9.2

196



The Kernel class

198




Work-items

JavaCL 201
JavaCL installation 202 Overview of JavaCL development
Creating kernels with JavaCL 203 Setting arguments and
enqueuing commands 206


202



9.3

PyOpenCL

210

PyOpenCL installation and licensing 210 Overview of PyOpenCL
development 211 Creating kernels with PyOpenCL 212 Setting
arguments and executing kernels 215




9.4

10


Summary



219

General coding principles
10.1

221

Global size and local size 222
Finding the maximum work-group size
devices 224

10.2

Numerical reduction
OpenCL reduction

226

223



Testing kernels and

225



Improving reduction speed with vectors

www.it-ebooks.info

228


xii

CONTENTS

10.3
10.4
10.5

Synchronizing work-groups 230
Ten tips for high-performance kernels 231
Summary 233

PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL.......235

11

Reduction and sorting
11.1

237

MapReduce 238

Introduction to MapReduce 238 MapReduce and
OpenCL 240 MapReduce example: searching for text




11.2

The bitonic sort 244
Understanding the bitonic sort
in OpenCL 247

11.3

12

244

254

258

Matrix multiplication



Theory and implementation of

262


The theory of matrix multiplication
multiplication in OpenCL 263

12.3

Implementing the radix sort

Matrix transposition 259
Introduction to matrices 259
matrix transposition 260

12.2



Summary 256

Matrices and QR decomposition
12.1

Implementing the bitonic sort



The radix sort 254
Understanding the radix sort
with vectors 254

11.4


242

262



Implementing matrix

The Householder transformation 265
Vector projection 265 Vector reflection 266 Outer products
and Householder matrices 267 Vector reflection in
OpenCL 269






12.4

The QR decomposition 269
Finding the Householder vectors and R 270 Finding the
Householder matrices and Q 272 Implementing QR
decomposition in OpenCL 273




12.5


13

Summary 276

Sparse matrices 278
13.1

Differential equations and sparse matrices

www.it-ebooks.info

279


xiii

CONTENTS

13.2

Sparse matrix storage and the Harwell-Boeing collection
Introducing the Harwell-Boeing collection
Matrix Market files 281

13.3

281




280

Accessing data in

The method of steepest descent 285
Positive-definite matrices 285 Theory of the method of steepest
descent 286 Implementing SD in OpenCL 288




13.4

The conjugate gradient method 289
Orthogonalization and conjugacy
method 291

13.5

14

289



The conjugate gradient

Summary 293

Signal processing and the fast Fourier transform 295

14.1
14.2

Introducing frequency analysis 296
The discrete Fourier transform 298
Theory behind the DFT

14.3

298

OpenCL and the DFT



305

The fast Fourier transform 306
Three properties of the DFT 306 Constructing the fast Fourier
transform 309 Implementing the FFT with OpenCL 312




14.4

Summary 317

PART 3 ACCELERATING OPENGL WITH OPENCL.............319


15

Combining OpenCL and OpenGL
15.1

321

Sharing data between OpenGL and OpenCL

322

Creating the OpenCL context 323 Sharing data between OpenGL
and OpenCL 325 Synchronizing access to shared data 328




15.2

Obtaining information

329

Obtaining OpenGL object and texture information
information about the OpenGL context 330

15.3

Basic interoperability example


329



Obtaining

331

Initializing OpenGL operation 331 Initializing OpenCL
operation 331 Creating data objects 332 Executing the
kernel 333 Rendering graphics 334








15.4

Interoperability and animation
Specifying vertex data 335
Executing the kernel 337

15.5



Summary 338


www.it-ebooks.info

334

Animation and display

336


xiv

CONTENTS

16

Textures and renderbuffers
16.1

Image filtering

341

The Gaussian blur
embossing 344

16.2

340


343



Image sharpening

Filtering textures with OpenCL

344



Image

345

The init_gl function 345 The init_cl function 345 The
configure_shared_data function 346 The execute_kernel
function 347 The display function 348








16.3
appendix A
appendix B

appendix C
appendix D

Summary 349
Installing and using a software development kit 351
Real-time rendering with OpenGL 364
The minimalist GNU for Windows and OpenCL 398
OpenCL on mobile devices 412
index 415

www.it-ebooks.info


preface
In the summer of 1997, I was terrified. Instead of working as an intern in my major
(microelectronic engineering), the best job I could find was at a research laboratory
devoted to high-speed signal processing. My job was to program the two-dimensional
fast Fourier transform (FFT) using C and the Message Passing Interface (MPI), and get
it running as quickly as possible. The good news was that the lab had sixteen brand new
SPARCstations. The bad news was that I knew absolutely nothing about MPI or the FFT.
Thanks to books purchased from a strange new site called Amazon.com, I managed to understand the basics of MPI: the application deploys one set of instructions
to multiple computers, and each processor accesses data according to its ID. As each
processor finishes its task, it sends its output to the processor whose ID equals 0.
It took me time to grasp the finer details of MPI (blocking versus nonblocking data
transfer, synchronous versus asynchronous communication), but as I worked more
with the language, I fell in love with distributed computing. I loved the fact that I
could get sixteen monstrous computers to process data in lockstep, working together
like athletes on a playing field. I felt like a choreographer arranging a dance or a composer writing a symphony for an orchestra. By the end of the internship, I coded multiple versions of the 2-D FFT in MPI, but the lab’s researchers decided that network
latency made the computation impractical.
Since that summer, I’ve always gravitated toward high-performance computing, and

I’ve had the pleasure of working with digital signal processors, field-programmable gate
arrays, and the Cell processor, which serves as the brain of Sony’s PlayStation 3. But nothing beats programming graphics processing units (GPUs) with OpenCL. As today’s

xv

www.it-ebooks.info


xvi

PREFACE

supercomputers have shown, no CPU provides the same number-crunching power per
watt as a GPU. And no language can target as wide a range of devices as OpenCL.
When AMD released its OpenCL development tools in 2009, I fell in love again.
Not only does OpenCL provide new vector types and a wealth of math functions, but it
also resembles MPI in many respects. Both toolsets are freely available and their routines can be called in C or C++. In both cases, applications deliver instructions to multiple devices whose processing units rely on IDs to determine which data they should
access. MPI and OpenCL also make it possible to send data using similar types of blocking/non-blocking transfers and synchronous/asynchronous communication.
OpenCL is still new in the world of high-performance computing, and many programmers don’t know it exists. To help spread the word about this incredible language, I decided to write OpenCL in Action. I’ve enjoyed working on this book a great
deal, and I hope it helps newcomers take advantage of the power of OpenCL and distributed computing in general.
As I write this in the summer of 2011, I feel as though I’ve come full circle. Last
night, I put the finishing touches on the FFT application presented in chapter 14. It
brought back many pleasant memories of my work with MPI, but I’m amazed by how
much the technology has changed. In 1997, the sixteen SPARCstations in my lab took
nearly a minute to perform a 32k FFT. In 2011, my $300 graphics card can perform an
FFT on millions of data points in seconds.
The technology changes, but the enjoyment remains the same. The learning curve
can be steep in the world of distributed computing, but the rewards more than make
up for the effort expended.


www.it-ebooks.info


acknowledgments
I started writing my first book for Manning Publications in 2003, and though much
has changed, they are still as devoted to publishing high-quality books now as they
were then. I’d like to thank all of Manning’s professionals for their hard work and
dedication, but I’d like to acknowledge the following folks in particular:
First, I’d like to thank Maria Townsley, who worked as developmental editor. Maria
is one of the most hands-on editors I’ve worked with, and she went beyond the call of
duty in recommending ways to improve the book’s organization and clarity. I bristled
and whined, but in the end, she turned out to be absolutely right. In addition, despite
my frequent rewriting of the table of contents, her pleasant disposition never flagged
for a moment.
I’d like to extend my deep gratitude to the entire Manning production team. In
particular, I’d like to thank Andy Carroll for going above and beyond the call of duty
in copyediting this book. His comments and insight have not only dramatically
improved the polish of the text, but his technical expertise has made the content
more accessible. Similarly, I’d like to thank Maureen Spencer and Katie Tennant for
their eagle-eyed proofreading of the final copy and Gordan Salinovic for his painstaking labor in dealing with the book’s images and layout. I’d also like to thank Mary
Piergies for masterminding the production process and making sure the final product
lives up to Manning’s high standards.
Jörn Dinkla is, simply put, the best technical editor I’ve ever worked with. I tested
the book’s example code on Linux and Mac OS, but he went further and tested the
code with software development kits from Linux, AMD, and Nvidia. Not only did he

xvii

www.it-ebooks.info



xviii

ACKNOWLEDGMENTS

catch quite a few errors I missed, but in many cases, he took the time to find out why
the error had occurred. I shudder to think what would have happened without his
assistance, and I’m beyond grateful for the work he put into improving the quality of
this book’s code.
I’d like to thank Candace Gilhooley for spreading the word about the book’s publication. Given OpenCL’s youth, the audience isn’t as easy to reach as the audience
for Manning’s many Java books. But between setting up web articles, presentations,
and conference attendance, Candace has done an exemplary job in marketing
OpenCL in Action.
One of Manning’s greatest strengths is its reliance on constant feedback. During
development and production, Karen Tegtmeyer and Ozren Harlovic sought out
reviewers for this book and organized a number of review cycles. Thanks to the feedback from the following reviewers, this book includes a number of important subjects
that I wouldn’t otherwise have considered: Olivier Chafik, Martin Beckett, Benjamin
Ducke, Alan Commike, Nathan Levesque, David Strong, Seth Price, John J. Ryan III,
and John Griffin.
Last but not least, I’d like to thank Jan Bednarczuk of Jandex Indexing for her
meticulous work in indexing the content of this book. She not only created a thorough, professional index in a short amount of time, but she also caught quite a few
typos in the process. Thanks again.

www.it-ebooks.info


about this book
OpenCL is a complex subject. To code even the simplest of applications, a developer
needs to understand host programming, device programming, and the mechanisms
that transfer data between the host and device. The goal of this book is to show how

these tasks are accomplished and how to put them to use in practical applications.
The format of this book is tutorial-based. That is, each new concept is followed by
example code that demonstrates how the theory is used in an application. Many of the
early applications are trivially basic, and some do nothing more than obtain information about devices and data structures. But as the book progresses, the code becomes
more involved and makes fuller use of both the host and the target device. In the later
chapters, the focus shifts from learning how OpenCL works to putting OpenCL to use
in processing vast amounts of data at high speed.

Audience
In writing this book, I’ve assumed that readers have never heard of OpenCL and know
nothing about distributed computing or high-performance computing. I’ve done my
best to present concepts like task-parallelism and SIMD (single instruction, multiple
data) development as simply and as straightforwardly as possible.
But because the OpenCL API is based on C, this book presumes that the reader has
a solid understanding of C fundamentals. Readers should be intimately familiar with
pointers, arrays, and memory access functions like malloc and free. It also helps to be
cognizant of the C functions declared in the common math library, as most of the kernel functions have similar names and usages.

xix

www.it-ebooks.info


xx

ABOUT THIS BOOK

OpenCL applications can run on many different types of devices, but one of its
chief advantages is that it can be used to program graphics processing units (GPUs).
Therefore, to get the most out of this book, it helps to have a graphics card attached

to your computer or a hybrid CPU-GPU device such as AMD’s Fusion.

Roadmap
This book is divided into three parts. The first part, which consists of chapters 1–10,
focuses on exploring the OpenCL language and its capabilities. The second part,
which consists of chapters 11–14, shows how OpenCL can be used to perform largescale tasks commonly encountered in the field of high-performance computing. The
last part, which consists of chapters 15 and 16, shows how OpenCL can be used to
accelerate OpenGL applications.
The chapters of part 1 have been structured to serve the needs of a programmer
who has never coded a line of OpenCL. Chapter 1 introduces the topic of OpenCL,
explaining what it is, where it came from, and the basics of its operation. Chapters 2
and 3 explain how to code applications that run on the host, and chapters 4 and 5
show how to code kernels that run on compliant devices. Chapters 6 and 7 explore
advanced topics that involve both host programming and kernel coding. Specifically,
chapter 6 presents image processing and chapter 7 discusses the important topics of
event processing and synchronization.
Chapters 8 and 9 discuss the concepts first presented in chapters 2 through 5, but
using languages other than C. Chapter 8 discusses host/kernel coding in C++, and
chapter 9 explains how to build OpenCL applications in Java and Python. If you aren’t
obligated to program in C, I recommend that you use one of the toolsets discussed in
these chapters.
Chapter 10 serves as a bridge between parts 1 and 2. It demonstrates how to take
full advantage of OpenCL’s parallelism by implementing a simple reduction algorithm
that adds together one million data points. It also presents helpful guidelines for coding practical OpenCL applications.
Chapters 11–14 get into the heavy-duty usage of OpenCL, where applications commonly operate on millions of data points. Chapter 11 discusses the implementation of
MapReduce and two sorting algorithms: the bitonic sort and the radix sort. Chapter 12
covers operations on dense matrices, and chapter 13 explores operations on sparse
matrices. Chapter 14 explains how OpenCL can be used to implement the fast Fourier
transform (FFT).
Chapters 15 and 16 are my personal favorites. One of OpenCL’s great strengths is

that it can be used to accelerate three-dimensional rendering, a topic of central interest in game development and scientific visualization. Chapter 15 introduces the topic
of OpenCL-OpenGL interoperability and shows how the two toolsets can share data
corresponding to vertex attributes. Chapter 16 expands on this and shows how
OpenCL can accelerate OpenGL texture processing. These chapters require an
understanding of OpenGL 3.3 and shader development, and both of these topics are
explored in appendix B.

www.it-ebooks.info


ABOUT THIS BOOK

xxi

At the end of the book, the appendixes provide helpful information related to
OpenCL, but the material isn’t directly used in common OpenCL development.
Appendix A discusses the all-important topic of software development kits (SDKs), and
explains how to install the SDKs provided by AMD and Nvidia. Appendix B discusses
the basics of OpenGL and shader development. Appendix C explains how to install
and use the Minimalist GNU for Windows (MinGW), which provides a GNU-like environment for building executables on the Windows operating system. Lastly, appendix
D discusses the specification for embedded OpenCL.

Obtaining and compiling the example code
In the end, it’s the code that matters. This book contains working code for over 60
OpenCL applications, and you can download the source code from the publisher’s
website at www.manning.com/OpenCLinAction or www.manning.com/scarpino2/.
The download site provides a link pointing to an archive that contains code
intended to be compiled with GNU-based build tools. This archive contains one folder
for each chapter/appendix of the book, and each top-level folder has subfolders for
example projects. For example, if you look in the Ch5/shuffle_test directory, you’ll

find the source code for Chapter 5’s shuffle_test project.
As far as dependencies go, every project requires that the OpenCL library
(OpenCL.lib on Windows, libOpenCL.so on *nix systems) be available on the development system. Appendix A discusses how to obtain this library by installing an appropriate software development kit (SDK).
In addition, chapters 6 and 16 discuss images, and the source code in these chapters makes use of the open-source PNG library. Chapter 6 explains how to obtain this
library for different systems. Appendix B and chapters 15 and 16 all require access to
OpenGL, and appendix B explains how to obtain and install this toolset.

Code conventions
As lazy as this may sound, I prefer to copy and paste working code into my applications rather than write code from scratch. This not only saves time, but also reduces
the likelihood of producing bugs through typographical errors. All the code in this
book is public domain, so you’re free to download and copy and paste portions of it
into your applications. But before you do, it’s a good idea to understand the conventions I’ve used:




Host data structures are named after their data type. That is, each
cl_platform_id structure is called platform, each cl_device_id structure is
called device, each cl_context structure is called context, and so on.
In the host applications, the main function calls on two functions: create_device
returns a cl_device, and build_program creates and compiles a cl_program.
Note that create_device searches for a GPU associated with the first available
platform. If it can’t find a GPU, it searches for the first compliant CPU.

www.it-ebooks.info


xxii

ABOUT THIS BOOK








Host applications identify the program file and the kernel function using macros
declared at the start of the source file. Specifically, the PROGRAM_FILE macro identifies the program file and KERNEL_FUNC identifies the kernel function.
All my program files end with the .cl suffix. If the program file only contains one
kernel function, that function has the same name as the file.
For GNU code, every makefile assumes that libraries and header files can be found
at locations identified by environment variables. Specifically, the makefile
searches for AMDAPPSDKROOT on AMD platforms and CUDA on Nvidia platforms.

Author Online
Nobody’s perfect. If I failed to convey my subject material clearly or (gasp) made a
mistake, feel free to add a comment through Manning’s Author Online system. You
can find the Author Online forum for this book by going to www.manning.com/
OpenCLinAction and clicking the Author Online link.
Simple questions and concerns get rapid responses. In contrast, if you’re unhappy
with line 402 of my bitonic sort implementation, it may take me some time to get back
to you. I’m always happy to discuss general issues related to OpenCL, but if you’re
looking for something complex and specific, such as help debugging a custom FFT, I
will have to recommend that you find a professional consultant.

About the cover illustration
The figure on the cover of OpenCL in Action is captioned a “Kranjac,” or an inhabitant
of the Carniola region in the Slovenian Alps. This illustration is taken from a recent
reprint of Balthasar Hacquet’s Images and Descriptions of Southwestern and Eastern Wenda,

Illyrians, and Slavs published by the Ethnographic Museum in Split, Croatia, in 2008.
Hacquet (1739–1815) was an Austrian physician and scientist who spent many years
studying the botany, geology, and ethnography of the Julian Alps, the mountain range
that stretches from northeastern Italy to Slovenia and that is named after Julius Caesar. Hand drawn illustrations accompany the many scientific papers and books that
Hacquet published.
The rich diversity of the drawings in Hacquet's publications speaks vividly of the
uniqueness and individuality of the eastern Alpine regions just 200 years ago. This was
a time when the dress codes of two villages separated by a few miles identified people
uniquely as belonging to one or the other, and when members of a social class or
trade could be easily distinguished by what they were wearing. Dress codes have
changed since then and the diversity by region, so rich at the time, has faded away. It is
now often hard to tell the inhabitant of one continent from another and today the
inhabitants of the picturesque towns and villages in the Slovenian Alps are not readily
distinguishable from the residents of other parts of Slovenia or the rest of Europe.
We at Manning celebrate the inventiveness, the initiative, and the fun of the computer business with book covers based on costumes from two centuries ago brought
back to life by illustrations such as this one.

www.it-ebooks.info


Part 
Foundations of
OpenCL programming

P

art 1 presents the OpenCL language. We’ll explore OpenCL’s data structures
and functions in detail and look at example applications that demonstrate their
usage in code.
Chapter 1 introduces OpenCL, explaining what it’s used for and how it

works. Chapters 2 and 3 explain how host applications are coded, and chapters 4
and 5 discuss kernel coding. Chapters 6 and 7 explore the advanced topics of
image processing and event handling.
Chapters 8 and 9 discuss how OpenCL is coded in languages other than C,
such as C++, Java, and Python. Chapter 10 explains how OpenCL’s capabilities
can be used to develop large-scale applications.

www.it-ebooks.info


www.it-ebooks.info


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×