Tải bản đầy đủ (.pdf) (352 trang)

Learning concurrency in python speed up your python code with clean, readable, and advanced concurrency techniques

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.36 MB, 352 trang )


Learning Concurrency in
Python

4QFFEVQZPVS1ZUIPODPEFXJUIDMFBOSFBEBCMFBOE
BEWBODFEDPODVSSFODZUFDIOJRVFT

Elliot Forbes

BIRMINGHAM - MUMBAI


Learning Concurrency in Python
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, without the prior written permission of the
publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its
dealers and distributors will be held liable for any damages caused or alleged to be caused
directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2017
Production reference: 1140817
1VCMJTIFECZ1BDLU1VCMJTIJOH-UE
-JWFSZ1MBDF
-JWFSZ4USFFU
#JSNJOHIBN


#1#6,

ISBN 978-1-78728-537-8
XXXQBDLUQVCDPN


Credits
Author
Elliot Forbes

Copy Editor
Sonia Mathur

Reviewer
Nikolaus Gradwohl

Project Coordinator
Vaidehi Sawant

Commissioning Editor
Merint Mathew

Proofreader
Safis Editing

Acquisition Editor
Chaitanya Nair

Indexer
Francy Puthiry


Content Development Editor
Rohit Kumar Singh

Graphics
Abhinash Sahu

Technical Editors
Ketan Kamble

Production Coordinator
Nilesh Mohite


About the Author
Elliot Forbes he worked as a full-time software engineer at JPMorgan Chase for the last two
years. He graduated from the University of Strathclyde in Scotland in the spring of 2015
and worked as a freelancer developing web solutions while studying there.
He has worked on numerous different technologies such as GoLang and NodeJS and plain
old Java, and he has spent years working on concurrent enterprise systems. It is with this
experience that he was able to write this book.
Elliot has even worked at Barclays Investment Bank for a summer internship in London and
has maintained a couple of software development websites for the last three years.


About the Reviewer
Nikolaus Gradwohl was born 1976 in Vienna, Austria and always wanted to become an
inventor like Gyro Gearloose. When he got his first Atari, he figured out that being a
computer programmer is the closest he could get to that dream. For a living, he wrote
programs for nearly anything that can be programmed, ranging from an 8-bit

microcontroller to mainframes. In his free time, he likes to master on programming
languages and operating systems.
Nikolaus authored the Processing 2: Creative Coding Hotshot book, and you can see some of
his work on his blog at IUUQXXXMPDBMHVSVOFU.














































www.PacktPub.com
For support files and downloads related to your book, please visit XXX1BDLU1VCDPN.
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at XXX1BDLU1VCDPN and as a
print book customer, you are entitled to a discount on the eBook copy. Get in touch with us
at TFSWJDF!QBDLUQVCDPN for more details.
At XXX1BDLU1VCDPN, you can also read a collection of free technical articles, sign up for a
range of free newsletters and receive exclusive discounts and offers on Packt books and
eBooks.

IUUQTXXXQBDLUQVCDPNNBQU


















































Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt
books and video courses, as well as industry-leading tools to help you plan your personal
development and advance your career.

Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser


Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial

process. To help us improve, please leave us an honest review on this book's Amazon page
at IUUQTXXXBNB[PODPNEQ.











































If you'd like to join our team of regular reviewers, you can e-mail us at
DVTUPNFSSFWJFXT!QBDLUQVCDPN. We award our regular reviewers with free eBooks and

videos in exchange for their valuable feedback. Help us be relentless in improving our
products!


Table of Contents
Preface
Chapter 1: Speed It Up!
History of concurrency
Threads and multithreading
What is a thread?
Types of threads

What is multithreading?

Processes
Properties of processes
Multiprocessing
Event-driven programming
Turtle
Breaking it down

Reactive programming
ReactiveX - RxPy
Breaking it down

GPU programming
PyCUDA
OpenCL
Theano
The limitations of Python
Jython
IronPython
Why should we use Python?
Concurrent image download
Sequential download
Breaking it down

Concurrent download
Breaking it down

Improving number crunching with multiprocessing
Sequential prime factorization
Breaking it down


Concurrent prime factorization
Breaking it down

Summary

1
6
7
8
8
9
9
10
11
12
13
14
15
16
16
18
19
20
20
21
21
22
23
23
23

24
24
25
26
26
27
27
28
29
30


Chapter 2: Parallelize It

31

Understanding concurrency
Properties of concurrent systems
I/O bottlenecks
Understanding parallelism
CPU-bound bottlenecks
How do they work on a CPU?
Single-core CPUs
Clock rate
Martelli model of scalability
Time-sharing - the task scheduler
Multi-core processors
System architecture styles
SISD
SIMD

MISD
MIMD
Computer memory architecture styles
UMA
NUMA
Summary

Chapter 3: Life of a Thread

32
32
33
35
36
36
37
37
38
39
40
41
41
42
44
44
45
45
46
48
49


Threads in Python
Thread state
State flow chart

49
50
51
51
52
52
53
53
53
53
54
54
55
55
55
56
57
57

Python example of thread state
Breaking it down

Different types of threads
POSIX threads
Windows threads


The ways to start a thread
Starting a thread
Inheriting from the thread class
Breaking it down

Forking
Example
Breaking it down

Daemonizing a thread
Example
Breaking it down

[ ii ]


Handling threads in Python
Starting loads of threads
Example
Breaking it down

Slowing down programs using threads
Example
Breaking it down

Getting the total number of active threads
Example
Breaking it down


Getting the current thread
Example
Breaking it down

Main thread
Example
Breaking it down

Enumerating all threads
Example
Breaking it down

Identifying threads
Example
Breakdown

Ending a thread
Best practice in stopping threads
Example
Output

Orphan processes
How does the operating system handle threads
Creating processes versus threads
Example
Breaking it down

Multithreading models
One-to-one thread mapping
Many-to-one

Many-to-many
Summary

Chapter 4: Synchronization between Threads
Synchronization between threads
The Dining Philosophers
Example
Output

Race conditions
Process execution sequence

[ iii ]

57
58
58
58
59
59
60
61
61
61
62
62
62
63
63
63

64
64
64
65
65
66
67
67
67
68
68
68
68
69
69
70
71
71
72
73
74
75
75
77
78
78
79


The solution


Critical sections
Filesystem
Life-critical systems

Shared resources and data races
The join method
Breaking it down
Putting it together

Locks
Example
Breaking it down

RLocks
Example
Breaking it down
Output

RLocks versus regular locks
Condition
Definition
Example
Our publisher
Our subscriber
Kicking it off
The results

Semaphores
Class definition

Example
The TicketSeller class
Output
Thread race

Bounded semaphores
Events
Example
Breaking it down

Barriers
Example
Breaking it down
Output

Summary

Chapter 5: Communication between Threads
Standard data structures
Sets
Extending the class
Exercise - extending other primitives

Decorator

[ iv ]

80
81
81

81
82
83
83
84
84
84
86
86
87
87
88
89
90
90
90
90
91
92
93
93
94
94
94
96
96
96
97
98
98

98
99
99
100
101
102
103
103
103
104
104


Class decorator
Lists
Queues
FIFO queues
Example
Breaking it down
Output
LIFO queues
Example
Breaking it down
Output
PriorityQueue
Example
Breakdown
Output

Queue objects

Full/empty queues
Example
Output
The join() function
Example
Breakdown
Output

Deque objects
Example
Breakdown
Output

Appending elements
Example
Breaking it down
Output

Popping elements
Example
Breaking it down
Output

Inserting elements
Example
Breaking it down
Output

Rotation
Example

Breaking it down
Output

Defining your own thread-safe communication structures
A web Crawler example
Requirements

[v]

105
106
107
107
108
108
109
109
110
111
111
112
112
113
114
114
114
115
115
115
116

117
117
117
117
118
118
119
119
119
120
120
120
121
121
121
122
122
122
122
123
123
124
124
124
125


Design
Our Crawler class
Our starting point

Extending the queue object
Breaking it down
Output
Future enhancements
Conclusion
Exercise - testing your skills

125
125
127
129
129
129
130
130
131
131

Summary

Chapter 6: Debug and Benchmark

132

Testing strategies
Why do we test?
Testing concurrent software systems
What should we test?
Unit tests
PyUnit

Example
Output
Expanding our test suite

Unit testing concurrent code
Integration tests
Debugging
Make it work as a single thread
Pdb
An interactive example

Catching exceptions in child threads
Benchmarking
The timeit module
Timeit versus time
Command-line example
Importing timeit into your code

Utilizing decorators
Timing context manager
Output

Profiling
cProfile
Simple profile example

The line_profiler tool
Kernprof

Memory profiling

Memory profile graphs

Summary

[ vi ]

133
133
134
134
134
135
135
136
136
136
137
138
138
139
140
142
143
144
145
145
145
147
147
149

149
149
150
152
152
154
155
158


Chapter 7: Executors and Pools

159

Concurrent futures
Executor objects
Creating a ThreadPoolExecutor
Example
Output
Context manager
Example
Output
Maps
Example
Output
Shutdown of executor objects
Example
Output

Future objects

Methods in future objects
The result() method
The add_done_callback() method
The .running() method
The cancel() method
The .exception() method
The .done() method

Unit testing future objects
The set_running_or_notify_cancel() method
The set_result() method
The set_exception() method

Cancelling callable
Example
Output

Getting the result
Example
Output

Using as_completed
Example
Output

Setting callbacks
Example
Output
Chaining callbacks


Exception classes
Example
Output

ProcessPoolExecutor
Creating a ProcessPoolExecutor

[ vii ]

159
160
160
161
161
162
162
163
163
164
164
164
165
165
166
166
166
167
167
167
167

167
168
168
168
168
168
169
170
170
171
172
172
172
173
174
174
175
176
176
176
177
178
178


Example
Output

Context Manager
Example

Output

Exercise
Getting started

Improving the speed of computationally bound problems
Full code sample
Output

Improving our crawler
The plan
New improvements
Refactoring our code
Storing the results in a CSV file

Exercise - capture more info from each page crawl
concurrent.futures in Python 2.7
Summary

Chapter 8: Multiprocessing

178
179
179
179
180
180
180
180
181

182
183
183
184
184
186
187
188
188
189

Working around the GIL
Utilizing sub-processes
Example
Output

The life of a process
Starting a process using fork
Spawning a process
Forkserver
Daemon processes
Example
Breaking it down
Output

Identifying processes using PIDs
Example
Output

Terminating a process

Example

Getting the current process
Subclassing processes
Example
Output

Multiprocessing pools

[ viii ]

189
190
190
191
191
191
192
192
192
193
193
193
194
194
195
196
196
197
197

198
198
199


The difference between concurrent.futures.ProcessPoolExecutor and
Pool
Context manager
Example
Output

Submitting tasks to a process pool
Apply
Apply_async
Map
Map_async
Imap
Imap_unordered
Starmap
Starmap_async
Maxtasksperchild

Communication between processes
Pipes
Anonymous pipes
Named pipes

Working with pipes
Example


Handling Exceptions
Using pipes

Multiprocessing managers
Namespaces
Example

Queues
Example
Output

Listeners and clients
Example
The Listener class
The Client class
Output

Logging
Example

Communicating sequential processes
PyCSP
Processes in PyCSP
Output

Summary

Chapter 9: Event-Driven Programming
Event-driven programming


199
200
200
201
201
201
202
203
204
204
205
206
207
207
208
209
209
209
210
210
211
211
212
213
213
214
214
215
215
216

216
217
217
218
218
220
220
221
221
222
223
224

[ ix ]


The event loop
Asyncio
Getting started
Event loops
The run_forever() method
The run_until_complete() method
The stop() method
The is_closed() method
The close() function

Tasks
Example
The all_tasks(loop=None) method
The current_tasks() function

The cancel() function

Task functions
The as_completed(fs, *, loop=None, timeout=None) function
The ensure_future(coro_or_future, *, loop=None) function
The wrap_future(future, *, loop=None) function
The gather(*coroes_or_futures, loop=None, return_exceptions=False) function
The wait() function

Futures
Example
Output

Coroutines
Chaining coroutines
Output

Transports
Protocols
Synchronization between coroutines
Locks
Queues
Events and conditions

Semaphores and BoundedSemaphores
Sub-processes
Debugging asyncio programs
Debug mode
Twisted
A simple web server example

Gevent
Event loops
Greenlets
Simple example-hostnames
Output

[x]

225
226
227
227
227
228
229
229
230
230
230
231
232
233
234
234
234
234
235
235
236
237

237
237
238
241
241
241
242
242
244
245
245
246
246
246
248
248
250
250
251
251
252


Monkey patching
Summary

252
253

Chapter 10: Reactive Programming


254

Basic reactive programming
Maintaining purity
ReactiveX, or RX
Installing RxPY
Observables
Creating observers
Example
Example 2
Breaking it down
Output

Lambda functions
Example
Breaking it down
On_next, on_completed, and on_error in lambda form
Output

Operators and chaining
Filter example
Breaking it down
Chained operators

The different operators
Creating observables
Transforming observables
Filtering observables
Error-handling observables


Hot and cold observables
Emitting events
Example
Breaking it down
Output

Multicasting
Example
Output

Combining observables
Zip() example
Output
The merge_all() operator
Output

Concurrency
Example
Output

PyFunctional

[ xi ]

255
255
255
256
257

257
257
259
259
260
260
261
261
262
263
263
263
264
264
265
265
265
266
266
266
267
267
268
268
268
269
270
271
271
272

272
273
273
274
275
276


Installation and official docs
Simple example
Output

Streams, transformations, and actions
Filtering lists
Output

Reading/writing SQLite3
Compressed files
Parallel execution
Summary

Chapter 11: Using the GPU

276
277
277
277
278
279
279

280
281
282
283

Introduction to GPUs
Why use the GPU?
Data science
Branches of data science
Machine learning
Classification
Cluster analysis
Data mining

CUDA
Working with CUDA without a NVIDIA graphics card
PyCUDA
Features
Simple example
Kernels
GPU arrays
Numba
Overview
Features of Numba
LLVM

Cross-hardware compatibility
Python compilation space
Just-in-Time (JiT) versus Ahead-of-Time (Aot) compilation
The Numba process

Anaconda
Writing basic Numba Python programs
Compilation options
nopython
nogil
The cache option
The parallel option
Issues with Numba

[ xii ]

284
285
285
286
286
286
286
287
288
289
289
290
290
291
292
292
293
293
293

294
294
295
295
296
296
297
297
297
298
298
298


Numba on the CUDA-based GPUs
Numba on AMD APUs
Accelerate
Theano
Requirements
Getting started
Very simple example
Adding two matrices
Fully-typed constructors

Using Theano on the GPU
Example

Leveraging multiple GPUs
Defining the context map
Simple graph example


PyOpenCL
Example
Output

Summary

Chapter 12: Choosing a Solution

299
299
300
301
301
301
302
302
303
303
304
305
306
306
307
307
308
309
310

Libraries not covered in this book

GPU
PyGPU

Event-driven and reactive libraries
Tornado
Flask
Celery

Data science
Pandas
Matplotlib
TensorFlow

Designing your systems
Requirements
Functional requirements
Non-functional requirements

Design
Computationally expensive
Event-heavy applications
I/O-heavy applications

Recommended design books
Software Architecture with Python
Python: Master the Art of Design Patterns

Research
Summary


[ xiii ]

310
311
311
311
311
312
313
313
313
314
314
314
315
315
315
316
316
317
317
317
318
318
318
318


Index


320

[ xiv ]


Preface
Python is a very high-level, general-purpose language that features a large number of
powerful high-level and low-level libraries and frameworks that complement its delightful
syntax. This easy-to-follow guide teaches you new practices and techniques to optimize
your code and then moves on to more advanced ways to effectively write efficient Python
code. Small and simple practical examples will help you test the concepts introduced, and
you will be able to easily adapt them to any application.
Throughout this book, you will learn to build highly efficient, robust, and concurrent
applications. You will work through practical examples that will help you address the
challenges of writing concurrent code, and also you will learn to improve the overall speed
of execution in multiprocessor and multicore systems and keep them highly available.

What this book covers
$IBQUFS, Speed It Up!, helps you get to grips with threads and processes, and you'll also

learn about some of the limitations and challenges of Python when it comes to
implementing your own concurrent applications.
$IBQUFS, Parallelize It, covers a multitude of topics including the differences between
concurrency and parallelism. We will look at how they both leverage the CPU in different
ways, and we also branch off into the topic of computer system design and how it relates to
concurrent and parallel programming.
$IBQUFS, Life of a Thread, delves deeply into the workings of Python's native threading
library. We'll look at the numerous different thread types. We'll also look in detail at
various concepts such as the multithreading model and the numerous ways in which we
can make user threads to their lower-level siblings, the kernel threads.

$IBQUFS, Synchronization between Threads, covers the various key issues that can impact
our concurrent Python applications. We will delve into the topic of deadlocks and the
famous "dining philosophers" problem and see how this can impact our own software.
$IBQUFS, Communication between Threads, discusses quite a number of different
mechanisms that we can employ to implement communication in our multithreaded
systems. We delve into the thread-safe queue primitives that Python features natively.


Preface
$IBQUFS, Debug and Benchmark, takes a comprehensive look at some of the techniques that
you can utilize in order to ensure your concurrent Python systems are as free as practically
possible from bugs before they plague your production environment. We will also cover
testing strategies that help to ensure the soundness of your code's logic.
$IBQUFS, Executors and Pools, covers everything that you need to get started with thread
pools, process pools, and future objects. We will look at the various ways in which you can
instantiate your own thread and process pools as well the advantages of using thread and
process pool executors over traditional methods.
$IBQUFS, Multiprocessing, discusses multiprocessing and how it can be utilized within our
systems. We will follow the life of a process from its creation all the way through to its
timely termination.
$IBQUFS, Event-Driven Programming, covers the paradigm of event-driven programming

before covering how asyncio works and how we can use it for our own event-driven Python
systems.
$IBQUFS, Reactive Programming, covers some of the key principles of reactive

programming. We will look at the key differences between both reactive programming and
typical event-driven programming and delve more deeply into the specifics of the very
popular RxPY Python library.
$IBQUFS, Using the GPU, covers some of the more realistic scenarios that data scientists

typically encounter and why these are ideal scenarios for us to leverage the GPU wrapper
libraries.
$IBQUFS, Choosing a Solution, briefly discusses some libraries that are not covered in this

book. We'll also take a look at the process that you should follow in order to effectively
choose which libraries and programming paradigms you leverage for your Python software
projects.

What you need for this book
For this book, you will need the following software installed on your systems:
Beautiful Soup
RxPy
Anaconda
Theano
PyOpenCL

[2]


Preface

Who this book is for
This book is for Python developers who would like to get started with concurrent
programming. You are expected to have a working knowledge of the Python language, as
this book will build on its fundamental concepts.

Conventions
In this book, you will find a number of text styles that distinguish between different kinds
of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions,

pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "We can
include other contexts through the use of the JODMVEF directive."
A block of code is set as follows:
JNQPSUVSMMJCSFRVFTU
JNQPSUUJNF
UUJNFUJNF

SFRVSMMJCSFRVFTUVSMPQFO
IUUQXXXFYBNQMFDPN
QBHF)UNMSFRSFBE

UUJNFUJNF

QSJOU
5PUBM5JNF5P'FUDI1BHF\^4FDPOETGPSNBU
UU

When we wish to draw your attention to a particular part of a code block, the relevant lines
or items are set in bold:
JNQPSUVSMMJCSFRVFTU
JNQPSUUJNF
UUJNFUJNF

req = urllib.request.urlopen('')
QBHF)UNMSFRSFBE

UUJNFUJNF

QSJOU
5PUBM5JNF5P'FUDI1BHF\^4FDPOETGPSNBU

UU

Any command-line input or output is written as follows:
pip install rx

[3]


×