Reinforcement learning with tensorflow a beginners guide to designing self learning systems with tensorflow and OpenAI gym

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (19.72 MB, 383 trang )

Reinforcement Learning with TensorFlow

A beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym

Sayon Dutta

BIRMINGHAM - MUMBAI

Reinforcement Learning with TensorFlow
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,
without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the
information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its
dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the
appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey Varangaonkar
Acquisition Editor: Viraj Madhav
Content Development Editor: Aaryaman Singh, Varun Sony
Technical Editor: Dharmendra Yadav
Copy Editors: Safis Editing
Project Coordinator: Manthan Patel
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Graphics: Tania Dutta
Production Coordinator: Shantanu Zagade

First published: April 2018
Production reference: 1200418
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78883-572-5
www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as
industry leading tools to help you plan your personal development and advance your career. For more
information, please visit our website.

Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over
4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content

PacktPub.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files
available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you

are entitled to a discount on the eBook copy. Get in touch with us at for more
details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free
newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author
Sayon Dutta is an Artificial Intelligence researcher and developer. A graduate from IIT Kharagpur,
he owns the software copyright for Mobile Irrigation Scheduler. At present, he is an AI engineer at
Wissen Technology. He co-founded an AI startup Marax AI Inc., focused on AI-powered customer
churn prediction. With over 2.5 years of experience in AI, he invests most of his time implementing
AI research papers for industrial use cases, and weightlifting.
I would extend my gratitude to Maa and Baba for everything, especially for teaching me that life is all about hustle and the key
to enjoyment is getting used to it; my brothers Arnav, Kedia, Rawat, Abhishek Singh, and Garg for helping me in my lowest
times. Thanks to the Packt team, especially Viraj for reaching out, and Aaryaman and Varun for guiding me throughout.
Thanks to the AI community and my readers.

About the reviewer
Narotam Singh has been in Indian Meteorological Department, Ministry of Earth Sciences, India,
since 1996. He has been actively involved with various technical programs and training of officers of
GoI in IT and communication. He did his PG in electronics in 1996, and Diploma and PG diploma in
computer engineering in 1994 and 1997 respectively. He is working in the enigmatic field of neural
networks, deep learning, and machine learning app development in iOS with Core ML.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today.
We have worked with thousands of developers and tech professionals, just like you, to help them
share their insight with the global tech community. You can make a general application, apply for a
specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents
Title Page
Copyright and Credits
Reinforcement Learning with TensorFlow
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews

1.

Deep Learning – Architectures and Frameworks
Deep learning
Activation functions for deep learning
The sigmoid function
The tanh function
The softmax function
The rectified linear unit function
How to choose the right activation function
Logistic regression as a neural network
Notation
Objective
The cost function
The gradient descent algorithm
The computational graph
Steps to solve logistic regression using gradient descent
What is xavier initialization?
Why do we use xavier initialization?
The neural network model
Recurrent neural networks
Long Short Term Memory Networks
Convolutional neural networks
The LeNet-5 convolutional neural network
The AlexNet model
The VGG-Net model
The Inception model
Limitations of deep learning
The vanishing gradient problem
The exploding gradient problem
Overcoming the limitations of deep learning
Reinforcement learning

Basic terminologies and conventions
Optimality criteria
The value function for optimality
The policy model for optimality
The Q-learning approach to reinforcement learning
Asynchronous advantage actor-critic
Introduction to TensorFlow and OpenAI Gym
Basic computations in TensorFlow
An introduction to OpenAI Gym
The pioneers and breakthroughs in reinforcement learning
David Silver
Pieter Abbeel
Google DeepMind
The AlphaGo program
Libratus
Summary

2.

Training Reinforcement Learning Agents Using OpenAI Gym
The OpenAI Gym
Understanding an OpenAI Gym environment
Programming an agent using an OpenAI Gym environment
Q-Learning
The Epsilon-Greedy approach
Using the Q-Network for real-world applications
Summary

3.

Markov Decision Process
Markov decision processes
The Markov property
The S state set
Actions
Transition model
Rewards
Policy
The sequence of rewards - assumptions
The infinite horizons
Utility of sequences
The Bellman equations
Solving the Bellman equation to find policies
An example of value iteration using the Bellman equation
Policy iteration
Partially observable Markov decision processes
State estimation
Value iteration in POMDPs
Training the FrozenLake-v0 environment using MDP
Summary

4.

Policy Gradients
The policy optimization method
Why policy optimization methods?
Why stochastic policy?

Example 1 - rock, paper, scissors
Example 2 - state aliased grid-world
Policy objective functions
Policy Gradient Theorem
Temporal difference rule
TD(1) rule
TD(0) rule
TD() rule
Policy gradients
The Monte Carlo policy gradient
Actor-critic algorithms
Using a baseline to reduce variance
Vanilla policy gradient
Agent learning pong using policy gradients
Summary

5.

Q-Learning and Deep Q-Networks
Why reinforcement learning?
Model based learning and model free learning
Monte Carlo learning
Temporal difference learning
On-policy and off-policy learning
Q-learning
The exploration exploitation dilemma
Q-learning for the mountain car problem in OpenAI gym
Deep Q-networks
Using a convolution neural network instead of a single layer neural network

Use of experience replay
Separate target network to compute the target Q-values
Advancements in deep Q-networks and beyond
Double DQN
Dueling DQN
Deep Q-network for mountain car problem in OpenAI gym
Deep Q-network for Cartpole problem in OpenAI gym
Deep Q-network for Atari Breakout in OpenAI gym
The Monte Carlo tree search algorithm
Minimax and game trees
The Monte Carlo Tree Search
The SARSA algorithm
SARSA algorithm for mountain car problem in OpenAI gym
Summary

6.

Asynchronous Methods
Why asynchronous methods?
Asynchronous one-step Q-learning
Asynchronous one-step SARSA
Asynchronous n-step Q-learning
Asynchronous advantage actor critic
A3C for Pong-v0 in OpenAI gym
Summary

7.

Robo Everything – Real Strategy Gaming
Real-time strategy games
Reinforcement learning and other approaches
Online case-based planning
Drawbacks to real-time strategy games
Why reinforcement learning?
Reinforcement learning in RTS gaming
Deep autoencoder
How is reinforcement learning better?
Summary

8.

AlphaGo – Reinforcement Learning at Its Best
What is Go?
Go versus chess
How did DeepBlue defeat Gary Kasparov?
Why is the game tree approach no good for Go?
AlphaGo – mastering Go
Monte Carlo Tree Search
Architecture and properties of AlphaGo
Energy consumption analysis – Lee Sedol versus AlphaGo
AlphaGo Zero
Architecture and properties of AlphaGo Zero
Training process in AlphaGo Zero
Summary

9.

Reinforcement Learning in Autonomous Driving
Machine learning for autonomous driving
Reinforcement learning for autonomous driving
Creating autonomous driving agents
Why reinforcement learning ?
Proposed frameworks for autonomous driving
Spatial aggregation
Sensor fusion
Spatial features
Recurrent temporal aggregation
Planning
DeepTraffic – MIT simulator for autonomous driving
Summary

10.

Financial Portfolio Management
Introduction
Problem definition
Data preparation
Reinforcement learning
Further improvements
Summary

11.

Reinforcement Learning in Robotics

Reinforcement learning in robotics
Evolution of reinforcement learning
Challenges in robot reinforcement learning
High dimensionality problem
Real-world challenges
Issues due to model uncertainty
What's the final objective a robot wants to achieve?
Open questions and practical challenges
Open questions
Practical challenges for robotic reinforcement learning
Key takeaways
Summary

12.

Deep Reinforcement Learning in Ad Tech
Computational advertising challenges and bidding strategies
Business models used in advertising
Sponsored-search advertisements
Search-advertisement management
Adwords
Bidding strategies of advertisers
Real-time bidding by reinforcement learning in display advertising
Summary

Reinforcement learning with tensorflow a beginners guide to designing self learning systems with tensorflow and OpenAI gym

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về