Tải bản đầy đủ (.docx) (54 trang)

Algorithmic Trading: Gametheoretic and Simulation Approach to Reinforcement Learning bot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (817.52 KB, 54 trang )

[Type here]

[Type here]

Bui Ngoc Duc

Abstract:
Keywords
Data mining, Game theory, policy making process, reinforcement learning

Algorithmic Trading: Game-theoretic and Simulation Approach to Reinforcement Learning bot

1


[Type here]

[Type here]

Bui Ngoc Duc

1. Chapter 1: Introduction
1.1. Problem statement
Trading stocks on the stock market is one of the major investment activities. In the past,
investors developed a number of stock analysis method that could help them predict the direction
of stock price movement. Modelling and predicting of equity future price, based on the current
financial information and news, is of enormous use to the investors. Investors want to know
whether some stock will rise or fall over certain period of time. In order to predict how some
company, in which investor want to invest, would perform in future, they developed a number of
analysis methods based on current and past financial data and other information about the
company. Financial balance sheets and various ratios that describe the health of company are the


bases of technical analysis that investors undertake to analyze and predict company’s future
stock prize. Predicting the direction of stock price is particularly important for value investing.
Experienced analysts could apply some mathematical models that are proven based on the
past data in order to evaluate company’s intrinsic value. However, markets do not remain stable
and indicators that have strong predictive value over one period may cease to generate excess
returns as soon as market conditions change. New investment strategies and new technology
were introduced, which made some of the old models obsolete. Since financial literacy became
higher, there are more market players than ever. Two measures have been proposed to counter
this evolving market behavior. First, some trading systems are based on genetic algorithms that
transform the indicators that are used as attributes over time [6] [28]. Second, more commonly,
the data set is fit to nonlinear models using machine learning algorithms such as Artificial Neural
Networks [10].
2


[Type here]

[Type here]

Bui Ngoc Duc

The introduction to algorithms in trading definitely changed the stock market. Algorithms
made it easy to react fast to certain events on the stock market. Machine learning algorithms also
enabled analysts to create models for predicting prices of stocks much easier. Introduction of
machine learning caused that new models can be developed based on the past data. The proof is
the AI fund have outperformed their peers while providing downside protection, according to
Eurekahedge’s report.

3



[Type here]

[Type here]

Bui Ngoc Duc

The table above is comparing AI funds to the average hedge fund and systematic
CTA/managed futures strategies, which can be considered the rough approximation for the
average quant fund. Source: Eurekahedge.

For the successful performance of AI fund, in this paper we will describe introduction to the
method for creating artificial agent trading on stock market using stock prices and through
several machine learning algorithms.

1.2. Objective of research
The monetary motivation behind the predictive value of buying and selling stocks at profitable
positions is a key driver of this research. Our main hypothesis was that by applying machine
learning and training it on the past data, it is possible to predict the movement of the stock price
through market’s patterns, then applying algorithms to create a profitable trading agent. We use
Profit and Loss (PnL) factor of agent through the test to justify the profitability of our agent. We
shall conduct some simulations to examine whether the agent is profitable under different data
set (seen and unseen) then calculate the average PnL of the agent.

1.3. Scope of the research
This thesis only provides elementary introduction approach to the algorithmic trading and game
theory approach as the frame work for market environment. The game environment is
uncomplicated when we assumed that others respond to our agent’s strategies indicate the stock
price movement. Moreover, the algorithms used to create and train the agent exploits from the
machine learning algorithms library called “Scikit-learn”, “Keras”. Nevertheless, exploited

algorithms and functions shall be explained in the Appendix of this thesis.
4


[Type here]

[Type here]

Bui Ngoc Duc

1.4. Overview
The thesis is organized in the following manner:
• Chapter 1 is stated the motivation for writing this thesis, the objectives and scope of the
research.
• In Chapter 2, we provide the background of Efficient Market Hypothesis (EMH) and
it’s contradicts, as well as relevant works for this topic.
• The game theoretical frame work background for describe the market, simulation
approach and algorithms are established in Chapter 3.
• Chapter 4 describes the methods of data collection as well as data processing,
implementation and simulation on different variable of model.
• Section 5 is the last section, we will discuss the final results of our agent, explain the
limitations of our research and state future improvement.

5


[Type here]

[Type here]


Bui Ngoc Duc

Chapter 2: literature review:
This section begins with a background to efficient markets and then gives a brief review of
previous empirical studies that use machine learning algorithms to construct trading strategies.

1.5. Efficient Markets
One of the strongest oppositions to the existence of profitable trading strategies is founded on the
ideas of Efficient Market Hypothesis (EMH). Since EMH implies that our search for
continuously profitably trading strategies is futile, we first give an overview of EMH and then
show the empirical results that contradict this theory.
EMH states that the current market price reflects the assimilation of all the information
available [13]. That is, its proponents argue that since the stocks always trade at their fair value
on stock exchanges, it is impossible to outperform the overall market through expert stock
selection or market timing. Any new information is quickly integrated into the market price.
Fama formalized the concept of efficient markets in 1970 by expressing the non-predictability of
market prices:

Where:
 is the price of security j at time t;
 is the one-period percentage return; and
 is the information reflected at time t.

6


[Type here]

[Type here]


Bui Ngoc Duc

Based on this expectation expression, Fama argues that there is no possibility of finding
excess market returns via market timing based solely on information in , hence dispelling the
possibility of trading strategies based on technical indicators.
On the other hand, despite the theoretically sound nature of EMH, research over the last 30
years has shown that several assumptions made in EMH may be unrealistic. First, a fundamental
assumption is that investors behave rationally, or that the deviations of the many irrational
investors cancel out. However, some research has shown that investors are not strictly rational
[41], or devoid of biases [20]. Indeed, people with a conservatism bias tend to underweight new
information. Moreover, experiments have shown that these biases tend to be systematic and that
deviations do not cancel each other out [21]. This leads to over- and under-reaction to news
events.
From the 1990s, literature has seen the growing decline of the EMH and the emergence of
behavioral finance. Behavioral finance views the market as an aggregate of human actions filled
with imperfect and inefficient decisions. Under this theory, the financial markets are a reflection
of human desires, goals, motivations, errors and overconfidence [40]. An alternative to EMH that
has grown traction is the idea of the Adaptive Market Hypothesis, which posits that profit
opportunities from inefficiencies exist in finance markets but are eroded away as the knowledge
of the efficiency spreads throughout the public and the public capitalizes on the opportunities. By
this view of financial markets, many have built evolutionary and/or non-linear models and
demonstrated that excess returns can be attained on out-of-sample data.

7


[Type here]

[Type here]


Bui Ngoc Duc

1.6. Previous Research
Because of their ability to model nonlinear relationships without pre-specification during the
modeling process, neural networks (NNs) have become a popular method in financial time-series
forecasting. NNs also offer huge flexibility in the type of architecture of the model, in terms of
number of hidden nodes and layers. Indeed, Pekkaya and Hamzacebi compare the results from
using a linear regression versus a NN model to forecast macro variables and show that the NN
gives much better results [35].
Many studies have used NNs and shown promising results in the financial markets.
Grudnitski and Osburn implemented NNs to forecast S&P500 and Gold futures price directions
and found they were able to correctly prediction the direction of monthly price changes 75% and
61% respectively [15]. Another study showed that a NN-based model leads to higher arbitrage
profits compared to cost of carry models. Phua, Ming and Lin implement a NN using
Singapore’s stock market index and show a forecasting accuracy of 81% [36]. Similarly, NN
models applied to weekly forecasting of Germany’s FAZ index find favorable predictive results
compared to conventional statistical approaches [14].
More recently, NNs have been augmented or adapted to improve performance on financial
time series forecasting. Shaoo et al. show that cascaded functional link artificial neural networks
(CFLANN) perform the best in FX markets [39]. Egrioglu et al. introduce a new method based
on feed forward artificial neural networks to analyze multivariate high order fuzzy time series
forecasting models [12]. Liao and Wang used a stochastic time effective neural network model to
show predictive results on the global stock indices. Bildirici and Ersin combined NNs with
ARCH/GARCH and other volatility-based models to produce a model that out performed ANNs
or GARCH based models alone. Moreover, Yudong and Lenan used back-trial chemotaxis
8


[Type here]


[Type here]

Bui Ngoc Duc

optimization (BCO) and back-propagation NN on S&P500 index and conclude that their hybrid
model (IBCO-BP) offers less computational complexity, better prediction accuracy and less
training time.
Another popular machine learning classification technique that does not require any domain
knowledge or parameter setting is the decision tree. It also often offers a better visually
interpretable model compared to NN, as the nodes in the tree can be easily understood. The
simplest type of decision tree model is the classification and regression tree (CART). Sorensen et
al. show that CART decision trees perform better than single-factor models-based on the same
variables in picking stock portfolios [42]. Wang and Chan use a two-layer bias decision tree to
predict the daily stock prices of Microsoft, Intel and IBM, finding excess returns compared to a
buy and hold method [43]. Another study found that a boosted alternating decision tree with
expert weighing generated abnormal returns for the S&P500 index during the test period [11]. To
improve accuracy, some studies used the random forest algorithm for classification, which will
be further discussed in chapter 4. Namely, Booth et al. show that a regency-weighted ensemble
of random forests produced superior results when analyzed on a large sample of stocks from the
DAX in terms of both profitability and prediction accuracy compared with other ensemble
techniques [7]. Similarly, a gradient boosted random forest model applied to Singapore’s stock
market was able to generate excess returns compared with a buy-and-hold strategy [37]. Some
recent research combines decision tree analysis with evolutionary algorithms to allow the model
to adapt to changing market conditions. Hsu et al. present constraint-based evolutionary
classification trees (CECT) and show strong predictability of a company’s financial performance
[16].

9



[Type here]

[Type here]

Bui Ngoc Duc

Support Vector Machines (SVM) are also often used in prediction market behaviors. Huang
et al. compare SVM with other classification methods (random Walk, linear discriminant
analysis, quadratic discriminant analysis and elman backpropagation neural networks) and finds
that SVM performs the best in forecasting weekly movements of the Nikkei 225 index [17].
Similarly, Kim compares SVM with NN and case-based reasoning (CBR) and finds that SVM
outperforms both in forecasting the daily direction of change in the Korea composite stock price
index (KOSPI) [23]. Likewise, Yang et al. use a margin-varying Support Vector Regression
model and show empirical results that have good predictive value for the Hang Seng Index [46].
Nair et al. propose a system that is a genetic algorithm optimized decision treesupport vector
machine hybrid and validate its performance on the BSE-Sensex and found that its predictive
accuracy is better than that of both a NN and Naive bayes based model [31].
While some studies have tried to compare various machine learning algorithms against each
other, the results have been inconsistent. Patel et al. compares four prediction models, NN, SVM,
random forest and naive-Bayes and find that over a ten years period of various indices, the
random forest model performed the best. However, Ou and Wang examine the performance of
ten machine learning classification techniques on the Hang Sen Index and found that the SVM
outperformed the other models [33]. Kara et al. compared the performance of NN versus SVM
on the daily Istanbul Stock Exchange National 100 Index and found that the average
performance of the NN model (75.74%) was significantly better than that of the SVM model
(71.52%) [22].
Machine learning researches are focus on predictive modeling. However, aiming to create an
agent in dynamic environment that is able to learn and improve his performance policy during
training requires another approach of machine learning that is reinforcement learning, when
10



[Type here]

[Type here]

Bui Ngoc Duc

agent is created to find the optimal policies and maximize its reward. But that is kind of a
isolated way to think about the trading environment; what if there is other agents in the world
and in fact evidence suggest that there are in fact others agents exist in the world with our agent.
Thus, game theory - the mathematics of conflict between participants is the missing piece to
complete the model of market. Eric Engle et al [note] provided the theoretical ideas of combining
game theory and machine learning to agent-based approach in stocks, but lack of implementation
result

Chapter 3: Theoretical reviews
In the first part of this chapter, we laid out the foundations of game theory. At the beginning it
formalizes the basic definitions, which are necessary to be able to correctly speak about games
and game-plays. Consecutively it presents the standard representations of games. The
background in game theory is essential for finding rational responses and also for general
reasoning about games. A mathematical formalization of game theory in this chapter is inspired
by [16]. In the later part of the chapter, we shall mention how the game theory is applied to
create decision making agent in stock market environment along with the difficulties of
traditional game theory approach and the need for simulation approach and algorithms.

Game theory frame work
Game theory is a part of applied mathematics that studies a strategic decision making. It uses
mathematical models to formulate interactions between intelligent rational decision-makers.
These interactions are called games.


11


[Type here]

[Type here]

Bui Ngoc Duc

Game
Games are played within a game environment (foot note :” The difference between games and
game environments is sometimes omitted. Although, it is useful to distinguish them, especially in
the context of general game playing. This problematics is further explained in chapter 4”) (also
called world) and are composed of system of rules, which defines the players, the actions and
postulates the dynamics of the game. The game is called a puzzle, if there is no more than one
agent involved. Otherwise it is a conflict [18].

Definition 2.1. Player
A player (or an agent) is an entity able to act. His activities alter the world in which he exists.
The concept of game consists of active and passive elements. Passive elements represent the
information, i.e. which actions are feasible for a particular agent in a given state, or how the
game will evolve under certain conditions and actions taken. Active elements in the game form
the players. Without the players, the game remains static. Only their actions can manipulate the
game.
Definition 2.2. Action
An action (or a move) is a change in the game caused by a player in a particular situation.
A valid game environment enables all agents to act and be immediately aware of their actions.
Their activity can lead to changing current situation as a consequence of their decision making.
Different situations which can occur before the game terminates are called states of the game.

Game is played within a game environment.
12


[Type here]

[Type here]

Bui Ngoc Duc

Every game begins in a root state and then progresses according to the game dynamics, as
participating agents make their decisions. All rational players select their actions to achieve their
goals. Theory of utility was established to recognize the effects of their behavior and evaluate the
situations in which the agents are located. Utility is a value which measures the usefulness of the
current state of the game for each player.
Definition 2.3. Utility
Let S be a set with weak ordering preference relation ≤. Utility (or outcome) is a cardinal
element e ∈ S, representing the motivation of players. The function u is said to be utility function
IFF ∀x, y ∈ S: u(x) ≤ u(y) ⇔ x ≤ y.
All together, a mathematical game is a structure, which conclusively defines the whole game and
its development.
Definition 2.4. Game
Game is a tuple , where:
 is a set of players;
 is a set of sets of available actions for each player; and
 u is a utility function .
This general definition of game expects all players to act simultaneously in just one round
and then it ends. Nevertheless, the end of a game in finite time is guaranteed only in the so-called
finite games. It signifies that at some point they will terminate and the utilities are assigned. All
finite games have starting and terminal states. In these games the number of players is finite, as

well as the number of permitted actions for each player. An agent can face only finitely many
situations in finite game, and the game-play cannot go on indefinitely [19].
13


[Type here]

[Type here]

Bui Ngoc Duc

Agents’ strategies
When there is more than a single agent in the environment, the whole game changes in
accordance to the activity of all players. In this setting the outcome depends not only on actions
of one particular agent, but on the behavior of all of them. Strategies can be seen as plans
contingency or policy for playing the game. In every situation, agent’s reaction is defined by his
strategy.
This approach is certainly rational enough in puzzles, where there is only one agent to set the
course of the world. In contrast, in the environments with greater number of other players it is
prefer able to rather randomize over the set of pure strategies, following selected probability
distribution. Sometimes rather than a strategy, randomizing the decisions can be seen as a belief
of an agent, that he can profit from playing such action. This kind of strategy is called mixed.
Playing a mixed strategy ensures that every agent can only guess what will happen; and
compared to the pure strategies, the outcome is now less predictable.
Optimal strategy
The whole game theory was originally established to solve a simple question. What is an optimal
reaction? How should an agent react to be the most likely to win the game? The answer is that
the fundamental advantage for a player can be an information about the strategies of his
opponents. In other words, once an agent is able to guess the next action of any other agent, he
can deliberately follow a strategy which maximizes his terminal utility. In conclusion, the set of

all optimal strategies (meaning the strategies with the highest equal expected utility of a rational
well-informed agent pi is then absolutely decided by the strategies of the others.
14


[Type here]

[Type here]

Bui Ngoc Duc

.
Definition Best response
Agent’s strategy in game is a best response to strategies:
Unfortunately, in most cases the information about the opponents’ strategies is out of reach or
obtaining is impossible in sense of computational complexity. Another possibility would be to
estimate the strategies, e. g. from the previous actions of other players, and consecutively adjust
his own one.
Definition 2.5. Nash equilibrium (NE)
Given a game and strategies , players P are in Nash equilibrium
If the stage of the world allows no one to benefit from changing his strategy, the situation
remains stable. It has been proved, that in every game with finitely many players and with finite
set of pure strategies, there is at least one Nash equilibrium profile, although it might consist of
mixed strategies [22]. ].( choox nay xem references roi sua laic chop hu hop)
Game representations
There is a number of various representations of games. The most simple one was presented at the
beginning of this part. Although the general definition is su fficient enough for the mathematical
apparatus, for concrete game examples it is more convenient to establish standard forms and
structures for working with the game data. Different representations extend the general
definition, thus allowing various games to express their specific aspects in more suitable form.

Algorithms for finding Nash equilibria can be adapted to a particular representation to reduce
computational complexity. There exist several representations of games, taking into account
15


[Type here]

[Type here]

Bui Ngoc Duc

stochasticity, number of players and decision points, possibility of cooperation and other
important characteristics of the game.
Normal form
Normal (or strategic) form is a basic type of game representation, . Each player moves once and
actions are chosen simultaneously. This makes the model simpler than other forms and easier to
solve for Nash equilibrium, but lacks any temporal locality.
The most famous representative game of normal form game is Prisoner’s Dilemma which is
describes as follow:
Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary
confinement with no means of communicating with the other. The prosecutors lack sufficient
evidence to convict the pair on the principal charge. They hope to get both sentenced to a year in
prison on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each
prisoner is given the opportunity either to: betray the other by testifying that the other committed
the crime, or to cooperate with the other by remaining silent. The offer is:
 If X and Y each betray the other, each of them serves 5 years in prison
 If X betrays Y but Y remains silent, X will be set free and Y will serve 20 years in prison
(and vice versa)
 If X and Y both remain silent, both of them will only serve 1 year in prison (on the
lesser charge)


16


[Type here]

[Type here]

Bui Ngoc Duc

An example of Prisoner’s Dilemma game

From that example we would observe that both confess is the Nash equilibrium of this game
because both player have no incentive to change their options.
Extensive form
Extensive form models a multi-agent sequential decision making. Convenient representation of
an extensive-form game is a game tree. Such structure allows to express even complicated
branching of the game, restricting actions in different game states to the feasible ones only.
Definition 2.6. Game tree
Every game tree is a tuple where:
 S is a set of game states;
 Z is a subset of S of terminal states;
 A is a set of game actions;
 e is an expander function, e: s ∈ S → {a ∈ A | a is executable in s};
 f is a successor function, f: (s ∈ S × a ∈ e(s)) → t ∈ S; and
 r ∈ S is a root state.
Using the notion of a game tree, now it is possible to define an extensive-form game. This
representation consists of a game tree with a set of players, who are assigned to the states of the

17



[Type here]

[Type here]

Bui Ngoc Duc

tree; and a utility function, which determines the utility in every terminal state, i.e. in every leaf
of the game tree.
Definition 2.7. Extensive-form games
Game in extensive form is a tuple , where:
 is a set of players;
 T is a game tree
 b is a player belonging function1) ; and
 u is a utility function
Extensive form was originally designed for sequential games, where players take their actions
one by one. Game trees in these games provide a suitable way to visualize the game-play. This
representation is also more complex than normal form.
In the example of matching pennies in extensive form, the second player can always make her
choice dependent on the first player’s choice; if the first player selects Head, she will select Tail,
and if the first player selects Tail, she will select Head. If paired with any of the two pure
strategies of the first player, we have a Nash equilibrium in pure strategies.

An example of extensive-form game – Matching pennies.
18


[Type here]


[Type here]

Bui Ngoc Duc

Stochastic games (Markov Games)
Arguably, most—if not all—real-world systems are influenced by events of a probabilistic
nature. Shapley (1953) was the first to define a game model that in corporates probabilistic
choices.
Definition 2.8. Stochastic games
According to Shapley, stochastic games is a tuple of where:
 S: is the states of the games;
 Ai is the set of available action for player i, A is the set of available action for players;
 T: is the transitions function it means that at state S if player I choose action a i and
others choose action simultaneously then the probability of reaching some next states S’;
 R: is the reward for the players for taking chosen actions.
 γ: is the discount factor.
 N: is the number of players
Shapley games are played by a finite number of players on a finite state space, and in each state,
each player chooses one of finitely many actions resulting profile of actions determines a reward
for each player and a probability distribution on successor states.
In principle, a stochastic game proceeds ad infinitum. The payoff that each player receives is
given by a function of the infinite stream of rewards for this player: Shapley considered games
where payoffs are discounted sum of rewards; other popular payoff functions are the limit
average of the rewards or the total sum of the reward that was mentioned by Filar & Vrieze in
1997.
A pure strategy in a stochastic game assigns an action to each possible sequence of states visited
so far, where as a randomized strategy assigns a probability distribution on actions to each such
sequence. Hence, every player has at his command, the Nash’s theorem of equilibrium is not
19



[Type here]

[Type here]

Bui Ngoc Duc

applicable. Nevertheless, in the case of discounted payoffs, there always exists a Nash
equilibrium in randomized strategies. There is even a Nash equilibrium where strategies only
depend on the current state and not on the full history of visited states; we call such strategies
stationary. For the general sum game, the Nash equilibria do not exist.
Thus, how the stochastic game could be applied to our research in order to create an agent having
the ability to make decision without the supervised of human. In principle, the stock market as a
stochastic game between our agent and others self interested agent, they can cooperate or
competitive with others in order to gain the optimize reward. However, the practical problem is
unable to know all the information about other agents’ decision and state. Then, in the context of
this thesis, we describe the stock market game as two player stochastic game, all the interaction
of other agents to our agent’s action shall be reflected through market movement (nature). As can
be seen, it could be ease to directly apply the stochastic game to stock market where our agent
chooses an action based on the current state, estimate the next available states and rewards, then
choose the best respond at the current state. However, it is impossible to predetermine all state
and available next states along with the rewards from taking actions because of the complex the
nature of the market. Fortunately, other research field that holds the key factor to solve our
problem simulation and computer science approach in the form of machine learning.
Simulation
In the following parts, we shall mention some key concept of simulation and machine learning to
provide more insight on how they could be the solution for the problem of traditional stochastic
game.
Simulation


20


[Type here]

[Type here]

Bui Ngoc Duc

Simulation methods are ways to imitate of the operation of real-world systems. It first requires
that a model be developed representing characteristics, behaviors and functions of the selected
system or process. The model represents the system itself, whereas the simulation represents the
operation of the system over time.
The methods are widely used is Economy, Biology, Engineering and almost all sciences. It is
usually done using computers making changes to variables and performing predictions about the
behavior of the system. A good example of the usefulness of computer simulation can be found
in automobile traffic simulation, grocery stores check out lines, inventory management, stock
prices predictions, environmental consequences of policies and so on.
Key issues in simulation include acquisition of valid source information about the relevant
selection of key characteristics and behaviors, the use of simplifying approximations and
assumptions within the simulation, and fidelity and validity of the simulation outcomes.
Procedures and protocols for model verification and validation are an ongoing field of academic
study, refinement, research and development in simulations technology or practice, particularly
in the field of computer simulation.

21


[Type here]


[Type here]

Bui Ngoc Duc

The simulation procedure.
Algorithms
Machine learning
Machine learning: Machine learning is a field of computer science that often uses statistical
techniques to give computers the ability to "learn" (i.e., progressively improve performance on a
specific task) with data, without being explicitly programmed [Samuel, Arthur (1959). "Some Studies in Machine Learning
Using the Game of Checkers". IBM Journal of Research and Development.]

Analysts like to talk about their model that they build in term of the problem that they solve. A
model is the process of taking in observations then provide predictions. There was a lot of
models that people have built base on the application of simulation model, for example the
famous Black-Scholes model that predicts options prices. Those models are developed by using
mathematical formula based.
22


[Type here]

[Type here]

Bui Ngoc Duc

However, to deal with the problem of building an agent that can learn and adapt to the
environment, we need simulation approach under the form of machine learning. With machine
learning, we do not use direct observations like modeling, we try to use data. The machine
learning process is to take historical data, run it through a machine learning algorithm to generate

the model. The model is not built by human but the machine it self. Then when we need to use
the model, we just provide some input and the out put come out automatically.
Application to stock data
The application of machine learning approach to stock data is quite straight-forward, the
following figure shall describe how it works with historical stock data. The historical data
represents the value of the features for a particular stock through time horizon, we represent
those features by stacking these one behind the other. We use machine learning algorithms to
train our agent based on those features and historical price.

23


[Type here]

[Type here]

Historical data

Bui Ngoc Duc

Historical price

Time horizon
Features (x)
P/E
Bollinger band
Moving average

Price (y)


An example of machine learning algorithm applies in stock data.
Reinforcement learning
The simple machine learning model is good at predictive result from recognizing the market
pattern of the input data; however, in order to create an agent that is able to determine the best
respond under specific pattern we shall use another research branch of machine learning –
Reinforcement Learning (RL)
The trading agent might be conveniently modeled in the framework of reinforcement learning as
mention above. This framework adjusts the parameters of an agent to maximize the expected
payoff or reward generated due to its actions. Therefore, the agent learns a policy that tells him

24


[Type here]

[Type here]

Bui Ngoc Duc

the actions it must perform to achieve its best performance. This optimal policy is exactly what
we hope to find when we are building an automated trading strategy.
To solving Stochastic games of our agent, Markov decision processes (MDPs) are the most
common model when implementing reinforcement learning. It can be considered as narrow
down model of Stochastic games. The MDP model of the environment consists, among other
things, of a discrete set of states S and a discrete set of actions taken from A. In this project, we
only mention the action set of our agent because we assume that other agent action will be
reflected as price movement of the stock; depending on the position of the learner (long or short),
at each time step t it will be allowed to choose an action at from different subsets from the action
space A, that consists of three possible actions:


Where:
 None indicates that the agent shouldn't have any order in the market.
 Long and Short means that the agent should execute a market order to buy or sell 100
stocks (the size of an order will always be a hundred shares).

So, at each discrete time step t, the agent senses the current state and choose to take an action at.
The environment responds by providing the agent a reward and by producing the succeeding
state The functions r and δ only depend on the current state and action (it is memoryless), are
part of the environment and are not necessarily known to the agent.
The task of the agent is to learn a policy that maps each state to an action, selecting its next
action at based solely on the current observed state st, that is . The optimal policy, or control
25


×