Tải bản đầy đủ (.pdf) (27 trang)

Summary of Mathematics Doctoral Dissertation: Apply Markov chains model and fuzzy time series for forecasting

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.28 MB, 27 trang )

MINISTRY OF EDUCATION
AND TRAINING

VIETNAM ACADEMY OF
SCIENCE AND TECHNOLOGY

GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY
-------------------------------

DAO XUAN KY

APPLY MARKOV CHAINS MODEL AND
FUZZY TIME SERIES FOR FORECASTING

Major: Math Fundamentals for Informatics
Code: 62.46.01.10

SUMMARY OF MATHEMATICS DOCTORAL
DISSERTATION

Ha Noi, 2017


This work is completed at:
Graduate University of Science and Technology
Vietnam Academy of Science and Technology

Supervisor 1: Assoc. Prof. Dr. Doan Van Ban
Supervisor 2: Dr. Nguyen Van Hung

Reviewer 1: ……………………………………………………………………


…………………………………………………………………………………

Reviewer 2: ……………………………………………………………………
…………………………………………………………………………………

Reviewer 3: ……………………………………………………………………
…………………………………………………………………………………
This Dissertation will be officially presented in front of the Doctoral Dissertation
Grading Committee, meeting at:
Graduate University of Science and Technology
Vietnam Academy of Science and Technology
At …………. hrs ……. day ……. month……. year …….

This Dissertation is available at:
1. Library of Graduate University of Science and Technology
2. National Library of Vietnam


LIST OF PUBLISHED WORKS
[1]

Dao Xuan Ky and Luc Tri Tuyen. A markov-fuzzy combination
model for stock market forecasting. International Journal of Applied
athematics and StatisticsTM, 55(3):109–121, 2016.

[2]

Đào Xuân Kỳ, Lục Trí Tuyen, va Phạm Quốc Vương. A combination
of higher order markov model and fuzzy time series for stock market
forecasting”. In Hội thảo lần thứ 19: Một số vấn đề chọn lọc của Công

nghệ thông tin và truyền thông, Hà Nội, pages 1–6, 2016.

[3]

Đào Xuân Kỳ, Lục Trí Tuyen, Phạm Quốc Vương, va Thạch Thị
Ninh. Mô hinh markov-chuỗi thời gian mờ trong dự báo chứng khoán.
In Hội thảo lần thứ 18: Một số vấn đề chọn lọc của Công nghệ thông
tin và truyền thông, TP HCM, pages 119–124, 2015.

[4]

Lục Trí Tuyen, Nguyễn Văn Hung, Thạch Thị Ninh, Phạm Quốc
Vương, Nguyễn Minh Đức, va Đào Xuân Kỳ. A normal-hidden
markov model model in forecasting stock index. Journal of Computer
Science and Cybernetics, 28(3):206–216, 2012.

[5]

Dao Xuan Ky and Luc Tri Tuyen. A Higher order Markov model for
time series forecasting. International Journal of Applied athematics
and StatisticsTM, vol 57(3):1-18, 2018.


Introduction
The time series forcasting with preditve variable object X changing over time in order
to achieve predictive accuracy is always a challenge to scientists, not only in Vietnam but
also globally. Because it is not easy to find a suitable probability distribution for this
predictive variable object at the point t was born. Historical data need to be collected and
analyzed, in order to find a perfect fit. It is, however, a distribution can only fit with
statistics in a particular time in time series analysis, and varies at other certain point of time.

Therefore, the use of a fixed distribution for the predictable object is not applicable for this
analysis.
For the above mentioned reason, the building of predictable time series forcasting
model requires connection and syncognition between historical and future statistics, in
order to set up a dependent model between data obtained at present t and in the past t-1, t-2.
If the connection X t  1 X t 1   2 X t 2  p X t  p   t  1 t 1  q t q is set up, we can
generate an autoregressive integrated moving average (ARIMA) [15] model. This model is
applicatable widely for its practical theory and intergrated into almost current statistical
software such as Eviews, SPSS, matlab, R, and etc.
It is, however, many real time sequencing shows that they do not change linearly.
Therefore, model such as ARIMA does not suit. R. Parrelli pointed it out in [28] that there
is a non-linerable connection in economic or financial time series variance indicators. The
generalized autoregressive conditional heteroskedasticity (GARCH) [25,28] is the most
popular non-linerable time series forecasting analysis to mention. The limitation of this
model lies in the assumption that statistics vary in a fixed distribution (normally standard
distribution), while actual statistics shows that distribution is statistically significant [39]
(while standard distribution has a balanced variation). Another time series forecasting is
Artificial Neural Network (ANN which was developed recently. ANN models do not based
on deterministic distribution of statistics; instead it functions like human brain trying to find
rules and pathes to training data, experimental testing, and result summarizing. ANN model
is usually used for statistics classification purpose [23]. More recently, a new theory of
statistical machine learning called Support Vector Machine (SVM) serving as answer to
forcast and classification which caught attention of scientiests [36,11,31]. SVM is applied
widely in many areas such as approximate function, regression analysis and forecast
[11,31]. The biggest limitation of SVM is that with huge training files, it requires enomous
calculation as awell as complexity of the linear regession exercise.
To address the limitations and promote the strengths of exisiting models, a new and
trendy research method was introduced which is called Combined Anaysis (CA) ie. a
combination of of different methods to increase the forecast accuracy. Numerrous studies
have been conducted based on this method, and many combined models have been

published [43,5,6]. Some methods uses the Markov chain (MC) as well as hidden Markov
(HMM). Refiul Hassan [19] developed a united model by matching an HMM with an ANN
and GA to generate forecast a day -1 stock price. This model aims to identify similar


patterns from historical statistics. Then ANN and GA models are used to interpolate the
neighbor values ò the defined statistics model. Yang [41] combined the HMM model using
synchoronous clustering technique to increase the accuracy of the forecasting model. The
weighted Markov model was used by Peng [27] in predicting and analyzing desease
transmission rate in Jiangsu, China. These combined models proved to bring practical and
meaningful results, as well as increase the accuracy in prediction compared to traditional
ones [27,41,19]. The above mentioned models, despite having improved significantly in
terms of accuracy in prediction, still face difficulties with fuzzy statistics (there are
uncertain molecules).
To deal with fuzzy statistics, a new research direction was introduced recently, which
is called Fuzzy Time Series (FTS). The first result from this theory worth to mention is
Song and Chissom [34]. These studies focused on improving the Fuzzy Time Series model
and finding ways for the forecasting analysis. Jilani and Nan combined Heuristic model
with Fuzzy Time Series model to improve the model accuracy [24]. Chen and Hwang
expanded the Fuzzy Time series model into Binary model [14] and then Hwang and Yu
developed it into N-scale model to forecast stock indicators [21]. In a recent paper [35],
BaiQuing Sun has expanded the Fuzzy Time Series model into multi-order to forecast stock
price in the future. Qisen Cai [10] combined the Fuzzy Time Series model with ant
optimization and regession to obtain a better outcome. In Vietnam, the Fuzzy Time Series
model was recently applied in a number of specific areas, some to mention include the
study of Nguyen Duy Hieu and Partners [2] in semantic analysis. Additionally, the study of
Nguyen Cong Dieu [3,4] combined The Fuzzy Time Series model with techniques to adjust
some parameter in maths or specific charactors of statistics aiming to the forecast accuracy.
The study of Nguyen Cat Ho [1] used sonographic algebra in Fuzzy Time Series model
which showed the higher forecast accuracy compared to several existing modesl.

Up to now, inspite of many new models combining existing one aiming to improve
the forecast accuracy, there is a fact that these models are complex yet accuracy not
improving. Therefore, there may arise some other direction aiming to simplify the model
while ensure the forecast accuracy.
The objective of this dissertation focuses on two key issues. Firstly, to modelize time
series by states in which each is a deterministic probability distribution (standard
distribution) and to evaluate the suitability of the model based on experimental results.
Secondly, combine Markov chain and new Fuzzy Time series models to improve the
forecast accuracy. In addition, to expand the advanced Markov chain model to
accommodate seasonal statistics.
The dissertation consists of 3 chapters – Chapter I presents overall study of Markov
chain and hidden Marko and Fuzzy Time Series models; Chapter II presents time series
modelling into states in which 1) each state is standard distribution vs. average i , variance
 i2 , i  1, 2,..., m with m is the state; 2) states over time followed Markov chain. The model,
then was tested on VN-Index indicator to evaluate efficiency of model forecast. Last chapter
presents the analysis of limitations and unmatches between forecasting models and


deterministic probability distribution as a motivation for the combined model proposed in
Chapter 3. Chapter III presents combined Markov chain and Fuzzy Time Series models in
time series forecasting. This chapter also presents the expanded and advanced Markov chain
with two chain concepts which are conventional higher order Markov (CMC) and improved
higher order Markov (IMC). These models, then, were programmed in the R language and
tested wit data sets that corresponded exactly with comparision model sets.

Chapter 1 - Overview & Proposal
1.1.

Markov chain


1.1.1. Definitions
Consider an economic or material system S with m possible states, denoted by I :
I  1, 2,..., m. System S evolves randomly in discrete time ( t  0,1, 2,..., n,... ), called Cn and set
to a random variable coresponding to the state ò the system S at the time n (Cn  I ) .
Definition 1.1.1. Random variable sequense ( Cn , n  ) is a Markov chain if and only if all
c0 ,c1 ,...,cn  I :
Pr (Cn  cn | C0  c0 , C1  c1 ,..., Cn1  cn1 )  Pr (Cn  cn | Cn1  cn1 )
(1.1.1)
(iwith a condition this probability makes sense)
Definition 1.1.2. Markov chain is considererd comprable if and only if the possiblity in (1.1.1)
is not dependent on n and non-comparable in other cases. .
For the time being, we consider the comparable case, in which
Pr (Cn  cn | Cn1  cn1 )   ij ,
And matrix Γ by definition:
Γ   ij  .
To define fully the development of a Markov chain, it is necessary to fix an iniital distribtuion
for state C0 , for example, a vector:
p  ( p1 , p2 ,..., pm ),

In this chapter, we stop at considering comparable Markov chain which is featured by
couple (p, Γ) .
Definition 1.2.3. A Markov matrix Γ is considered formal if there exists a positive integer k,
such that all elements of the matrix are actually positive.
1.1.2. Markov chain classification
Take i  I and put d (i) is the largest general divisor of a set of intgers n such that  ii( n )  0.
Definition 1.2.4. If d (i)  1 , state i is considered a revolving cycle d (i) . If d (i)  1, then sate i is
not revolving.
Easy to see, if  ii  0 then i is not revolving. However, the opposite is not pretty true.
Definition 1.2.5. Markov chain of which all its states not revolving is call irrevolving Markov
chain.


Definition 1.2.6. A state i is called reaching state j (written i
that  ijn  0.
iCj means i can not reach j .

j ) if exist an integer n such


Definition 1.2.7. State i and j is called inter-connected if i
i

j and j i , or if i  j. We write

j.

Definition 1.2.8. State i is called essential if it connects with every state that it reaches; the
opposite is call non-essential.
Relationship
determines an equivelent relationship in state space I resulted in a class
division on I . The equivalent class contains symbol i denoted by Cl (i) .
Definition 1.2.9. Markov chain is called not expandable if there is only one equivalent class on
it.
Definition 1.2.10. Subset E of state space I is considered closed if:
 ij  1, với mọi i  E.

jE
Definition 1.2.11. State i  I of Markov chain (Ct ) is considered regressed if exists state i
j  I and n  such that  nji  0 . Oppositely, i is called forwarding state (moving).
1.1.3. Markov matrix estimation
Consider Markov chain (Ct ), t  1, 2,... and suppose to observe n and other states

c1 , c2 ,..., cn . Symbols cn  c1 , c2 ,..., cn generated by random variables C n then the logical
function of forwarding probability matrixnis given by:
Pr (C n  c n )  Pr (C1  c1 )
Pr  Ct  ct | C t 1  c t 1 
n
tn 2
 Pr (C1  c1 )
Pr  Ct  ct | Ct 1  ct 1 
 Pr (C1  c1 )
t  2  ct 1ct
t 2

Define numbers of transfer nij  number of times that state i forwards, follwed by state
j in chain C n , then likelihood looks like: k k
n
L( p)  Pr (C1  c1 )  ij
j 1
L( p) with the hiddens are  ij . To solve
We need to find the maximum rationali 1function
this exercise, first we take logarit of L( p) to make a total function aiming to take the derivative
easily. .
( p)  log L( p)  log Pr
(C1  c1 )   nij log  ij
m
i, j
Due to   ij  1 , each i,  i1  1    ij , take the
derivative by parameter:
ij

nij n


 j 2  i1
 ij  ij  i1
Given derivative equals to 0 obtained
at  ij we have:
nij ni1

ˆij ˆi1
therefore
nij ˆij

ni1 ˆi1
true with all j  1 therefore
n
ˆij  m ij
 nij
j

1.2.

Hidden Markov Model

j 1

A HMM includes two basis components: chain X t , t  1,..., T consists observations and
Ct  i, t  1,.., T , i {1, 2,..., m} which were generated from those observations. In deed, HMM
model is a special case of mixed dependent model [16] and Ct which are mixed components.


1.2.1. Definition and Symbols

Symbols X(t ) và C(t ) displayhistorical statistics from point of time 1 to point of time t ,
which can be summarized as the simpliest HMM model as follows:
Pr (Ct | C(t 1) )  Pr (Ct | Ct 1 ), t  2,3,..., T .
Pr ( X t | X(t 1) , C (t ) )  Pr ( X t | Ct ), t 

Now we introduce some symbols which are used in the study. In case of discrete
observation, by definition:
pi  x   Pr  X t  x | Ct  i  .

In the case of continuity, pi ( x) is X t „s probability function range, if Markov chain
receives state i at point of time t .
We symbolize a comparative Markov chain‟s forwarding matrix as Γ with its
components  ij defined by:
 ij  Pr (Ct  j | Ct 1  i).
From now on, m distributes pi ( x) is called dependent dependencies of the model.
1.2.2. Likelihood and maximum estimation of likelihood
For discrete observation X t , define
ui  t   Pr  Ct  i  với i  1, 2,..., T , we have:
m
m (C  i ) Pr ( X  x | C  i )
Pr ( X t  x)   Pr
t
t
t
i 
1  ui (t ) pi ( x ).
i 1

(1.2.1)
For convenience in calculating , fomula (1.2.1) can be re-written in the form of the

0 1
following matrix:
 p1 ( x) 0


 
Pr(Xt =x)=(u1 (t),...,u (m) (t))   0
0  

 u(t)P( x )10.
0 pm ( x) 
1
in which P(x) is diagonal matrix with the i element on the diagonal line pi ( x) . On the other
hand, by nature of the pure Markov chain , u(t)  u(1) Γt1 with u(1) is an initital distribution of
Markov chain, usually denoted with stop distribution as δ . Thus, we have
Pr ( X t  x)  u(1)Γt 1P( x)1.
(1.2.2)
Now call LT is the likelihood function of the model with T observe x1 , x2 ,..., xT then
LT  Pr (X( T)  x( T) ) . Derived from theT simutaneous probability
formula
T
(T)
( T)
(1.2.3)
Pr ( X , C )  Pr (C1 ) Pr (Ck | Ck 1 ) Pr ( X k | Ck ),
k 1
k 1
We sum on all possible states of Ck , then using the method as the fomula (1.2.2), we have
LT   P( x1 )ΓP( x2 )...ΓP( xT )1.
If initial distribution δ is the stop distribution of Markov chain, then

LT   ΓP( x1 )ΓP( x2 )...ΓP( xT )1.

To calculate likelihood easily by algorithm, reduce the number of operations that the
1,..., T by
computer needs to perform, we define vector α t where t 
t
(1.2.4)
 t   P( x1 )ΓP( x2 )...ΓP( xt )   P( x1 ) ΓP( xs ),
s 2
Then we have
LT  T 1, và t  t 1ΓP( xt ), t  2.
(1.2.5)
It is easy to calculate LT by regression algorithm. To find the parameter set satisfies LT
maximal, we can perform two methods:


Direct estimation of extreme values function LT (MLE): Firsly,from equation (1.2.5) we
need to calculate logarit of LT effectively to advantageous to find the maximum based on the
progressive probabilities α t . For t  0,1,..., T , we define the vector t  t / wt , where
wt   t (i)   t 1 , and Bt  P( xt ) . We have
i
w0   0 1   1  1;
0   ;
wtt  wt 1t 1Bt ;
LT   t 1  wT (T 1)  wT .
T
Then LT  wT   ( wt / wt 1 ) . From
equation (1.4.13)
we have wt  wt 1  Bt 1 , then
T

T
t 1
log LT   log  wt / wt 1    log t 1Bt 1  .
t 1
t 1
EM Algorithm: This algorithm
is called Baum-Welch
algorithm[9] for consistent
Markov chain (Not necessarily Markov stop). The algorithm uses forward probabilities (FWP)
and backward probabilities (BWP) to calculate LT .
1.2.3. Forecasting distribution
For discrete observations, forecasting distribution Pr ( X nh  x | X ( n)  x( n) ) is a ratio of LT
based on conditional probability
(T )
(T )
Pr ( X  x , X T  h  x)
Pr ( X T  h  x | X (T )  x (T ) ) 
(T )
(T )h
 P(xPr
BxT Γ
(X
)P(x)1
1 )B
2 B 3 ...

h
 ΓPP(x(x1 )1B2B 3 ...BT 1
 T
.

) T 1
(T )
(T
By T  T / T 1 $, we have Pr ( X T h  x | X  x )  T h P(x)1.

Forecasting distribution can be written as probabilitymof dependency random variables:
Pr ( X T  h  x | X (T )  x (T ) )   i (h) pi ( x).
where the weight i (h) is the i component of vectori 1 T  h .
1.2.4. Viterbi algorithm
The objective of Viterbi algorithm is to find the best of state sequences i1 , i2 ,..., iT
corresponding to the observation sequence x1 , x2 ,..., xT which maximizes the function LT .
Set 1i  Pr (C1  i, X1  x1 )  i pi ( x1 ), where t  2,3,..., T
ti  max c ,c ,...,c Pr (C (t 1)  c(t 1) , Ct  i, X (T )  x(T ) ).
Then we can see that probability tj satisfies the following recursion process for
t  2,3,..., T and i  1, 2,..., m : tj  max i (t 1,i ij ) p j ( xt ).
The best state sequence i1 , i2 ,..., iT is determined by regression from iT  argmaxTi and
i 1,..,m
for t  T  1, T  2,...,1, we have it  argmax(ti i ,i ).
1

2

t 1



1.2.5. Status forecasting




i 1,..., m

t 1

For status forecasting, we only use the Bayes formula in classical.
For i  1, 2,..., m,
Pr (CT h  i | X (T )  x(T ) )  αT Γh (, i) / LT  T h (, i)
Note that, when h  , n Γh moves towards the stop distribution of the Markov chain


1.3.

Fuzzy time series

1.3.1. Some concepts
Suppose U be the discourse domain. This space determines a set of objects. If A is a
crisp set of U then we can determine exactly a feature function:
( )

{

Definition 1.3.1. [34]: Suppose U be the discource domain and U  {u1 , u2 ,..., un } . A fuzzy set
A in U defined:
A=f A (u1 )/u1 +f A (u2 )/u2 +...+f A (un )/un
f A is membership function of fuzzy set A and f A : U  [0;1], f A (ui ) is a degree of
membership (the rank) of ui on A .
Definition 1.3.2. [34]: Let Y (t )(t  0,1, 2,...) be a time series that its values in the discource,
which is a subset of real numbers. On which, the fuzzy sets fi (t )(i  0,1, 2,...) is determined on
Y  t  , and F (t ) is a collection the sets f1 (t ), f 2 (t ),..., then F (t ) is called fuzzy time series on
Y t .


Definition 1.3.3.[34]: Suppose that F (t ) is only inferred from F (t  1) , denoted as
F (t  1)  F (t ) , this relationship can be expressed as follows F (t )  F (t 1)oR(t , t 1) , in which
F (t )  F (t 1)oR(t , t 1) is called as rst-order model of F (t ), R(t , t  1) , is the fuzzy relationship
between F (t  1) and F (t ) , and "o" is a component operator Max–Min .
Definition 1.3.4. [34]: Let R(t , t  1) be a first-order model of F (t ) . For any
t , R(t , t 1)  R(t 1, t  2) , then F (t ) is said to be a time-invariant fuzzy time series. Otherwise,
F (t ) isa time-variant fuzzy time series.


Chapter 2. HIDDEN MARKOV MODEL IN TIMES SERIES
FORECASTING
2.1.

Hidden Markov model in the time series forecasting

According to Chapper 1, HMM model consists of two basic components: the chain of
observations X t , t  1,..., T and mix components Ct  i, t  1,.., T , i {1, 2,..., m} .
To illustrate the HMM model in time series forecasting easily, let us consider the above
time series and denoted as X t , t  1,..., T . The real problem for investors is to predict the value of
the future to know how long the stock index will go from the bottom to the top. From observing
the fact that the stock index at a new peak will not be at that value (or fluctuate slightly around
that value) forever that will go down after some time, similarly. with oscillations from the
bottom to the top. So we can be specified X max is the longest time that the stock's value from
the bottom to the top. Then, 0  X t  X max (see Figure 2.2.1). Investors want to regulate the state
of affairs with X t , such as "wait fast", "wait quite fast", "wait long", "wait very long" but do not
know how to define. To solve this problem, we consider each of these states a Poisson
distribution with the mean (also the variance) i , i  1, 2,3, 4 and is "hidden" in the chain X t .
Assuming that these states follow a Markov chain, We have a hidden Markov model for the
time series forcasting problem.


Figure 2.1 .1. The Definition of the time series forecasting
2.1.1. HMM model with Poisson distribution
To apply the HMM model for time series forecasting, The dissertation illustrates both
parametric estimation methods described in Section 1.3.2 Chapter 1. For MLE estimation, the
dissertation performs programming on R for the HMM model with the state as the Poisson
distribution. Poisson distribution has the parametter   0 both mean and variance. Parameter
estimation by MLE method is as follows:
Algorithm 2.1 Maximum reasonable function
Input: x,m, lambda0,gamma0
Output: m, lambda0, gamma0, BIC, AIC, mllk
1:
procedure POIS.HMM.MLE (x,m, lambda0,gamma0, ... )
2:
parvect0← pois.HMM.pn2pw(m, lambda0,gamma0) { Change model
to free parameter }
3:
mod ←nlm(pois.HMM.mllk, parvect0,x = x,m = m) { Estimate the
parameter as a reasonable maximum function }


4:
pn← pois.HMM.pw2pn(m,mod$estimate) { Change the free parameter to
the model parameter pm }
5:
mllk ←mod$minimum
{ Get the max value assigned to mllk }
6:
np←length(parvect0)
{ Count of model parameters }

7:
AIC < −2 ∗ (mllk+np)
{ Calculate the standard AIC }
8:
n < −sum(!is.na(x))
{ Calculate the number of observations }
9:
BIC < −2 ∗mllk+np ∗ log(n)
{ Calculate the standard BIC }
10:
return (lambda, gamma, mllk, AIC, BIC)
2.1.2. HMM model with normal distribution
In the model with normal distribution, the parameters of the Markov chain are still the
same, but the parameter of the mix distribution is the mean and the variance, while the number
of states of the model is also the stop distribution of the Markov chain.
Calculations of FWP and BWP are performed by the normalization function
HMM.lalphabeta (logarithm of FWP and BWP). In which, lalpha, lbeta is the log of FWP and
BWP respectively.
Algorithm 2.3 Calculate the forward and backward probabilities of LT
Input: x,m,mu, sigma,gamma,delta
Output: lalpha, lb = lbeta
1:

procedure NORM.HMM.LALPHABETA(x,m,mu, sigma,gamma,delta )

2:
if (is.null(delta)) then delta←solve(t(diag(m)−gamma+1), rep(1,m))
unlikely event of the initial distribution of the Markov chain}
3:
Calculate the probabilities of FWP in (1.2.6) for lalpha

4:
Calculates probabilities for BWP in (1.2.7) for lbeta
5:
return list(la = lalpha, lb = lbeta)

{ In the

Here, according to the EM algorithm in Section 1.3.2 of Chapter 1, we can immediately
perform the parameter estimation by norm.HMM.EM
Algorithm 2.4. Algorithm EM for Normal-HMM
Input: x,m,mu(), sigma(),gamma(),delta(),maxiter, tol
Output: mu, sigma, gamma, delta, mllk, AIC, BIC
1:
procedure NORM.HMM.EM(x,m,mu, sigma,gamma,delta,maxiter, tol )
2:
mu.next ←mu(); sigma ←sigma();delta ←delta() { Assign a parameter to the
original value }
3:
for iter in 1 : maxiter do
4:
f b←norm.HMM.lalphabeta(x,m,mu, sigma,gamma,delta= delta) {Calculate
FWP and BWP}
5:
llk ←reasonable function value
6:
for j in 1:m do
7:
for k in 1:m do



8:
Calculate gamma[ j,k]
9:
Calculate mu[j]
10:
Calculate sigma [ j]
11:
Calculate delta
12:
crit ← sum(abs(mu[j] – mu()[j])) + sum(abs(gamma[jk] –
gamma()[jk])) + sum(abs(delta[j] –delta()[j]))+sum(abs(sigma[j]−sigma()[j])) { the
converge criteria }
13:
if crit < tol then
14:
AIC← -2 ∗ (llk−np)
{ the criteria AIC}
15:
BIC← -2 ∗ llk+np ∗ log(n) {the criteria BIC}
16:
return (mu, sigma, gamma, delta, mllk, AIC , BIC)
17:
else { If not converged }
mu0←mu; sigma0←sigma; gamma0←gamma; delta0←delta
{ Reassign
the new original parameter }
18:
Not converging later, “maxiter”, loop
2.2.


Experimental results for HMM with Poisson distribution

2.2.1. Parameter estimation
Table 2.2.1. Estimate parameters of model Poisson-HMM for time.b.to.t with states
m=2,3,4,5

2

3

4

5

1
2
1
2
3
1
2
3
4
1
2
3
4
5

11,46267

40,90969
5,78732
21,75877
57,17104
5,339722
16,943339
27,711948
58,394102
5,226109
15,679316
25,435562
38,459987
67,708874

0,6914086
0,3085914
0,3587816
0,5121152
0,1291032
0,3189824
0,3159413
0,2301279
0,1349484
0,31513881
0,28158191
0,22224329
0,10376304
0,07727294

 0,8 0,2 



 0,51 0,49 
 0,46 0,47 0,07 


 0,33 0,47 0,02 
 0,2 0,8
0 

 0,4 0,46 0,07 0,07 


0 
 0,53 0,29 0,18
 0
0
0,51 0,49 


0 
 0,19 0,56 0,25

 0,38 0,4 0,15 0,07
0 


0
0,14
0 

 0,5 0,36
 0,13
0
0,33 0,19 0,35 


0,53 0,47
0
0 
 0
 0,33
0
0,67
0
0 


Table 2.2.2. Mean and variance compared with sample
M
1
2
3
4

Mean
20,45238
20,45238
20,45238
20,45238


Variance
20,45238
205,5624
272,6776
303,7112

216,8401

171,1243

159,898

154,6275


5
Sample

20,45238
20,45238

303,4568
307,083

The results show that the Poisson-HMM model with 4 states has a variance close to the
sample variance. However, there is not enough evidence to confirm that the 4-state model is the
best. In order to have better methods of selection, we need to have standards for selecting
models in more detail.
2.2.2. Model selection
Given the observed x1 ,..., xT were born by a “real" unknown f model and we model by

two different approximations {g1  G1} và {g2  G2 } . Purpose of model selection is to identify
the best model in some aspect.
Now, apply 2 standards AIC and BIC to Poisson-HMM model for data set time.b.to.t, the
results are listed in Table 2.3.3.
Table 2.2.3. AIC and BIC Standards
m

AIC
BIC

2

3

4

5

441,6803
448,6309

360,2486
375,8876

351,7961
379,5988

359,2551
402,6968


2.2.1. Forecast Distribution
As mentioned above, training data for the HMM model was obtained from 3 January
2006 to 19 June 2013. We will get the following data from 14/06/2013 to 22/08/2013 to
compare with forecast results of the model. Figure 2.1.2 shows fluctuation of closing VN-Index
during this period. As we see, the number of sessions that VN-Index fluctuates from the bottom
(26/06/2013) to the peak (19/08/2013) is 35 days. Thus, this value corresponds to the state 3 of
the model (Poisson distribution with the mean at 27.711948). We will wait to see the results of
the forecast model

Figure 2.2.1. Vn-Index fluctuation from 14/06/2013 to 22/08/2013 and waiting time from bottom to
peak
Pr ( X T h  x | X( T)  x( T) )

Now, we need to find the formula for predicting the distribution
.
With the matrix formulas as shown in the previous sections, this distribution can be computed
as follows:




T 

T 





P X T   x T  , X T  h  x


P X T  h  x|X  x

δP  x1  ΓP  x2  ΓP  x3 ΓPP XxTT Γh P
x T x 1'

h
P
δP  x1  ΓP xT2 Γ
Γ
P  xx3 1'

ΓP  xT 1'

T 1'
Given T  αT / αT 1' , we have
T
T
P X T  h  x|X    x    T Γh P  x 1 .












These distributions are summarized in Table 2.3.4
Table 2.2.4. Distribution forecast information & intervals.
Forecast Mode
Forecast mean
Forecast interval
Probability
Reality

1
2
3
4
27
26
5
5
42,30338
30,16801
25,53973
23,68432
Estimated range with probability over 90%
[
]
[
]
[
]
[
]
0,9371394 0,9116366 0,9342868 0,9279009

35
-

5
5
22,48149

6
5
21,91300

[
]
[
]
0,9237957 0,9215904
-

2.2.2. Forecast states
In the previous section we have found the conditional distribution of the state Ct given
observation X (T ) . In this way we only consider the present state and past states. However, it is
also possible to calculate the conditional distribution for the future state CT  h , This is called
state forecast.
α Γh (, i)
Pr (CT  h  i | X( T)  x( T) )  T
 T Γh (, i)
L

T
with t  αT / αT 1 .

We perform forecasting state of the Poisson-HMM model, 4 states of data time.b.to.t in 6
times with results as shown in Table 2.2.5.
Table 2.2.5. 6 times forecasting state of time.b.to.t.
State =

2.3.

1
0,006577011
0,003744827
0,506712945
0,482965217

2
0,09686901
0,27624774
0,37858412
0,24829913

3
0,2316797
0,2658957
0,3104563
0,1919683

4
0,2688642
0,2931431
0,2698832
0,1681095


5
0,2934243
0,3048425
0,2508581
0,1508750

6
0,3060393
0,3098824
0,2407846
0,1432937

Experiemental result of HMM model with normal distribution

2.3.1. Parameters estimation
0283
0000
0000  by EM we have:
 0,9717
With any initial distribution
(e.g.: (1/0,4,1/
estimated
4,1/0,4,1/
4) ),0,


0, 0927 0,8106 0, 0804 0, 0163 



 0, 0000 0, 0748 0,8624 0, 0628 


0, 0000 0, 0000 0, 0818 0,9182 

  (453,9839;484,6801;505,9007;530,8300)
  (10,6857;7,1523;6, 4218;13,0746)

Figure 2.3.1 describes values of VNIndex with best state range using Viterbi algorithm.
The dashed lines represent the four states while the dark dots represent the best state for the
value at each time.


Figure 2.3.1. VN-Index data: best state range
2.3.2. Model selection
According to the theory of HMM on the BIC and AIC criteria for the VNI index, AIC and
BIC selected 4 status. The values of the standard given in the Table 2.4.1.
Table 2.3.1. VN-Index data: select state number
Model
2-state HM
3-state HM
4-state HM
5-state HM

-logL
1.597,832
1.510,989
1.439,179
No convergence


AIC
3.205,664
3.043,978
2.916,358

BIC
3.225,312
3.087,204
2.991,02

2.3.3. Forecast distribution
As described in part 1.3.3 Chapter 1, Figure 2.3.2 represents 10 forecast distributions for
VNIndex value. We see the forecasts distribution move toward stop distribution very fast.

Figure 2.3.2. VN-Index data: forecast distribution of 10 following days.
Thus, the HMM model with distributions is certainly suitable with the prediction in some
cases, especially with the data it actually fits with the distribution selected in the model.
However, whether the time series generated by a random variable that fits into the normal


distribution (or mix with normal distributions) or any other distribution selected is a question
that will determine the appropriateness and accuracy level of the forecasts
2.3.4. Forecase state
Table 2.3.2. Maximum ability (probability) forecast for each state of 30 following days
stating from the last date 13/05/2011
Days
State=[1,]
[2,]
[3,]
[4,]

[1,]
[2,]
[3,]
[4,]
[1,]
[2,]
[3,]
[4,]
[1,]
[2,]
[3,]
[4,]
[1,]
[2,]
[3,]
[4,]

[,1]
0,0975
0,8062
0,0799
0,0162

[,2]
0,1695
0,6622
0,1351
0,0330

[,3]

0,2261
0,5517
0,1724
0,0496

[,4]
0,2709
0,4665
0,1971
0,0653

[,5]
0,3065
0,4005
0,2128
0,0800

[,6]
0,3350
0,3492
0,2223
0,0933

[,7]
0,3579
0,3092
0,2274
0,1053
[,14]
0,4355

0,1870
0,2200
0,1573
[,21]
0,4586
0,1593
0,2066
0,1754

[,8]
0,3764
0,2778
0,2296
0,1160
[,15]
0,4405
0,1803
0,2176
0,1613
[,22]
0,4604
0,1576
0,2053
0,1766
[,28]
0,4676
0,1517
0,2000
0,1805


[,9]
0,3915
0,2530
0,2298
0,1255
[,16]
0,4448
0,1749
0,2154
0,1647
[,23]
0,4619
0,1561
0,2041
0,1776

[,10]
0,4039
0,2334
0,2288
0,1338
[,17]
0,4484
0,1705
0,2133
0,1676
[,24]
0,4633
0,1549
0,2031

0,1784
[,29]
0,4684
0,1512
0,1995
0,1807

[,11]
0,4141
0,2177
0,2270
0,1410
[,18]
0,4515
0,1669
0,2113
0,1701
[,25]
0,4646
0,1539
0,2022
0,1791

[,12]
0,4225
0,2052
0,2248
0,1473
[,19]
0,4542

0,1639
0,2096
0,1722
[,26]
0,4657
0,1530
0,2014
0,1797
[,30]
0,4692
0,1507
0,1990
0,1809

[,13]
0,4296
0,1951
0,2224
0,1527
[,20]
0,4565
0,1614
0,2080
0,1739
[,27]
0,4667
0,1523
0,2007
0,1801


We see the highest probability in the first 7 days falls in state 2 and the next day falls into
the state.1 Therefore, the model is not effective in long term but good for short term. However,
we can forecast by continuously updating the data automatically. The dissertation will be
updated from 14/5/2011 to 23/6/2011 with 30 closing price of the stock to compare the forecast
value with the actual value of the data. Figure 2.3.4 shows that the value of these 30 days is
mostly in state 1. This proves that the forecast is correct.


Figure 2.3.3. VNIndex data: Comparison of forecast state and actual state.
2.4.

Result comparison

This dissertation presents forecast results of the HMM model with a number of models
[19] on some of the data as stock indexes. Due to the value characteristics of the growth
time series receiving real values, the HMM model with normal distribution is chosen. The
proposed model in this dissertation and the comparative model are performed on the same
training set and on the same test set to ensure the accuracy of the comparison. The accuracy
measurement used is the average percent error
(MAPE) calculated by:
n
MAPE 

ai  pi
1
*100%

n i 1 ai

Table 2.4.1. Multiple MAPE run by HMM model on Apple data

1,812
1,779

1,778
1,788

1,790
1,802

1,784
1,816

1,815
1,778

1,777
1,800

1,812
1,790

1,794
1,789

Mean: 1,795.
Accuracy mean 1,795 and average forecast value are illustrated in Firgure 2.4.1.
Figure 2.4.1. HMM forecast for Apple share price:actual-real price; predict-forecasted
price
Similar is Ryanair Airlines stock data from January 6, 2003 to January 17, 2005; IBM
Corporation from January 10, 2003 to January 21, 2005, and Dell Inc. from 10/01/2003 to

21/01/2005. A comparison of the MAPE accuracy measurement with 400 training
observations is shown in Table 2.5.2
Table 2.4.2. Comparison of the MAPE accuracy measurement with other models
Data
Apple
Ryanair
IBM
Dell

ARIMA model
1,801
1,504
0,660
0,972

ANN model
1,801
1,504
0,660
0,972

HMM model
1,795
1,306
0,660
0,863

Given the results in Table 2.4.2 we see the model HMM with normal distribution
provide higher forecast accuracy compared with other classic models ARIMA and ANN.



Chapter 3. EXTENSION OF HIGHER ORDER MARKOV
CHAIN MODEL AND FUZZY TIME CHAIN IN FORECASTING
3.1. Higher-order Markov chain
Assume that at each data point Ct in a given classified data series, take value in the set
I  1, 2,..., m and m is limited, i.e. the value set has m types or states. A Markov chain order k is
a random variable set that
Pr (Cn  cn | Cn1  cn1 ,..., C1  c1 )  Pr (Cn  cn | Cn1  cn1,..., Cnk  cnk )

In [30], Raftery proposes a higher-order Markov chain model (CMC). This model can be
written as follows:
k
P
(
C

c
|
C

c
,...,
C

c
)

i qc c
(3.1.1)


n
n
n 1
n 1
nk
nk
k
i 1
Whereas  i  1 , and Q  [qij ] isk a shift matrix with total column number is 1, then:
i 1
0   i qc c  1, cn , ci  I
(3.1.2)
n i

n i

i 1

3.1.1. Improved higher-order Markov model (IMC)
In this subsection, the dissertation presents the extension of the Raftery [30] model to a
more general higher-order Markov chain model by allowing Q to change as various degrees of
latency. Here we assume that the non-negative weight i that satisfies:
k


We have (3.1.1) re-written as:

i

i 0


1

(3.1.3)

k

Cn  k 1   i QCn  k 1i

(3.1.4)
Where Cn k 1i is the probability distribution of states at time (n  k  1  i) . Use (3.1.3)
and Q a transition probability matrix, we have each element Cn k 1 between 0 and 1 , and sum
of all elements is 1 . In Raftery model, not given  is not negative then conditions (3.1.2) are
added to ensure that Cn k 1 is probability distribution of all states.
i 1

Raftery model in (3.1.4) can be kgeneralized as below:
Cn  k 1   i Qi Cn k 1i
i 1
The total number of independent parameters
in the new model is (k  km2 ) .

(3.1.5)

3.1.2. Parameter estimation
In this section, the author presents effective methods for estimating parameters Qi and i
with i  1, 2,..., k. To estimate Qi , we can consider Qi as a matrix to transfer step i in
classification data sequence Cn . Given the classification data Cn , we can count transition
l to state(i ) j after i steps. Moreover, we can build the
frequency f jl(i ) in the sequence from(istate

 f11 )
f m1 
matrix to transfer i steps for sequence
 (i ) Cn as below:
(i ) 
f
fm2 
F (i )   (12i )
qˆ11
qˆm(i1) 
(i )
(i )
Given F , we receive estimation for

( i()i ) Qi  [ qlj ] as
( i )( ifollow:
)
ˆ
ˆ
q
q
f
f
12
m
2



ˆ

1
m
mm
Qi   (i )
m
 flj

Where
if  flj(i )(i 
 m(i )
0
qˆ1m (i ) l 1 qˆmm) 
(i )

qˆlj    flj
 l 1
0 other


We note that the complex calculations of the construction of F (i ) is the calculation of
O( L2 ) , where L is length of data sequence. So the total complexity of the construction F (i )ik1 is
the calculation of O(kL2 ) . Where k is number of latencies.
We now present the steps for estimating the parameters i as follow [15] that the
dissertation will use to embed the proposed combined model.
Given Cn  C when n move to infinity, then C can be estimated from sequence Cn by
calculating the occurrence of each statek in the sequence and we set it as Cˆ .
i Qi Cˆ  Cˆ

1
This gives us an estimation of ithe

parameters   (1 ,..., k ) as below. We consider the
following minimum problem:
k
min  ||  i Qi Cˆ  Cˆ ||
k
i 1
with condition that  i  1, và i  0, i
i 1
Here | . || is the standard
Vector. Special case, if select || . || , we have the below minimum
problem:
k
min  max l | [ i Qi Cˆ  Cˆ ]l |
k
i 1
with condition that  i  1, và i  0, i
i 1
Here [.]l identify
the l element of Vector. The difficulty here is the optimization to ensure
the existence of stable distribution C . Next, we consider the above minimum problem to be
constructed as a linear problem:
min  
 1 
 
with condition that
  
2 1
 ˆ
ˆ
ˆ

ˆ
ˆ
ˆ
ˆ
   C  [Q1C | Q2C | ... | QnC ]   
  Cˆ  [Qˆ Cˆ | Qˆ Cˆ | ... | Qˆ Cˆ ] 2
2
n
k 1
 
n 
    0,  i  1, và i  0, i 
i 1
 
 n 
We can solve the linear problem above and obtain the parameter i . Instead of solving a
min-max problem, we can also choose || . ||1m andk build the following minimum problem:
min   |[ i Qˆ i Cˆ  Cˆ ]l |
k
with condition that  i  1, và i  0, i l 1 i 1
i 1
The corresponding linear
problem is given below:m
min   l
 1 
 1 
l 1
with condition that



  

12  ˆ ˆ ˆ ˆ ˆ
21
ˆ
ˆ
   C  [Q1 X | Q2C | ... | QnC ]  
2  Cˆ  [Qˆ Cˆ | Qˆ Cˆ | ... | Qˆ Cˆ ] 2
 
2
n
k1
 m
 k 
 i  0, i,  i  1, và i  0, i 

i 1
 m linear

\In constructing the above
problems,
the number ofk variables is equal to k and the
number of conditions is equal (2m  1) . The complexity of solving linear problems is calculated
O(k 3 L) , where n is variable and L is the number of binary bits needed to store all the data
(conditions and target functions) [18].
3.2. Select fuzzy time series in the combined model
Consider time series with observations X1 , X 2 , , X T , with growth chain x1 , x2 , , xT ,
(defined immediately below). We want to classify growth into different states such as "slow",



"normal", "fast" or even more. However, each xt at a time t is unclear at what degree even
though we define levels clearly. This means, xt can vary from one level to another and different
levels of membership. Therefore, the fuzzy time series theory in Section 1.4 of Chapter 1 can do
this to classify the subset of xt (defined in the following section) into states that xt are
members. Assuming that these states follow a Markov chain, the Markov model gives us a
predicted future state. From the future state, the predicted value xt is calculated from the
previous definition of the fuzzy period .
3.2.1 Define and segment base sequence
Given the training set { yt }tN1 , we can define the base sequence for growth sapce by
U   min t{1,..., N } yt   ; max t{1,..., N } yt   
with   0 a positive number is chosen so that growth in the future does not exceed
max t{1,..., N } yt   . With each data we can select different  . However, select   1 can satisfy
all stock growths.
In order to make sequence U fuzzy into growth labels such as "fast increase," "slow
increase," "steady increase," or even k levels, the base sequence U is divided into k interval
(the simplest is divided into consecutive equal intervals) u1 , u2 ,..., uk . For example, if the zoning
of VN-Index (Vietnam stock index) is:
U  [0.0449, 0.0150]  [0.0150,0.0149] [0.0149,0.0448]

Then VN-Index results are coded as in Table 3.3.1
Table 3.2.1. Fuzzy growth chain
Date
04/11/2009
05/11/2009
06/11/2009
09/11/2009
10/11/2009
11/11/2009
...


xi
537,5
555,5
554,9
534,1
524,4
537,6
...

Index
-0,015997
-0,031866
-0,026580
0,054237
0,020036
0,002917
...

growth ( yi )
NA
0,0334883
-0,0010801
-0,0374842
-0,0181613
0,0251716
...

Code
NA
3

2
1
1
3
...

3.2.2 Fuzzy rule of time series
We identify fuzzy Ai , each Ai is assigned with a growth tag and identified on sprecific
paragraphs u1 , u2 ,..., uk . Then the fuzzy Ai can be described as:
Ai   Ai (u1 ) / u1   Ai (u2 ) / u2  ...   Ai (uk ) / uk
where  Ai is a member function of each u j , j  1,..., k in Ai , i  1,..., k .
Each fuzzy value of time series yt is re-calculated based on fuzzy rule  Ai . For example:
A1  1/ u1  0.5 / u2  0 / u3  ...  0 / uk
A2  0.5 / u1  1/ u2  0.5 / u3  ...  0 / uk
...
Ak  0 / u1  0 / u2  0 / u3  ...  1/ uk .

where yt  A2 is a unclear value, then the clear value is re-calculated based on fuzzy rule by:
1
yt  0.5m1  m2  0.5m3 ,
2
where m1 , m2 , m3 are midpoints of u1 , u2 , u3 respectively.


For different fuzy rules, the inversed rule is different.
3.3. Combined model of Markov chain and fuzzy time series
3.3.1 Combined model a first-order Markov chain
In this section, we describe in detail the combination of the Markov model - fuzzy time
series. This combination is illustrated in Figure 3.3.1. Details of each step are shown below:
Compute the

returns of the
traing set and
define the
discouse

Step 1

Make
partition of
the discource

Step 2

Fuzzify the
returns data

Step 3

Train the
higher –
order Markov
model for
fuzzy series

Step 4

Forecast the
encoded
series and
defuzzy into

forecast price

Step 5

Figure 3.3.1. Model structure
Step 1: Given observation data of a time series {x1 , x2 ,..., xT } growth chain of training data
is calculated as follows:
yt 

We have

xt 1  xt
,
xt

xt 1  (1  yt ).xt

Given Dmax and Dmin are the maximum value and minimum values of the growth chain
after removing the extraneous value, then the base sequence U  [ Dmin   , Dmax   ] where
  0 can be set as a threshold for the increase of changes.
Step 2: A partition of universe is generated in simplest way by devide [ Dmax , Dmin ] into
k  2 equal intervals. Then the universe U  u1  u2  ...  uk where u1  [ Dmin   , Dmin ] and
uk  [Dmax , Dmax   ].
Step 3: In this research, the linguistic terms A1 , A2 , A3 ,..., Ak of the time series representing
by fuzy sets, are also defined in the simplest way as follows:
A1  1/ u1  0.5 / u2  0 / u3  ...  0 / uk
A2  0.5 / u1  1/ u2  0.5 / u3  ...  0 / uk
...

Ak  0 / u1  0 / u2  0 / u3  ...  1/ uk


Each Ai is encoded by i for i {1, 2,..., k} . Therefore, if a datum of the time series belongs
to ui , it is encoded by i ( i {1, 2,..., k} ). We obtain a encoded time series {ct }Tt 1 , ct {1, 2,..., k}.
For instance, if a partition of discourse VN-Index (Vietnam stock index) as in part 3.2.1
Step 4: This step explains how Markov chains are applied in encoded time series.
Acording to section 3.2, we assume that the encoded time series is a Markov chain as defined in
Definition 1.2.1. To estimate the parameters of the Markov chain as in Section 1.2.3, it is easy
to estimate the transition probability matrix Γ  [γ ij ], i, j {1, 2,..., k}, where:
 ij  Pr (ct 1  j | ct  i)
If exist the state ct  i is the absorption state (see 1.2.1), to ensure regularity of Γ define
1
Pr (ct 1  j | ct  i )  with all j  1, 2,..., k. This means probability change from state i to any
k
other state is the same.


Step 5: Next we generate the one-step-ahead forecast for encoded time series and defuzzy
forecast fuzzy set into forecast value of the returns. Given ct , column Γ[, ct ] is the probability
ct 1  j, j  1, 2,..., k .
distribution
of
If
2
1
2
M  ( (m1  0.5m2 ), (0.5m1  m2  0,5m3 ), , (mk 1  0.5mk )) where mi is the middle value of
3
2
3
the interval ui then the forecast returns at t  k1 is calculated as below:

yˆt 1  Γ[, ct ]*M   a jct m j
j 1
In this step, the vector M can be selected
differently according to the fuzzy method in

Step 2. Finally, the x value is calculated as follows:
xˆt 1  ( yˆt  1)* xt

3.3.2 Extension of higher-order Markov chain
Higher-order Markov chain model combined with fuzzy time series is different from
Markov chain first-order model in Step 4 and Step 5.
Step 4: For the conventional higher order Markov model associated with fuzzy time series
(called CMC-Fuz), by maximizing the same in the first-order Markov chain model, it is easy to
estimate the transition probability matrix l  1 dimension Γ  [ i i ...i ], i j {1, 2,..., k} . As defined
in high-order Markov chain,  i i ...i is the observed probability ct 1 given the known ct ,..., ct l 1 :
 i i ...i  Pr (ct 1  il 1 | ct  il ,..., ct l 1  i1 )
For the new
combined high-order Markov model (called IMC-Fuz), transfer matrix is
l
m  m    i Qi as section 3.1.4.
1
Step i5:
Next, we generate a one-step forecast for the time series encoded by the transition
probabilities matrix and the inverse of the prediction value of the time series.
With model CMC-Fuz, given ct ,..., ct l 1 , column [, ct ,..., ct l 1 ] is probability distribution
of ct 1  j for all k encoded value j  1, 2,..., k . Forecast growth value at the time t  1 is
computed by:
k
xˆt 1  [, ct ,..., ct l 1 ]* M    jc ...c m j
j 1

With model IMC-Fuz, Forecastl growth value
at the time t  1 is calculated by:
xˆt 1   i Qi [, ct i 1 ]
1
Finally, value X t 1 forecast is icomputed
by:
ˆ
X t 1  ( xˆt  1)* X t
l 1 l

l 1 l

l 1 l

1

1

1

t

t l 1

Algorithm 3.1 Combined Markov - Fuzzy algorithm
Input: Data,   1, nTrain, nOrder , nStates
Output: predict , RMSE, MAPE, MAE

2:


Datat 1  Datat
, t  2,..., nTrain
Datat
Train <- Remove alien elements of yt

3:

Divide interval [min(Train)   ; max(Train)   ] into nStates equal interval Ak

4:

if

6:

if Model = CMC-Fuz then estimate transition matrix of model CMC_Fuz.

7:

for i in 1:nOrder do estimate matrix

1:

yt 

xt in Ak c then encodedt  k

Qi



8:
9:
10:
12:

if Model = IMC-Fuz then

C  counts(encoded ) / sum(counts)

 n

Solve optimum problem min-max min  max k   i Qi C  C  for i
nOrder
 i 1

IMC.Fuz1.Mat   i Qi
⊲ Estimate transition matrix of model IMC-Fuz based on stop
i 1

distribution

closet in testset do

13:

for

14:

if


15:

M  vector (2 / 3(mid ( A1 )  0.5mid ( A2 )),1/ 2(0.5mid ( A1 )  mid ( A2 )

closet in Ak then encodedt 1  k

⊲ encode new observation, t > nTrain

0,5mid ( A3 )),..., 2 / 3(0.5mid ( Ak 1 )  mid ( Ak ))) ⊲ reverse fuzzy rule with | ( Ai ) is middle point
of interval Ai
predictt  (transition.Mats[, encodedt 1 , encodedt 2 ,..., encodedt nOrder 1 ]%*%M  1)* Datat
17: errors (RMSE, MAPE, MAE)  f ( predictt  actualt ) ⊲ calculate accuracy
16:

Where, nTrain is the observation number in the training set; nOrder is order of highorder Markov chain and nStates is the number of states (the Ak ) in the model.
Thus, models CMC-Fuz and IMC-Fuz with order nOrder  1 coincides with the combined
model of first-order as in item 3.4.1. As a result, the experimental results for the first order
Markov chain model were performed simultaneously in the high-order Markov chain model.
3.3.3 Experimental result
Data collection
In order to compare results in [19, 20, 17, 26, 38, 33], we use similar data used in [40,
29, 7, 37]. Moreover, other different data are also used to check the accuracy of the model.
Details are given in Table 3.3.2
Table 3.3.2. Comparative data sets
Data Name

From

To


Frequency

Apple Computer Inc.

10/01/2003

21/01/2005

Daily

IBM Corporation

10/01/2003

21/01/2005

Daily

Dell Inc.

10/01/2003

21/01/2005

Daily

Ryanair Airlines

06/01/2003


17/01/2005

Daily

TAIEX (Taiwan exchange index)

01/01/2001

31/12/2009

Daily

SSE(Shanghai Stock Exchange)

21/06/2006

31/12/2012

Daily

DJIA( Dow Jones Industrial Average Index)

04/08/2006

31/08/2012

Daily

S&P500


04/08/2006

31/08/2012

Daily

Unemployment rate

01/01/1948

01/12/2013

Monthly

Australian electricity

01/01/1956

01/08/1995

Monthly

Poland Electricity Load From

1990‟s

1500 values

Daily



This study does not fixed sets of training and test data. Therefore readers can make
appropriate changes when they apply specific data. In many cases, the experimental results
show that the training data is between 75% and 85% for the best predicted result.
Results compared with other models
The first model to compare is the model in [19]. Training and test data sets on Apple inc.,
Dell comp., IBM cor., Ryanair Airlines are used similarly (nTrain = 400 ). British Airlines and
Delta Airlines are not compared due to datawarehouse on are not
complete and correspondent with [19].
Table 3.3.3. Compare MAPEs against other models
Stock

HMM-based

Fusion

Combination

CMC-Fuz

IMC-Fuz

forecasting

HMM-ANN-

of

model


model

model

GA

HMM-fuzzy

nStates =6

nStates =6

with weighted

model(MAPE)

nOrder =1

nOrder =2

average
(MAPE)
Ryanair Air.

1,928

1,377

1,356


1,275

1,271

Apple

2,837

1,925

1,796

1,783

1,783

IBM

1,219

0,849

0,779

0,660

0,656

Dell Inc.


1,012

0,699

0,405

0,837

0,823

Table 3.3.3, with nStates =6, we can see model IMC-Fuz with nOrder = 1 is better than
model CMC-Fuz with nOrder = 1. Both models are better than the models compared against 4
data in [19].
Comparative results are shown in Table 3.3.4. Comparison results of the IMC-Fuz and
CMC-Fuz models are slightly better than other models for SSE data and much better for the
DJIA and S & P500 data
Table 3.3.4. Compare different models using SSE, DJIA và S\&P500 data sets
Data

Measurement

IMC-Fuz

CMC-

BPNN

STNN


SVM

Fuz

PCA-

PCA-

BPNN

STNN

SSE
MAE

20,5491

20,4779

24,4385

22,8295

27,8603

22,4485

22,0844

RMSE


27,4959

27,4319

30,8244

29,0678

34,5075

28,6826

28,2975

MAPE

0,8750

0,8717

1,0579

0,9865

1,2190

0,9691

0,9540


MAE

90,1385

90,4159

258,4801 230,7871 278,2667 220,9163 192,1769

RMSE

123,2051 123,2051 286,6511 258,3063 302,793

250,4738 220,4365

MAPE

0,7304

0,7304

2,0348

1,8193

2,2677

1,7404

1,5183


MAE

10,4387

10,4387

24,7591

22,1833

22,9334

16,8138

15,5181

RMSE

14,2092

14,2092

28,1231

25,5039

25,9961

20,5378


19,2467

MAPE

0,8074

0,8074

1,8607

1,6725

1,7722

1,282

1,1872

DJIA

S&P500


×