Tải bản đầy đủ (.pdf) (117 trang)

Self organizing maps based hybrid approaches to short term load forecasting

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.51 MB, 117 trang )

Chapter 1 Introduction
This first chapter offers a general description of the short term load forecasting (STLF) problem, and its
significance for the power industry. Then the two main approaches to STLF – statistical approach and
artificial neural networks approach are introduced and detailed, followed by the motivation for this thesis
and contribution of this thesis. Finally there is a bibliographic review of the methods for STLF from these
two disciplines, and then the structure of this thesis is explained.

1.1 Load Forecasting
Load forecasting has always been an issue of major interest for the electricity
industry. During the operation of a power system, the system response closely follows the
load requirements. So when there is an increase or decrease in the load demand, then the
power generation has to be increased or decreased accordingly. To be able to provide this
on-demand power generation, the electric utility operator needs to have available
sufficient quantity of generation resources. Thus, if the operator has some a priori
knowledge of the load requirements in the future, he can optimally allocate the generation
resources.
There are three kinds of load forecasting: short term, medium term, and long term
forecasts. Utility operators need to perform all the three forecasts, as they influence
different aspects of the power supply chain. Short term load forecasting typically means
forecasts for one hour to one week, and are needed for the daily operation of the power
system. Medium term forecasts typically cover one week to one year ahead, and are
needed for fuel supply planning and maintenance. Long term load forecasts usually cover
a period longer than a year, and are needed for power system planning.

1.2 Importance of Short Term Load Forecasting
Short term load forecasting (STLF) is the keystone of the operation of today’s
power systems. Without access to good short term forecasts, it would be impossible for
any electric utility to be able to operate in an economical, reliable and secure manner.
The input data for load flow studies and contingency analysis is provided by STLF.
Utilities need to perform these studies to calculate the generating requirements of each
1




generator in the system, to determine the line flows, to determine the bus voltages, and to
ensure that the system continues to operate reliably even in the case of contingencies such
as loss of a generator or of a line. STLF is also used by the utility engineers in other offline network studies, such as preparing a list of corrective actions for different types of
expected faults. Such corrective actions may include load shedding, switching off
interconnections and forming islands, starting up of peaking units or increasing the
spinning and standby reserves of the system [1]. Thus, the STLF is used by the system
operators and regulatory agencies to ensure the safe and reliable operation of the system,
and by the producers to ensure the optimal utilization of generators and power stations.
With the advent of deregulation and the rise of competitive electricity markets,
STLF has also become important for market operators, transmission owners and other
market participants [2]. As an accurate electricity price forecast is not possible without an
accurate load forecast, hence the operational plans and bidding strategies of the market
players require STLF as well. Forecast errors will have negative implications for the
company profits, and eventually for shareholder value.

1.3 Approaches to Short Term Load Forecasting
STLF methods, and more generally, time series prediction (TSP) methods can be
broadly divided into two categories: statistical methods and computational intelligence
(CI) methods.
1.3.1 Statistical Methods

1.3.1.1 Time Series Models
Modern statistical methods for time series prediction can be said to begun in 1927
when Yule came up with an autoregressive technique to predict the annual number of
sunspots. According to this model, the next-step value was a weighted average of previous
observations of the series. To model more interesting behavior from this linear system,
outside intervention in the form of noise was introduced. For the next half-century, the
reigning paradigm for predicting any time series remained that of a linear model added

with noise. The popular models developed during this period would include moving
2


average,

exponential

smoothing

methods,

Box-Jenkins

approach

to

modeling

autoregressive moving average (ARMA) models and autoregressive integrated moving
average (ARIMA) models. These models, referred together as time series models, assume
that the data is following a stationary pattern, i.e. the series is normally distributed with a
constant mean and variance over a long time period. They also assume that the series has
uncorrelated random error, and no outliers are present.
Applied for load forecasting, time series methods provide satisfactory results as long
as the variables affecting the load demand, such as environmental variables, do not change
suddenly. Whenever there is an abrupt change in such variables, the accuracy of the time
series models suffers. Also, the assumption of stationarity of the load series is rather
restricting, and whenever the historical load data deviates significantly from this

assumption, the forecasting accuracy decreases.

1.3.1.2 Regression Models
Regression methods are another popular tool for load forecasting. Here the load is
modeled as a linear combination of relevant variables such as weather conditions and day
type. Temperature is usually the most important factor for load forecasting among weather
variables, though its importance depends upon the kind of forecast and the type of climate.
For example, for STLF, temperature effects might be more critical for tropical regions
than temperate ones. Typically temperature is modeled in a nonlinear fashion. Other
weather variables such as wind velocity, humidity and cloud cover can be included in the
regression model to obtain higher accuracy. Clearly, no two utilities are the same, and a
detailed case study analysis of the different geographical, meteorological, and social
factors affecting the load demand needs to be carried out before proceeding with the
regression methods. Once the variables have been determined, the coefficients of these
variables can be estimated using least squares or other regression methods.
Though regression methods are popular tools for STLF among electric utilities, they
have their share of drawbacks. The relationship between the load demand and the
influencing factors is a nonlinear and complex one, and developing an accurate model is a
challenge. From on-site tests, it has been seen that the performance of regression methods
3


deteriorates when the weather changes abruptly, leading to load deviation [3]. This
drawback occurs in particular because the model is linearized so as to obtain its
coefficients. But the load patterns are nonlinear; hence a linearized model fails to
represent the load demand accurately during certain distinct time periods.

1.3.1.3 Kalman Filtering Based Models
Towards the end of 1980s, as computers became more powerful, it became possible
to record longer time series and apply more complex algorithms to them. Drawing on

ideas from differential topology and dynamical systems, it was possible to represent a time
series as being generated by deterministic governing equations. This approach of Kalman
filtering techniques characterizes dynamical systems by a state-space representation. The
theory of Kalman filtering provides an efficient computational (recursive) means to
estimate the state of a process, in a way that minimizes the mean of the squared error. The
filter supports estimation of the past, present and even future states, and it can do so even
when the precise nature of the modeled system is unknown [4]. A significant challenge in
the use of Kalman filtering based methods is the estimation of the state-space model
parameters.

1.3.1.4 Non-linear Time Series Models
To overcome the limitations of the linear time series models, a second generation of
non-linear statistical time series models has been developed. Some of the models, such as
autoregressive conditional heteroscedatic (ARCH) and generalized autoregressive
conditional heteroscedatic (GARCH) attempt the model the variance of the time series as a
function of its past values. These models achieved limited success for STLF since they
were mostly specialized for particular problems in particular domain, example volatility
clustering in financial indices.
Regime-switching models, developed first for econometrics, are slowly being
successfully applied for STLF as well. As the name suggests, these models involve
switching between a finite number of linear regimes. The models only differ in their
assumptions about the stochastic process generating the regime.
4


i.

The mixture of normal distributions model has state transition probabilities which
are independent of the history of the regime. Compared to a single normal
distribution, this approach is better able to model fatter-than-normal tails and

skewness [5].

ii.

In the Markov-switching model, the switching between two or more regimes is
governed by a discrete-state homogeneous Markov chain [6]. So in a possible
formulation of the Markov-switching model, the model can be divided into two
parts, firstly a regressive model to regress the model variable over hidden state
variables, and secondly, an autoregressive model to describe the hidden state
variables.

iii.

In the threshold autoregressive (TAR) model [7][8], the switching between two or
more linear autoregressive models is governed by an observable variable, called
the threshold variable. In the case where this threshold variable is a lagged value
of the time series, the model is called a self-exciting threshold autoregressive
(SETAR) model.

iv.

In the smooth transition autoregressive (STAR) model, the switching is governed
by an observable threshold variable, similar to TAR model, but a smooth
transition between the two regimes is enforced.
As a few of these non-linear time series model form the basis of the hybrid models

proposed in this work, they are explained in detail in Chapter 2.
1.3.2 Computational Intelligence Methods
The deregulated markets and the constant need to improve the accuracy of load
forecasting have forced the electricity utility operators to focus much attention to

computational intelligence based forecasting methods. It has been calculated in [9] that a
reduction of 1% of forecasting error could save up to $1.6 million annually for a utility.
Computational intelligence techniques broadly fall into four classes – expert
systems, fuzzy logic systems, neural networks and evolutionary computation systems. A
brief introduction to these four approaches is provided.
5


1.3.2.1 Expert Systems
An expert system is a computer program which simulates the judgment and behavior
of a human or an organization that has expert knowledge and experience in a particular
field. Typically an expert system would comprise four parts: a knowledge base, a data
base, an inference mechanism, and a user interface. For STLF, the knowledge base is
typically a set of rules represented in the IF-THEN form, and can consist of relationships
between the changes in the load demand and changes in factors which affect the use of
electricity. The data base is typically a collection of facts provided by the human experts
after interviewing them, and also facts obtained using the inference mechanism of the
system. The inference mechanism is the “thinking” part of the expert system, because it
makes the logical decisions using the knowledge from the knowledge base and
information from the data base. Forward chaining and backward chaining are two popular
reasoning mechanisms used by the inference mechanism [10].
In terms of advantages, the expert systems can be used to take decisions when the
human experts are unavailable, thus reducing the work burden of human experts. When
human experts retire, their knowledge can still be retained in these systems.

1.3.2.2 Fuzzy Logic Systems
Fuzzy systems are knowledge-based software environments which are constructed
from a collection of linguistic IF-THEN rules, and realize nonlinear mapping which has
interesting mathematical properties of “low-order interpolation” and “universal function
approximation”. These systems facilitate the design of reasoning mechanism of partially

known, nonlinear and complex processes.
A fuzzy logic system comprises of four parts – fuzzifier, fuzzy inference engine,
fuzzy rule base and defuzzifier. The system takes the crisp input value, which is then
fuzzified (i.e. converted into corresponding membership grade in the input fuzzy sets),
thereafter it is fed to the fuzzy inference engine. Using the stored IF-THEN fuzzy rules
from the rule base, the inference engine produces a fuzzy output that undergoes further
defuzzification to result in crisp output.
6


Fuzzy logic is often combined with other computational intelligence methods such
as expert systems and neural networks.

1.3.2.3 Artificial Neural Networks (ANN)
Artificial neural networks are massively parallel, distributed processing systems
built on the analogy to the human neural network – the fundamental information
processing system. Generally speaking, the practical use of neural networks has been
recognized mainly because of such distinguished features as
i.

general nonlinear mapping between a subset of the past time series values and the
future time series values

ii.

the capability of capturing essential functional relationships among the data, which
is valuable when such relationships are not known a priori or are very difficult to
describe mathematically and/or when the collected observation data are corrupted
by noise


iii.

universal function approximation capability that enables modeling of arbitrary
nonlinear continuous function to any degree of accuracy

iv.

capability of learning and generalizing from examples using the data-driven selfadaptive approach [11]
In fact, there are several kinds of ANN models. Every neural network model can be

classified by its architecture, processing and training. The architecture describes the neural
connections. Processing describes how the network produces output for every input and
weight. The training algorithm describes how the neural network adapts its weight for
every training vector.
The multilayer perceptron (MLP) is one of the most researched network
architecture. It is a supervised learning neural architecture, and it has been very popular
for time series prediction in general, and STLF in particular. This is because in its simplest
form, a TSP problem can be rewritten as a supervised learning problem, with the current
and past values of the time series as the input values to the network, and the one-step7


ahead value as the output value. This formulation allows one to explore the universal
function approximation and subsequent generalization capability of the MLP. The radial
basis function (RBF) network is another popular supervised learning architecture which
can also be used for the same purposes.
The Self-Organizing Map (SOM) is an important unsupervised learning neural
architecture, which is based on unsupervised competitive-cooperative learning paradigm.
In contrast to the supervised learning methods, SOM has not been popular for time series
prediction, or STLF. This mostly is because the SOM is traditionally viewed as a data
vector quantization and clustering algorithm [12][13] less suitable for function

approximation by itself. Hence when used for TSP, the SOM is usually used in a hybrid
model, where the SOM is first used for clustering, and subsequently another function
approximation method such as MLP or support vector regression (SVR) is used to learn
the function.
As the MLP and SOM form the basis of the work proposed in this thesis, they are
reviewed in greater detail in Chapter 3.

1.3.2.4 Evolutionary Approach
The algorithms developed under the common term of evolutionary computation are
inspired from the study of evolutionary behavior of biological processes. They are mainly
based on selection of a population as a possible initial solution of a given problem.
Through stepwise processing of the initial population using evolutionary operators, such
as crossover, recombination, selection and mutation, the fitness of the initial population
steadily improves.
Consider how a genetic algorithm might be applied to load forecasting. First an
appropriate model (either linear or nonlinear) is selected and an initial population of
candidate solutions is created. A candidate solution is produced by randomly choosing a
set of parameter values for the selected forecasting model. Each solution is then ranked
based on its prediction error over a set of training data. A new population of solutions is
generated by selecting fitter solutions and applying a crossover or mutation operation.
8


New populations are created until the fittest solution has a sufficiently small prediction
error or repeated generations produce no reduction of error.
1.3.3 Hybrid Approaches
Hybrid models have been proposed to overcome the inadequacies of using an
individual model, either a statistical method or a computational intelligence method. Also
referred to as ensemble methods or combined methods, these models usually are
employed to improve the prediction accuracy. There is still an absence of good theory in

this field on how to proceed with hybridization, though trends are emerging. Broadly
speaking, hybrid methods can be implemented in three different ways: linear models,
nonlinear models and both linear and nonlinear models.
In linear hybridization, two or more linear statistical models are combined together.
Though some work has been done in this field, as discussed in the section on literature
review, this field did not really pick up because a linear hybrid model would still suffer
from many of the problems with which linear models suffer.
The most heavily researched hybrid models would be those involving two nonlinear
models, especially two computational intelligence models. This is because the three
popular CI models - ANNs, fuzzy logic and evolutionary computation have their own
capabilities and restrictions, which are usually complimentary to each other. So for eg, the
black-box modeling approach of neural networks might be well suited for process
modeling or for intelligent control, but not that suitable for decision control. Similarly the
fuzzy logic systems can easily handle imprecise data and explain their decisions in the
context of the available facts in linguistic form; however they cannot automatically
acquire the linguistic rules to make these decisions. It is these capabilities and restrictions
of individual intelligent technologies which have driven their fusion to create hybrid
intelligent systems which have been successfully applied for various complex problems,
including STLF.
The third class of hybrid models, which this thesis is about, involve one statistical
method and one computational intelligence method. Usually the CI method is a neural
network, chosen for their flexibility and powerful pattern recognition capabilities. But
9


when developed as a predictive model, neural networks become difficult to interpret due
to their black-box nature and it becomes hard to test the parameters for their statistical
significance. Hence, time series models, linear ones such as ARMA or ARIMA, or
nonlinear ones such as STAR are introduced in the hybrid model to handle the concern of
interpretation.


1.4 Motivation
Though a comfortable state of performance has been achieved for electricity load
forecasting, but market players will always bring in new dynamic bidding strategies,
which, coupled with price-dependent load shall introduce new variability and nonstationarity in the electricity load demand series. Besides, the stricter power quality
requirements and development of distributed energy resources are other reasons why the
modern power system will always require more advanced and more accurate load
forecasting tools.
Consider why a SOM based hybrid model is an appealing option. Though every
possible approach has been applied for STLF, the more popular ones are the time series
approaches and computational intelligence approaches of feed-forward neural networks.
An extensive literature review is done in Section 1.6. Both these approaches attempt to
build a single global model to describe the load dynamics. The difference between time
series approaches and supervised learning neural networks is that while time series
approaches build an exact model of the dynamics (“hard computing”), the supervised
learning neural networks allow some tolerance for imprecision and uncertainty to achieve
tractability and robustness (“soft computing”). However, there is an exciting alternative to
building a global model, which is to build local models for the series dynamics, where
each local model handles a smaller section of the series dynamics. This is definitely an
area which needs further study, because a time series such as load demand series shows
various stylized facts, discussed further in Chapter 4. The complexity of a global model
increases a lot if it is to handle all the stylized facts. Working with multiple local models
might bring down the complexity. On the other hand, the challenges faced in working with
local models are manifold. Firstly, what factors should decide the division of the series
10


dynamics into local models. Secondly, how do we combine the results from multiple local
models to give the final prediction value?
In this thesis, SOM based hybrid models are proposed to explore the abovementioned idea of local models. As mentioned earlier, SOMs have been less applied to

STLF traditionally, which mostly has to do with the prevalent attitude among researchers
that SOMs are an unsupervised learning method, suitable only for data vector quantization
and clustering [12][13]. But this same property of clustering makes SOMs an excellent
tool for building local models.
Another motivation for this thesis is to further explore the idea of transitions
between local models. Once the local models have been built, how does the transition
from one model to another take place? Is it a sudden jump, where a local model M1 was
being used to describe the series on a particular day and a different local model M2 is
being used for the next day? After analyzing the electricity load demand series, it was
found that regimes were present in the series, due to season effects and market effects, and
the transition between these regimes was smooth. Hence a sudden jump from one local
model to another local model might not be the best approach. Hence this thesis studies the
NCSTAR model in Chapter 6, which allows smooth transition between local models. The
idea is to be able to obtain a highly accurate learning and prediction not only for test
samples which clearly belong to a particular local model, but also for test samples which
represent the transition from one local model to another local model.
Earlier researchers have proposed working with local models for STLF in different
ways, and to attain different aims. For eg., in [14], the same wavelet-based neural network
is trained four times over different periods in the year to handle the four seasons. But this
paper does not consider the transitions between the local models, i.e. the seasons to be
smooth. Not much work has been done on enforcing smooth transition between regimes or
local models for STLF. After extensive literature review (please see Section 1.6), the only
paper which was found to be handling smooth transition between local models for
electricity load forecasting is [30]. So definitely more study has to be done on how to
identify local models, how to implement smooth transitions between local models, and
11


how introducing the smooth transition will affect the prediction accuracy of the overall
model for STLF. This is exactly what this thesis sets out to do.


1.5 Contribution of the Thesis
In this work, two SOM based hybrid models are proposed for STLF.
In the first model, a load forecasting technique is proposed which uses a weighted
SOM for splitting the past historical data into clusters. For the standard SOM, all the
inputs to the neural network are equally weighted. This is a drawback compared to other
supervised learning methods which have procedures to adjust their network weights, e.g.
back-propagation method for MLPs and pseudo-inverse method for RBFs. Hence, a
strategy is proposed which weighs the inputs according to their correlation with the
output. Once the training with the weighted SOM is complete, the time series has now
been divided into smaller clusters, one cluster for each neuron. Next, a local linear model
is built for each of these clusters using an autoregressive model, which helps to smoothen
the results.
In the second hybrid model, the aim is to allow for smooth transitions between the
local models. Here the model of interest is a linear model with time varying coefficients
which are the outputs of a single hidden layer feedforward neural network. The hidden
layer is responsible for partitioning the input space into multiple sub-spaces through
multivariate thresholds and smooth transition between the sub-spaces. Significant research
has already been done into the specification, estimation and evaluation of this model. In
this thesis, a new SOM-based method is proposed to smartly initialize the weights of the
hidden layer before the network training. First, a SOM network is applied to split the
historical data dynamics into clusters. Then the Ho-Kashyap algorithm is used to obtain
the equations of the hyperplanes separating the clusters. These hyperplanes' equations are
then used to smartly initialize the weights and biases of the hidden layer of the network.

1.6 Literature Survey
The two approaches to STLF, and TSP in general, statistical methods and CI
methods have already been discussed above, and their different sub-categories have been
12



introduced. Some of the approaches described, such as non-linear time series models,
SOMs, and MLPs are more relevant to the work in this thesis than other models. What
follows next is a bibliographical survey for methods in STLF, with more emphasis given
to methods relevant to work done in this thesis.
1.6.1 Statistical Methods
In the field of linear approach to time series, Box-Jenkins methodology is the most
popular approach to handling ARMA and ARIMA models, and consists of model
identification and selection, parameter estimation and model checking. Box-Jenkins
methodology is among the oldest methods applied to STLF. It was proposed in [15], and
further developed in [16]. With a more modern perspective, [17] is an influential text on
nonlinear time series models, including several of those described in Section 1.3.1.4.
ARMA and ARIMA continue to be very popular for STLF.
In [18], the load demand is modeled as the sum of the two terms, the first term
depending on the time of day and the normal weather pattern for that day, and the second
term being the residual term which models the random disturbances using an ARMA
model. Usually the Box-Jenkins models assume a Gaussian noise. In [19], the ARMAmodeling method proposed allows for non-Gaussian noise as well.

Other works which

use Box-Jenkins method for STLF are [20][21].
In [22], a periodic autoregression model is used to develop 24 seasonal equations,
using the last 48 load values within each equation. The motivation is that by following a
seasonal-modeling approach, it is possible to incorporate a priori information concerning
the seasonalities at several levels (daily, weekly, yearly, etc.) by appropriately choosing
the model structure and estimation method. In [23], an ARMAX model is proposed for
STLF, where the X represents an exogenous variable, temperature in this case. Actually
this is a hybrid model as it uses a computational intelligence method, paticle swarm
optimization to determine the order of the model as well as its coefficients instead of the
traditional Box-Jenkins approach.

An ARIMA model uses differencing to handle the non-stationarity of the series, and
then uses ARMA to handle the resulting stationary series. In [24], six methods are
13


compared for STLF, and ARIMA is found to be a suitable benchmark. In [25], a modified
ARIMA model is proposed. Basically this model not only takes past loads as input, but
also the estimates of past loads provided by human experts. Thus this model, in a sense,
incorporates the knowledge of experienced human operators. This method is shown to be
superior to both ANN and ARIMA.
Now consider the previous work in STLF on regime-switching models, i.e. nonlinear statistical time series model discussed earlier in Section 1.3.1.4. The threshold
autoregressive (TAR) model was proposed by [7] and [8]. In [26], a TAR model with
multiple thresholds is developed for load forecasting. This model chooses the optimum
number of thresholds is the one which minimizes the sum of threshold variances.
A generalization of the TAR model is the smooth transition autoregressive (STAR)
model, which was initially proposed in [27], and further developed in [28] and [29]. A
modified STAR model for load forecasting is proposed in [30] where temperature plays
the role of threshold variable. This method uses periodic autoregressive models to
represent the linear regime, as they better capture the fact that the autocorrelation at a
particular lag of one half-hour varies across the week. Such switching regime models have
also been proposed for electricity price forecasting [31] [32].
1.6.2 Computational Intelligence Methods
Four CI methods were introduced earlier in Section 1.3.2, but the following
literature review focuses mostly on neural networks, as these are the most popular
amongst the four for STLF, and also the most relevant to the work done in this thesis.
There are several kinds of ANN models, classified by their architecture, processing
and training. For STLF, the popular ones have been used, e.g. radial basis function
networks [33][34], self-organizing maps [35] and recurrent neural networks [36][37].
However the most popular network architecture is the multi-layer perceptron described in
Section 1.3.2.3, as its structure lends naturally to unknown function approximation. In

[38],

a fully connected

three-layer

feedforward

ANN is

implemented

with

backpropagation learning rule, and the input variables being historical hourly load data,
day of the week and temperature. In [3], a multi-layered feedforward ANN is developed
14


which takes three types of variables as inputs - season related inputs, weather related
inputs, and historical loads. In [39], electricity price is also considered as a main
characteristic of the load.

Other recent work involving MLP for STLF include

[40][41][34][42].
In [43], in order to reduce the neural network structure and learning time, a onehour-ahead load forecasting method is proposed which uses the correction of similar day
data. In this proposed prediction method, the forecasted load power is obtained by adding
a correction to the selected similar day data. In [44], weather ensemble predictions are
used for STLF. A weather ensemble prediction consists of multiple scenarios for a

weather variable. These scenarios are used to produce multiple scenarios for load
forecasts. In [45], network committee technique, which is a technique from the neural
network architecture, is applied to improve the accuracy of forecasting the next-day peak
load.
1.6.3 Hybrid Methods
Hybrid models combining statistical models and neural networks are rare for STLF,
though they have been proposed for other TSP fields. In [46], a hybrid ARIMA/ANN
model is proposed. Because of the complexity of a moving trend as well as a cyclic
seasonal variation, an adaptive ARIMA model is first used to forecast the monthly load
and then the forecast load of the ARIMA model is used as an additive input to the ANN.
The prediction accuracy of this approach is shown to be better than traditional methods of
time series models and regression methods. In [47], a recurrent neural network is trained
by features extracted from ARIMA analyses, and used for predicting the mid-term price
trend of the Taiwan stock exchange weighted index. In [48], again an ARIMA model and
neural network model are combined to forecast time series of reliability data with growth
trend, and the results are shown to be better than either of the component models. In [49],
seasonal ARIMA (SARIMA) model and the neural network MLP are combined to
forecast time series with seasonality.
It was mentioned earlier in Section 1.3.2.3 that a neural network can be
implemented for both, supervised as well as unsupervised learning. But unsupervised
15


learning architectures, such as SOMs have traditionally been used for data vector
quantization and clustering. Hence when used for TSP, the SOM is usually used in a
hybrid model, where the SOM is first used for clustering, and subsequently another
function approximation method such as MLP or support vector regression (SVR) is used
to learn the function. In [50][51][52], a two-stage adaptive hybrid network is proposed. In
the first stage, a SOM network is applied to cluster the input data into several subsets in an
unsupervised manner. In the next stage, support vector machines (SVMs) are used to fit

the training data of each subset in a supervised manner. In [53], profiling is done through
SOMs, followed by prediction through radial function networks. In [54], the first SOM
module is used to forecast normal and abnormal days, and the second MLP module is able
to make the load model sensitive weather factors such as temperature.
As was mentioned in Section 1.3.3, the most heavily researched hybrid models for
TSP in general involve those where both the component models are computational
intelligence methods. In [55], a real-time pricing type scenario is envisioned where energy
prices change on an hourly basis, and the consumer is able to react to those price signals
by changing his load demand. In [56], attention is paid to special days. An ANN provides
the forecast scaled load curve and fuzzy inference models give the forecast maximum and
minimum loads of the special day. Similarly, significant work has also been done on
hybridizing evolutionary algorithms with neural networks. In [57], a genetic algorithm is
used to tune the parameters of a neural network which is used for STLF. A similar
approach is presented in [58]. In [59], a fuzzy neural network is combined with a chaossearch genetic algorithm and simulated annealing, and is found to be able to exploit all the
original methods' advantages. Similarly, particle swarm optimization is a recent CI
approach which has been hybridized with other CI approaches such as neural networks
[60][61] and support vector machines [62] to successfully improve the prediction accuracy
for STLF.

1.7 Structure of the Thesis
The thesis consists of the following chapters.

16


In this first chapter, short term load forecasting was introduced. The two approaches
to short term load forecasting, statistical approach and computational intelligence based
approach, were introduced, and their hybrid methods were discussed. Relevant work from
past research was presented. Finally the motivation for this thesis, and its contributions
were presented.

In the second chapter, statistical methods for time series analysis are briefly
discussed. These include the more traditional Box-Jenkins methodology, Holt-Winters
exponential smoothing, and the more recent regime-switching models.
In the third chapter, two popular neural network models, multilayer perceptron for
supervised learning and self-organizing maps for unsupervised learning are described. The
architecture, the learning rule and relevant issues are presented.
In the fourth chapter, the stylized facts of the load demand series are presented. It is
necessary to understand the unique properties of the load demand series before any
attempt is made to model them.
In the fifth chapter, the first hybrid model is presented. First it is explained how an
unsupervised model such as a self-organizing map can be used for time series prediction.
Then the hybrid model, involving autocorrelation weighted input to the self-organizing
map and autoregressive model is explained, along with the motivation for weighing with
autocorrelation coefficients.
In the sixth chapter, the second hybrid model is proposed to overcome certain issues
with the first proposed model. The need for smooth transitions between regimes in the
load series is highlighted. The contribution of this paper, a novel method to smartly
initialize the weights of the hidden layer of the neural network model NCSTAR is
presented.
The final chapter concludes this thesis with some directions for future work.

17


Chapter 2 Statistical Models for Time Series Analysis
In this chapter, the classical tools for time series prediction are reviewed, and recent developments in
nonlinear modeling are detailed. First, the commonly used Box-Jenkins approach to time series analysis is
described. Then, another commonly used classical method, the Holt-Winters exponential smoothing
procedure is explained. Finally, an overview of the more recent regime-switching models is given.


2.1 Box-Jenkins Methodology
ARMA models, as described by the Box-Jenkins methodology, are a very rich class
of possible models. The assumptions for this class of models are (a) the series is stationary
or can be transformed to one using a simple transformation such as differencing (b) the
series follows a linear model.
The original Box-Jenkins modeling procedure involves an iterative three-stage
procedure of model identification, model estimation and model validation. Later work
[63] includes a preliminary stage for data preparation and a final stage for forecasting.


Data preparation can involve several sub-steps. If the variance of the series
changes with the level, then a transformation of the data, such as logarithms, might
be necessary to make it a homoscedastic (constant variance) series. Similarly, it
needs to be determined if the series is stationary, and if there is any significant
seasonality which needs to be modeled. Differencing approach enables to handle
stationarity and remove seasonality.



Model identification involves identifying the order of the autoregressive and
moving average terms to obtain a good fit to the data. Several graph based
approaches exist, which include the autocorrelation function and partial
autocorrelation function approaches, and new model selection tools such as
Akaike’s Information Criterion have been developed.



Model estimation involves finding the value of model coefficients in order to
obtain a good fit on the data. The main approaches are non-linear least squares and
maximum likelihood estimation.


18




Model validation involves testing the residuals. As the Box-Jenkins models
assume that the error term should follow a stationary univariate process, the
residuals should have nearly the properties of i.i.d. normal random variables. If the
assumptions are not satisfied, then a more appropriate model needs to be found.
The residual analysis should hopefully provide some clues on how to develop a
more appropriate model.

2.1.1 AR Model
An autoregressive model of order p ≥ 1 is defined as
Xt = b1Xt-1 + .. + bpXt-p + εt

(2.1)

where {εt} ~ N(0, σ2), also known as white noise. This model can be written as an AR(p)
process. The equation explicitly specifies the linear relationship between the current value
and its past values.
2.1.2 MA Model
A moving average model of order q ≥ 1 is defined as
Xt = εt + a1 εt-1 +…+ aq εt-q

(2.2)

where {εt} ~ N(0, σ2), or white noise. This model can be written as an MA(q) process. For
h < q, there is a correlation between Xt and Xt-h due to the fact that they depend on the

same error terms εt-j.
2.1.3 ARMA Model
Combining the AR and MA forms together gives the popular autoregressive moving
average ARMA model, which can be defined as
Xt = b1Xt-1 + .. + bpXt-p + εt + a1 εt-1 +…+ aq εt-q

(2.3)

where {εt} ~ N(0, σ2), or white noise, and (p,q) are the order of the models. ARMA
models are a popular choice for approximating various stationary processes.

19


2.1.4 ARIMA Model
An autoregressive integrated moving average ARIMA model is a generalization of
an ARMA model. A time series which needs to be differenced to be made stationary is
said to be an “integrated” version of a stationary series. So an ARIMA(p,q,d) process is
one where the series needs to be differenced d times to obtain an ARMA(p,q) process.
This model, as mentioned in Section 1.6.1, continues to popular for STLF, and has been
used as a benchmark in this work.

2.2 Holt Winters Exponential Smoothing Method
2.2.1 Introduction
Exponential smoothing is a procedure where the forecast is continuously revised in
the light of more recent experience. This method assigns exponentially decreasing weights
as the observation gets older. The method consists of three steps: deciding on the model to
use and setting initial values of model parameters, updating the estimates of model
parameters, and finally forecasting.
Single exponential smoothing, used for short-range smoothing, assumes that the data

fluctuates around a reasonably stable mean (no trend or seasonality). Double exponential
smoothing method is used when the data shows a trend. Finally, the method which is most
interesting for this thesis, triple exponential smoothing, also called Holt-Winters
smoothing, can handle both trend and seasonality.
There are two main Holt-Winters smoothing models, depending on the type of
seasonality – multiplicative seasonal model and additive seasonal model. The difference
between the two is that in the multiplicative case, the size of the seasonal fluctuations
varies, depending on the overall level of the series, whereas in the additive case, the series
shows steady seasonal fluctuations. So an additive seasonal model is appropriate for a
time series when the amplitude of the seasonal pattern is independent of the average level
of the series.

20


2.2.2 Model Set-up
Consider the case when the series exhibits additive seasonality. In this model, the
assumption is that the time series can be represented by the model
yt = b1 + b2t + St + εt

(2.4)

where
b1 is the base signal called the permanent component
b2 is a linear trend component, which may be deleted if necessary
St is a additive seasonal factor, such that for season length of L periods,



St = 0


1≤t ≤L

εt is the random error component
2.2.3 Notation Used for the Updating Process
Let the current deseasonalized level of the process at the end of period T be denoted
by RT. At the end of a time period t, let
R t be the estimate of the deseasonalized level.
Gt be the estimate of the trend.
St be the estimate of the seasonal component.

2.2.4 Procedure for Updating the Estimates of Model Parameters

2.2.4.1 Overall smoothing
R t = α (yt - St-L) + (1 – α) * ( R t-1 + Gt-1)

(2.5)

where 0 < α < 1 is a smoothing constant.
St-L is the seasonal factor for period T computed one season (L periods) ago.

Subtracting St-L from yt deseasonalizes the data so that only the trend component and the
prior value of the permanent component enter into the updating process for R t.
21


2.2.4.2 Smoothing of the trend factor
Gt = β * ( R t - R t-1) + (1 – β) * Gt-1

(2.6)


where 0 < β < 1 is another smoothing constant.
The estimate of the trend component is simply the smoothed difference between two
successive estimates of the deseasonalized level.

2.2.4.3 Smoothing of the seasonal component
St = γ * (yt - R t) + (1 – γ) * St-L

(2.7)

where 0 < γ < 1 is the third smoothing constant.
The estimate of the seasonal component is a combination of the most recently
observed seasonal factor given by the demand yt after removing the deseasonalized series
level estimate R t and the previous best seasonal factor estimate for this time period.
All the parameters in the method, α, β, and γ are estimated by minimizing the sum of
squared one step-ahead in-sample errors. The initial smoothed values for the level, trend
and seasonal components are estimated by averaging the early observations.

2.2.4.4 Value of forecast
The forecast for the next period is given by:
yt = R t-1 + Gt-1 + St-L

(2.8)

Note that the best estimate of the seasonal factor for this time period in the season is
used, which was last updated L periods ago.
2.2.5 Exponential smoothing for double seasonality
When dealing with daily load forecasting, the series shows only one significant
seasonality, which is the within-week cycle. Hence the above proposed method can be
satisfactorily applied in that scenario.


22


But when concerned with hourly load forecasting, there are two seasonalities, the
within-day cycle and the within-week cycle. To handle this double seasonality scenario,
[64] proposes an extension of the classical seasonal Holt-Winters smoothing method.
Using a new formulation where St and Tt denote the smoothed level and trend, Dt and Wt
are seasonal indices (intra-day and intra-week), s1 and s2 are the seasonal periodicity
lengths for intra-day and intra-week periods respectively, α, γ, δ, and ω are the smoothing
parameters, and ŷt(k) is the k-step-ahead forecast made from forecast origin t, then:
St = α

yt
+ (1 – α) (St-1 + Tt-1)
Dt-s Wt-s
1

(2.9a)

2

Tt = γ(St - St-1) + (1- γ) Tt-1
Dt = δ

yt
+ (1 – δ) Dt- s
St Wt-s

(2.9b)

(2.9c)

1

2

Wt = ω

yt
+ (1 – ω) Wt- s
St Dt-s

(2.9d)

2

1

ŷt(k) = (St + kTt) Dt- s

1

+k

Wt- s

2

+k


+ Φk(yt – ((St-1 + Tt-1) Dt- s Wt- s )
1

2

(2.9e)

The multiplicative seasonality formulation has been used here, though it is
mentioned in [64] that the additive seasonality gives similar results. The parameter
involving Φ is an adjustment for first-order autocorrelation.
In [65], a comparison of several univariate methods for STLF is presented. Besides
the exponential smoothing for double seasonality described above, the other methods
compared are double seasonal ARIMA model, artificial neural network, and a regression
method with principal component analysis. It is reported that in terms of mean absolute
percentage error (MAPE), the best approach is double seasonal exponential smoothing.
Hence in this paper, the standard Holt-Winters exponential smoothing has been used as a
benchmark for daily load forecasting, and the double seasonal exponential smoothing as
proposed in [64] has been used as a benchmark for hourly load forecasting.

23


2.3 Nonlinear Models
Regime-switching models were earlier mentioned briefly in Section 1.3.1.4.
Nonlinear models can prove to be better in terms of estimation and forecasting compared
to linear models because of their flexibility in capturing the characteristics of the data. In
this document, we will only be considering the threshold models.
In order to keep clarity within the various threshold models, a homogeneous
notation for all the models is described here.
Henceforth the following notation will be used:




yt is the value of a time series {yt} at time t



xɶ t ∈ ℜp is a p × 1 vector of lagged values of yt and/or some exogenous variables



xt ∈ ℜp+1 is defined as xt = [1, xɶ tT]T, where the first element is referred as an
intercept



The general nonlinear model is then expressed as
yt = Φ(xt ; ψ) + εt

(2.10)

where Φ(xt ; ψ) is a nonlinear function of the variable xt with parameter ψ, and {εt}
is a sequence of independently normally distributed random variables with zero
mean and variance σ2.



The logistic function which is used later on, when defined over the domain ℜp is
usually written as
f(γ(xt - β)) =


1
1 +exp (-γ ( xt - β ) )

(2.11a)

where γ or slope parameter determines the smoothness of the change between
models, i.e. the smoothness of the transition from one regime to another, and β can
be considered as the threshold which marks the regime switch. In its onedimensional form, it can be written as
24


f(γ(yt-d - c)) =

1
1 +exp (-γ (yt-d- c) )

(2.11b)

where yt-d is usually known as the transition or threshold variable, and d is called
the delay parameter.

2.3.1 Threshold Autoregressive Model (TAR)
To solve limitations of the linear approach, a threshold autoregressive (TAR) model
was proposed, which allows for a locally linear approximation over a number of regimes,
and it can be formulated as
k

yt =


∑ ωi xt I(st ∈ Ai) + εt

(2.12a)

i=1

k

=

∑ {ωi,0 + ωi,1 yt-1 + ωi,2 yt-2 +…+ ωi,p yt-p} I(st ∈ Ai) + εt

(2.12b)

i=1

where st is the threshold variable, I is an indicator (or step) function, ωi is the
autoregressive parameters for the ith linear regime, and {Ai} forms a partition of (-∞,∞)
k

with

k

∪ Ai = (-∞,∞) and

∩A

i=1


i=1

i

= ϕ, ∀ i ≠ j. So basically one of the autoregressive models

is activated, depending upon the value of the threshold variable st relative to the partitions
{Ai}.

2.3.2 Smooth Transition Autoregressive Model (STAR)
If one has good reason to believe that the transitions between the regimes are
smooth, and not discontinuous as assumed by TAR model, then one can choose the
smooth transition autoregressive (STAR) model. In this model, the indicator function I(.)
changes from a step function to a smooth function, such as sigmoid function, as in
Equation 2.11b. This STAR model with k regimes is defined as
k

yt =

∑ ωi xt Fi(st; γi, ci) + εt

(2.13)

i=1

25


×