Contents
xiii
PREFACE
xv,
ACKNOWLEDGMENTS
1
CHAPTER 1 INTRODUCTION
1.1
Why Neural Networks, and Why Now?
1.2
What Is a Neural Net? 3
1.2.1
1.2.2
1.3
Artificial Neural Networks, 3
Biological Neural Networks, 5
Where Are Neural Nets Being Used? 7
1.3.1
1.3.2
1.3.3
1.3.4
1.3.5
1.3.6
1.3.7
1.4
1
Signal Processing, 7
Control, 8
Pattern Recognition, 8
Medicine, 9
Speech Production, 9
Speech Recognition, 10
Business,
11
How Are Neural Networks Used?
1.4.1
1.4.2
1.4.3
1.4.4
11
Typical Architectures, 12
Setting the Weights, 15
Common Activation Functions, 17
Summary of Notation, 20
vii
Preface
There has been a resurgence of interest in artificial neural networks over the last
few years, as researchers from diverse backgrounds have produced a firm theoretical foundation and demonstrated numerous applications of this rich field of
study. However, the interdisciplinary nature of neural networks complicates the
development of a comprehensive, but introductory, treatise on the subject. Neural
networks are useful tools for solving many types of problems. These problems
may be characterized as mapping (including pattern association and pattern classification), clustering, and constrained optimization. There are several neural networks available for each type of problem. In order to use these tools effectively
it is important to understand the characteristics (strengths and limitations) of each.
This book presents a wide variety of standard neural networks, with diagrams of the architecture, detailed statements of the training algorithm, and several examples of the application for each net. In keeping with our intent to show
neural networks in a fair but objective light, typical results of simple experiments
are included (rather than the best possible). The emphasis is on computational
characteristics, rather than psychological interpretations. TO illustrate the similarities and differences among the neural networks discussed, similar examples
are used wherever it is appropriate.
Fundamentals of Neural Networks has been written for students and for
researchers in academia, industry, and govemment who are interested in using
neural networks. It has been developed both as a textbook for a one semester,
or two quarter, Introduction to Neural Networks course at Florida Institute of
Technology, and as a resource book for researchers. Our course has been developed jointly by neural networks researchers from applied mathematics, comxiii
Xiv
Preface
puter science, and computer and electrical engineering. Our students are seniors,
or graduate students, in science and engineering; many work in local industry.
It is assumed that the reader is familiar with calculus and some vector-matrix
notation and operations. The mathematical treatment has been kept at a minimal
level, consistent with the primary aims of clarity and correctness. Derivations,
theorems and proofs are included when they serve to illustrate the important
features of a particular neural network. For example, the mathematical derivation
of the backpropagation training algorithm makes clear the correct order of the
operations. The level of mathematical sophistication increases somewhat in the
later chapters, as is appropriate for the networks presented in chapters 5, 6, and
7. However, derivations and proofs (when included) are presented at the end of
a section or chapter, SO that they can be skipped without loss of continuity.
The order of presentation of the topics was chosen to reflect increasing
complexity of the networks. The material in each chapter is largely independent,
SO that the chapters (after the first chapter) may be used in almost any order
desired. The McCulloch-Pitts neuron discussed at the end of Chapter 1 provides
a simple example of an early neural net. Single layer nets for pattern classification
and pattern association, covered in chapters 2 and 3, are two of the earliest applications of neural networks with adaptive weights. More complex networks,
discussed in later chapters, are also used for these types of problems, as well as
for more general mapping problems. Chapter 6, backpropagation, can logically
follow chapter 2, although the networks in chapters 3-5 are somewhat simpler in
structure. Chapters 4 and 5 treat networks for clustering problems (and mapping
networks that are based on these clustering networks). Chapter 7 presents a few
of the most widely used of the many other neural networks, including two for
constrained optimization problems.
Algorithms, rather than computer codes, are provided to encourage the
reader to develop a thorough understanding of the mechanisms of training and
applying the neural network, rather than fostering the more superficial familiarity
that sometimes results from using completely developed software packages. For
many applications, the formulation of the problem for solution by a neural network
(and choice of an appropriate network) requires the detailed understanding of the
networks that cornes from performing both hand calculations and developing computer codes for extremely simple examples.
Acknowledgments
Many people have helped to make this book a reality. 1 can only mention a few
of them here.
1 have benefited either directly or indirectly from short courses on neural
networks taught by Harold Szu, Robert Hecht-Nielsen, Steven Rogers, Bernard
Widrow, and Tony Martinez.
My thanks go also to my colleagues for stimulating discussions and encouragement, especially Harold K. Brown, Barry Grossman, Fred Ham, Demetrios Lainiotis, Moti Schneider, Nazif Tepedelenlioglu, and Mike Thursby.
My students have assisted in the development of this book in many ways;
several of the examples are based on student work. Joe Vandeville, Alan Lindsay,
and Francisco Gomez performed the computations for many of the examples in
Chapter 2. John Karp provided the results for Example 4.8. Judith Lipofsky did
Examples 4.9 and 4.10. Fred Parker obtained the results shown in Examples 4.12
and 4.13. Joseph Oslakovic performed the computations for several of the examples in Chapter 5. Laurie Walker assisted in the development of the backpropagation program for several of the examples in Chapter 6; Ti-Cheng Shih did the
computations for Example 6.5; Abdallah Said developed the logarithmic activation
function used in Examples 6.7 and 6.8. Todd Kovach, Robin Schumann, and
Hong-wei Du assisted with the Boltzmann machine and Hopfield net examples
in Chapter 7; Ki-suck Yoo provided Example 7.8.
Several of the network architecture diagrams are adapted from the original
publications as referenced in the text. The spanning tree test data (Figures 4.11,
4.12, 5.11, and 5.12) are used with permission from Springer-Verlag. The illustrations of modified Hebbian learning have been adapted from the original pubxv
xvi
Acknowledgments
lications: Figure 7.10 has been adapted from Hertz, Krogh, Palmer, Introduction
to the Theory of Neural Computation, @ 1991 by Addison-Wesley Publishing
Company, Inc. Figure 7.11 has been adapted and reprinted from Neural Networks,
Vol. 5, Xu, Oja, and Suen, Modified Hebbian Leaming for Curve and Surface
Fitting, pp. 441-457, 1992 with permission from Pergamon Press Ltd, Headington
Hi11 Hall, Oxford 0X3 OBW, UK. Several of the figures for the neocognitron are
adapted from (Fukushima, et al., 1983); they are used with permission of IEEE.
The diagrams of the ART2 architecture are used with permission of the Optical
Society of America, and Carpenter and Grossberg. The diagrams of the simple
recurrent net for learning a context sensitive grammar (Servan-Schreiber, et al.,
1989) are used with the permission of the authors.
The preparation of the manuscript and software for the examples has been
greatly facilitated by the use of a Macintosh IIci furnished by Apple Computers
under the AppleSeed project. 1 thank Maurice Kurtz for making it available to
me.
1 appreciate the constructive and encouraging comments of the manuscript
reviewers: Stanley Ahalt, The Ohio State University; Peter Anderson, Rochester
Institute of Technology; and Nirmal Bose, Penn State University.
1 would like to thank the Prentice-Hall editorial staff, and especially Rick
DeLorenzo, for their diligent efforts to produce an accurate and attractive product
within the inevitable time and budget constraints.
But first, last, and always, 1 would like to thank my husband and colleague,
Don Fausett for introducing me to neural networks, and for his patience, encouragement, and advice when asked, during the writing of this book (as well as
other times).
,
OF
FUNDAMENTALS
NEURAL NETWORKS
1.1 WHY NEURAL NETWORKS AND WHY NOW?
As modern computers become ever more powerful, scientists continue to be challenged to use machines effectively for tasks that are relatively simple for humans.
Based on examples, together with some feedback from a “teacher,” we learn
easily to recognize the letter A or distinguish a cat from a bird. More experience
allows us to refine our responses and improve our performance. Although eventually, we may be able to describe rules by which we can make such decisions,
these do not necessarily reflect the actual process we use. Even without a teacher,
we can group similar patterns together. Yet another common human activity is
trying to achieve a goal that involves maximizing a resource (time with one’s
family, for example) while satisfying certain constraints (such as the need to earn
a living). Each of these types of problems illustrates tasks for which computer
solutions may be sought.
Traditional, sequential, logic-based digital computing excels in many areas,
but has been less successful for other types of problems. The development of
artificial neural networks began approximately 50 years ago, motivated by a desire
to try both to understand the brain and to emulate some of its strengths. Early
1