Tải bản đầy đủ (.pdf) (516 trang)

Optimization and its applications in control and data sciences

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.38 MB, 516 trang )

Springer Optimization and Its Applications  115

Boris Goldengorin Editor

Optimization and
Its Applications in
Control and Data
Sciences
In Honor of Boris T. Polyak’s 80th
Birthday


Springer Optimization and Its Applications
VOLUME 115
Managing Editor
Panos M. Pardalos (University of Florida)
Editor–Combinatorial Optimization
Ding-Zhu Du (University of Texas at Dallas)
Advisory Board
J. Birge (University of Chicago)
C.A. Floudas (Texas A & M University)
F. Giannessi (University of Pisa)
H.D. Sherali (Virginia Polytechnic and State University)
T. Terlaky (Lehigh University)
Y. Ye (Stanford University)

Aims and Scope
Optimization has been expanding in all directions at an astonishing rate
during the last few decades. New algorithmic and theoretical techniques
have been developed, the diffusion into other disciplines has proceeded at a
rapid pace, and our knowledge of all aspects of the field has grown even more


profound. At the same time, one of the most striking trends in optimization
is the constantly increasing emphasis on the interdisciplinary nature of the
field. Optimization has been a basic tool in all areas of applied mathematics,
engineering, medicine, economics, and other sciences.
The series Springer Optimization and Its Applications publishes undergraduate and graduate textbooks, monographs and state-of-the-art expository work that focus on algorithms for solving optimization problems and
also study applications involving such problems. Some of the topics covered
include nonlinear optimization (convex and nonconvex), network flow
problems, stochastic optimization, optimal control, discrete optimization,
multi-objective programming, description of software packages, approximation techniques and heuristic approaches.

More information about this series at />

Boris Goldengorin
Editor

Optimization and Its
Applications in Control
and Data Sciences
In Honor of Boris T. Polyak’s 80th Birthday

123


Editor
Boris Goldengorin
Department of Industrial
and Systems Engineering
Ohio University
Athens, OH, USA


ISSN 1931-6828
ISSN 1931-6836 (electronic)
Springer Optimization and Its Applications
ISBN 978-3-319-42054-7
ISBN 978-3-319-42056-1 (eBook)
DOI 10.1007/978-3-319-42056-1
Library of Congress Control Number: 2016954316
© Springer International Publishing Switzerland 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG Switzerland


This book is dedicated to Professor
Boris T. Polyak on the occasion of his
80th birthday.



Preface

This book is a collection of papers related to the International Conference “Optimization and Its Applications in Control and Data Sciences” dedicated to Professor
Boris T. Polyak on the occasion of his 80th birthday, which was held in Moscow,
Russia, May 13–15, 2015.
Boris Polyak obtained his Ph.D. in mathematics from Moscow State University,
USSR, in 1963 and the Dr.Sci. degree from Moscow Institute of Control Sciences,
USSR, in1986. Between 1963 and 1971 he worked at Lomonosov Moscow State
University, and in 1971 he moved to the V.A. Trapeznikov Institute of Control
Sciences, Russian Academy of Sciences. Professor Polyak was the Head of Tsypkin
Laboratory and currently he is a Chief Researcher at the Institute. Professor Polyak
has held visiting positions at universities in the USA, France, Italy, Israel, Finland,
and Taiwan; he is currently a professor at Moscow Institute for Physics and
Technology. His research interests in optimization and control have an emphasis
in stochastic optimization and robust control. Professor Polyak is IFAC Fellow, and
a recipient of Gold Medal EURO-2012 of European Operational Research Society.
Currently, Boris Polyak’s h-index is 45 with 11807 citations including 4390 citations
since 2011.
This volume contains papers reflecting developments in theory and applications
rooted by Professor Polyak’s fundamental contributions to constrained and unconstrained optimization, differentiable and nonsmooth functions including stochastic
optimization and approximation, optimal and robust algorithms to solve many
problems of estimation, identification, and adaptation in control theory and its
applications to nonparametric statistics and ill-posed problems.
This book focus is on the recent research in modern optimization and its
implications in control and data analysis. Researchers, students, and engineers will
benefit from the original contributions and overviews included in this book. The
book is of great interest to researchers in large-scale constraint and unconstrained,
convex and non-linear, continuous and discrete optimization. Since it presents
open problems in optimization, game and control theories, designers of efficient
algorithms and software for solving optimization problems in market and data

analysis will benefit from new unified approaches in applications from managing
vii


viii

Preface

portfolios of financial instruments to finding market equilibria. The book is also
beneficial to theoreticians in operations research, applied mathematics, algorithm
design, artificial intelligence, machine learning, and software engineering. Graduate
students will be updated with the state-of-the-art in modern optimization, control
theory, and data analysis.
Athens, OH, USA
March 2016

Boris Goldengorin


Acknowledgements

This volume collects contributions presented within the International Conference
“Optimization and Its Applications in Control and Data Sciences” held in Moscow,
Russia, May 13–15, 2015 or submitted by an open call for papers to the book “Optimization and Its Applications in Control Sciences and Data Analysis” announced at
the same conference.
I would like to express my gratitude to Professors Alexander S. Belenky
(National Research University Higher School of Economics and MIT) and Panos
M. Pardalos (University of Florida) for their support in organizing the publication
of this book including many efforts with invitations of top researches in contributing
and reviewing the submitted papers.

I am thankful to the reviewers for their comprehensive feedback on every
submitted paper and their timely replies. They greatly improved the quality of
submitted contributions and hence of this volume. Here is the list of all reviewers:
1. Anatoly Antipin, Federal Research Center “Computer Science and Control”
of Russian Academy of Sciences, Moscow, Russia
2. Saman Babaie-Kafaki, Faculty of Mathematics, Statistics, and Computer
Science Semnan University, Semnan, Iran
3. Amit Bhaya, Graduate School of Engineering (COPPE), Federal University of
Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
4. Lev Bregman, Department of Mathematics, Ben Gurion University, Beer
Sheva, Israel
5. Arkadii A. Chikrii, Optimization Department of Controlled Processes,
Cybernetics Institute, National Academy of Sciences, Kiev, Ukraine
6. Giacomo Como, The Department of Automatic Control, Lund University,
Lund, Sweden
7. Xiao Liang Dong, School of Mathematics and Statistics, Xidian University,
Xi’an, People’s Republic of China
8. Trevor Fenner, School of Computer Science and Information Systems,
Birkbeck College, University of London, London, UK

ix


x

Acknowledgements

9. Sjur Didrik Flåm, Institute of Economics, University of Bergen, Bergen,
Norway
10. Sergey Frenkel, The Institute of Informatics Problems, Russian Academy of

Science, Moscow, Russia
11. Piyush Grover, Mitsubishi Electric Research Laboratories, Cambridge, MA,
USA
12. Jacek Gondzio, School of Mathematics The University of Edinburgh,
Edinburgh, Scotland, UK
13. Rita Giuliano, Dipartimento di Matematica Università di Pisa, Pisa, Italy
14. Grogori Kolesnik, Department of Mathematics, California State University,
Los Angeles, CA, USA
15. Pavlo S. Knopov, Department of Applied Statistics, Faculty of Cybernetics,
Taras Shevchenko National University, Kiev, Ukraine
16. Arthur Krener, Mathematics Department, University of California, Davis,
CA, USA
17. Bernard C. Levy, Department of Electrical and Computer Engineering,
University of California, Davis, CA, USA
18. Vyacheslav I. Maksimov, Institute of Mathematics and Mechanics, Ural
Branch of the Russian Academy of Sciences, Ekaterinburg, Russia
19. Yuri Merkuryev, Department of Modelling and Simulation, Riga Technical
University, Riga, Latvia
20. Arkadi Nemorovski, School of Industrial and Systems Engineering, Atlanta,
GA, USA
21. José Valente de Oliveira, Faculty of Science and Technology, University of
Algarve Campus de Gambelas, Faro, Portugal
22. Alex Poznyak, Dept. Control Automatico CINVESTAV-IPN, Mexico D.F.,
Mexico
23. Vladimir Yu. Protasov, Faculty of Mechanics and Mathematics, Lomonosov
Moscow State University, and Faculty of Computer Science of National
Research University Higher School of Economics, Moscow, Russia
24. Simeon Reich, Department of Mathematics, Technion-Israel Institute of
Technology, Haifa, Israel
25. Alessandro Rizzo, Computer Engineering, Politecnico di Torino, Torino, Italy

26. Carsten W. Scherer, Institute of Mathematical Methods in Engineering,
University of Stuttgart, Stuttgart, Germany
27. Alexander Shapiro, School of Industrial and Systems Engineering, Atlanta,
GA, USA
28. Lieven Vandenberghe, UCLA Electrical Engineering Department, Los
Angeles, CA, USA
29. Yuri Yatsenko, School of Business, Houston Baptist University, Houston, TX,
USA
I would like to acknowledge the superb assistance that the staff of Springer
has provided (thank you Razia Amzad). Also I would like to acknowledge help
in preparation of this book from Silembarasanh Panneerselvam.


Acknowledgements

xi

Technical assistance with reformatting some papers and compilation of this
book’s many versions by Ehsan Ahmadi (PhD student, Industrial and Systems Engineering Department, Ohio University, Athens, OH, USA) is greatly appreciated.
Finally, I would like to thank all my colleagues from the Department of Industrial
and Systems Engineering, The Russ College of Engineering and Technology, Ohio
University, Athens, OH, USA for providing me with a pleasant atmosphere to work
within C. Paul Stocker Visiting Professor position.


Contents

A New Adaptive Conjugate Gradient Algorithm
for Large-Scale Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Neculai Andrei


1

On Methods of Terminal Control with Boundary-Value
Problems: Lagrange Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Anatoly Antipin and Elena Khoroshilova

17

Optimization of Portfolio Compositions for Small and Medium
Price-Taking Traders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Alexander S. Belenky and Lyudmila G. Egorova

51

Indirect Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Daniel Berend and Luba Sapir
Lagrangian Duality in Complex Pose Graph Optimization . . . . . . . . . . . . . . . . . 139
Giuseppe C. Calafiore, Luca Carlone, and Frank Dellaert
State-Feedback Control of Positive Switching Systems with
Markovian Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Patrizio Colaneri, Paolo Bolzern, José C. Geromel,
and Grace S. Deaecto
Matrix-Free Convex Optimization Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Steven Diamond and Stephen Boyd
Invariance Conditions for Nonlinear Dynamical Systems . . . . . . . . . . . . . . . . . . . 265
Zoltán Horváth, Yunfei Song, and Tamás Terlaky
Modeling of Stationary Periodic Time Series by ARMA Representations
Anders Lindquist and Giorgio Picci


281

A New Two-Step Proximal Algorithm of Solving the Problem
of Equilibrium Programming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Sergey I. Lyashko and Vladimir V. Semenov
xiii


xiv

Contents

Nonparametric Ellipsoidal Approximation
of Compact Sets of Random Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Sergey I. Lyashko, Dmitry A. Klyushin, Vladimir V. Semenov,
Maryna V. Prysiazhna, and Maksym P. Shlykov
Extremal Results for Algebraic Linear Interval Systems . . . . . . . . . . . . . . . . . . . . 341
Daniel N. Mohsenizadeh, Vilma A. Oliveira, Lee H. Keel,
and Shankar P. Bhattacharyya
Applying the Gradient Projection Method
to a Model of Proportional Membership for Fuzzy Cluster Analysis . . . . . . 353
Susana Nascimento
Algorithmic Principle of Least Revenue for Finding Market Equilibria . . 381
Yurii Nesterov and Vladimir Shikhman
The Legendre Transformation in Modern Optimization . . . . . . . . . . . . . . . . . . . . 437
Roman A. Polyak


Contributors


Neculai Andrei Center for Advanced Modeling and Optimization, Research Institute for Informatics, Bucharest, Romania
Academy of Romanian Scientists, Bucharest, Romania
Anatoly Antipin Federal Research Center “Computer Science and Control” of
Russian Academy of Sciences, Dorodnicyn Computing Centre, Moscow, Russia
Alexander S. Belenky National Research University Higher School of Economics,
Moscow, Russia
Center for Engineering Systems Fundamentals, Massachusetts Institute of Technology, Cambridge, MA, USA
Daniel Berend Departments of Mathematics and Computer Science, Ben-Gurion
University, Beer Sheva, Israel
Shankar P. Bhattacharyya Department of Electrical and Computer Engineering,
Texas A&M University, College Station, TX, USA
Paolo Bolzern Politecnico di Milano, DEIB, Milano, Italy
Stephen Boyd Department of Electrical Engineering, Stanford University,
Stanford, CA, USA
Giuseppe C. Calafiore Politecnico di Torino, Torino, Italy
Luca Carlone Massachusetts Institute of Technology, Cambridge, MA, USA
Patrizio Colaneri Politecnico di Milano, DEIB, IEIIT-CNR, Milano, Italy
Grace S. Deaecto School of Mechanical Engineering, UNICAMP, Campinas,
Brazil
Frank Dellaert Georgia Institute of Technology, Atlanta, GA, USA
Steven Diamond Department of Computer Science, Stanford University, Stanford,
CA, USA
xv


xvi

Contributors

Lyudmila G. Egorova National Research University Higher School of Economics,

Moscow, Russia
José C. Geromel School of Electrical and Computer Engineering, UNICAMP,
Campinas, Brazil
Zoltán Horváth Department of Mathematics and Computational Sciences,
Széchenyi István University, Gy˝or, Hungary
Lee H. Keel Department of Electrical and Computer Engineering, Tennessee State
University, Nashville, USA
Elena Khoroshilova Faculty of Computational Mathematics and Cybernetics,
Lomonosov Moscow State University, Moscow, Russia
Dmitry A. Klyushin Kiev National Taras Shevchenko University, Kiev, Ukraine
Anders Lindquist Shanghai Jiao Tong University, Shanghai, China
Royal Institute of Technology, Stockholm, Sweden
Sergey I. Lyashko Department of Computational Mathematics, Kiev National
Taras Shevchenko University, Kiev, Ukraine
Daniel N. Mohsenizadeh Department of Electrical and Computer Engineering,
Texas A&M University, College Station, TX, USA
Susana Nascimento Department of Computer Science and NOVA Laboratory
for Computer Science and Informatics (NOVA LINCS), Faculdade de Ciências e
Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal
Yurii Nesterov Center for Operations Research and Econometrics (CORE),
Catholic University of Louvain (UCL), Louvain-la-Neuve, Belgium
Vilma A. Oliveira Department of Electrical and Computer Engineering, University of Sao Paulo at Sao Carlos, Sao Carlos, SP, Brazil
Giorgio Picci University of Padova, Padova, Italy
Roman A. Polyak Department of Mathematics, The Technion – Israel Institute of
Technology, Haifa, Israel
Maryna V. Prysiazhna Kiev National Taras Shevchenko University, Kiev, Ukraine
Luba Sapir Department of Mathematics, Ben-Gurion University and Deutsche
Telekom Laboratories at Ben-Gurion University, Beer Sheva, Israel
Vladimir V. Semenov Department of Computational Mathematics, Kiev National
Taras Shevchenko University, Kiev, Ukraine

Vladimir Shikhman Center for Operations Research and Econometrics (CORE),
Catholic University of Louvain (UCL), Louvain-la-Neuve, Belgium
Maksym P. Shlykov Kiev National Taras Shevchenko University, Kiev, Ukraine


Contributors

xvii

Yunfei Song Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, USA
Tamás Terlaky Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, USA


A New Adaptive Conjugate Gradient Algorithm
for Large-Scale Unconstrained Optimization
Neculai Andrei

This paper is dedicated to Prof. Boris T. Polyak on the occasion
of his 80th birthday. Prof. Polyak’s contributions to linear and
nonlinear optimization methods, linear algebra, numerical
mathematics, linear and nonlinear control systems are
well-known. His articles and books give careful attention to
both mathematical rigor and practical relevance. In all his
publications he proves to be a refined expert in understanding
the nature, purpose and limitations of nonlinear optimization
algorithms and applied mathematics in general. It is my great
pleasure and honour to dedicate this paper to Prof. Polyak, a
pioneer and a great contributor in his area of interests.

Abstract An adaptive conjugate gradient algorithm is presented. The search

direction is computed as the sum of the negative gradient and a vector determined by
minimizing the quadratic approximation of objective function at the current point.
Using a special approximation of the inverse Hessian of the objective function,
which depends by a positive parameter, we get the search direction which satisfies
both the sufficient descent condition and the Dai-Liao’s conjugacy condition. The
parameter in the search direction is determined in an adaptive manner by clustering
the eigenvalues of the matrix defining it. The global convergence of the algorithm is
proved for uniformly convex functions. Using a set of 800 unconstrained optimization test problems we prove that our algorithm is significantly more efficient and
more robust than CG-DESCENT algorithm. By solving five applications from the
MINPACK-2 test problem collection, with 106 variables, we show that the suggested
adaptive conjugate gradient algorithm is top performer versus CG-DESCENT.
Keywords Unconstrained optimization • Adaptive conjugate gradient method •
Sufficient descent condition • Conjugacy condition • Eigenvalues clustering •
Numerical comparisons

N. Andrei ( )
Center for Advanced Modeling and Optimization, Research Institute for Informatics, 8-10,
Averescu Avenue, Bucharest, Romania
Academy of Romanian Scientists, Splaiul Independentei Nr. 54, Sector 5, Bucharest, Romania
e-mail:
© Springer International Publishing Switzerland 2016
B. Goldengorin (ed.), Optimization and Its Applications in Control
and Data Sciences, Springer Optimization and Its Applications 115,
DOI 10.1007/978-3-319-42056-1_1

1


2


N. Andrei

1 Introduction
For solving the large-scale unconstrained optimization problem
minff .x/ W x 2 Rn g;

(1)

where f W Rn ! R is a continuously differentiable function, we consider the
following algorithm
xkC1 D xk C ˛k dk ;

(2)

where the step size ˛k is positive and the directions dk are computed using the
updating formula:
dkC1 D

gkC1 C ukC1 :

(3)

Here, gk D rf .xk /; and ukC1 2 Rn is a vector to be determined. Usually, in (2), the
steplength ˛k is computed using the Wolfe line search conditions [34, 35]:
f .xk C ˛k dk / Ä f .xk / C ˛k gTk dk ;
gTkC1 dk

gTk dk ;

(4)

(5)

where 0 < Ä < 1: Also, the strong Wolfe line search conditions consisting
of (4) and the following strengthened version of (5):
ˇ T
ˇ
ˇg dk ˇ Ä
kC1

gTk dk

(6)

can be used.
Observe that (3) is a general updating formula for the search direction computation. The following particularizations of (3) can be presented. If ukC1 D 0; then
we get the steepest descent algorithm. If ukC1 D .I r 2 f .xkC1 / 1 /gkC1 ; then the
1
Newton method is obtained. Besides, if ukC1 D .I BkC1
/gkC1 ; where BkC1 is an
2
approximation of the Hessian r f .xkC1 / then we find the quasi-Newton methods.
On the other hand, if ukC1 D ˇk dk ; where ˇk is a scalar and d0 D g0 ; the family
of conjugate gradient algorithms is generated.
In this paper we focus on conjugate gradient method. This method was introduced by Hestenes and Stiefel [21] and Stiefel [31], (ˇkHS D gTkC1 yk =yTk dk ), to
minimize positive definite quadratic objective functions ( Hereyk D gkC1 gk .)
This algorithm for solving positive definite linear algebraic systems of equations
is known as linear conjugate gradient. Later, the algorithm was generalized to
nonlinear conjugate gradient in order to minimize arbitrary differentiable nonlinear
functions, by Fletcher and Reeves [14], (ˇkFR D kgkC1 k2 =kgk k2 ), Polak and
Ribière [27] and Polyak [28], (ˇkPRP D gTkC1 yk =kgk k2 ), Dai and Yuan [10],



A New Adaptive Conjugate Gradient Algorithm

3

(ˇkDY D kgkC1 k2 =yTk dk ), and many others. An impressive number of nonlinear
conjugate gradient algorithms have been established, and a lot of papers have
been published on this subject insisting both on theoretical and computational
aspects. An excellent survey of the development of different versions of nonlinear
conjugate gradient methods, with special attention to global convergence properties,
is presented by Hager and Zhang [20].
In this paper we consider another approach to generate an efficient and robust
conjugate gradient algorithm. We suggest a procedure for ukC1 computation by
minimizing the quadratic approximation of the function f in xkC1 and using a
special representation of the inverse Hessian which depends on a positive parameter.
The parameter in the matrix representing the search direction is determined in an
adaptive manner by minimizing the largest eigenvalue of it. The idea, taken from
the linear conjugate gradient, is to cluster the eigenvalues of the matrix representing
the search direction.
The algorithm and its properties are presented in Sect. 2. We prove that the search
direction used by this algorithm satisfies both the sufficient descent condition and
the Dai and Liao conjugacy condition [11]. Using standard assumptions, Sect. 3
presents the global convergence of the algorithm for uniformly convex functions.
In Sect. 4 the numerical comparisons of our algorithm versus the CG-DESCENT
conjugate gradient algorithm [18] are presented. The computational results, for a
set of 800 unconstrained optimization test problems, show that this new algorithm
substantially outperform CG-DESCENT, being more efficient and more robust.
Considering five applications from the MINPACK-2 test problem collection [4],
with 106 variables, we show that our algorithm is way more efficient and more

robust than CG-DESCENT.

2 The Algorithm
In this section we describe the algorithm and its properties. Let us consider that at
the kth iteration of the algorithm an inexact Wolfe line search is executed, that is the
step-length ˛k satisfying (4) and (5) is computed. With these the following elements
sk D xkC1 xk and yk D gkC1 gk are computed. Now, let us take the quadratic
approximate of function f in xkC1 as
1
˚kC1 .d/ D fkC1 C gTkC1 d C dT BkC1 d;
2

(7)

where BkC1 is an approximation of the Hessian r 2 f .xkC1 / of functionf and d is the
direction to be determined. The search direction dkC1 is computed as in (3), where
ukC1 is computed as solution of the following minimizing problem
min ˚kC1 .dkC1 /:

ukC1 2Rn

(8)


4

N. Andrei

Introducing dkC1 from (3) in the minimizing problem (8), then ukC1 is obtained
as

1
BkC1
/gkC1 :

ukC1 D .I

(9)

Clearly, using different approximations BkC1 of the Hessian r 2 f .xkC1 / different
search directions dkC1 can be obtained. In this paper we consider the following
1
expression of BkC1
:
1
BkC1
DI

sk sTk
sk yTk yk sTk
C
!
;
k
yTk sk
yTk sk

(10)

where !k is a positive parameter which follows to be determined. Observe
1

that BkC1
is the sum of a skew symmetric matrix with zero diagonal elements
T
.sk yk yk sTk /=yTk sk ; and a pure symmetric and positive definite one I C !k sk sTk =yTk sk :
1
The expression of BkC1
in (10) is a small modification of the BFGS quasiNewton updating formula without memory. This is considered here in order to
get the sufficient descent and the conjugacy conditions of the corresponding search
direction. Now, from (9) we get:
Ä
ukC1 D

sk yTk yk sTk
yTk sk

!k

sk sTk
gkC1 :
yTk sk

(11)

1
: Therefore, using (11) in (3) the search direction can be
Denote HkC1 D BkC1
expressed as

dkC1 D


HkC1 gkC1 ;

(12)

where
HkC1 D I

sk sTk
sk yTk yk sTk
C
!
:
k
yTk sk
yTk sk

(13)

Observe that the search direction (12), where HkC1 is given by (13), obtained by
1
using the expression (10) of the inverse Hessian BkC1
; is given by:
Â
dkC1 D

gkC1 C

yTk gkC1
yTk sk


!k

sTk gkC1
yTk sk

Ã
sk

sTk gkC1
yk :
yTk sk

(14)

Proposition 2.1. Consider !k > 0 and the step length ˛k in (2) is determined by
the Wolfe line search conditions (4) and (5). Then the search direction (14) satisfies
the descent condition gTkC1 dkC1 Ä 0:


A New Adaptive Conjugate Gradient Algorithm

5

Proof. By direct computation, since !k > 0; we get:
gTkC1 dkC1 D

kgkC1 k2

2


!k

.gTkC1 sk /
Ä 0:
yTk sk

Proposition 2.2. Consider !k > 0 and the step length ˛k in (2) is determined by
the Wolfe line search conditions (4) and (5). Then the search direction (14) satisfies
the Dai and Liao conjugacy condition yTk dkC1 D vk .sTk gkC1 /; where vk 0:
Proof. By direct computation we have
"
yTk dkC1
where vk Á !k C

D
kyk k2
:
yTk sk

#
kyk k2
!k C T
.sTk gkC1 / Á
yk sk

vk .sTk gkC1 /;

By Wolfe line search conditions (4) and (5) it follows

that yTk sk > 0; therefore vk > 0:

Observe that, although we have considered the expression of the inverse Hessian
as that given by (10), which is a non-symmetric matrix, the search direction (14),
obtained in this manner, satisfies both the descent condition and the Dai and Liao
conjugacy condition. Therefore, the search direction (14) leads us to a genuine
conjugate gradient algorithm. The expression (10) of the inverse Hessian is only
a technical argument to get the search direction (14). It is remarkable to say that
from (12) our method can be considered as a quasi-Newton method in which the
inverse Hessian, at each iteration, is expressed by the non-symmetric matrix HkC1 :
More than this, the algorithm based on the search direction given by (14) can be
considered as a three-term conjugate gradient algorithm.
In this point, to define the algorithm the only problem we face is to specify a
suitable value for the positive parameter !k : As we know, the convergence rate of the
nonlinear conjugate gradient algorithms depend on the structure of the eigenvalues
of the Hessian and the condition number of this matrix. The standard approach
is based on a singular value study on the matrix HkC1 (see for example [6, 7]),
i.e. the numerical performances and the efficiency of the quasi-Newton methods
are based on the condition number of the successive approximations of the inverse
Hessian. A matrix with a large condition number is called an ill-conditioned matrix.
Ill-conditioned matrices may produce instability in numerical computation with
them. Unfortunately, many difficulties occur when applying this approach to general
nonlinear optimization problems. Mainly, these difficulties are associated to the
condition number computation of a matrix. This is based on the singular values
of the matrix, which is a difficult and laborious task. However, if the matrix HkC1 is
a normal matrix, then the analysis is simplified because the condition number of a
normal matrix is based on its eigenvalues, which are easier to be computed.
As we know, generally, in a small neighborhood of the current point, the
nonlinear objective function in the unconstrained optimization problem (1) behaves


6


N. Andrei

like a quadratic one for which the results from linear conjugate gradient can apply.
But, for faster convergence of linear conjugate gradient algorithms some approaches
can be considered like: the presence of isolated smallest and/or largest eigenvalues
of the matrix HkC1 ; as well as gaps inside the eigenvalues spectrum [5], clustering of
the eigenvalues about one point [33] or about several points [23], or preconditioning
[22]. If the matrix has a number of certain distinct eigenvalues contained in m
disjoint intervals of very small length, then the linear conjugate gradient method will
produce a very small residual after m iterations [24]. This is an important property
of linear conjugate gradient method and we try to use it in nonlinear case in order
to get efficient and robust conjugate gradient algorithms. Therefore, we consider
the extension of the method of clustering the eigenvalues of the matrix defining the
search direction from linear conjugate gradient algorithms to nonlinear case.
The idea is to determine !k by clustering the eigenvalues of HkC1 ; given by (13),
by minimizing the largest eigenvalue of the matrix HkC1 from the spectrum of this
matrix. The structure of the eigenvalues of the matrix HkC1 is given by the following
theorem.
Theorem 2.1. Let HkC1 be defined by (13). Then HkC1 is a nonsingular matrix and
its eigenvalues consist of 1 (n 2 multiplicity); C
kC1 ; and kC1 ; where
C
kC1

D

Ä
q
1

.2 C !k bk / C !k2 b2k
2

4ak C 4 ;

(15)

kC1

D

Ä
1
.2 C !k bk /
2

q
!k2 b2k

4ak C 4 ;

(16)

kyk k2 ksk k2
> 1;
.yTk sk /2

bk D

and

ak D

ksk k2
yTk sk

0:

(17)

Proof. By the Wolfe line search conditions (4) and (5) we have that yTk sk > 0:
Therefore, the vectors yk and sk are nonzero vectors. Let V be the vector space
spanned by fsk ; yk g: Clearly, dim.V/ Ä 2 and dim.V ? / n 2: Thus, there exist a
set of mutually unit orthogonal vectors fuik gniD12 V ? such that
sTk uik D yTk uik D 0; i D 1; : : : ; n

2;

which from (13) leads to
HkC1 uik D uik ; i D 1; : : : ; n
Therefore, the matrix HkC1 has n
to fuik gniD12 as eigenvectors.

2:

2 eigenvalues equal to 1, which corresponds


A New Adaptive Conjugate Gradient Algorithm

as


7

Now, we are interested to find the rest of the two remaining eigenvalues, denoted
C
kC1 and kC1 ; respectively. From the formula of algebra (see for example [32])
det.I C pqT C uv T / D .1 C qT p/.1 C v T u/
where p D

yk C!k sk
;
yTk sk

q D sk ; u D

det.HkC1 / D

sk
yTk sk

ksk k2 kyk k2
.yTk sk /

2

.pT v/.qT u/;

and v D yk ; it follows that
C !k


ksk k2
Á ak C !k bk :
yTk sk

(18)

But, ak > 1 and bk 0, therefore, HkC1 is a nonsingular matrix.
On the other hand, by direct computation
tr.HkC1 / D n C !k

ksk k2
Á n C !k bk :
yTk sk

(19)

By the relationships between the determinant and the trace of a matrix and
its eigenvalues, it follows that the other eigenvalues of HkC1 are the roots of the
following quadratic polynomial
2

.2 C !k bk / C .ak C !k bk / D 0:

(20)

Clearly, the other two eigenvalues of the matrix HkC1 are determined from (20)
as (15) and (16), respectively. Observe that ak > 1 follows from Wolfe conditions
and the inequality
yTk sk


ksk k2

Ä

kyk k2
:
yTk sk

In order to have both C
kC1 and kC1 as real eigenvalues, from (15) and (16) the
following condition must be fulfilled !k2 b2k 4ak C4 0; out of which the following
estimation of the parameter !k can be determined:
p
2 ak
bk

!k

1

:

(21)

Since ak > 1; if ksk k > 0; it follows that the estimation of !k given in (21) is well
defined. From (20) we have
C
kC1

C


kC1

C
kC1 kC1

D 2 C !k bk > 0;

D ak C !k bk > 0:

(22)
(23)


8

N. Andrei

Therefore, from (22) and (23) we have that both C
kC1 are positive
kC1 and
2 2
eigenvalues. Since !k bk 4ak C4 0; from (15) and (16) we have that C
kC1 :
kC1
By direct computation, from (15), using (21) we get
C
kC1

p

ak

1C

1 > 1:

(24)

A simple analysis of Eq. (20) shows that 1 Ä kC1 Ä C
kC1 : Therefore HkC1 is a
positive definite matrix. The maximum eigenvalue of HkC1 is C
kC1 and its minimum
eigenvalue is 1.
Proposition 2.3. The largest eigenvalue
C
kC1

D

gets its minimum 1 C

p

Ä
q
1
.2 C !k bk / C !k2 b2k
2
ak


1; when !k D

4ak C 4

(25)

p
2 ak 1
:
bk

Proof. Observe
p that ak > 1: By direct computation the minimum
p of (25) is obtained
for !k D .2 ak 1/=bk ; for which its minimum value is 1 C ak 1:
p
We see that according to proposition 2.3 when !k D .2 ak 1/=bk the largest
eigenvalue of HkC1 arrives at the
p minimum value, i.e. the spectrum ofpHkC1 is
clustered. In fact for !k D .2 ak 1/=bk ; C
ak 1:
kC1 D 1 C
kC1 D
Therefore, from (17) the following estimation of !k can be obtained:
!k D 2

yTk sk p
2

ksk k


ak

1Ä2

kyk k p
ak
ksk k

1:

(26)

From (17) ak > 1; hence if ksk k > 0 it follows that the estimation of !k given
by (26) p
is well defined. However, we p
see that the minimum of C
kC1 obtained for
!k D 2 ak 1=bk ; is given by 1 C ak 1: Therefore, if ak is large, then the
largest eigenvalue of the matrix HkC1 will be large. This motivates the parameter !k
to be computed as:
( p
2
p
!k D
2 ak

kk
1 ky
; if ak

;
ksk k
kyk k
1 ksk k ; otherwise;

(27)

where
> 1 is a positive constant. Therefore, our algorithm is an adaptive
conjugate gradient algorithm in which the value of the parameter !k in the search
direction (14) is computed as in (27) trying to cluster all the eigenvalues of HkC1
defining the search direction of the algorithm.
Now, as we know, Powell [30] constructed a three dimensional nonlinear
unconstrained optimization problem showing that the PRP and HS methods could
cycle infinitely without converging to a solution. Based on the insight gained by
his example, Powell [30] proposed a simple modification of PRP method where


A New Adaptive Conjugate Gradient Algorithm

9

the conjugate gradient parameterˇkPRP is modified as ˇkPRPC D maxfˇkPRP ; 0g: Later
on, for general nonlinear objective functions Gilbert and Nocedal [15] studied the
theoretical convergence and the efficiency of PRP+ method. In the following, to
attain a good computational performance of the algorithm we apply the Powell’s
idea and consider the following modification of the search direction given by (14)
as:
 T
Ã

y gkC1 !k sTk gkC1
sTk gkC1
dkC1 D gkC1 C max k
;
0
sk
yk :
(28)
T
yk sk
yTk sk
where !k is computed as in (27).
Using the procedure of acceleration of conjugate gradient algorithms presented in
[1], and taking into consideration the above developments, the following algorithm
can be presented.
NADCG Algorithm (New Adaptive Conjugate Gradient Algorithm)
Select a starting point x0 2 Rn and compute: f .x0 /; g0 D rf .x0 /: Select some positive
values for and used in Wolfe line search conditions. Consider a positive value for
the parameter : ( > 1/ Set d0 D g0 and k D 0.
Step 2. Test a criterion for stopping the iterations. If this test is satisfied, then stop; otherwise
continue with step 3.
Step 3. Determine the steplength ˛k by using the Wolfe line search (4) and (5).
Step 4. Compute z D xk C ˛k dk ; gz D rf .z/ and yk D gk gz :
Step 5. Compute: aN k D ˛k gTz dk and bN k D ˛k yTk dk :
Step 6. Acceleration scheme. If bN k > 0; then compute k D aN k =bN k and update the variables as
xkC1 D xk C k ˛k dk ; otherwise update the variables as xkC1 D xk C ˛k dk :
Step 7. Compute !k as in (27).
Step 8. Compute the search direction
ˇ as inˇ (28).
ˇ

ˇ
Step 9. Powell restart criterion. If ˇgTkC1 gk ˇ > 0:2kgkC1 k2 ; then set dkC1 D gkC1 :
Step 10. Consider k D k C 1 and go to step 2.
Step 1.

If function f is bounded along the direction dk ; then there exists a stepsize ˛k
satisfying the Wolfe line search (see for example [13] or [29]). In our algorithm
when the Beale-Powell restart condition is satisfied, then we restart the algorithm
with the negative gradient gkC1 : More sophisticated reasons for restarting the
algorithms have been proposed in the literature [12], but we are interested in
the performance of a conjugate gradient algorithm that uses this restart criterion
associated to a direction satisfying both the descent and the conjugacy conditions.
Under reasonable assumptions, the Wolfe conditions and the Powell restart criterion
are sufficient to prove the global convergence of the algorithm. The first trial of
the step length crucially affects the practical behavior of the algorithm. At every
iteration k
1 the starting guess for the step ˛k in the line search is computed
as ˛k 1 kdk 1 k = kdk k : For uniformly convex functions, we can prove the linear
convergence of the acceleration scheme used in the algorithm [1].


10

N. Andrei

3 Global Convergence Analysis
Assume that:
i. The level set S D fx 2 Rn W f .x/ Ä f .x0 /g is bounded.
ii. In a neighborhood N of S the function f is continuously differentiable and
its gradient is Lipschitz continuous, i.e. there exists a constant L > 0 such that

krf .x/ rf .y/k Ä L kx yk ; for all x; y 2 N:
Under these assumptions on f there exists a constant
0 such that krf .x/k Ä
for all x 2 S: For any conjugate gradient method with strong Wolfe line search the
following general result holds [26].
Proposition 3.1. Suppose that the above assumptions hold. Consider a conjugate
gradient algorithm in which, for all k
0; the search direction dk is a descent
direction and the steplength ˛k is determined by the Wolfe line search conditions. If
X

1

k 0

kdk k2

D 1;

(29)

then the algorithm converges in the sense that
lim inf kgk k D 0:

(30)

k!1

For uniformly convex functions we can prove that the norm of the direction dkC1
computed as in (28) with (27) is bounded above. Therefore, by proposition 3.1 we

can prove the following result.
Theorem 3.1. Suppose that the assumptions (i) and (ii) hold. Consider the algorithm NADCG where the search direction dk is given by (28) and !k is computed
as in (27). Suppose that dk is a descent direction and ˛k is computed by the strong
Wolfe line search. Suppose that f is a uniformly convex function on S i.e. there exists
a constant > 0 such that
.rf .x/

rf .y//T .x

kx

y/

yk2

(31)

for all x; y 2 N: Then
lim kgk k D 0:

(32)

k!1

Proof. From Lipschitz continuity we havekyk k Ä L ksk k : On the other hand, from
uniform convexity it follows that yTk sk
ksk k2 : Now, from (27)
p
!k D 2


1

p
kyk k
Ä2
ksk k

1

p
L ksk k
D 2L
ksk k

1:


×