Tải bản đầy đủ (.pdf) (144 trang)

Giải tích biến phân và một số bài toán tối ưu đặc biệt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.22 MB, 144 trang )

VIETNAM ACADEMY OF SCIENCE AND TECHNOLOGY

INSTITUTE OF MATHEMATICS

NGUYEN THAI AN

VARIATIONAL ANALYSIS AND
SOME SPECIAL OPTIMIZATION PROBLEMS

Speciality: Applied Mathematics
Speciality code: 62 46 01 12

DISSERTATION
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY IN MATHEMATICS

HANOI - 2016


VIETNAM ACADEMY OF SCIENCE AND TECHNOLOGY

INSTITUTE OF MATHEMATICS

Nguyen Thai An

VARIATIONAL ANALYSIS AND
SOME SPECIAL OPTIMIZATION PROBLEMS
Speciality: Applied Mathematics
Speciality code: 62 46 01 12



DISSERTATION
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY IN MATHEMATICS
Supervisors:
1. Prof. Dr. Hab. Nguyen Dong Yen
2. Assoc. Prof. Nguyen Mau Nam

HANOI - 2016


Trang (i)

i


Abstract
This dissertation uses tools from variational analysis and optimization theory to study some complex facility location problems involving distances to
sets. In contrast to the existing facility location models where the locations
are of negligible sizes, represented by points, the new approach allows us to
deal with facility location problems where the locations are of non-negligible
sizes, now represented by sets. Our efforts focus not only on studying theoretical aspects but also on developing effective algorithms for solving these
problems. Besides, we also introduce an algorithm for minimizing the difference of functions.
Our main results include:
- Algorithms based on Nesterov’s smoothing technique and the majorizationminimization principle for solving new models of the Fermat-Torricelli problem.
- Theoretical properties as well as an algorithm based on the log-exponential
smoothing technique and Nesterov’s accelerated gradient method for the
smallest intersecting ball problem.

- Solution existence together with an algorithm based on the DC algorithm
and the Weiszfeld algorithm for a nonconvex facility location problem.
- Convergence analysis of a generalized proximal point algorithm for minimizing the difference of a nonconvex function and a convex function.

ii


Confirmation
This dissertation was written on the basis of my research works carried
out at the Institute of Mathematics, Vietnam Academy of Science and Technology, under the guidance of Prof. Nguyen Dong Yen and Assoc. Prof.
Nguyen Mau Nam. All results presented in this dissertation have never been
published by others.
Hanoi, August 2016
The author

Nguyen Thai An

iii


Acknowledgment
I would like to express my sincere gratitude to Prof. Nguyen Dong Yen
and Assoc. Prof. Nguyen Mau Nam for their guidance and supports. I thank
them for always being there for me and providing many relevant suggestions
and worthy opinions through all stages of this dissertation. I am grateful
for being able to participate in their research groups where I have had the
pleasure of working with many active and accomplished researchers.
I would like to thank the Board of Directors and the research staff of the
Institute of Mathematics, Vietnam Academy of Science and Technology, for
providing me with a wonderful scientific environment.

I am also grateful to Prof. Hoang Xuan Phu, Assoc. Prof. Ta Duy Phuong,
Assoc. Prof. Phan Thanh An and all members of the Weekly Seminar at
the Department of Numerical Analysis and Scientific Computing, Institute
of Mathematics, for their valuable discussions.
Financial supports from the Vietnam National Foundation for Science and
Technology Development (NAFOSTED), the Vietnam Institute for Advanced
Study in Mathematics (VIASM), and Thua Thien Hue College of Education,
are gratefully acknowledged.
My deepest gratitude goes to my parents, my sisters and brothers, for
their supports and continuing encouragement. I want to thank my loving
wife who always believes me in pursuing my dreams. I want to thank her for
her sacrifices and supports during the past three years. Finally, I would like
to thank and dedicate this dissertation to my little daughter, Bao Nguyen,
who is the greatest inspiration of my life.

iv


Contents
Table of Notations

vii

List of Figures

viii

Introduction

ix


Chapter 1. Preliminaries
1.1 Tools of Convex Analysis . . . . . . . . .
1.2 Majorization-Minimization Principle . .
1.3 Nesterov’s Accelerated Gradient Method
1.4 Nesterov’s Smoothing Technique . . . . .
1.5 DC Programming and DC Algorithm . .
1.6 Conclusions . . . . . . . . . . . . . . . .

1
1
4
5
7
9
10

.
.
.
.
.
.

.
.
.
.
.
.


.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.

.
.
.

.
.
.
.
.
.

Chapter 2. Effective Algorithms for Solving Generalized FermatTorricelli Problems
11
2.1 Generalized Fermat-Torricelli Problems . . . . . . . . . . . . . 11
2.2 Nesterov’s Smoothing Technique and a General Form of the
Majorization-Minimization Principle . . . . . . . . . . . . . . 13
2.3 Problems Involving Points . . . . . . . . . . . . . . . . . . . . 17
2.4 Problems Involving Sets . . . . . . . . . . . . . . . . . . . . . 21
2.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 3. The Smallest Intersecting Ball Problem
36
3.1 Problem Formulation and Theoretical Aspects . . . . . . . . . 36
3.2 A Smoothing Technique for the Smallest Intersecting Ball Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
v


3.3
3.4
3.5


A Majorization-Minimization Algorithm for the
tersecting Ball Problem . . . . . . . . . . . . . .
Numerical Implementation . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . .

Smallest
. . . . .
. . . . .
. . . . .

In. . .
. . .
. . .

53
59
62

Chapter 4. A Nonconvex Location Problem Involving Sets
64
4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 Solution Existence in the General Case . . . . . . . . . . . . . 66
4.3 Solution Existence in a Special Case . . . . . . . . . . . . . . 73
4.4 A Combination of DCA and Generalized Weiszfeld Algorithm 79
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Chapter 5. Convergence Analysis of a Proximal Point Algorithm
for Minimizing a Difference of Functions
87
5.1 The Kurdyka-Lojasiewicz Property . . . . . . . . . . . . . . . 87

5.2 A Generalized Proximal Point Algorithm for Minimizing a Difference of Functions . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
General Conclusions

107

List of Author’s Related Papers

108

References

109

Appendix A

117

Index

127

vi


Table of Notations
IN := {0, 1, 2, . . .}

IR

IRn
(a, b)
[a, b]
median{α, β, γ}
x, y
|x|
x
I
A
bd Ω
co Ω
cone Ω
d(x; Ω) or dist(x; Ω)
P (x; Ω)
N (x; Ω)
{xk }
xk → x
liminf αk

set of natural numbers
empty set
set of real numbers
n-dimensional Euclidean vector space
set of x ∈ IR with a < x < b
set of x ∈ IR with a ≤ x ≤ b
the middle number of the list obtained after
sorting α, β, γ from the smallest to the largest
canonical inner product
absolute value of x ∈ IR
Euclidean norm of a vector x

n × n unit matrix
transposed matrix of a matrix A
topological boundary of Ω
convex hull of Ω
cone generated by Ω
distance from x to Ω
Euclidean projection from x onto Ω
normal cone to Ω at x ∈ Ω
sequence of vectors
xk converges to x in norm topology
lower limit of a sequence {αk } ⊂ IR

limsup αk

upper limit of a sequence {αk } ⊂ IR

k→+∞

k→+∞

δ(·; Ω)
f : IRn → IR ∪ {+∞}
domf
f∗
∂f (x)
∂ F f (x)
∂ L f (x)

indicator function of Ω
extended-real-valued function

effective domain of f
Fenchel conjugate function of f
subdifferential of f at x in the sense
of convex analysis
Fr´echet subdifferential of f at x
limiting subdifferential of f at x

vii


List of Figures
2.1
2.2
2.3
2.4
3.1
3.2
3.3
3.4
4.1
4.2

MM algorithm for a generalized Fermat-Torricelli problem . .
Generalized Fermat-Torricelli problems with different norms .
A generalized Fermat-Torricelli problem with US Cities . . . .
A generalized Fermat-Torricelli problem with MM method . .
A smallest intersecting ball problem for three balls in IR2 . . .
A smallest intersecting ball problem for disks in IR2 . . . . . .
A smallest intersecting ball problem for cubes in IR3 . . . . . .
Comparison between Algorithm 4, a subgradient algorithm,

and a BFGS algorithm. . . . . . . . . . . . . . . . . . . . . . .
The surface and contour lines of the objective function . . . .
A generalized Fermat-Torricelli problem with US Cities . . . .

viii

25
32
33
34
44
60
61
62
65
85


Introduction
Optimization techniques usually require differentiability of the function involved, while nondifferentiable structures appear frequently and naturally in
many mathematical models. Motivated by applications to optimization problems with nondifferentiable data, nonsmooth/variational analysis has been
developed to study generalized differentiability properties of functions and
set-valued mappings without imposing the smoothness of the data.
Starting with convex analysis, nonsmooth analysis has grown to be an
important theory with numerous applications. Before the 1960s, convexity appeared mostly in geometric forms. The geometry of convex sets had
been studied by prominent mathematicians: H. Minkowski (1864-1909), W.
Fenchel (1905-1988), and others. In the early 1960s, the development of convex analysis was mainly due to the pioneering works of J. J. Moreau (19232014) and R. T. Rockafellar (born 1935). Convex analysis is a branch of
mathematics devoted to studying convex sets, continuity and generalized differentiability properties of convex functions. Convex analysis provides mathematical foundation for convex optimization, a field with increasing impact on
optimal control, automatic control, signal processing, communications and
networks, electronic circuit design, data analysis, statistics, economics, finance, and other fields.

The successes of convex analysis urged people to look for a new theory
to deal with broader classes of sets and functions in which convexity is not
a major assumption. It was F. H. Clarke, a student of R. T. Rockafellar,
who initiated a generalized differentiation theory for locally Lipschitz functions. From 1973 until now, Clarke’s theory has attracted great attention
worldwide and has shown significant applications. In 1976, using a dual approach, B. S. Mordukhovich started to develop the robust nonconvex generalized differentiation theory for not only extended-real-valued functions but
also for set-valued mappings. The limiting/Mordukhovich subdifferential,
defined not only for locally Lipschitz functions but also for lower semicontinuous functions, is smaller than Clarke’s counterpart in general and hence it
is more effective for applications, especially those to optimization and equilibrium problems. In spite of the nonconvexity of the limiting generalized
ix


differentiation constructions, they possess well-developed calculus rules that
are comprehensive in many important classes of Banach spaces including the
reflexive ones.
Facility location, also known as location analysis, is a branch of operations research and computational geometry that concerns with mathematical modelings and solution methods for problems of finding the right site
of a set of facilities in a given space in order to supply some service to
a set of demands/customers. There are four fundamental ingredients in
each location problem: demands/customers, who are supposed to be already
present to receive service, facilities that will be located, a space in which
demands/customers are present and facilities are located, and a metric that
measures distances between customers and facilities. Facility location models include locating desirable sites such as supermarkets, schools, warehouses,
etc., to minimize the average time to the nearby residents; locating obnoxious
materials that have undesirable effects on people or the environment such as
nuclear power stations, sewage plants, etc., to maximize their distances from
the public; locating antenna stations to transfer signals effectively; locating
automatic teller machines to serve bank customers better, etc. Depending
on specific applications, location models are very different in their objective
functions, the distance metrics applied, the number and size of the facilities
to locate; see, e.g., [29, 33] and the references therein.
The origin of location theory can be traced back as far as to the seventeenth

century when the French mathematician P. de Fermat (1601-1665) formulated
the problem of finding a fourth point such that the sum of its distances to the
three given points in the plane is minimal. This celebrated problem was then
solved by E. Torricelli (1608-1647), an Italian mathematician and physicist
who is mostly known for inventing the barometer. Torricelli’s solution is
stated as follows: If none of the interior angles of the triangle formed by the
three fixed points reaches or exceeds 120◦ , the minimizing point in question is
located inside this triangle in such a way that each side of the triangle is seen
at an angle of 120◦ ; otherwise it is the obtuse vertex of the triangle. This
point is often called the Torricelli point. At the beginning of the twentieth
century, the German economist A. Weber incorporated weights, and was able
to treat facility location problems with more than 3 points as follows
m

αi x − ai : x ∈ IRn ,

min
i=1

where αi > 0 for i = 1, . . . , m are given weights and the vectors ai ∈ IRn for
i = 1, . . . , m are given demand points. This problem was consequently called
the Fermat-Weber problem. Other names for the problem are the Weber
problem, the Fermat-Torricelli problem, and other variants. This practical
x


problem has been the inspiration for many new problems in the fields of
computational geometry, logistics, and location science; see, e.g., [48, 53].
The first numerical algorithm for solving the Fermat-Torricelli problem was
introduced by the Hungarian mathematician E. Weiszfeld [96]. As pointed

out by H. W. Kuhn [45], the Weiszfeld algorithm may fail to converge when
the iterative sequence enters the set of demand points. The assumptions guaranteeing the convergence of the Weiszfeld algorithm along with a proof of the
convergence theorem were given in [45]. Generalized versions of the FermatTorricelli problem and several new algorithms have been introduced to solve
generalized Fermat-Torricelli problems as well as to improve the Weiszfeld
algorithm; see, e.g., [18, 32, 46, 61, 91, 94, 95]. The Fermat-Torricelli problem has also been revisited several times from different viewpoints; see, e.g.,
[17, 26, 28].
The Fermat-Torricelli/Weber problem on the plane with some negative
weights was first introduced and solved in the triangle case by L.-N. Tellier
in 1985 and then generalized by Z. Drezner and G. O. Wesolowsky in [30]
with the following formulation in IR2 :
p

q

βj x − bj : x ∈ IR2 ,

α i x − ai −

min
i=1

(1)

j=1

where αi for i = 1, . . . , p and βj for j = 1, . . . , q are positive numbers; the
vectors ai ∈ IR2 for i = 1, . . . , p and bj ∈ IR2 for j = 1, . . . , q are given
demand points. According to Z. Drezner and G. O. Wesolowsky, a negative
weight for a demand point means that the cost is increased as the facility
approaches that demand point. One can view demand points as attracting

or repelling the facility, and the optimal location as the one that balances
the forces. Since the problem is nonconvex in general, traditional solution
methods of convex optimization widely used in the previous convex versions of
the Fermat-Torricelli problem, are no longer applicable to this case. Sufficient
conditions for solution existence, some properties of optimal solutions as well
as a result that limits the region of the plane where the optimal points of
(1) can be located are provided in [30]. The first numerical algorithm for
solving this nonconvex problem which is based on the outer-approximation
procedure from global optimization was given by P.-C. Chen, P. Hansen, B.
Jaumard, and H. Tuy in [19].
The smallest enclosing circle problem can be stated as follows: Given a
finite set of points in the plane, find the circle of smallest radius that encloses
all of the points. It was introduced in the nineteenth century by the English
mathematician J. J. Sylvester (1814–1897) in [90]. The mathematical model
xi


of the problem in high dimensions can be formulated as follows
min

max x − ai : x ∈ IRn ,

1≤i≤m

(2)

where ai ∈ IRn for i = 1, . . . , m are given points. Problem (2) is both a
facility location problem and a major problem in computational geometry.
The smallest enclosing circle problem and its versions in higher dimensions are
also known under other names such as the smallest enclosing ball problem,

the minimum ball problem, or the bomb problem. Over a century later,
research on the smallest enclosing circle problem remains very active due
to its important applications to clustering, nearest neighbor search, data
classification, facility location, collision detection, computer graphics, and
military operations. The problem has been widely treated in the literature
from both theoretical and numerical standpoints; see [1, 20, 27, 34, 37, 73,
85, 97, 101, 104] and the references therein.
When dealing with a nonconvex programming problem, the most important property of convex problems concerning the fact that local solutions
are global ones no longer holds true. Therefore, it is natural that solution
methods for nonconvex problems have to take into account the form of the
models. A progress to go beyond convexity was made by considering the class
of functions representable as a difference of convex functions. A pioneer in
this research direction is P. D. Tao who introduced a simple algorithm called
the DCA based on generalized differentiation of the functions involved as
well as their Fenchel conjugates [80]. Over the past three decades, P. D. Tao,
L. T. H. An and many others have contributed to providing mathematical
foundation for the algorithm and making it accessible for applications. The
DCA nowadays becomes a classical tool in the field of optimization due to
several key features including simplicity, inexpensiveness, flexibility and efficiency; see [63, 76, 77, 78].
The proximal point algorithm (PPA for short) was suggested by Martinet
[52] for solving convex optimization problems and was extensively developed
by Rockafellar [82] in the context of monotone variational inequalities. The
main idea of this method consists of replacing the initial problem with a
sequence of regularized problems, so that each particular auxiliary problem
can be solved by one of the well-known algorithms. Along with the DCA, a
number of proximal point algorithms have been proposed in [12, 62, 88, 89]
to minimize differences of convex functions. Although convergence results
for the DCA and the proximal point algorithms for minimizing differences
of convex functions have been addressed in some recent research, it is still
an open research question to study the convergence analysis of algorithms

for minimizing a difference of functions in which only the second function
involved is required to be convex.
xii


In this dissertation, we use tools from nonsmooth analysis and optimization
theory to study some complex facility location problems involving distances to
sets in a finite dimensional space. In contrast to the existing facility location
models where the locations are of negligible sizes, represented by points, the
approach adapted in this dissertation allows us to deal with facility location
problems where the locations are of non-negligible sizes, now represented
by sets. Because of the intrinsic nondifferentiability of the problems under
consideration, they can be used as test problems for nonsmooth optimization
algorithms. Our efforts focus not only on studying theoretical aspects but
also on developing effective solution methods for these problems.
The dissertation has five chapters, a list of references, and an appendix
containing MATLAB codes of some numerical examples.
Chapter 1 collects several concepts and results from convex analysis and
DC programming that are useful for subsequent studies. We also describe
briefly the majorization-minimization principle, Nesterov’s accelerated gradient method and smoothing technique, as well as P. D. Tao and L. T. H.
An’s DC algorithm.
Chapter 2 is devoted to numerically solving a number of new models of facility location which generalize the classical Fermat-Torricelli problem. Convergence of the proposed algorithms is proved and numerical tests are presented.
Chapter 3 studies a generalized version of problem (2) from both theoretical
and numerical viewpoints. Sufficient conditions guaranteeing the existence
and uniqueness of solutions, optimality conditions, constructions of the solutions in special cases are addressed. We also propose an algorithm based on
the log-exponential smoothing technique and Nesterov’s accelerated gradient
method for solving the problem under consideration.
Chapter 4 is dedicated to studying a nonconvex facility location problem
that is a generalization of problem (1). After establishing some theoretical
properties, we propose an algorithm by combining the DC algorithm and the

Weiszfeld algorithm for solving the problem.
Chapter 5 is totally different from the preceding parts of the dissertation.
Motivated by the methods developed recently in [5, 6, 14, 79], we introduce
a generalized proximal point algorithm for solving optimization problems in
which the objective functions can be represented as differences of nonconvex
and convex functions. Convergence of this algorithm under the main assumption that the objective function satisfies the Kurdyka-Lojasiewicz property is
established.
The dissertation is written on the basis of one paper [64] in SIAM Journal
xiii


on Optimization, two papers [4, 65] in Journal of Convex Analysis, one paper
[2] in Journal of Optimization Theory and Applications, and one preprint [3]
which has been submitted.
The results of this dissertation were presented at International Workshop
on Some Selected Problems in Optimization and Control Theory (February 47, 2015, VIASM, Hanoi), The 11th Workshop on Optimization and Scientific
Computing (April 24-27, 2013, Ba Vi, Hanoi), and at the weekly seminar of
the Department of Numerical Analysis and Scientific Computing, Institute
of Mathematics, Vietnam Academy of Science and Technology.

xiv


Chapter 1

Preliminaries
Several concepts and results from convex analysis and DC programming
are recalled in this chapter. As a preparation for the investigations in Chapters 2–5, we also describe the majorization-minimization principle, Nesterov’s
accelerated gradient method and smoothing technique, as well as DC algorithm.
The concepts and results discussed in this chapter can be found in [69, 71,

77, 78, 81, 83].

1.1

Tools of Convex Analysis

We use IRn to denote the n-dimensional Euclidean space, ·, · to denote
the inner product, and · to denote the associated Euclidean norm. For an
extended-real-valued function f : IRn → IR ∪ {+∞}, the domain of f is the
set
domf := {x ∈ IRn : f (x) < +∞}.
The function f is said to be proper if its domain is nonempty. One says that
f is convex if
f ((1 − λ)x + λy) ≤ (1 − λ)f (x) + λf (y),
for all x, y ∈ IRn and λ ∈ (0, 1). If the above inequality is strict whenever
x = y, then f is said to be strictly convex. We say that a function f is
strongly convex with modulus σ if f − σ2 · 2 is a convex function. A strongly
convex function is also strictly convex, but not vice versa.
The subdifferential in the sense of convex analysis of a convex function f
at x¯ ∈ domf is defined by
∂f (¯
x) := {v ∈ IRn : v, x − x¯ ≤ f (x) − f (¯
x) ∀x ∈ IRn }.
1


For a nonempty closed convex subset Ω of IRn and a point x¯ ∈ Ω, the normal
cone to Ω at x¯ is the set
N (¯
x; Ω) := {v ∈ IRn : v, x − x¯ ≤ 0 ∀x ∈ Ω}.

This normal cone is the subdifferential of the indicator function
δ(x; Ω) =

0
if x ∈ Ω,
+∞ if x ∈
/ Ω,

at x¯, i.e., N (¯
x; Ω) = ∂δ(¯
x; Ω). It follows from the definition that the normal
cone mapping N (·; Ω) : Ω ⇒ IRn has closed graph in the sense that for any
sequences xk → x¯ and v k → v¯ where v k ∈ N (xk ; Ω), one has that v¯ ∈ N (¯
x; Ω).
The distance function to a nonempty set Ω is defined by
d(x; Ω) := inf{ x − ω : ω ∈ Ω}, x ∈ IRn .

(1.1)

According to [24, Proposition 2.4.1], the distance function d(·; Ω) is Lipschitz
continuous on IRn whose Lipschitz constant is 1, i.e.,
|d(x; Ω) − d(y; Ω)| ≤ x − y

for all x, y ∈ IRn .

If Ω is a nonempty closed convex set in IRn , then the distance function d(·; Ω)
is convex; see, e.g., [84, Example 2.55].
The Euclidean projection from x¯ to Ω is the set
P (¯
x; Ω) := {w¯ ∈ Ω : d(¯

x; Ω) = x¯ − w¯ }.
If Ω is a nonempty closed convex set, then the Euclidean projection P (x; Ω) is
a singleton for every x ∈ IRn . Furthermore, the projection operator P (·; Ω) :
IRn → Ω is nonexpansive:
P (x; Ω) − P (y; Ω) ≤ x − y

for all x, y ∈ IRn .

The nonexpansiveness clearly implies that P (·; Ω) is continuous. Explicit formulae for the projection operator P (x; Ω) exist when Ω is a box, an Euclidean
ball, a hyperplane, or a half-space. Fast algorithms for computing P (x; Ω)
exist for the unit simplex and the 1 ball; see [31].
The projection operator and the distance function to a closed convex set
Ω are related through the identity
∇d2 (x; Ω) = 2[x − P (x; Ω)] ∀x ∈ IRn .

(1.2)

A standard proof of this fact can be found in [38, p. 186]. If d2 (x; Ω) > 0,
i.e., x ∈
/ Ω, then it follows from the chain rule that
∇d(x; Ω) = ∇ d2 (x; Ω) =
2

x − P (x; Ω)
.
d(x; Ω)


One has ∇d(x; Ω) = 0 if x is an interior of Ω. However, the differentiability
of d(·; Ω) at the boundary points of Ω is not guaranteed. By [38, p. 259], the

subdifferential of the distance function (1.1) at x¯ can be computed by the
formula

x; Ω) ∩ IB
if x¯ ∈ Ω,
 N (¯
x
¯

P

x
;
Ω)
(1.3)
∂d(¯
x; Ω) =
if x¯ ∈
/ Ω,

d(¯
x; Ω)
where IB denotes the Euclidean closed unit ball of IRn .
It is well-known that a convex function f : IRn → IR ∪ {+∞} has a global
minimum on a convex set Ω at x¯ if and only if it has a local minimum on Ω
at x¯. Furthermore, under the assumption that f is continuous at one point
belonging to Ω, the following generalized version of the Fermat rule holds:
x¯ ∈ Ω is a minimizer of f on Ω if and only if
0 ∈ ∂f (¯
x) + N (¯

x; Ω);

(1.4)

see, e.g., [84, Theorem 3.33]. If suppose further that f is differentiable on
IRn , then (1.4) becomes
∇f (¯
x), x − x¯ ≥ 0 ∀x ∈ Ω.

(1.5)

For a finite number of convex functions fi : IRn → IR, i = 1, . . . , m, one has
m

∂(

m

∂fi (x), x ∈ IRn .

fi )(x) =
i=1

i=1

We continue by recalling some basic facts concerning the conjugate functions.
The Fenchel conjugate of a convex function f : IRn → IR ∪ {+∞} is defined
by
f ∗ (v) := sup{ v, x − f (x) : x ∈ IRn }, v ∈ IRn .
By [42, Proposition 3, p. 174], if f is proper and lower semicontinuous, then

f ∗ : IRn → IR ∪{+∞} is also a proper, lower semicontinuous convex function.
Proposition 1.1 (Properties of the Fenchel conjugates; see [42]) Let f :
IRn → IR ∪ {+∞} be a convex function.
(i) Given any x ∈ dom f , one has that v ∈ ∂f (x) if and only if
f (x) + f ∗ (v) = v, x .
(ii) If f is proper and lower semicontinuous, then for any x ∈ dom f one has
that v ∈ ∂f (x) if and only if x ∈ ∂f ∗ (v).
(iii) If f is proper and lower semicontinuous, then (f ∗ )∗ = f .
We refer to [16, 38, 57, 84] for a more complete theory of convex analysis
and applications to optimization from both theoretical and numerical aspects.
3


1.2

Majorization-Minimization Principle

First proposed by J. M. Ortega and W. C. Rheinboldt [75, Section 8.3], the
majorization-minimization (MM) principle currently has many applications
not only to computational statistics [11, 50] but also to imaging sciences
[23]. This principle has been an inspiration of many iterative methods in
optimization; see, e.g., [10, 22, 50, 51].
The basic idea of MM principle is to convert a hard optimization problem
(for example, a non-differentiable problem) into a sequence of simpler ones
(for example, smooth problems). The objective function f : IRn → IR is said
to be majorized by a surrogate function M : IRn × IRn → IR on Ω if
f (x) ≤ M (x, y) and f (y) = M(y, y) for all x, y ∈ Ω.
Given x0 ∈ Ω, the iterates of the associated MM algorithm for minimizing f
on Ω are defined by the rule
xk+1 ∈ argmin M(x, xk ),


(1.6)

x∈Ω

where argmin M(x, xk ) denotes the solution set of the problem
x∈Ω

min{M(x, xk ) : x ∈ Ω}.
Because
f (xk+1 ) ≤ M(xk+1 , xk ) ≤ M(xk , xk ) = f (xk ),

(1.7)

the MM iterates generate a descent algorithm driving the objective function
downhill. Under appropriate regularity conditions, the iterative sequence
{xk } converges to a local minimum of the original problem; see [49].
The convergence theory of MM algorithms relies heavily on the properties
of the algorithm map
ψ(x) := argmin M(y, x).
y∈Ω

The following simple version of Meyer’s monotone convergence theorem [54]
will be used to prove convergence results in our setting.
Proposition 1.2 (See [22, Proposition 1]) Let f (x) be a real-valued continuous function on a domain Ω and ψ : Ω → Ω be a single-valued continuous
algorithm map satisfying f (ψ(x)) < f (x) for all x ∈ Ω with ψ(x) = x. Suppose for some initial point x0 that the set
Lf (x0 ) := {x ∈ Ω : f (x) ≤ f (x0 )},
is compact. Then
(a) lim xk+1 − xk = 0,
k→+∞


4


(b) All cluster points of {xk } are fixed points of ψ,
(c) {xk } converges to one of the fixed points if the later are finite in number.
J. Mairal [51] showed that, if some additional assumptions on surrogate
functions of f are satisfied, then the convergence rate of MM sequences can
be determined. A function g : IRn → IR is said to be a first-order (majorizing)
surrogate function of f near z ∈ Ω if g(x) ≥ f (x) for all x ∈ Ω, h = g − f is
differentiable with L - Lipschitz gradient, h(z) = 0, and ∇h(z) = 0. Here L
is a positive constant such that
∇h(x) − ∇h(y) ≤ L x − y

for all x, y ∈ IRn .

Denote by SL (f, z) the set of first-order surrogate functions g of f near
z with ∇(g − f ) being L - Lipschitz. The subset of SL (f, z) containing all
ρ-strongly convex functions is abbreviated to SL, ρ (f, z).
Suppose that the objective function f is majorized by a surrogate function
M. For a given xk ∈ Ω, let gk+1 (x) = M(x, xk ) and hk+1 (x) := gk+1 (x)−f (x).
Let {xk } be the sequence generated by the MM algorithm:
xk+1 ∈ argmin M(x, xk ) = argmin gk+1 (x).
x∈Ω

x∈Ω

If gk+1 is strongly convex, then the vector xk+1 is uniquely defined.
Proposition 1.3 (See [51, Proposition 2.8]) Assume that f is convex, bounded
from below and x∗ is a minimizer of f on Ω. If gk+1 ∈ SL, ρ (f, xk ) for all k

and ρ ≥ L, then
L x0 − x∗
f (x ) − V ≤
2k
k



2

for all k ∈ IN \ {0},

where V ∗ = f (x∗ ) is the optimal value. If suppose further that f is µ-strongly
convex, then we have
k



f (x ) − V ≤

1.3

L
ρ+µ

k−1

L x0 − x∗
2


2

for all k ∈ IN \ {0}.

Nesterov’s Accelerated Gradient Method

In his seminal papers [69, 71], Yu. Nesterov introduced a fast first-order
method for solving convex smooth optimization problems in which the cost
functions have Lipschitz gradients. In contrast to the convergence rate of
O k1 when applying the classical gradient method to this class of problems,
Nesterov’s accelerated gradient method gives a convergence rate of O k12 .
5


Let f : IRn → IR be a smooth convex function with Lipschitz gradient.
That is, there exists ≥ 0 such that
∇f (x) − ∇f (y) ≤

for all x, y ∈ IRn .

x−y

Let Ω be a nonempty closed convex set. In [69, 71], Nesterov considered the
optimization problem
min f (x) : x ∈ Ω .
(1.8)
For x ∈ IRn , one defines
ΨΩ (x) := argmin

∇f (x), y − x +


2

x−y

2

: y∈Ω .

Let d be a continuous and strongly convex function on Ω with modulus σ > 0.
The function d is called a prox-function of the set Ω. Since d is a strongly
convex function on Ω, it has a unique minimizer on this set. Denote
x0 = argmin{d(x) : x ∈ Ω}.
Without loss of generality, we assume that d(x0 ) = 0. Then Nesterov’s accelerated gradient algorithm for solving (1.8) is outlined as follows.
Algorithm 1.
INPUT: f , , x0 ∈ Ω
set k = 0
repeat
find y k := ΨΩ (xk )
[f (xi ) + ∇f (xi ), x − xi ] : x ∈ Ω
find z k := argmin σ d(x) + ki=0 i+1
2
k
k
2
k+1 k
set x := k+3 z + k+3 y
set k := k + 1
until a stopping criterion is satisfied.
OUTPUT: y k .


In our computational experiments, we often choose d(x) =
where x0 ∈ Ω and σ = 1. In this case, it is not hard to see that
k

k

k

0

k

y = ΨΩ (x ) = P x −
and
z =P

x −

1

k

i=0

∇f (xk )

1
2


x − x0 2 ,

;Ω ,

i+1
∇f (xi ); Ω .
2

Theorem 1.1 (See [71, Theorem 2]) Consider the sequences {xk } and {y k }
generated by Algorithm 1. Then, for any k ≥ 0, we have
(k + 1)(k + 2)
f (y k ) ≤ min
d(x) +
x∈Ω
4
σ
6

k

i=0

i+1
[f (xi ) + ∇f (xi ), x − xi ] .
2


Therefore,
f (y k ) − f (x∗ ) ≤


4 d(x∗ )
,
σ(k + 1)(k + 2)

where x∗ is an optimal solution of (1.8).

1.4

Nesterov’s Smoothing Technique

Let Ω be a nonempty closed convex subset of IRn and let Q be a nonempty
compact convex subset of IRm . Consider the constrained optimization problem (1.8) in which f : IRn → IR is a convex function of the type
f (x) := max{ Ax, u − φ(u) : u ∈ Q}, x ∈ IRn ,

(1.9)

where A is an m × n matrix and φ is a continuous convex function on Q.
The main difficulty in solving (1.8) in this case arises from the nondifferentiability of the objective function. In [71], Nesterov made use of the
special structure of f to approximate it by a function with Lipschitz continuous gradient and then applied his accelerated gradient method to minimize
the smooth approximation. To this end, let d1 be a prox-function of Q with
modulus σ1 > 0. Denote by
u¯ := argmin{d1 (u) : u ∈ Q},
the unique minimizer of d1 on Q. Assume that d1 (¯
u) = 0. In the sequel, we
1
2
will work mainly with d1 (u) = 2 u − u¯ where u¯ ∈ Q. Let µ be a positive
number called a smooth parameter. Define
fµ (x) := max{ Ax, u − φ(u) − µd1 (u) : u ∈ Q}.


(1.10)

Since d1 (u) is strongly convex, problem (1.10) has a unique solution. For an
m × n matrix A = (aij ), the norm of A is defined by
A := max{ Ax :

x ≤ 1}.

(1.11)

The definition gives us
Ax ≤ A

x

∀x ∈ IRn .

The following statement is a simplified version of [71, Theorem 1].
Theorem 1.2 (See [71, Theorem 1]) The function fµ in (1.10) is well defined
and continuously differentiable on IRn . The gradient of the function is
∇fµ (x) = A uµ (x),
7


where uµ (x) is the unique element of Q such that the maximum in (1.10) is
attained. Moreover, ∇fµ is a Lipschitz function with the Lipschitz constant
1
A 2.
µ =
µσ1

Let D1 := max{d1 (u) : u ∈ Q}. Then
fµ (x) ≤ f (x) ≤ fµ (x) + µD1

∀x ∈ IRn .

The above theorem shows that the function fµ is a smooth approximation
of f . Let us put Theorem 1.1 and Theorem 1.2 together. When f has the
particular form (1.9), for numerically solving (1.8), we apply Algorithm 1 to
the following smooth optimization problem
min fµ (x) : x ∈ Ω .
Assume that fµ∗ = inf x∈Ω fµ (x) > −∞. Since f (x) ≥ fµ (x) for all x ∈ IRn , we
have f ∗ = inf x∈Ω f (x) ≥ fµ∗ . Moreover, it follows from f (x) ≤ fµ (x) + µD1 for
every x ∈ IRn that
f (x) − f ∗ ≤ fµ (x) − fµ∗ + µD1 .
In order to achieve an -suboptimal point of f , i.e., a point x such that
f (x) − f ∗ ≤ , we need
fµ (x) − fµ∗ ≤ µ ,
where µ = − µD1 . If we choose the value of smooth parameter µ to be 2D1 ,
then
A 2
4D1 A 2
µ
=
=
.
µσ1 ( − µD1 )
σ1 2
µ
Suppose that there exists a positive constant D satisfying d(x∗µ ) ≤ D for
any minimizer x∗µ of fµ on Ω, for example, we can choose

D = max{d(x) : x ∈ Ω},
when Ω is compact. By Theorem 1.1, we have
4 µD
fµ (y k ) − fµ∗ ≤
.
σk 2
Therefore, to get an -suboptimal point of f , we perform Algorithm 1 for the
smooth problem at least C/ iterations, where
C=4 A

DD1
.
σσ1

For this reason, we say that the combination of Nesterov’s smoothing techniques and Nesterov’s accelerated gradient methods ends up with a scheme
having efficiency estimate of the order O( 1 ). Comparing this with the order O( 12 ) of the subgradient method for minimizing the nonsmooth function
(1.9), the difference is enormous; see [71] for more details.
8


1.5

DC Programming and DC Algorithm

In nonconvex programming problems, local solutions are not necessary to
be global ones. Therefore, solution methods for nonconvex problems have to
take into account the specific form of the models in question. One of the most
successful approaches to go beyond convexity was made by considering the
class of DC functions, where DC stands for Difference of Convex functions.
The class of DC functions is closed under many operations usually considered in optimization and is quite large to contain many objective functions

in applications of optimization. Moreover, this class of functions possesses
beautiful generalized differentiation properties and is favorable for applying
numerical optimization schemes; see [9, 39, 40, 63, 93] and the references
therein.
We now briefly outline some important results from the theory of DC
programming and DC algorithm. Let g : IRn → IR ∪ {+∞} and h : IRn → IR
be convex functions. Throughout the forthcoming we assume that g is proper
and lower semicontinuous. Consider the DC programming problem
min{f (x) := g(x) − h(x) : x ∈ IRn }.

(1.12)

The following first-order necessary optimality condition was first proved by
J. B. Hiriart-Urruty.
Proposition 1.4 (See [77, Theorem 1]) If x¯ ∈ dom f is a local minimizer
of (1.12), then
∂h(¯
x) ⊂ ∂g(¯
x).
(1.13)
Any point x¯ ∈ domf that satisfies (1.13) is called a stationary point of
(1.12), and any point x¯ ∈ domf such that ∂g(¯
x)∩∂h(¯
x) = ∅ is called a critical
point of this problem. Since h is a finite convex function, its subdifferential at
any point is nonempty, and hence any stationary point of (1.12) is a critical
point. More details about the optimality condition (1.13) can be seen in
[39, 77].
The J. F. Toland dual of (1.12) is the problem
min{h∗ (y) − g ∗ (y) : y ∈ IRn }.


(1.14)

Using the convention (+∞) − (+∞) = +∞, we can state Toland’s duality
theorem as follows.
Proposition 1.5 (See [77, Section 2]) Under the assumptions made on the
functions g and h, one has
inf{g(x) − h(x) : x ∈ IRn } = inf{h∗ (y) − g ∗ (y) : y ∈ IRn },
i.e., the optimal values of (1.12) and (1.14) coincide.
9


×