Bài giảng Tối ưu hóa nâng cao: Chương 6 - Hoàng Nam Dũng

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (747.96 KB, 36 trang )

Subgradients

Hoàng Nam Dũng
Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội

Last time: gradient descent
Consider the problem
min f (x)
x

for f convex and differentiable, dom(f ) = Rn .
Gradient descent: choose initial x (0) ∈ Rn , repeat
x (k) = x (k−1) − tk · ∇f (x (k−1) ),

k = 1, 2, 3, . . .

Step sizes tk chosen to be fixed and small, or by backtracking line
search
If ∇f Lipschitz, gradient descent has convergence rate O(1/ε)
Downsides:
Requires f differentiable
Can be slow to converge
1

Outline

Today:
Subgradients
Examples

Properties
Optimality characterizations

2

Basic inequality

Basic inequality

recall the basic inequality for differentiable convex functions:

Recall that for convex and differentiable f ,

T T
f (y )f≥
+ ∇f
(x)(x)
(y (y
−−x),
dom(f
(y)f (x)
≥ f (x)
+ ∇f
x) ∀x,
∀yy∈∈dom
f ).

(x, f (x))
∇f (x)

−1

first-order
approximation
f at xlower
is abound
global lower
• the The
first-order
approximation
of f at x isof
a global

bound.
∇f defines a non-vertical supporting hyperplane to epi(f ) at
T
∇f (x)
y
x
(x, f (x))
−
≤ 0 ∀(y, t) ∈ epi f

• ∇f (x) defines a non-vertical supporting hyperplane to epi f at (x, f (x)):

Subgradients

∇f

−1

−1

y
t

t

f (x)

−

x
f (x)

≤ 0, ∀(y , t) ∈ epi(f ).

3
4-2

Subgradients
A subgradient of a convex function f at x is any g ∈ Rn such that
f (y ) ≥ f (x) + g T (y − x), ∀y ∈ dom(f ).

Always exists (on the relative interior of dom(f ))
If f differentiable at x, then g = ∇f (x) uniquely
Same definition works for nonconvex f (however, subgradients
need not exist).

4

Subgradients
A subgradient of a convex function f at x is any g ∈ Rn such that
f (y ) ≥ f (x) + g T (y − x), ∀y ∈ dom(f ).

Always exists (on the relative
interior of dom(f ))
Subgradient
If f differentiable at x, then g = ∇f (x) uniquely
g is a subgradient of a convex function f at x ∈ dom f if
Same
definition works for nonconvex f (however, subgradients
need not exist).f (y) ≥ f (x) + gT (y − x) ∀y ∈ dom f
f (y)
f (x1) + g1T (y − x1)
f (x1) + g2T (y − x1)

f (x2) + g3T (y − x2)
x1

x2

g1 and g2 are subgradients at x1 , g3 is subgradient at x2 .
g1, g2 are subgradients at x1; g3 is a subgradient at x2

4

Examples of subgradients
Examples of subgradients

−0.5

0.0

0.5

f(x)

1.0

1.5

2.0

Considerff :: RR →
→ R,
R, ff (x) = |x|
Consider

−2

−1

0

1

2

x

• For x = 0, unique subgradient g = sign(x)

For x = 0, unique subgradient g = sign(x)
For x = 0, subgradient g is any element of [−1, 1].

• For x = 0, subgradient g is any element of [−1, 1]

5
5

Examples of subgradients
Considerf f: :RRnn→
→R,
R,ff (x)
(x) =
= xx
Consider

22

f(x)

x2

x1

x

Forxx==0,0,unique
uniquesubgradient
subgradientg g==x/ x 2
• For
x 2
• For
Forxx==0,0,subgradient
subgradientg gis isany
anyelement
elementofof{z{z: : z z2 2≤≤1}1}.

6

Examples of subgradients
Considerff :: RRnn →
→ R,
R, ff (x)
Consider
(x) =
= xx

11

f(x)

x2

x1

For x = 0, unique ith component g = sign(x )
For xi = 0, ith component gi is any element of [−1, 1].
• For xi = 0, ith component gi is any element of [−1, 1]
• For xi i = 0, unique ith component gi i = sign(xii)

7

Examples of subgradients

0

5

f(x)

10

15

n
Considerf f(x)
(x)==max{f
max{f11(x),
(x), ff22(x)}, for
Consider
for ff11,,ff2 2: R

: Rn→→RRconvex,
convex,
differentiable
differentiable

−2

−1

0

1

2

x

Forf1f1(x)
(x)>>ff22(x),
(x), unique
unique subgradient
subgradient gg =
= ∇f
∇f11(x)
• For
Forf2f2(x)
(x)>>ff11(x),
(x), unique
unique subgradient
subgradient gg =

= ∇f
∇f22(x)
(x)
• For

Forf1f1(x)
(x)==ff22(x),
(x), subgradient
subgradient gg isis any
any point
point on
on line
line segment
segment
• For
between∇f
∇f11(x)
(x)and
and∇f
∇f22(x)
(x).
between

8

Subdifferential

Set of all subgradients of convex f is called the subdifferential:
∂f (x) = {g ∈ Rn : g is a subgradient of f at x}.

9

Subdifferential

Set of all subgradients of convex f is called the subdifferential:
∂f (x) = {g ∈ Rn : g is a subgradient of f at x}.
Properties:
Nonempty for convex f at x ∈ int(domf )

∂f (x) is closed and convex (even for nonconvex f )
If f is differentiable at x, then ∂f (x) = {∇f (x)}

If ∂f (x) = {g }, then f is differentiable at x and ∇f (x) = g .
Proof: See />lectures/subgradients.pdf
9

Monotonicity
Theorem
The subdifferential of a convex function f is a monotone operator
(u − v )T (x − y ) ≥ 0, ∀u ∈ ∂f (x), v ∈ ∂f (y ).
Chứng minh.
By definition we have
f (y ) ≥ f (x) + u T (y − x) and f (x) ≥ f (y ) + v T (x − y ).
Combining them get shows monotonicity.

10

Monotonicity
Theorem
The subdifferential of a convex function f is a monotone operator
(u − v )T (x − y ) ≥ 0, ∀u ∈ ∂f (x), v ∈ ∂f (y ).
Chứng minh.
By definition we have
f (y ) ≥ f (x) + u T (y − x) and f (x) ≥ f (y ) + v T (x − y ).
Combining them get shows monotonicity.
Question: Monotonicity for differentiable convex function?

10

Monotonicity
Theorem
The subdifferential of a convex function f is a monotone operator
(u − v )T (x − y ) ≥ 0, ∀u ∈ ∂f (x), v ∈ ∂f (y ).
Chứng minh.
By definition we have
f (y ) ≥ f (x) + u T (y − x) and f (x) ≥ f (y ) + v T (x − y ).
Combining them get shows monotonicity.
Question: Monotonicity for differentiable convex function?
(∇f (x) − ∇f (y ))T (x − y ) ≥ 0,
which follows directly from the first order characterization of
convex functions.
10

Examples of non-subdifferentiable functions

The following functions are not subdifferentiable at x = 0
f : R → R, dom(f ) = R+
f (x) =

1 if x = 0
0 if x > 0.

f : R → R, dom(f ) = R+

√
f (x) = − x.

The only supporting hyperplane to epi(f ) at (0, f (0)) is vertical.

11

Connection to convex geometry
Convex set C ⊆ Rn , consider indicator function IC : Rn → R,

0
if x ∈ C
IC (x) = I {x ∈ C } =
∞ if x ∈ C .

For x ∈ C , ∂IC (x) = NC (x), the normal cone of C at x is, recall
NC = {g ∈ Rn : g T x ≥ g T y for any y ∈ C }.

Why? By definition of subgradient g ,

IC (y ) ≥ IC (x) + g T (y − x) for all y .
For y ∈ C , IC (y ) = ∞

For y ∈ C , this means 0 ≥ g T (y − x).
12

●
●

●

●

13 11

Subgradient calculus

Basic rules for convex functions:
Scaling: ∂(af ) = a · ∂f provided a > 0.
Addition: ∂(f1 + f2 ) = ∂f1 + ∂f2 .

Affine composition: if g (x) = f (Ax + b), then
∂g (x) = AT ∂f (Ax + b).
Finite pointwise maximum: if f (x) = maxi=1,...m fi (x), then
∂f (x) = conv

∂fi (x)
i:fi (x)=f (x)

convex hull of union of subdifferentials of active functions at
x.
14

Subgradient calculus

General pointwise maximum: if f (x) = maxs∈S fs (x), then




∂f (x) ⊇ cl conv
∂fs (x)
.


s:fs (x)=f (x)

Under some regularity conditions (on S, fs ), we get equality.
Norms: important special case, f (x) = x
that 1/p + 1/q = 1, then
x

p

p.

Let q be such

= max z T x.
z

q ≤1

And
∂f (x) = argmax

z

q ≤1

z T x.

15

Why subgradients?

Subgradients are important for two reasons:
Convex analysis: optimality characterization via subgradients,
monotonicity, relationship to duality.
Convex optimization: if you can compute subgradients, then
you can minimize any convex function.

16

Optimality condition

Subgradient optimality condition: For any f (convex or not),
f (x ∗ ) = min f (x) ⇐⇒ 0 ∈ ∂f (x ∗ ),
x

i.e., x ∗ is a minimizer if and only if 0 is a subgradient of f at x ∗ .

17

Optimality condition

Subgradient optimality condition: For any f (convex or not),
f (x ∗ ) = min f (x) ⇐⇒ 0 ∈ ∂f (x ∗ ),
x

i.e., x ∗ is a minimizer if and only if 0 is a subgradient of f at x ∗ .
Why? Easy: g = 0 being a subgradient means that for all y
f (y ) ≥ f (x ∗ ) + 0T (y − x ∗ ) = f (x ∗ ).
Note the implication for a convex and differentiable function f
with ∂f (x) = {∇f (x)}.

17

Derivation of first-order optimality
Example of the power of subgradients: we can use what we have
learned so far to derive the first-order optimality condition.
Theorem
For f convex and differentiable and C convex

min f (x) subject to x ∈ C
x

is solved at x if and only if
∇f (x)T (y − x) ≥ 0 for all y ∈ C .a
a

Direct proof see, e.g., />Teaching/ORF523/S16/ORF523_S16_Lec7_gh.pdf. Proof using subgradient
next slide.

Intuitively: says that gradient increases as we move away from x.
Note that for C = Rn (unconstrained case) it reduces to ∇f = 0.

18

Derivation of first-order optimality
Chứng minh.
First recast problem as
min f (x) + IC (x).
x

Now apply subgradient optimality: 0 ∈ ∂(f (x) + IC (x)).
Observe
0 ∈ ∂(f (x) + IC (x)) ⇔ 0 ∈ {∇f (x)} + NC (x)
⇔ −∇f (x) ∈ NC (x)

⇔ −∇f (x)T x ≥ −∇f (x)T y for all y ∈ C
⇔ ∇f (x)T (y − x) ≥ 0 for all y ∈ C
as desired.

Note: the condition 0 ∈ ∂f (x) + NC (x) is a fully general condition
for optimality in convex problems. But it’s not always easy to work
with (KKT conditions, later, are easier).

19

Bài giảng Tối ưu hóa nâng cao: Chương 6 - Hoàng Nam Dũng

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về