Tải bản đầy đủ (.pdf) (122 trang)

AUGMENTED LAGRANGIAN BASED ALGORITHMS FOR CONVEX OPTIMIZATION PROBLEMS WITH NON SEPARABLE l1 REGULARIZATION

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.09 MB, 122 trang )

AUGMENTED LAGRANGIAN BASED
ALGORITHMS FOR CONVEX
OPTIMIZATION PROBLEMS WITH
NON-SEPARABLE `
1
-REGULARIZATION
GONG ZHENG
(B.Sc., NUS, Singapore)
A THESIS SUBMITTED
FOR THE DEGREE OF DOCTOR OF PHILOSOP HY
DEPARTMENT OF MATHEMATICS
NATIONAL UNIVERSITY OF SINGAPORE
2013

DECLARATION
Iherebydeclarethatthethesisismyoriginalworkandithas
been written by me in its entirety. I have duly acknowledged all
the sources of information which have been used in the thesis.
This thesis has also not been submitted for any degree in
any university previously.
Gong, Zheng
23 August, 2013

To my parents

Acknowledgements
The e↵ort and time that my supervisor Professo r Toh Kim-Chuan has spent on me
throughout the five-year endeavor indubitably deserves more than a simple word
“thanks”. His guidance has been constantly ample in each stage of the preparation
of this thesis, from mathematical proofs, algorithms design to numerical results
analysis, and extends to paper writing. I have learned a lot from him, and this is


not only lim i t ed to scientific ideas. His integrity and enthusiasm for research are
commun icat i ve, and working with him has been a true pleasure for me.
My deepest gratitude also goes to Professor Shen Zuowei, my co-supervisor and
perhaps more worthwhile to mention, my first guide to academic research. I always
remember my first research project done with him as a third year undergraduate for
his graduate course in Wavelets. It was challenging, yet motivating, and thus, led
to where I am now. It has been my great fortune to have the opportunity to work
with him again during my Ph.D. studies. The di scu ss i on s in his office every Friday
afternoon have been extremely inspiring and helpful.
IamequallyindebtedtoProfessorSunDefeng,whohasincludedmeinhis
research seminar group and treated me as his own student. I have benefited greatly
from the weekly seminar di scu ssion s throughout the five years, as well as his Conic
Programming course. His deep understanding and great experience in optimization
and nonsmooth analysis have been more than helpful in building up the theoretical
aspect of this thesis. His kindness and generosity are exceptional. I feel very grateful
and honored to be invited to his family parties almost every year.
It has been my privilege to be a member in bot h the optimization group an d
vii
viii Acknowledgements
the wavelets and signal processing group, which have provided me a great source of
knowled g e and frien d sh i p . Many thanks to Professor Zhao Gongyun, Zhao Xinyuan,
Liu Yongjing, Wang Chengjing, Li Lu, Gao Yan, D i n g Chao, Miao Weimi n , Jian g
Kaifeng, Wu Bin, S h i Dongjian, Yang Junfen g , Chen Caihua, Li Xudong and Du
Mengyu in the opti m i za ti o n group; and Professor Ji Hui, Xu Yuhong, Hou Likun,
Li Jia, Wang Kang, Bao Chenglong, Fan Zhitao, Wu Chunl i n , Xie Peichu and
Heinecke Andreas in th e wave l et s and signal processing group. Especially, Chao,
Weimin, Kaifeng and Bin, I am sincerely grateful to your dedication for the weekly
reading sem i n ar of Convex Analy si s, which lasted for more than two years and is
absolutely the most memorable experience among all the others.
This acknowledgement will remain i n co m p l et e with o u t ex p re ssi n g my gratitu d e

to some of my other fellow colleagues and friends at NUS, in particu l a r , Cai Yongy-
ong, Ye Shengku i , Gao Bin, Ma Jiajun, Gao Rui, Zhang Yongchao, Cai Ruilun, Xue
Hansong, Sun Xiang, Wang Fei, Jiao Qian, Shi Yan and Gu Weijia, for their friend-
ship, (academic) discussions and of course, the (birthday) gatherings and chit-chats.
I am al so thankful to the university and the department for providing me the fu l l
scholar sh i p to complete the degree and th e financial suppo r t for conference trips.
Last but not least, thanks to all the administrative and IT sta↵ for their consistent
help during the past years.
Finally, they will not read this thesis, nor do they even read English, yet this
thesis is dedicated to them, my parents, for their unfaltering love and support.
Gong, Zheng
August, 2013
Contents
Acknowledgements vii
Summary xi
1 Introduction 1
1.1 Motivations and Related Methods . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Sparse Structured Regression . . . . . . . . . . . . . . . . . . 2
1.1.2 Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Limitations o f the Existing First-order Meth ods . . . . . . . . 4
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Preliminaries 9
2.1 Monotone Operators and The Proximal Poi nt Algorithm . . . . . . . 9
2.2 Basics of No n sm ooth Analysis . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Tight Wavelet Frames . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Tight Wavelet Frames Generated From MRA . . . . . . . . . 15
2.3.2 Decomposition and Reconstruction Algori t h m s . . . . . . . . . 16
3 A Semismooth Newton-CG Augmented Lagrang ia n Algor it hm 19
3.1 Reformulation of (1.1) . . . . . . . . . . . . . . . . . . . . . . . . . . 20

ix
x Contents
3.2 The General Augmente d Lagrangian Framework . . . . . . . . . . . . 22
3.3 An Inexact Semismooth Newton Method for Solving (3.8) . . . . . . . 23
3.4 Convergence of the In ex a ct SSNCG Method . . . . . . . . . . . . . . 26
3.5 The SSNAL Algorithm and Its Convergence . . . . . . . . . . . . . . 32
3.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 First-order Methods 41
4.1 Alternating Directi o n Method of Mul ti p l i e rs . . . . . . . . . . . . . . 41
4.2 Inexact Accelerated Proximal Gradient Method . . . . . . . . . . . . 44
4.3 Smoothing Accelerated Proxim al Gradient Method . . . . . . . . . . 45
5 Applications of (1.1) in Statist ics 49
5.1 Sparse Structured Regression Models . . . . . . . . . . . . . . . . . . 49
5.2 Results on Random Generated Data . . . . . . . . . . . . . . . . . . 52
5.2.1 Fused Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.2 Clustered Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Applications of (1.1) in Image Processi ng 61
6.1 Image Restorations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Results on Image Restorations with Mixed Noises . . . . . . . . . . . 63
6.2.1 Synthetic Image Denoising . . . . . . . . . . . . . . . . . . . . 65
6.2.2 Real Image Denoising . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.3 Image Deblurri n g with Mixed Noises . . . . . . . . . . . . . . 71
6.2.4 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Comparison with Other Models on Specified Noises . . . . . . . . . . 77
6.3.1 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.2 Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.3 Recovery from Images with Randomly Missing Pixels . . . . . 87
6.4 Further Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4.1 Reduced Model . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4.2 ALM-APG versus ADMM . . . . . . . . . . . . . . . . . . . . 94

Bibliography 97
Summary
This thesis is concerned with the problem of minimizing the sum of a convex func-
tion f and a non-separable `
1
-regularization term. The motivation for this work
comes from recent interests in var iou s high-dimensional sparse feature learning pr ob -
lems in statistics, as well as from problems in image processing. We present those
problems under the unified framework of convex minimization with nonseparable `
1
-
regularization, and propose an inexact semi-smooth Newton augmented Lagrang i an
(SSNAL) algorithm to so l ve an equivalent reformulation of the problem. Compre-
hensive results on the global convergence and local rate of convergence of the SSNAL
algorithm are establish ed , together with the characterization of the positive definite-
ness of the generalized Hessian of the objective function arising in each subproblem
of the algor i t h m.
For th e purpose of exposition and c om p a r i son , we also summarize/design three
first-order methods to solve the problem under con si d er a t i o n , namely, the alternating
direction method of multi p l i e rs (AD MM) , th e in ex a ct acc el er at ed p r oximal gradient
(APG) method and the smoothing accel er at ed proximal gradient (SAPG) method.
Numerical experiments show that the SSNAL algorith m performs favourably in
comparison to several st at e- of -t h e-a r t first -o r d er algo r i t h m s for so l v i n g fused lass o
problems, and outp er fo r m s the best available algorithms for clustered lasso prob-
lems.
With the available numerical methods, we propose a simple model to solve var-
ious image restorati o n problems in the p r esenc e of mixed or unknown noises. The
proposed model essentially takes the weighted sum of `
1
and `

2
-norm based distance
xi
xii Summary
functions as the data fitting t er m and utilizes the sparsity prior of images in wavelet
tight frame domain. Since a moderately accurate result is usually sufficient for im-
age restoration problems, an augmented Lagrangian method (ALM) with the inner
subproblem being so l ved by an accele ra t ed proximal gradient (APG) algorithm is
used to sol ve the proposed model.
The numer i cal simulati on results show that the performance of the proposed
model together with the numeri ca l algorith m is surprisingly robust and efficient
in solving several image resto r at i o n problems, including denoising, deblurrin g and
inpainting, in the presence of both additive and non-additive noises or their mixtures.
This single one-for-all fitting model does not depend on any prior knowledge of
the noise. Thus, it has the potential of p e rf or m i n g e↵ectively in real color image
denoising problems, where the noise type is difficult to model.
Chapter 1
Introduction
In this th esis, we focus on solving mi n im i zat i on problems of the following form:
min
x2R
n
f(x)+⇢kBxk
1
, (1.1)
where f : R
n
! R is a co nvex and twice continuou sl y di↵erentiable function, B 2
R
p⇥n

is a g i ven matrix, and ⇢ is a given positive parameter. For any x 2 R
n
,we
denote its 2-norm by kxk, and let kxk
1
=

n
i=1
|x
i
|. We assume that objective
function in (1.1) is coercive and hence the optimal solution set of (1.1) is nonempty
and bounded.
1.1 Motivations and Related Methods
As the `
1
-norm regularization term encourages sparsity in the optimal solution, the
special case of the problem (1.1) when f (x)=
1
2
kAx bk
2
and B = I, i.e.
min
x2R
n
1
2
kAx bk

2
+ ⇢kxk
1
(1.2)
has drawn particular attention in both signal processing (basis pursuit [24]) and
statistics (lasso [99]) communities since almost twenty years ago. Due to the sep-
arability of kxk
1
and the simple structure of squared loss term, a great variety of
algorithms have been designed to solve the problem (1.2). Ever since the compressed
sensing t h eor y in the context of signal processing has estab l i sh ed the theoretical
guarantee for stable recovery of the original sparse signal by solving (1.2) under
1
2 Chapter 1. Introduction
certain conditions [19, 35], the problem (1.2) h a s regained immense interest among
the signal processi n g, statistics and optimization communities during the recent ten
years.
Here we briefly describe some of th e methods available for solving (1.2). These
methods mainly fall into three broad categori es. ( 1 ) first-order methods [6, 45, 53,
103, 106, 108], which are specifically designed to exploit the separability of kxk
1
to
ensure that a certain subproblem at each iteration admits an analytical solution.
These methods have been very successful in solving large scale pr ob l em s where A
satisfies certain restricted isometry property [20], which ensures that the Hessian
A
T
A is well con d i t i o n ed on the subspace corresponding t o the non-zero components
of the optimal x


; (2) homotopy-type methods [36,38], which attempt to solve (1.2)
by sequent i a l l y finding the break-points of the solution x(⇢)of(1.2)startingfrom
the initial parameter val u e kA
T
bk
1
and ending with the desired target value. These
methods rely on the property that each component of the solution x(⇢)of(1.2)isa
piece-wise linear function; (3 ) inexact interior-point methods [24,46,60], which solve
a convex quadratic programming refor mulation of (1.1) . The literature on algorithm s
for solving (1.2) is vast and here we only mention those that are known to be the
most efficient. We refer the reader to the recent paper [46] for more details on the
relative performance and merits of various algorithms. Numer ica l experiments have
shown that first-order methods are generally quite efficient if one requires only a
moderately accurate approximate solution for large scale problems. More recently,
the authors in [9] have proposed an active-set method using the semismooth Newton
framework to solve (1.2) by reformulating it as a bound constrained convex quadratic
programming pro b l em .
However, many applications require one to solve the gen er a l problem (1.1) where
f is n o n -q u a d r a t i c and/or the regularization term is non-sep a ra b l e, such as various
extensions of the `
1
-norm lasso penalty and regression models wi th loss functions
other than the least-squared l os s; total variation (TV) regularized image restoratio n
models, etc. Most o f the algorithms mentioned in the last paragraph are specifica l l y
designed to exploit t h e special structu r e of (1.2) , and as a result, they are either not
applicable or become very inefficient when applied to (1.1).
1.1 Motivations and Related Methods 3
1.1.1 Sparse Structured Regressi on
One of the main motivations for studying the problem (1.1) comes from high di-

mensional regr essi o n models with structured sparse regularizations, such as group
lasso [107, 109], fused lasso [100] , clustered lasso [78, 90], OSCAR [7], etc. In these
statistical applications, f(x) is the data fitting term (known as the loss funct i o n ),
and B is typical l y structured or sparse.
Efficient first-order algorithms that exploit the special structures of the corre-
sponding regularization terms have been developed for di↵erent structured lasso
problems. For example, proximal gradient methods have been designed in [5, 7 4]
for non-overlappin g grouped lasso problems, and coordinate descent methods [47]
and accelerated proximal gradient based methods [65] have been proposed for fused
lasso problems with quadratic loss function. Unfortunately, there are many more
complex structured lasso problems such as overlapping grouped lasso, graph-guided
fused lasso, clustered lasso etc, for which the aforementioned first-order algorithms
are not app l i ca bl e.
Although the probl em (1.1) with a quadratic loss function can always be for mu-
lated as a second-order cone programming (SOCP) problem or a convex quadratic
programming (QP) problem which are solvable by interior-point solvers such as [101]
or [98], the high computational cost and limitation in the scale of the problem solv-
able usually prohibit one from doing so, especially when the problem is large.
1.1.2 Image Restoration
Image restoration is another major area that give rises to problems of the form (1.1),
where f is typica l l y the quadr a t i c loss function.
In TV-regularized image restoration (original introduced by Rudin, Osher an d
Fatemi [88]), the regularization term is essentially the `
1
-norm of the first-order for-
ward di↵eren ce of x in the one-dimensional case, which is a non-separable `
1
-term
similar to the fused la sso regularization t er m . With f being a quadratic loss func-
tion as in (1.2), the authors in [75] considered half-quadratic reformulatio n s of (1. 1 )

and applied alternati n g minimization methods to solve the refor mulated problems.
In [56,102], the authors independ ently d eveloped some alternating minimization al-
gorithms for some types of TV image restoration problems. We should mention here
that those alternating minimization methods only solve an approximate version (by
4 Chapter 1. Introduction
smoothing the TV-term) of th e original problem (1.1) , and hence the approximate
solution obtained is at best moderately accurate for (1.1). More recently, [104] pro-
posed to use the alternating d i r ect i on method of multipliers (ADMM) to solve the
original TV-regularized problem (1.1) with quadratic loss, a n d demonstrated very
good numerical performance of th e ADMM for such a problem.
In frame based image restoration, since the wavelet tight frame systems are re-
dundant, the mapping from the image to its coefficients is not one-to-one, i.e., the
representation of the image in the frame domain is not unique. Therefore, based on
di↵erent assumpt i o n s, there are three formulations for the sparse approximation of
the underlying image, namely, the analysis based approach, the synthesis based ap-
proach and the balan c ed approach. The analysis based a p p r oa ch proposed in [39,96 ]
assumes that the coefficient vector can be sparsely approximated; therefore, it is for-
mula t ed as the general pr o b l em (1.1) with a n o n -sep a r a b l e `
1
-regularization, where
B is the framelet decomposi t i o n operator. The synthesis b a sed approach introduced
in [31, 41–44] and the balanced app r o a ch first used in [2 1 , 22] assume that the un-
derlying image is synthesized from some sparse coefficient vector via the framelet
reconstruction operator; therefore, the models directly penalize the `
1
-norm of the
coefficient vector, which leads to the special separable case (1.2). The proximal
forward -b a ckward splitting (PFBS) algorithm was first used to solve the synthesis
based model in [2 9 , 3 1 , 4 1 – 4 4 ] (also known as the iterative shrinkage/thresholding
(IST) algorithm), and the bala n ced model in [12–14, 18]. Later, a linearized Breg-

man algorithm wa s designe d to solve the synthesis based model in [16], and an APG
algorithm was proposed to solve the bal a n ced model in [92], both of whi ch demon-
strated faster convergence than the PFBS (IST ) algorithm. For the analysis based
approach, where a non-separable `
1
term is involved, the split Bregman iteration
was used to develop a fast algorithm in [17]. It was later observed that the resulted
split Bregman algorithm is equivalent to the ADMM mentioned previously.
1.1.3 Limitations of the Ex i st ing F ir st -or der Me thods
To summar i ze, first-ord er methods have been very pop ul ar for structured convex
minimization problems (especially t h o se wi t h the simple regularization term kxk
1
)
arising from statistics, machine learning, and image processing. In those applica-
tions, the optimization models are used to serve as a guide to obtain a good feasible
solution t o the underlying application p r ob l em s and the goa l is not necessa r i l y to
1.2 Contributions 5
compute the optimal solutions of the optimizati o n models. As a result, first-order
methods are mostly adequate for many such application problem s since the required
accuracy (with respect to the optimization model) of the computed solution is rather
modest. Even then, the efficiency of first-order methods ar e heavily dependent on
the structures of the particular problem they are designed to exploit. To avoid
havi n g a multitude of first-order alg o r i t h m s each catering to a particular problem
structure, it is th er efore desirable to design an algorithm which can efficiently be
applied to (1.1), and its efficiency is not completely dictated by the particular prob-
lem structure on hand, while at the same time it is able to deliver a high accuracy
solution when required.
For the general problem (1.1), so far there is no single unifying algorithmic
framework that has been demonstrated to be efficient and robust for solving the
problem. Although some general first-order methods (derived from the ADMM [37]

and accelerated proximal gradient methods [73], [5]) are avai l a b l e for solving (1.1),
their practical efficiency are highly dependent on the problem structure of (1.1),
especially on the structure of the nonsepar ab l e `
1
-term kBxk
1
. One can also use
the commonly employed strategy of approximat i n g the non-smooth term kBxk
1
by
some smooth su r r ogates to approximatel y solve (1.1). Indeed, this has been done
in [2 7 ] , which proposed to use the accelerated proximal gradient method in [5] to
solve smoothed surrogates of some stru ct u re d lasso p ro b l em s. But the efficien cy of
such an approach has yet to be demonstrated convincingly. A det ai l ed discussion
on those first-order methods will be given in Chapter 3.
Above all, the main purpose of this work is to design a unifying algorithmic
framewor k (semismooth Newton augmented Lagrangian (SSNAL)) for solving (1.1),
which does not depend heavily on the structure of kBxk
1
. Unlike first-order meth-
ods, our SSNAL based algorit h m exploits second-order information of the problem
to achieve high efficiency for computing accu r a t e solutions o f (1.1).
1.2 Contributions
The main contribution s of this thesis are three-folds. First, we provide a unified
algorithmic framework for a wide variety of `
1
-regularized (not necessarily separa-
ble) convex minimization problems that have been studied in the literature. The
algorithm we developed is a semismooth Newton augmented Lagrangia n (SSNAL)
6 Chapter 1. Introduction

method applied to (1.1 ) , where the inner subproblem is solved by a semismooth
Newton method for which the linear system in each iteration is solved by a precon-
ditioned conjugate gradient method. An important feature of our algorithm is that
its efficiency does not depend critically on the separability of the `
1
-term in contrast
to many existing efficient methods. Also, unlike many existing algorithms which are
designed only for quadratic lo ss functions, our a l g o r i t h m can handle a wide variety
of convex loss functions. Moreover, based the general convergence theory of the
ALM [84, 85], we are able to provide comprehensive g l ob a l and local convergence
results for our algorithm. Second, our algorithm can solve (1.1) and its dual s i multa-
neously, and hence there is a natural stopping criterion based on dua l i ty theory (or
the KKT conditions). Third, our algorithm utilizes second-order information and
hence it can obtain accurate solutions much more efficiently than first-order meth-
ods for (1.1) b u t at the same time it is compet i t i ve to st at e- of -t h e-a r t first-order
algorithms (for which a high accuracy solution may not be achievable) for solvi ng
large scale problems. We evaluate our algorithm and compare its performance with
state-of-the-art algorithms for solving the fused l asso and clustered lasso problems.
In addition, we propose a simple model for image restoration with mixed or
unknown noises. While m o st of the existing methods for image restoratio n s are de-
signed specifically for a given type of noise, our mod el appears to be the first versatile
model for handling i m a ge restoration with various mixed noises and unknown type
of noises. This feature is particularly important for solving real life image restoration
problems, since, under various con st r a i nts, images are a l ways degraded wi t h mixed
noise and it is impossib l e to determine what type of noise is involved. The proposed
model falls in the framework of the general non-separable `
1
-regularized problem
(1.1). Since a moderately accurate solution is usually sufficient for image process-
ing problems, we use an accelerated proximal gradient (APG) algorithm to solve

the inner subproblem. The simulations o n synthetic dat a show that our method is
e↵ective and robust in restoring images contaminated by additive Gau ssi an noise,
Poisson noise, random-v alued impulse noise, multiplicative Gamma noise and mix-
tures of these noises. Numer i cal r es u lts on r ea l di gi t a l co l ou r im a ges ar e al so given,
which confirms the e↵ectiveness and robustness o f our method in removing un k n own
noises.
1.3 Thesis Organization 7
1.3 Thesis Organization
The rest of the thesis is organized as fol l ows. In Chapter 2, we present some pre-
liminaries that relate to the subsequent discussions. We first int r oduce the idea of
monotone operators and the proximal point algorithm. The augmented Lagrangian
method is essentially the dual application of the proximal point algorithm. Secondly,
some basic co n cep t s in nonsmooth ana l y si s will b e provided. The convergence of
the SSNAL algorithm proposed here relies on the semismoothness of th e projection
operator (onto an `
1
-ball). Finally, a brief introduction on tight wavelet frames will
be given, which includes (1) the multiresolution analysis (MRA) based tight frames
derived from the unitary extension principle; (2) the fast algorithms for framelet
decomposition and reconstruction. All of the applications on image restoration
problems presented in thi s thesis are, but not limited to, under the assumption that
the images are sparse in the tight wavelet frame domain.
In Chapter 3, we first reformulate the original unconstrained problem (1.1) to an
equivalent constrained one, and build up the general augmented Lagrangian frame-
work. Then we propose an inexact semis m ooth Newt on augmented Lagrangian
(SSNAL) algorithm to sol ve this reformulated constrained problem. We also charac-
terize the cond i t i on when the generalized Hessian of the objective function is positive
definite, and provide the convergence an a l ysi s of the proposed SSNAL algorithm.
Finally, the extensions of the SSNAL framewor k for solving some generalizations of
(1.1) are described.

We summarize/design some first-order algorithms which are promising for solving
the general problem (1.1) in C h ap t er 4. Although the computational efficien cy of
these first-order methods depends crucially on the problem structures of (1.1), our
SSNAL algorithm can always capitalize on the strength (of rapid initial progress) of
first-order methods for generat i n g a good starting point to warm-start the algorithm.
Chapter 5 is devoted to the applicat i o n of the SSNAL algorit h m to solve the
structured lasso problems of major concern among the statistics community. We
first introduce the various sparse structured regression models and discuss how they
can be fitted into our unified framework. The numerical performance of our SSNAL
algorithm for fus ed lasso and clustered lasso problems on randomly generated data,
as well as the comparison with other state-of-the-art algorithms is presented.
In Chapt er 6, we propose a simple model for image restor at ion with mixed or
unknown noises. The numerical results for various image rest o r a t i o n s with mixed
8 Chapter 1. Introduction
noise and examples on noise removal of real digital colour images a re presented.
While t h er e is no result for image rest or a t i on s with such a wide range of mixed
noise available in the literature as far as we are aware of, comparisons with some
of the available models for removing noises such as single type of noise, mixed
Poisson- Ga u ssi a n noise, and imp u l se noise mixed with Gaussian noise are given.
Some addi t i o n a l remarks on our proposed model and nu m er i ca l algorithm will b e
addressed.
Chapter 2
Preliminaries
In this chapter , we present some preliminaries that relat e to the subsequ ent discus-
sions. We first introduce the idea of monotone op e ra t or s and the proximal point
algorithm. The augmented Lagrangian me th od (ALM) is essentially the dual appli-
cation of the proximal point al gori t h m . Secondly, som e basic concepts in nonsmo ot h
analysis will be provided. The convergence of the SSNAL algorithm proposed here
relies on the semismoothness of the projection operator (onto an `
1

-ball). Finally,
a brief i ntroduction on tight wavelet frames will be given, which includes (1) the
multi r es ol u t i o n an a l y si s ( MR A) ba sed tight fra m es d er i ved from the unitary exten-
sion principle; (2) the fast algorithms for framelet decomposition and reconstruction.
The proposed simple model for image restoration with mixed and unknown noises
is based on, but not limited to , the assum p t i o n that the i m a g es are sparse in the
tight wavelet frame domain.
2.1 Monotone Operators and The Proximal Point
Algorithm
Let H be a real Hilbert space with inner product h·, ·i. A multifunction T : H ! H
is said to be a monotone operator if
hz z
0
,w w
0
i0wheneverw 2 T (z),w
0
2 T ( z
0
).
9
10 Chapter 2. Preliminaries
It is said to be maximal monotone if, in additi o n , the graph
G(T )={(z, w) 2 H ⇥ H|w 2 T (z)}
is not properly contained in the graph of any other monotone operator T
0
: H ! H.
Such operators have been studied extensively for their important role in convex
analysis. A fundamenta l problem is that of determining an element z such that
0 2 T (z). For example, the subdi↵erential mapping @f of a proper closed convex

function f is maximal monotone, and the inclusion 0 2 @f(z)meansthatf(z)=
min f. The pro b l em is then one of m i n i m i za t i o n subject to implicit constraints.
A fundament algorithm for solving 0 2 T (z) in the case of an arbitrary m a x im al
monotone operator T is based on the fact th a t for each z 2 H and c>0there
is a unique u 2 H such that z  u 2 cT (u), i.e. z 2 (I + cT )(u)[70]. The
operator P := (I + cT )
1
is therefore single-valued from all of H to H. It is also
nonexpansive:
kP(z) P(z
0
)kkz z
0
k, (2.1)
and one has P(z)=z if and only if 0 2 T (z). P is call ed the proximal mapping
associated with cT , following the terminology of Moreau [71] for the case of T = @f.
The proximal point algorithm generates for any starting point z
0
a sequence {z
k
}
in H by the approximate rule
z
k+1
⇡ P
k
(z
k
), where P
k

=(I + c
k
T )
1
. (2.2)
Here {c
k
} is some sequence of positive real numbers. In the case of T = @f,this
procedure reduces to
z
k+1
⇡ arg min
z

f(z)+
1
2c
k
kz z
k
k
2

.
2.2 Basics of Nonsmooth Analysi s 11
In [85], Rockafellar introduced the following two general criteria for the approx-
imate calculation of P
k
(z
k

):
kz
k+1
 P
k
(z
k
)k"
k
,
1

k=0
"
k
< 1, (2.3)
kz
k+1
 P
k
(z
k
)k
k
kz
k+1
 z
k
k,
1


k=0

k
< 1. (2.4)
He p r oved that under very mild assumptions that for any starting point z
0
,the
criterion (2.3) guarantees weak convergence of {z
k
} to a particular solution z
1
to
0 2 T (z). In general, the set of all such points z forms a closed convex set in H,
denoted by T
1
(0). If in addition , the criterion (2.4) is also satisfi ed and T
1
is
Lipschitz continuous at 0, then it can be shown that the convergence is at least at
alinearrate,wherethemoduluscanbebroughtarbitrarilyclosetozerobytaking
c
k
large enough. If c
k
!1, one ha s superlinear convergence.
Note that T
1
is Lipschitz continuous at 0 with modulus a  0ifthereisa
unique solution ¯z to 0 2 T (z), i.e. T

1
(0) = {¯z}, and for some ⌧ > 0, we have
kz  ¯zkakwk whenever z 2 T
1
(w)andkwk⌧.
This assumption could be fulfilled very naturally in applications to convex program-
ming, for instance, under certain stand ar d second-order conditions cha ra ct er izi n g a
“nice” optimal solution (see [84] for detailed discussions).
There are three distinct types of applications of the proximal point algorithm in
convex programming: ( 1) to T = @f, where f is the object i ve function in the primal
problem; (2) to T = @g, where g is the concave objective funct i on in the dual
problem, and (3) to th e monotone operator corresponding to the convex-concave
Lagrangian function. The augmented Lagrangian method that will be discussed
further in Chapter 3 actu al l y corresponds to the second a p p l i cat i o n .
2.2 Basics of Nonsmooth Analysis
Let X and Y be two finite-dimensional real Hil bert spaces. Let O be an open
set in X and f : O ✓ X ! Y be a locally Lipsch i t z continuous function on the
open set O. Then f is almost everywhere F(r´echet)-di↵erentiable by Rademacher’s
12 Chapter 2. Preliminaries
theorem. Let D
f
denote the set of F-di↵erentiable points of f in O. Then the
B(ouligand)-subdi↵erenti al of f at x 2 O, denoted by @
B
f(x), is
@
B
f(x):=

lim

k!1
f
0
(x
k
) | x
k
2 D
f
,x
k
! x

,
and the Cla r ke’s generalized Jacobian [28] at x is the convex hull of @
B
f(x), i.e.
@f(x)=conv{@
B
f(x)}.
In addition, f is said to be directionally di↵erentiable at x if for any x 2 X,the
directional derivative of f at x along x, denoted by f
0
(x; x)exists.
Definition 2.2.1. Let f : O ✓ X ! Y be a locally Lipschitz continuous function
on the open set O. We say that f is semismooth at a point x 2 O if
(i) f is directionally di↵erentiable at x;and
(ii) for any x 2 X and V 2 @f(x + x)withx ! 0,
f(x + x) f(x) V (x)=o(kxk). (2.5)
Furthermo r e if (2.5) is replaced by

f(x + x) f(x) V (x)=O(kxk
2
), (2.6)
then f i s said to be strongly semismooth at x.
Semismoothness wa s originally introduced by Mi✏in [69] for functionals. Qi and
Sun [81] extended the con c ep t to vector valued functions.
2.3 Tight Wavelet Frames
We introduce the notion of tight wavelet frames in space L
2
(R), as well as some
other basic concepts and notation. The space L
2
(R) is the set of all functions f(x)
satisfying kf k
L
2
(R)
:= (

R
|f(x)|
2
dx)
1/2
< 1, and the space `
2
(Z)isthesetofall
sequences h defined on Z satisfying khk
`
2

(Z)
:= (

k2Z
|h[k]|
2
)
1/2
< 1.
For any function f 2 L
2
(R), the dyadic dilation operator D is defined by
Df(x):=
p
2f(2x)andthetranslationoperatorT is defined by T
a
f(x):=f(x a)
2.3 Tight Wavelet Frames 13
for a 2 R. Given j 2 Z,wehaveT
a
D
j
= D
j
T
2
j
a
.
For given := {

1
, ,
r
} ⇢ L
2
(R), define the wavelet system by
X( ):={
`,j,k
:1 `  r; j, k 2 Z},
where
`,j,k
= D
j
T
k

`
=2
j/2

`
(2
j
·k). The system X( )iscalledatightwavelet
frame of L
2
(R)if
kfk
2
L

2
(R)
=

g2X( )
|hf,gi|
2
holds for all f 2 L
2
(R), where h·, ·i is the inner pro d u ct in L
2
(R)andk · k
L
2
(R)
=

h·, ·i. This is equivalent to f =

g2X( )
hf,gig, for all f 2 L
2
(R).
Note that when X( )formsanorthonormalbasisofL
2
(R), it is called an or-
thonormal wavelet basis. It is clear that an orthonormal basis is a ti g ht frame.
The Fourier transform of a function f 2 L
1
(R)isusuallydefinedby


f(!):=

R
f(x)e
i!x
dx, ! 2 R,
and then, the corresponding inverse is
f(x)=
1
2⇡

R

f(!)e
i!x
d!,x2 R.
They can be extended to more general functions, e.g. t h e functions in L
2
(R). Simi-
larly, we can define the Fou r i er series for a sequence h 2 `
2
(Z)by

h(!):=

k2Z
h[k]e
ik!
, ! 2 R.

To characterise the wavelet system X( )tobeatightframeorevenanorthonor-
mal basis for L
2
(R)intermsofitsgenerators , the dual Gramian analysis [86] is
used in [8 7 ] .
Theorem 2.3.1. The wavelet system X( )isatightframeofL
2
(R)ifandonlyif
the identities

2

k2Z
|

(2
k
⇠)|
2
=1;

2
1

k=0

(2
k
⇠)


(2
k
(⇠ +(2j +1)2⇡)) = 0,j= Z (2.7)

×