Machine Learning Summer School Ile de Re Learning with sparsity inducing norms slides

icon

131

pages

icon

English

icon

Documents

Écrit par

Publié par

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris
icon

131

pages

icon

English

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Learning with sparsity-inducing norms Francis Bach INRIA - Ecole Normale Superieure MLSS 2008 - Ile de Re, 2008

  • solution iff

  • supervised learning

  • usual convex

  • function space

  • sparsity- inducing norms

  • ?1 -norm ?w?1

  • norm

  • piecewise quadratic function


Voir icon arrow

Publié par

Langue

English

Poids de l'ouvrage

2 Mo

Learning with sparsity-inducing norms
Francis Bach
INRIA - Ecole Normale Sup´erieure
MLSS 2008 - Ile de R´e, 2008Supervised learning and regularization
• Data: x ∈X, y ∈Y, i = 1,...,n
i i
• Minimize with respect to function f∈F:
n
X
λ
2
ℓ(y,f(x )) + kfk
i i
2
i=1
Error on data + Regularization
Loss & function space ? Norm ?
• Two issues:
– Loss
– Function space / normUsual losses [SS01, STC04]
• Regression: y∈R, prediction yˆ=f(x),
1 2
– quadratic cost ℓ(y,f(x)) = (y−f(x))
2
• Classification : y∈{−1,1} prediction yˆ= sign(f(x))
– loss of the form ℓ(y,f(x)) =ℓ(yf(x))
– “True” cost: ℓ(yf(x)) = 1
yf(x)<0
– Usual convex costs:
5
0−1
hinge
4
square
logistic
3
2
1
0
−3 −2 −1 0 1 2 3 4Regularizations
• Main goal: control the “capacity” of the learning problem
• Two main lines of work
1. Use Hilbertian (RKHS) norms
– Non parametric supervised learning and kernel methods
– Well developped theory [SS01, STC04, Wah90]
2. Use “sparsity inducing” norms
P
p
– main example: ℓ -normkwk = |w|
1 1 i
i=1
– Perform model selection as well as regularization
– Often used heuristically
• Goal of the course: Understand how and when to use
sparsityinducing normsWhy ℓ -norms lead to sparsity?
1
1
2
• Example 1: quadratic problem in 1D, i.e. min x −xy+λ|x|
x∈R
2
• Piecewise quadratic function with a kink at zero
– Derivative at 0+: g =λ−y and 0−: g =−λ−y
+ −
– x = 0 is the solution iff g > 0 and g 6 0 (i.e.,|y|6λ)
+ −

– x> 0 is the solution iff g 6 0 (i.e., y>λ)⇒ x =y−λ
+

– x6 0 is the solution iff g 6 0 (i.e., y6−λ)⇒ x =y+λ


• Solution x = sign(y)(|y|−λ) = soft thresholding
+Why ℓ -norms lead to sparsity?
1
• Example 2: isotropic quadratic problem
p p
X X
1 1
2 ⊤ ⊤
• min x − xy +λkxk = min x x−x y+λkxk
i i 1 1
i
p p
x∈R x∈R
2 2
i=1 i=1

• solution: x = sign(y )(|y|−λ)
i i +
i
• decoupled soft thresholdingWhy ℓ -norms lead to sparsity?
1
• Example 3: general quadratic problem
– coupled soft thresolding
• Geometric interpretation
– NB : Penalizing is “equivalent” to constrainingCourse Outline
1
1. ℓ -norm regularization
• Review of nonsmooth optimization problems and algorithms
• Algorithms for the Lasso (generic or dedicated)
• Examples
2. Extensions
• Group Lasso and multiple kernel learning (MKL) + case study
• Sparse methods for matrices
• Sparse PCA
3. Theory - Consistency of pattern selection
• Low and high dimensional setting
• Links with compressed sensingℓ -norm regularization
1
p
• Data: covariates x ∈R , responses y ∈Y, i = 1,...,n, given in
i i
p n×p
vector y∈R and matrix X∈R
p
• Minimize with respect to loadings/weights w∈R :
n
X

ℓ(y,w x ) + λkwk
i i 1
i=1
Error on data + Regularization
• Including a constant term b?
• Assumptions on loss:
– convex and differentiable in the second variable
– NB: with the square loss ⇒ basis pursuit (signal processing)
[CDS01], Lasso (statistics/machine learning) [Tib96]A review of nonsmooth convex
analysis and optimization
• Analysis: optimality conditions
• Optimization: algorithms
– First order methods
– Second order methods
• Books: Boyd & VandenBerghe [BV03], Bonnans et al.[BGLS03],
Nocedal & Wright [NW06], Borwein & Lewis [BL00]

Voir icon more
Alternate Text