28
pages
Français
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe et accède à tout notre catalogue !
Découvre YouScribe et accède à tout notre catalogue !
28
pages
Français
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Publié par
Langue
Français
Mixture Models
Simplification
Comparison
Simplifying and comparing Kernel Density
Estimators
Journée Transport Optimal et Géométrie de l’information
IHP, Paris, France
Olivier Schwander Frank Nielsen
joint work with A. Sim, M. Levitt, J. Bernauer, A. Schutz, Y. Berthoumieu
École Polytechnique
Sony CSL
February 10, 2012
Olivier Schwander Simplifying and comparing Kernel Density EstimatorsMixture Models
Simplification
Comparison
Outline
Mixture Models
Statistical mixtures
Getting
Simplification
Bregman Hard Clustering
Model Hard Clustering
Experiments
Comparison
Majoring Kullback-Liebler
Other ground distances
Comparing and EMD
Olivier Schwander Simplifying and comparing Kernel Density EstimatorsMixture Models
Statistical mixtures
Simplification
Getting
Comparison
Mixture models
Mixture
P
I Pr(X = x) = ! Pr(X = xj; )i i ii
I each Pr(X = xj; ) is a probability density functioni i
Famous special case
Gaussian Mixtures Models (GMM)
Olivier Schwander Simplifying and comparing Kernel Density EstimatorsMixture Models
Statistical mixtures
Simplification
Getting
Comparison
Getting mixtures
Expectation-Maximization
Kernel density estimation
I or Parzen windows methods
I one Kernel by data point (often a Gaussian kernel)
I fixed bandwidth
Olivier Schwander Simplifying and comparing Kernel Density Estimators
2502000.00220001000.0041000.0062000.00800.0101500.01250150250100150505000.000Mixture Models
Statistical mixtures
Simplification
Getting
Comparison
Why simplification ?
A lot of components
I 120120 = 14400 Gaussian in the previous curve
KDE: good approximation but
I very large mixture: time and memory problems
I low number of components is often enough (EM)
EM: small approximation but
We may want a fixed number of components without learning a
new mixture
I EM is slow
I we don’t have the original dataset, just the model
Olivier Schwander Simplifying and comparing Kernel Density EstimatorsMixture Models Bregman Hard Clustering
Simplification Model Hard
Comparison Experiments
k-means
Olivier Schwander Simplifying and comparing Kernel Density Estimators
200250150502000.0080.0021000.00400.0060.0120.0082000.01000.01210000.006500.0101001001500200150250503002500.0001500.002500.0040.000Mixture Models Bregman Hard Clustering
Simplification Model Hard
Comparison Experiments
k-means
What do we need ?
I A distance (or a divergence, or a dissimilarity measure)
I A centroid
Olivier Schwander Simplifying and comparing Kernel Density EstimatorsMixture Models Bregman Hard Clustering
Simplification Model Hard
Comparison Experiments
Kullback-Liebler divergence
Divergence
Z
p(x)
I D(PkQ) = p(x)log dx
q(x)
I Not a symmetric divergence
Centroids
P
I Left-sided one: min ! B (x;p )x i F iiP
I Right-sided one: min ! B (p;x)x i F ii
I Various symmetrizations !
I Known in closed-form
Olivier Schwander Simplifying and comparing Kernel Density EstimatorsMixture Models Bregman Hard Clustering
Simplification Model Hard
Comparison Experiments
Fisher divergence
Riemannian metric on the statistical manifold
Fisher information matrix
@ @~ ~g = I (; ) = E logp(X ;) logp(X ;)ij i j @ @i j
XX
ds = g d dij i j
Olivier Schwander Simplifying and comparing Kernel Density EstimatorsMixture Models Bregman Hard Clustering
Simplification Model Hard
Comparison Experiments
Fisher divergence formula
Known for 0-mean Gaussian
I Not really interesting for general case mixtures...
I Open problem for others cases
For 1D data
I Poincaré hyperbolic distance in the Poincaré upper half-plane
FRD(f ;f ) =p q