Smaller Coresets for k Median and k Means Clustering

icon

15

pages

icon

English

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
icon

15

pages

icon

English

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Smaller Coresets for k-Median and k-Means Clustering? Sariel Har-Peled† Akash Kushal‡ November 29, 2004 Abstract In this paper, we show that there exists a (k, ?)-coreset for k-median and k-means clustering of n points in IRd, which is of size independent of n. In particular, we con- struct a (k, ?)-coreset of size O(k2/?d) for k-median clustering, and of size O(k3/?d+1) for k-means clustering. 1 Introduction Clustering is a widely used technique in Computer Science with applications to unsupervised learning, classification, data mining and other fields. We study two variants of the clustering problem in the geometric setting. The geometric k-median clustering problem is the follow- ing: Given a set P of n points in IRd, compute a set of k points (i.e., medians) such that the sum of the distances of the points in P to their respective nearest median is minimized. The k-means differs from the above in that instead of the sum of distances, we minimize the sum of squares of distances. Interestingly the 1-mean is the center of mass of the points, while the 1-median problem, also known as the Fermat-Weber problem, has no such closed form.

  • such line

  • ?opt

  • clustering

  • problem has

  • smaller coresets

  • resulting batches

  • line into


Voir icon arrow

Publié par

Nombre de lectures

4

Langue

English

SmallerCoresetsfork-Medianandk-MeansClusteringSarielHar-PeledAkashKushalNovember29,2004AbstractInthispaper,weshowthatthereexistsa(k,ε)-coresetfork-medianandk-meansclusteringofnpointsinIRd,whichisofsizeindependentofn.Inparticular,wecon-structa(k,ε)-coresetofsizeO(k2d)fork-medianclustering,andofsizeO(k3d+1)fork-meansclustering.1IntroductionClusteringisawidelyusedtechniqueinComputerSciencewithapplicationstounsupervisedlearning,classification,dataminingandotherfields.Westudytwovariantsoftheclusteringprobleminthegeometricsetting.Thegeometrick-medianclusteringproblemisthefollow-ing:GivenasetPofnpointsinIRd,computeasetofkpoints(i.e.,medians)suchthatthesumofthedistancesofthepointsinPtotheirrespectivenearestmedianisminimized.Thek-meansdiffersfromtheaboveinthatinsteadofthesumofdistances,weminimizethesumofsquaresofdistances.Interestinglythe1-meanisthecenterofmassofthepoints,whilethe1-medianproblem,alsoknownastheFermat-Weberproblem,hasnosuchclosedform.Assuchtheproblemshaveusuallybeenstudiedseparatelyfromeachotherevenintheapproximatesetting.Thebasicquestionunderlyingapproximationalgorithms,iswhatportionofthedataisnecessarytocompute(approximately)acertainquantity.Thesmallerthisportionis,themoreefficienttheresultingalgorithmwouldbe.Acoresetisasmallportionofthedata,suchthatrunningaclusteringalgorithmonit,generatesaclusteringforthewholedata,whichisapproximatelyoptimal.Inparticular,asmallcoresetindicatesthataproblemiseasytoapproximate.Furthermore,itimpliesthatonecansummarizeandsketchthedataefficiently.Thisisusefulfordatabaseapplications,whereonecanstoresuchsketchesefficiently,andperformefficientclusteringonadatabase,orportionsofitusingthesketches.Seehttp://www.uiuc.edu/˜sariel/papers/04/smallcoreset/forthemostrecentversionofthispaper.DepartmentofComputerScience;UniversityofIllinois;201N.GoodwinAvenue;Urbana,IL,61801,USA;sariel@uiuc.edu;http://www.uiuc.edu/~sariel/.WorkonthispaperwaspartiallysupportedbyaNSFCAREERawardCCR-0132901.DepartmentofComputerScience;UniversityofIllinois;201N.GoodwinAvenue;Urbana,IL,61801,USA;kushal@uiuc.edu;http://www-cvr.ai.uiuc.edu/~kushal/.1
Voir icon more
Alternate Text