A fast k-means implementation using coresets

G. Frahling, C. Sohler, in: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry  - SCG ’06, 2006.

Download
No fulltext has been uploaded.
Conference Paper | Published | English
Author
Frahling, Gereon; Sohler, Christian
Abstract
In this paper we develop an efficient implementation for a k-means clustering algorithm. Our algorithm is a variant of KMHybrid [28, 20], i.e. it uses a combination of Lloyd-steps and random swaps, but as a novel feature it uses coresets to speed up the algorithm. A coreset is a small weighted set of points that approximates the original point set with respect to the considered problem. The main strength of the algorithm is that it can quickly determine clusterings of the same point set for many values of k. This is necessary in many applications, since, typically, one does not know a good value for k in advance. Once we have clusterings for many different values of k we can determine a good choice of k using a quality measure of clusterings that is independent of k, for example the average silhouette coefficient. The average silhouette coefficient can be approximated using coresets.To evaluate the performance of our algorithm we compare it with algorithm KMHybrid [28] on typical 3D data sets for an image compression application and on artificially created instances. Our data sets consist of 300,000 to 4.9 million points. We show that our algorithm significantly outperforms KMHybrid on most of these input instances. Additionally, the quality of the solutions computed by our algorithm deviates less than that of KMHybrid.We also computed clusterings and approximate average silhouette coefficient for k=1,…,100 for our input instances and discuss the performance of our algorithm in detail.
Publishing Year
Proceedings Title
Proceedings of the twenty-second annual symposium on Computational geometry - SCG '06
LibreCat-ID

Cite this

Frahling G, Sohler C. A fast k-means implementation using coresets. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry  - SCG ’06. ; 2006. doi:10.1145/1137856.1137879
Frahling, G., & Sohler, C. (2006). A fast k-means implementation using coresets. In Proceedings of the twenty-second annual symposium on Computational geometry  - SCG ’06. https://doi.org/10.1145/1137856.1137879
@inproceedings{Frahling_Sohler_2006, title={A fast k-means implementation using coresets}, DOI={10.1145/1137856.1137879}, booktitle={Proceedings of the twenty-second annual symposium on Computational geometry  - SCG ’06}, author={Frahling, Gereon and Sohler, Christian}, year={2006} }
Frahling, Gereon, and Christian Sohler. “A Fast K-Means Implementation Using Coresets.” In Proceedings of the Twenty-Second Annual Symposium on Computational Geometry  - SCG ’06, 2006. https://doi.org/10.1145/1137856.1137879.
G. Frahling and C. Sohler, “A fast k-means implementation using coresets,” in Proceedings of the twenty-second annual symposium on Computational geometry  - SCG ’06, 2006.
Frahling, Gereon, and Christian Sohler. “A Fast K-Means Implementation Using Coresets.” Proceedings of the Twenty-Second Annual Symposium on Computational Geometry  - SCG ’06, 2006, doi:10.1145/1137856.1137879.

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar