evoStream — Evolutionary Stream Clustering Utilizing Idle Times
M. Carnein, H. Trautmann, Big Data Research 14 (2018) 101–111.
Download
No fulltext has been uploaded.
Journal Article
| English
Author
Carnein, Matthias;
Trautmann, HeikeLibreCat
Abstract
Clustering is an important field in data mining that aims to reveal hidden patterns in data sets. It is widely popular in marketing or medical applications and used to identify groups of similar objects. Clustering possibly unbounded and evolving data streams is of particular interest due to the widespread deployment of large and fast data sources such as sensors. The vast majority of stream clustering algorithms employ a two-phase approach where the stream is first summarized in an online phase. Upon request, an offline phase reclusters the aggregations into the final clusters. In this setup, the online component will idle and wait for the next observation in times where the stream is slow. This paper proposes a new stream clustering algorithm called evoStream which performs evolutionary optimization in the idle times of the online phase to incrementally build and refine the final clusters. Since the online phase would idle otherwise, our approach does not reduce the processing speed while effectively removing the computational overhead of the offline phase. In extensive experiments on real data streams we show that the proposed algorithm allows to output clusters of high quality at any time within the stream without the need for additional computational resources.
Publishing Year
Journal Title
Big Data Research
Volume
14
Page
101–111
LibreCat-ID
Cite this
Carnein M, Trautmann H. evoStream — Evolutionary Stream Clustering Utilizing Idle Times. Big Data Research. 2018;14:101–111. doi:10.1016/j.bdr.2018.05.005
Carnein, M., & Trautmann, H. (2018). evoStream — Evolutionary Stream Clustering Utilizing Idle Times. Big Data Research, 14, 101–111. https://doi.org/10.1016/j.bdr.2018.05.005
@article{Carnein_Trautmann_2018, title={evoStream — Evolutionary Stream Clustering Utilizing Idle Times}, volume={14}, DOI={10.1016/j.bdr.2018.05.005}, journal={Big Data Research}, author={Carnein, Matthias and Trautmann, Heike}, year={2018}, pages={101–111} }
Carnein, Matthias, and Heike Trautmann. “EvoStream — Evolutionary Stream Clustering Utilizing Idle Times.” Big Data Research 14 (2018): 101–111. https://doi.org/10.1016/j.bdr.2018.05.005.
M. Carnein and H. Trautmann, “evoStream — Evolutionary Stream Clustering Utilizing Idle Times,” Big Data Research, vol. 14, pp. 101–111, 2018, doi: 10.1016/j.bdr.2018.05.005.
Carnein, Matthias, and Heike Trautmann. “EvoStream — Evolutionary Stream Clustering Utilizing Idle Times.” Big Data Research, vol. 14, 2018, pp. 101–111, doi:10.1016/j.bdr.2018.05.005.