Indexed by:
Abstract:
Clustering algorithms aim at gathering similar data points from a dataset in an unsupervised manner. Although the batch clustering algorithms have relatively high accuracy, they cannot make use of the dynamic clustering results efficiently. The requirement of using the whole dataset in calculation results in the problems of resource waste and high time cost. On the contrary, incremental clustering only needs to update the varied part of a model upon the arrival of new data, which makes it unnecessary to recluster the whole data all the time. The feature is very suitable for the streaming data process, but it decreases the accuracy of the algorithms and cannot satisfy the low latency requirement of real-time data processing. In response to this problem, the paper proposes a novel unified batch and streaming clustering model (UBSCM) based on streaming computation, which includes a streaming cluster feature updating mechanism (SCFUM). The Flink framework is used to implement a new streaming KMeans algorithm based on UBSCM (KMeansUBSP). The experiments on the real-world datasets validate that the new streaming KMeans algorithm is effective in clustering the batch and streaming data in a unified manner. © 2021, Springer Nature Singapore Pte Ltd.
Keyword:
Reprint 's Address:
Email:
Source :
ISSN: 1865-0929
Year: 2021
Volume: 1330 CCIS
Page: 639-649
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: