A Comparative Study of the Performance of Spark-based k-Means Algorithm Based on Euclidean Distance and Manhattan Distance - Details

author：

Cheng, Fang (Cheng, Fang.) ^[1]

Indexed by：

EI Scopus

Abstract：

Currently,　with　the　widespread　use　of　distributed　information,　traditional　clustering　algorithms　can　no　longer　meet　the　processing　needs　of　massive　information　both　in　terms　of　accuracy　and　computational　efficiency,　so　clustering　algorithms　based　on　the　Spark　distributed　platform　have　become　today＇s　research　hotspots.　The　K-Means　algorithm　can　be　widely　used　in　both　academic　research　and　business　by　virtue　of　its　ease　of　implementation　and　high　scalability.　However,　the　traditional　K-Means　is　based　on　Euclidean　distance,　which　is　not　applicable　in　some　scenarios.　And　it　is　inefficient　when　dealing　with　large-scale　data.　This　study　implements　and　tests　K-means,　K-means++,　Canopy+　K-means　algorithms　based　on　Euclidean　distance　and　Manhattan　distance　in　Spark　distributed　platform　and　analyses　the　performance　changes.　The　experimental　results　show　that　the　introduction　of　Manhattan　distance　makes　the　clustering　time　longer　and　the　optimisation　effect　of　different　algorithms　changes　differently.　©　2024　IEEE.

Keyword：

K-means clustering

Community：

[ 1 ] [Cheng, Fang]Fuzhou University, Fujian; 350000, China

Reprint 's Address：

待查

Email：

Show more details

Version：

A Comparative Study of the Performance of Spark-based k-Means Algorithm Based on Euclidean Distance and Manhattan Distance
2024，Proceedings - 2024 3rd International Conference on Big Data, Information and Computer Network, BDICN 2024

Related Keywords：

Improved initial cluster center selection in K-means clustering
2014，ENGINEERING COMPUTATIONS
Development of a representative driving cycle for urban buses based on the K-means cluster method
2019，CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
Potential Sources and Transport Pathways of PM2.5 in Shanghai, China
2015，2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM)
Solid lanes extraction from mobile laser scanning point clouds
2019，Acta Geodaetica et Cartographica Sinica
Automatic extraction of fuzzy and touching leukocyte using improved FWSA K-means in peripheral blood and bone marrow cell images
2019，Journal of Computers (Taiwan)

Source ：

Year： 2024

Page： 15-20

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

学院待认领本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to