K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method

Authors

  • Yongyi Li College of Electronic and Information Engineering, Qinzhou University, Qinzhou 535011, China
  • Zhongqiang Yang College of Electronic and Information Engineering, Qinzhou University, Qinzhou 535011, China
  • Kaixu Han College of Electronic and Information Engineering, Qinzhou University, Qinzhou 535011, China

Abstract

With the increase in the offshore industry in the Beibu Gulf, data clustering has become an important task of intelligent ocean monitoring. However, the traditional K-means algorithm is not suitable for large-scale marine data. Aiming at the characteristics of marine big data, a parallel K-means algorithm based on MapReduce big data clustering is proposed. First, according to the characteristics of the MapReduce framework, a partition, communication, combination and mapping model is established. A parallel K-means algorithm based on MapReduce big data clustering is then designed, and the execution process of the algorithm is analyzed. Finally, through data and experimental analysis, it is demonstrated that the MR K-means parallel algorithm reduces the time and space complexity and the data point missing rate compared with the traditional algorithm.

Keywords: Clustering, K-means, Parallel, MapReduce, PCAM

Cite As

Y. Li, Z. Yang, K. Han, "K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM
Method", Engineering Intelligent Systems, vol. 29 no. 6, pp. 411-418, 2021.






Published

2021-11-01