K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
With the increase in the offshore industry in the Beibu Gulf, data clustering has become an important task of intelligent ocean monitoring. However, the traditional K-means algorithm is not suitable for large-scale marine data. Aiming at the characteristics of marine big data, a parallel K-means algorithm based on MapReduce big data clustering is proposed. First, according to the characteristics of the MapReduce framework, a partition, communication, combination and mapping model is established. A parallel K-means algorithm based on MapReduce big data clustering is then designed, and the execution process of the algorithm is analyzed. Finally, through data and experimental analysis, it is demonstrated that the MR K-means parallel algorithm reduces the time and space complexity and the data point missing rate compared with the traditional algorithm.
Keywords: Clustering, K-means, Parallel, MapReduce, PCAM
Y. Li, Z. Yang, K. Han, "K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM
Method", Engineering Intelligent Systems, vol. 29 no. 6, pp. 411-418, 2021.