Massive High-Dimensional Big Data Feature Selection Algorithm in a Cloud Computing Environment
Abstract
Due to the large number of redundant features in the high-dimensional space, the distance between each sample point is almost equal in the whole feature space. Current algorithms cannot retain and utilize the key features of the original data to their maximum effect in order to get a higher classification prediction result. A high-dimensional and big data feature selection algorithm based on PageRank is proposed. According to the set of information entropy threshold, a large number of high-dimension and big data are screened in the cloud computing environment, the attributes of the almost useless raw data information are discarded, and the dimensionality reduction processing is completed. On the basis of dimensionality
reduction processing, the sample is determined. Each data feature in the sample is regarded as a network node, and the edge of the node is created according to the mutual information. The PageRank algorithm is used to evaluate the global redundancy of the network nodes, and the nodes are sorted according to the evaluation criteria, where the first g features of the sequence are the optimal feature subset. The experimental results show that the proposed method can not only get a good visual division result, but can also reach a higher accuracy of classification prediction.
Keywords: Cloud Computing Environment; Massive; High-Dimensional; Big Data; Feature Selection.
Cite As
X. Zheng, "Massive High-Dimensional Big Data Feature Selection Algorithm in a Cloud Computing
Environment", Engineering Intelligent Systems, vol. 29 no. 5, pp. 323-330, 2021.