Massive High-Dimensional Big Data Feature Selection Algorithm in a Cloud Computing Environment

Xiaochang Zheng

Massive High-Dimensional Big Data Feature Selection Algorithm in a Cloud Computing Environment

Authors

Xiaochang Zheng Information and Network Center, Minnan Normal University, Zhangzhou 363000, China

Abstract

Due to the large number of redundant features in the high-dimensional space, the distance between each sample point is almost equal in the whole feature space. Current algorithms cannot retain and utilize the key features of the original data to their maximum effect in order to get a higher classification prediction result. A high-dimensional and big data feature selection algorithm based on PageRank is proposed. According to the set of information entropy threshold, a large number of high-dimension and big data are screened in the cloud computing environment, the attributes of the almost useless raw data information are discarded, and the dimensionality reduction processing is completed. On the basis of dimensionality
reduction processing, the sample is determined. Each data feature in the sample is regarded as a network node, and the edge of the node is created according to the mutual information. The PageRank algorithm is used to evaluate the global redundancy of the network nodes, and the nodes are sorted according to the evaluation criteria, where the first g features of the sequence are the optimal feature subset. The experimental results show that the proposed method can not only get a good visual division result, but can also reach a higher accuracy of classification prediction.

Keywords: Cloud Computing Environment; Massive; High-Dimensional; Big Data; Feature Selection.

Cite As

X. Zheng, "Massive High-Dimensional Big Data Feature Selection Algorithm in a Cloud Computing
Environment", Engineering Intelligent Systems, vol. 29 no. 5, pp. 323-330, 2021.

Downloads

Feature Selection - Click to Download PDF

Published

2021-09-01

Issue

Vol. 29 No. 5 (2021): International Journal of Engineering Intelligent Systems

Section

General Submission

License

The submission of a paper implies that, if accepted for publication, it will not be published elsewhere in the same form, in any language, without the prior consent of the publisher. Before publication, authors are requested to assign copyright to CRL Publishing Ltd. This allows CRL to sanction photocopying, and to authorize the reprinting of issues or volumes according to demand. Authors' traditional rights will not be jeopardized by assigning Copyright in this way, as they retain the right to reuse the material following publication, and to veto third-party publication.

Massive High-Dimensional Big Data Feature Selection Algorithm in a Cloud Computing Environment

Authors

Abstract

Downloads

Published

Issue

Section

License

Developed By

Information