Time: 2021-05-10 | Counts: |
doi:10.16186/j.cnki.1673-9787.2020040086
Received:2019/04/23
Revised:2019/05/29
Published:2021/05/15
Decision tree construction algorithm based on attribute dispersion and feature measurement
WANG Lei, LIU Yu, LIU Zhizhong, QI Junyan
College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000 ,H&nan, China
Abstract:Aiming at the problems of multi-valued attribute bias, poor continuous attribute processing and high time complexity in the decision tree algorithm based on information entropy, a decision tree feature measurement method was proposed based on the concept of dispersion ratio. First, the K-means clustering algorithm was used to discretize the attributes of the continuous numerical value, and then the weight ratio of the attribute in each classification and the weight ratio in the entire condition attribute were used to calculate the dispersion ratio of the attribute, which avoided the complex logarithmic operation in the entropy calculation, and finally the topology structure between each characteristic attribute was determined according to the size of the dispersion ratio, and the construction of the tree was completed. The experimental results showed that, compared with the two improved decision tree algorithms, K_C4. 5 and Id3_improved, the dispersion-based algorithm conld solve multi-valued attribute bias more effectively than attribute segmentation algorithm, reduce the time complexity of the algorithm, and have further breakthroughs in the classification of the continuous data sets.
Key words:decision tree;attribute dispersion;dispersion ratio;K-means