>> 自然科学版期刊 >> 2017年05期 >> 正文
一种面向不平衡数据的半监督特征选择算法
供稿: 杜利敏;徐扬 时间: 2018-01-16 次数:

作者:杜利敏徐扬

第一作者单位:西南交通大学智能控制开发中心

摘要:针对不平衡数据中特征维数高、标记样本缺乏问题,提出一种基于遗传算法和BiasedSVM的不平衡数据半监督特征选择算法。该方法首先利用初始的标记样本集训练处理不平衡数据的Biased-SVM模型,然后用训练好的Biased-SVM模型为未标记样本加上标签,再把新标记样本加入到初始标记样本集中,得到新标记样本集,最后采用基于遗传算法的不平衡数据特征选择方法选出最优的特征子集。实验结果表明,所提方法在不同的标记样本率下均具有较高的平均特征子集缩减率和平均小类识别率。

Abstract:Considering the scarcity of labeled samples and the high feature dimension for imbalanced data, a new semi-supervised feature selection algorithm based on GA and Biased-SVM is proposed.The biased-SVM model which can dispose the unbalanced samples data is trained by the initial labeled sample set and then the trained Biased-SVM model is used to add labels to the unlabeled samples, and add the new labeled samples to the initial labeled sample set.Finally, the optimal feature subset is selected by the GA-based feature selection method for imbalanced data.Experimental results show that the proposed method not only reduces the feature dimension, but also improves the precision of the minor class under the different labeled sample rates generally.

基金:国家自然科学基金青年科学基金资助项目(61305074);

关键词:遗传算法;Biased-SVM;不平衡数据;半监督学习;特征选择;

DOI:10.16186/j.cnki.1673-9787.2017.05.016

分类号:TP18

最近更新