论文部分内容阅读
为了解决因软件缺陷数据存在数据不平衡问题限制了分类器的性能,将POSS(pareto optimization for subset selection)特征选择算法和随机欠采样技术引入到软件缺陷检测中,并利用支持向量机(support vector machine,SVM)构建预测模型。试验结果表明,通过多次随机欠采样可以有效地解决软件缺陷数据不平衡问题,同时使用POSS方法对目标子集进行双向优化,从而提高分类的准确率,其结果要优于Relief、Fisher、M I(mutual information)特征选择算法。
In order to solve the problem of the data unbalance caused by the software defect data, the performance of the classifier is limited. The POSS (pareto optimization for subset selection) feature selection algorithm and the random under-sampling technique are introduced into the software defect detection and the support vector machine, SVM) to build a predictive model. The experimental results show that the problem of software defect data imbalance can be effectively solved by multiple random undersampling, and the target subset is bi-directionally optimized by POSS to improve the classification accuracy. The result is better than that of Relief, Fisher, MI (mutual information) feature selection algorithm.