论文部分内容阅读
目的提出一种基于半监督学习的多步降维特征提取方法。方法算法首先运用t-test对样本特征进行筛选,初步降低特征维度;然后进行离散小波变换,对小波系数进行相对熵排序,筛选出新的特征子集;接着进行主成分分析,提取主成分;最后运用半监督学习算法BB-LLGC进行标签传递,充分提取有标记和无标记样本的判别信息。结果在公共卵巢癌数据集OC-WCX2b和公共前列腺癌数据集PC-H4上获得了99.13%和97.20%分类准确率。在浙江省肿瘤医院临床乳腺癌数据集BC-WCX2a上获得了92.78%的分类准确率和100%的敏感性。结论多步降维的特征提取方法可以有效降低SELDI质谱数据的特征维度,结合半监督学习算法BB-LLGC,可以获得较好的分类效果。
Objective To propose a multi-step feature extraction method based on semi-supervised learning. Firstly, the features of the samples were screened by t-test to reduce the feature dimension. Secondly, the discrete wavelet transform was used to sort the relative coefficients of the wavelet coefficients, and new feature subsets were screened out. Then, the principal components were extracted and the principal components were extracted. Finally, the semi-supervised learning algorithm BB-LLGC is used to transfer the label, and the discriminant information of labeled and unlabeled samples is fully extracted. Results 99.13% and 97.20% classification accuracy was obtained on the public ovarian cancer dataset OC-WCX2b and the public prostate cancer dataset PC-H4. The classification accuracy and 100% sensitivity of 92.78% were obtained from Zhejiang Cancer Hospital’s clinical breast cancer data set BC-WCX2a. Conclusion The multi-step feature extraction method can effectively reduce the feature dimension of SELDI mass spectrometry data. Combining with the semi-supervised learning algorithm BB-LLGC, we can get better classification results.