论文部分内容阅读
VC维理论和结构风险最小化准则是统计学习理论中的重要内容 ,基于这一理论的支持向量机算法由于具有好的泛化性能受到重视 ,并被研究用于文本分类问题 基于多项式核的研究工作认为SVM的泛化能力不受多项式阶数的影响 ,并且能够处理很高维的分类问题 ,用于文本分类无需进行特征选择 研究发现 ,随着多项式核阶数的升高 ,SVM文本分类器会出现过学习现象 ,并且特征数越多越明显 ,特征选择是必需的 通过估计函数集的VC维 ,基于结构风险最小化理论对此问题进行分析 ,得出的结论跟实验结果相符
VC dimension theory and structural risk minimization criterion are the important contents of statistical learning theory. Support Vector Machine (SVM) algorithm based on this theory has been paid attention due to its good generalization performance and has been studied for polynomial kernel based text categorization The work holds that the generalization ability of SVM is not affected by polynomial order, and it can deal with very high-dimensional classification problems without using feature selection for text classification. As the polynomial kernel order increases, SVM text classifier There will be a learning phenomenon, and the more the number of features more obvious, the feature selection is necessary. By estimating the VC dimension of the function set, based on the structural risk minimization theory to analyze this problem, the conclusion is consistent with the experimental results