论文部分内容阅读
针对面向结构特性的科技文献分类问题,通过关联规则的分类方法将科技文献划分为不同的类型:综述型、理论型和应用型。首先对科技文献数据分词等进行预处理;然后通过PredictiveApriori关联算法挖掘关于类别特征项的频繁项集,构造科技文献分类的分类器;接着对分类科技文献进行分类规则匹配,判定所属类别;最后通过实验对分类性能进行评估,并通过对比证明了本方法的有效性。
According to the classification of scientific and technical documents oriented to the structural characteristics, the scientific and technical documents are divided into different types through the classification of association rules: review type, theoretical type and applied type. Firstly, the word segmentation of scientific literature data is preprocessed. Secondly, PredictiveApriori association algorithm is used to mine the frequent itemsets of the category features and to construct the classifier of scientific and technical document classification. Then the classification technology documents are classified and matched to determine their classification. Finally, The experiment evaluated the classification performance and proved the effectiveness of the method by comparison.