论文部分内容阅读
现有的分类系统通常忽略类别体系的层次结构,在对文献进行分类时,往往很难区分类别相近的文献属于哪一类。本文基于向量空间模型,提出根据类别体系的层次结构,自顶向下,逐层分类的方法。其目的是提高分类精度;并根据概念词典,将同义词或下位概念映射到单一的概念词上,由这些概念词构成一个规模很小的特征集,以缩小特征向量空间的维数,从而减少分类系统的计算量。此外,通过对类别层次体系的分析,压缩特征向量,从另一方面减少分类系统的计算量
The existing classification system usually neglects the hierarchy of the category system. When classifying the documents, it is often difficult to distinguish which category the documents with similar categories belong to. Based on the vector space model, this paper proposes a hierarchical and top-down classification method based on the category system. The purpose is to improve the classification accuracy. According to the concept dictionary, the synonym or subordinate concept is mapped to a single conceptual word. These conceptual words form a small feature set to reduce the dimension of the eigenvector space so as to reduce the classification The amount of computing system. In addition, by analyzing the category hierarchy system, the feature vectors are compressed, and on the other hand, the amount of calculation of the classification system is reduced