论文部分内容阅读
文章提出一种基于多重过滤策略的科技文献自动标引方法,该方法不依赖于大规模训练语料,很容易作为处理模块嵌入到其他文本处理环节中,实验结果验证了方法的可行性。另外,还提出了一种基于二次文献的标引词评价方法。该方法虽然严重依赖于二次文献中给出的摘要和关键词的质量,但在人力和物力资源不足以支持建立一个高质量测试集的条件下是有价值的,制定更加合理有效的评测方案势在必行。
This paper proposes a method of automatic indexing of scientific and technical documents based on multiple filtering strategies. This method does not depend on large-scale training corpus and can easily be embedded into other text processing steps as a processing module. The experimental results verify the feasibility of the method. In addition, an evaluation method of index words based on secondary documents is also proposed. Although this method relies heavily on the quality of the digests and keywords given in the secondary literature, it is valuable to establish a more rational and effective evaluation program where human and material resources are not sufficient to support the establishment of a high-quality test set Imperative.