论文部分内容阅读
[目的 /意义]提出一种基于相似度的专利与产业类目映射模型,模型拥有准确、易扩展和高效率的优点,可为后续研究提供借鉴和参考。[方法 /过程]整理现有专利与产业类目映射方法,以《国际专利分类》与《国民经济行业分类》为例,设计类目映射模型并做映射实验,通过Z-score标准化方法处理余弦相似度结果,完成《国际专利分类》小类与《国民经济行业分类》小类的部分映射,并根据国家知识产权局的试用版本对照成果综合评价本模型。[结果 /结论]模型综合考虑专利官方注释规范精炼性和大量专利数据覆盖面广的优点,通过自然语言处理技术自动化得到专利与产业类目的映射组合,较现有方法在节省大量人力成本的同时保证了正确率,并可方便地进行映射类目细粒度的调整,适用于其他符合本模型数据格式要求的专利与产业分类的映射。
[Purpose / Significance] This paper presents a mapping model based on similarity between patents and industries. The model has the advantages of accuracy, expansibility and high efficiency, which can be used as reference for future research. [Methods / Procedures] The existing methods for mapping patents and industries are summarized. Taking “International Patent Classification” and “Classification of National Economic Industries” as examples, the category mapping model is designed and mapped experimentally, and the cosine is processed by the Z-score standardization method Similarity results, the mapping between the subcategories of “International Patent Classification” and the subcategories of “Classification of National Economic Industries” is completed, and the model is evaluated comprehensively according to the comparison results of the trial version of the State Intellectual Property Office. [Result / Conclusion] The model takes into account the advantages of the patent official annotation specification refinement and the extensive patent data coverage, and the combination of patent and industry category mapping through natural language processing automation. Compared with the existing methods, the model saves a lot of labor costs The correct rate is guaranteed, and fine-grained adjustment of the mapping categories is conveniently performed, which is applicable to mapping of other patents and industrial categories that conform to the data format requirements of this model.