论文部分内容阅读
基于网络信息检索,从理论上和实证上探讨单个关键词出现概率与信息量的关系。分析不同概率的检索词在需求表达信息量上的差异,在信息需求的多维描述基础上研究高频关键词在需求信息量上对低频关键词的排挤效应。针对这种排挤效应,结合叙词表词间关系提出了关键词归类去重的检索相关性测量方案。
Based on the retrieval of network information, the relationship between the probability of emergence of a single keyword and the amount of information is discussed theoretically and empirically. This paper analyzes the difference of the information quantity of the demand expression between the search terms with different probabilities and studies the crowding-out effect of the high-frequency keywords on the low-frequency keywords based on the multi-dimensional description of the information requirements. In view of this crowding-out effect, combined with the relationship between thesauri, this paper proposes a retrieval relevance measurement scheme of keyword categorization de-duplication.