论文部分内容阅读
随着Internet的快速发展,Web信息量急剧增加,Web信息检索也相应地越来越困难。本文提出利用查询采样、Web资源分类等技术,自动创建类似Yahoo!的分类层次资源特征,建立一个树型的Web资源特征检索系统。由于采样时只需采集同类信息的一部分,避免了全部采集带来的大量物力浪费。利用创建的资源特征,在检索时选择需要的资源。然后再利用互操作机制,远程调用该资源的全文索引机制进行检索,提高了检索的查全率和查准率。同时由于要检索的Web资源特征是通过选择相关和抛弃不太相关的Web资源,也提高了查询的效率。
With the rapid development of the Internet, the amount of Web information has dramatically increased, and Web information retrieval has also become more and more difficult. This paper proposes to use the techniques of query sampling and Web resource classification to automatically create classification-level resource characteristics similar to Yahoo! and establish a tree-type Web resource feature retrieval system. Since sampling only collects part of the same kind of information, it avoids a great deal of material waste caused by all the acquisitions. Use the resource characteristics created to select the resources you need during the search. And then use the interoperability mechanism, remote retrieval of the resource’s full-text indexing mechanism to retrieve, improve the retrieval of the recall rate and accuracy. At the same time, due to the characteristics of Web resources to be retrieved, the efficiency of the query is also improved by selecting related and discarding less relevant Web resources.