论文部分内容阅读
将完全加权关联规则挖掘应用于印尼汉跨语言查询扩展,给出面向跨语言查询扩展的完全加权词间关联模式挖掘算法(AWAR-CLQE-Miner)和印尼汉跨语言查询扩展模型,提出基于完全加权关联规则挖掘的印尼汉跨语言伪相关反馈查询扩展算法(IC_CLPRF_AWAR),以及印尼汉跨语言用户相关反馈查询扩展算法(IC_CLURF_AWAR).算法将印尼语查询通过机器翻译系统翻译为中文查询进行跨语言检索,分别采用伪相关反馈和用户相关反馈技术构建跨语言初检相关文档集,调用AWAR-CLQE-Miner算法对初检相关文档集挖掘与原查询相关的扩展词实现跨语言查询译后扩展.以标准测试集NTCIR-5 CLIR为实验语料,将本文算法与现有算法进行实验比较,实验结果表明,本文算法能提高和改善印尼汉跨语言信息检索性能,对长查询更有效,其中,IC_CLURF_AWAR算法比IC_CLPRF_AWAR算法获得更好的检索性能.“,”In this paper,the technique of all-weighted association rules (AWAR) mining is applied to the Indonesian-Chinese cross language query expansion (CLQE).The algorithm,AWAR-CLQE-Miner,of all-weighted association patterns mining between terms is proposed for cross language query expansion,as well as the model of Indonesian-Chinese CLQE.The algorithms,IC_CLPRF_AWAR,of Indonesian-Chinese cross language pseudo relevance feedback (PRF) query expansion,and,IC_CLURF_AWAR,of Indonesian-Chinese cross language user relevance feedback (URF) query expansion,are presented based on AWAR mining.Firstly,the original query in Indonesian is translated into Chinese by machine translation so as to carry out cross-language retrieval.And then,the top-ranked cross-language retrieved relevance document set is built by the technique of PRF and URF respectively.Finally,the expansion terms related to the original query in Chinese are mined from the document set using AWAR-CLQE-Miner in order to accomplish cross-language post-translation query expansion.Taking NTCIR-5 CLIR as the experimental corpus,a comparison between the proposed algorithms and the existing algorithms is made,which shows that the former can improve the performance of Indonesian-Chinese cross language information retrieval,and they are more effective for long queries.And moreover,the retrieval performance of the IC_CLURF_AWAR algorithm is better than that of the IC_CLPRF_AWAR.