论文部分内容阅读
词汇级复述研究旨在为词汇获取复述.词汇级复述是上下文相关的,即对同一个词在不同上下文中应获取不同的复述词.提出了一种获取上下文相关词汇级复述的方法.该方法包括两部分:基于网络挖掘的候选复述词获取以及基于二元分类的复述词确认.在《人民日报》语料库上的实验结果表明:(1)基于网络挖掘的候选复述词获取方法是切实可行的,平均为每个待复述词在每个给定的上下文句子中获取2.3个正确复述词;(2)利用二元分类的方法进行复述确认是有效的,其F值达到0.6023;(3)利用该方法抽取得到的复述中,有75.11%和98.31%无法通过两种常用的上下文无关方法,即基于辞典和基于聚类的方法来获得.这证明了所提出的上下文相关复述方法可以有效地补充传统的上下文无关方法.
The lexical paraphrase study aims to obtain a rehearsal of lexical lexicon. The lexical paraphrase is context-dependent, that is, different paraphrases should be obtained in different contexts for the same word, and a method for obtaining the context-dependent lexical paraphrase is proposed. Including two parts: the acquisition of candidate paraphrase based on web mining and the confirmation of the paraphrase based on binary classification.The experimental results on the “People’s Daily” corpus show that: (1) The method of obtaining candidate paraphrase based on web mining is feasible , And obtain 2.3 correct repetition words in each given context sentence for each word to be duplicated; (2) it is effective to use the binary classification method to confirm the paraphrase, and its F value reaches 0.6023; (3) 75.11% and 98.31% of the paraphrases drawn by this method can not be obtained by two common context-free methods, ie, dictionary-based and clustering-based methods, which demonstrates that the proposed context-based paraphrase can be effectively supplemented The traditional context-free method.