A Study of the Techniques of Automatic Abstracting and Knowledge Acquisition Systems

来源 :The Journal of China Universities of Posts and Telecommunica | 被引量 : 0次 | 上传用户:dknight123lin
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is out of order and is mostly written in natural languages which need to be processed by the technology of natural language processing. When you search for some certain information on Internet through a search engine, you might be confused by the huge amount of results which the search engine provides. However, if a search engine is embedded with Automatic Abstracting (AA) processing systems, you could locate the information quickly or you could get more information within a limited time. So, the AA technology is valuable both in science and application. The work of this thesis was begun when we took over a project that is called “The Key Technology Research of Computer Networks Providing Intelligent Information Services” which belongs to the national 863 plan. One of the tasks is “The Key Technology Research of Automatic Abstracting Systems of Chinese Text”. As a member of this research group, I took part in designing and implementing an AA system called Literature Abstract and Digest Information Extract System(LADIES). From then on, I have been working in this field and this paper is the conclusion of my work. The main topic of the thesis is AA technology. There are two parts of it. One is about the research of understanding based AA systems, and the other is about the invcestigation of Automatic Knowledge Acquistion(AKA) in AA systems. In the first part, the contents of AA technology are introduced and an understanding based AA model is put forward. Based on this model, LADIES is implemented. There are two major features of LADIES: (1) it understands text with the grammar, semantic and pragmatic information of words; (2) it chunks words into a relatively independent entity with chunking rules which are substitutes of syntactic analyzing rules. The results demonstrate that it performs better than those statistical based AA systems. However, the application of LADIES is limited for its knowledge bases. And it is difficult to use in other fields because the knowledge bases are setup manually. So we investigate the techniques of automatic knowledge acquisition in order to solve the above problems to some extent. In the second part, we introduce the basic ideas of AKA and some Machine Learning (ML) methods which AKA applies. Then we propose a comprehensive dictionary model that contains grammar, semantic and pragmatic information of words. And we investigate a strategy of automatic learning pragmatic information for words. Also we put forward another strategy of automatic learning rule of salience sentences in texts and based on it, we establish an AA system LADIES NEW. Eventually, we suggest a AKA based AA system model called hierarchical feature extracting AA system model. ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is out of order and is mostly written in natural languages ​​which need to be processed by the technology of natural language processing. When you search for some certain information on the Internet through a search engine, you might be confused by the huge amount of results which the search engine provides. However, if a search engine is embedded with Automatic Abstracting (AA ) processing systems, you could locate the information quickly or you could get more information within a limited time. So, the AA technology is valuable both in science and application. The work of this “The Key Technology Research of Computer Networks Providing Intelligent Information Services” which belongs to the national 863 plan. One of the tasks is “The Key Technology Research of Automatic Abstracting Systems of Chinese Text”. As a member of this research group, I took part in designing and implementing an AA system called Literature Abstract and Digest Information Extract System (LADIES). From then on, I have been working in this field and this paper is the conclusion of my work. The main topic of the thesis is AA technology. There are two parts of it. One is about the research of understanding based AA systems, and the other is the about the invcestigation of Automatic Knowledge Acquistion (AKA) in AA systems. In the first part, the contents of AA technology are introduced and an understanding based AA model is put forward. Based on this model, LADIES is implemented. There are two major features of LADIES: (1) it understands text with the grammar, semantic and pragmatic information of words; (2) it chunks words into a relatively independent entity with chunking rules which are substitutes of syntactic analyzing rules.The results demonstrate that it performs better than those statistical based AA systems. However, the application of LADIES is limited for its knowledge bases. And it is difficult to use in other fields because the knowledge bases are setup manually. So we investigate the techniques of automatic knowledge acquisition in order to solve the above problems to some extent. we introduce the basic ideas of AKA and some Machine Learning (ML) methods which AKA applies. Then we propose a comprehensive dictionary model that contains grammar, semantic And we investigate a strategy of automatic learning rule of salience sentences in texts and based on it, we establish an AA system LADIES NEW. Eventually, we suggest a AKA based AA system model called hierarchical feature formula AA system model.
其他文献
目的:比较超声造影(CEUS)与增强CT(CECT)对肝硬化背景下肝脏占位性病变的诊断结果,探讨CEUS对肝硬化背景下肝脏恶性病变的诊断价值。  方法:回顾性分析2009年10月至2011年7月
学位
日前,据住房城乡建设部总经济师冯俊介绍,截至今年4月底,各类棚户区改造已开卫建设184万套。根据年度建设计划,2014年全国城镇保障性安居工程新开工700万套以上,其中各类棚户
目的:既往研究发现谷氨酸能系统在神经认知功能衰退中起着重要作用。然而,目前谷氨酸能系统在异氟烷引起的认知功能障碍中的变化研究的结论尚不一致。本实验主要研究异氟烷引
广告人才的话题几乎每年都要讨论几番,行业无论怎么转型、怎么提档升级,都绕不过人才。毋容置疑,广告人才是行业发展的根本。但是现实情况如何?名校培养的广告专业人才投身广
目的:建立一种应用白消安联合环磷酰胺(BU-CY)化疗预处理为基础的小鼠异基因造血干细胞移植(allogeneic hematopoietic stem celltransplantation,allo-HSCT)急性移植物抗宿主
时逢清秋,金风玉露,正是“秋风想见西湖上,化出白莲千叶花”的美好时光。在热烈庆祝中国人民政治协商会议成立65周年之际,《邮票上的福建》发行第四版。《邮票上的福建》由福
目的探讨锥形束计算机断层扫描(CBCT)在非小细胞肺癌放射治疗中的应用价值。方法采用统计学软件对患者CBCT和CT两个计划的大体肿瘤体积(GTV)、计划靶体积(PTV)、双肺V20、Dme
学位
目的:通过回顾性分析,探讨我院2007-2011年妊娠期高血压疾病发生情况,了解妊娠期高血压疾病的发生率、并发症发生情况及围生儿结局等情况,总结妊娠期高血压疾病妊娠妇女的临床特