Error Analysis of English-Chinese Machine Translation

来源 :第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD | 被引量 : 0次 | 上传用户:daoshi100
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  In order to explore a practical way of improving machine translation(MT)quality,the error types and distribution of MT results have to be analyzed first.This paper analyzed English-Chinese MT errors from the perspective of naming-telling clause(NT clause,hereafter).Two types of text were input to get the MT output: one was to input the whole original English sentences into an MT engine; the other was to parse English sentences into English NT clauses,and then input these clauses into the MT engine in order.The errors of MT output are categorized into three classes: incorrect lexical choices,structural errors and component omissions.Structural errors are further divided into SV-structure errors and non-SV-structure errors.The analyzed data shows firstly,the major errors are structural errors,in which non-SV-structural errors account for a larger proportion; secondly,translation errors decrease significantly after English sentences are parsed into NT clauses.This result reveals that non-SV clauses are the main source of MT errors,and suggests that English long sentences should be parsed into NT clauses before they are translated.
其他文献
  Hedge detection aims to distinguish factual and uncertain information,which is important in information extraction.The task of hedge detection contains two
会议
  Unlike previous Mongolian morphological segmentation methods based on large labeled training data or complicated rules concluded by linguists,we explore a n
会议
  实体相似度的计算有诸多应用,例如电商平台的相似商品推荐,医疗疗效分析中的相似病人组等。在知识图谱的实体相似度计算中,给出了每个实体的属性值,并对部分实体进行相似
  A great number of clinicians in mainland China are under increasing pressure to publish their research results on international journals,and they urgently n
会议
  高考阅读理解选择题是基于背景材料,通过对材料的“理解”从多个选项中选出最佳选项.由于提供的背景材料相对较短且关键信息极具隐藏性,答案可能无法在背景材料中直接找到.
会议
  经过对大量维吾尔文网站的调查与分析,本文从多语种混合网页中针对维吾尔文网页识别进行了研究.这对维吾尔语信息处理工作起着关键的作用.首先本文探讨了维吾尔文不规范网
会议
  AMR是国际上一种新的句子抽象语义表示方法,有着接近于中间语言的表示能力,其研发者已经建立了英文《小王子》等AMR语料库.AMR与以往的句法语义表示方法的最大不同在于两个
会议
  本文旨在以HowNet为例,探讨在表示学习模型中引入人工知识库的必要性和有效性。目前词向量多是通过构造神经网络模型,在大规模语料库上无监督训练得到,但这种框架面临两个困
会议
  随着互联网整体水平的提高,大量基于维吾尔语的网络信息不断建立,引起了对不同领域的信息进行情感倾向性分析的迫切需要.本文考虑到维吾尔文没有足够的情感训练语料和完整
会议
  迁移学习在一定程度上减轻了目标域的数据稀疏问题对泛化能力的影响,然而泛化能力的提高仍然受到负迁移等问题的影响。为了解决负迁移问题,本文提出使用源域结构的文本语
会议