Construction of an English-Uyghur WordNet Dataset

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户:boycant
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Automatically building semantic resources is essential to low resource-languages like Uyghur.However,Uyghur suffers from a lack of publicly available evaluation dataset for automatically building semantic resources like WordNet.To cope with this problem,first,we build the largest Uyghur-English and English-Uyghur dictionaries by exploiting many possible online and offline resources.Then by using Princeton WordNet(PWN)3.0 and Contemporary Uyghur Detailed Dictionary(CUDD),we construct an English-Uyghur WordNet evaluation dataset which is publicly available(https://github.com/kaharjan/uywordnet).In this dataset,more than 73,000 English synsets are mapped Uyghur automatically,in which over 20,000 are annotated manually.And the corresponding Uyghur words include definition and examples in Uyghur language context.We also propose a Synset Mapping based on Word Embeddings(SMWE)method.The experimental results on the dataset are promising.
其他文献
地下水监测研究工作是国民经济建设的一项基础工作,是水利、水文事业的重要组成部分.根据《国家地下水监测工程(水利部分)山东省监测井建设工程第10标段合同》要求,2017年7月31日泰安市完成49眼自动监测井的土建工作,安装自动监测仪器后,2018年正式投入运行,国家地下水监测站建设完成后,如何更好的做好运行维护与管理工作已成为地下水管理工作中的重中之重.本文结合泰安市国家地下水监测工程运维与管理中存
学位
学位
基于黄河宁夏、内蒙古河段实地查勘和实测资料进行了分析.研究总结了宁蒙河段2020~2021年度凌情特点.黄河宁蒙河段2020~2021年度凌情具有流凌封冻前气温高,流量大,河段流凌、封冻时间接近常年;封河流量大,首封河段出现几封几开现象;盖面冰层厚;槽蓄量增量小,开河过程释放完全;个别断面封河水位高;开河时间早、速度快、开河过程未出现大的凌峰流量;全线开通日期为有资料以来最早等特点.形成本年度凌情
为解决城市洪涝监测预警预报与应急响应中城市地下管网水位精准监测的难题,在调研分析城市地下管网水位监测的现状的基础上,研究基于120GHz调频连续波的一体化雷达水位计的技术路线,为城市地下管网水位精准监测提供一种性价比高的解决方案.
Neural Machine Translation(NMT)has recently achieved the state-of-the-art in many machine translation tasks,but one of the challenges that NMT faces is the lack of parallel corpora,especially for low-
Automatic judgment prediction aims to predict the judicial results based on case materials.It has been studied for several decades mainly by lawyers and judges,considered as a novel and prospective ap
会议
Learning the similarity between sentences is made difficult by the fact that two sentences which are semantically related may not contain any words in common limited to the length.Recently,there have
Uyghur is an agglutinative language that has many mor-phemes.It is necessary for processing Uyghur to segment words into morphemes.This work is called morphological segmentation.Previous works treat m
Sentence compression is a task of compressing sentences containing redundant information into short semantic expressions,simplifying the text struc-ture and retaining important meanings and informatio