Tibetan Syllable-based Functional Chunk Boundary Identification

来源 :第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会 | 被引量 : 0次 | 上传用户:sunplusit
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Tibetan syntactic functional chunk parsing is aimed at identifyingsyntactic constituents of Tibetan sentences.In this paper,based on the Tibetan syntactic functional chunk description system,we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Con-ditional Random Fields(CRFs)to identify the functional chunk boundary of a sentence.According to the actual characteristics of the Tibetan language,we firstly identify and extract the syntactic markers as identification characteristics of syntactic functional chunk boundary in the text preprocessing stage,while the syntactic markers are composed of the sticky written form and the non-sticky written form.Afterwards we identify the syntactic functional chunk boundary using CRF.Experiments have been performed on a Tibetan language corpus containing 46783 syllables and the precision,recall rate and F value re-spectively achieves 75.70%,82.54%and 79.12%.The experiment results show that the proposed method is effective when applied to a small-scale unlabeled corpus and can provide foundational support for many natural language pro-cessing applications such as machine translation.
其他文献
随着互联网的发展及硬件的更新,神经网络模型被广泛应用于自然语言处理、图像识别等领域.目前,结合传统自然语言处理方法和神经网络模型正日益成为研究的热点.引入先验知识代表了传统方法的惯例,然而它们对基于神经网络模型的自然语言处理任务的影响尚不清楚.鉴于此,本文尝试探究语言层先验知识对基于神经网络模型的若干自然语言处理任务的影响.根据不同任务的特点,比较了不同先验知识和不同输入位置对不同神经网络模型的影
We take the generation of Chinese classical poetry as a sequence-to-sequence learning problem,and investigate the suitability of recurrent neural network(RNN)for poetry generation task by various qual
Understanding chemical-disease relations(CDR)from biomedicalliterature is important for biomedical research and chemical discovery.This pa-per uses a k-max pooling convolutional neural network(CNN)to
Most state-of-the-art models for named entity recognition(NER)rely on recurrent neural networks(RNNs),in particular long short-term memory(LSTM).Those models learn local and global fea-tures automatic
会议
Word deletion(WD)errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation(SMT),and have a critical impact on the adequacy of
Answer selection is a crucial subtask of the open domain question answering problem.In this paper,we introduce the Bi-directional Gated Memory Network(BGMN)to model the interactions between question a
In the last decades,named entity recognition has been extensivelystudied with various supervised learning approaches depend on massive labeled data.In this paper,we focus on person name recognition in
Enabling a computer to understand a document so that itcan answer comprehension questions is a central,yet unsolved goal of Natural Language Processing,so reading comprehension of text is an important
Generating textual entailment(GTE)is a recently proposed task to study how to infer a sentence from a given premise.Current sequence-to-se-quence GTE models are prone to produce invalid sentences when
Recently long short-term memory language model(LSTMLM)has received tremendous interests from both language and speech communities,due to its superiorty on modelling long-term dependency.Moreover,integ