论文部分内容阅读
Tibetan syntactic functional chunk parsing is aimed at identifyingsyntactic constituents of Tibetan sentences.In this paper,based on the Tibetan syntactic functional chunk description system,we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Con-ditional Random Fields(CRFs)to identify the functional chunk boundary of a sentence.According to the actual characteristics of the Tibetan language,we firstly identify and extract the syntactic markers as identification characteristics of syntactic functional chunk boundary in the text preprocessing stage,while the syntactic markers are composed of the sticky written form and the non-sticky written form.Afterwards we identify the syntactic functional chunk boundary using CRF.Experiments have been performed on a Tibetan language corpus containing 46783 syllables and the precision,recall rate and F value re-spectively achieves 75.70%,82.54%and 79.12%.The experiment results show that the proposed method is effective when applied to a small-scale unlabeled corpus and can provide foundational support for many natural language pro-cessing applications such as machine translation.