Table-to-Text Generation via Row-Aware Hierarchical Encoder

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户:coni
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  In this paper,we present a neural model to map structured table into document-scale descriptive texts.Most existing neural net-work based approaches encode a table record-by-record and generate long summaries by attentional encoder-decoder model,which leads to two problems.(1)portions of the generated texts are incoherent due to the mismatch between the row and corresponding records.(2)a lot of irrelevant information is described in the generated texts due to the in-correct selection of the redundant records.Our approach addresses both problems by modeling the row representation as an intermediate struc-ture of the table.In the encoding phase,we first learn record-level rep-resentation via transformer encoder.Afterwards,we obtain each rows representation according to their corresponding records representation and model row-level dependency via another transformer encoder.In the decoding phase , we first attend to row-level representation to find important rows.Then,we attend to specific records to generate texts.Experiments were conducted on ROTOWIRE,a dataset which aims at producing a document-scale NBA game summary given structured ta-ble of game statistics.Our approach improves a strong baselines BLEU score from 14.19 to 15.65(+10.29%).Furthermore,three extractive eval-uation metrics and human evaluation also show that our model has the ability to select salient records and the generated game summary is more accurate.
其他文献
神经网络语言模型应用广泛但可解释性较弱,其可解释性的一个重要而直接的方面表现为词嵌入向量的维度取值和语法语义等语言特征的关联状况.先前的可解释性工作集中于对语料库训得的词向量进行知识注入,以及基于训练和任务的算法性能分析,对词嵌入向量和语言特征之间的关联缺乏直接的验证和探讨.该文应用基于语言知识库上的伪语料法,通过控制注入语义特征,并对得到的词向量进行分析后取得了一些存在性的基础性结论:语义特征可
Hashtag recommendation aims to recommend hashtags when social media users show the intention to insert a hashtag by typing in the hashtag symbol “#” while writing a microblog.Previous methods usually
Distant supervision is an effective way to collect large-scale training data for relation extraction.To better solve the wrong labeling problem accompanied by distant supervision,some methods have bee
会议
性别偏见是社会学研究的热点.近年来,机器学习算法从数据中学到偏见使之得到更广泛的关注,但目前尚无基于语料库的方法对文本数据中职业性别偏见的研究.该文基于标记理论,利用BCC和DCC语料库,从共时和历时两个层面考察了63个职业的性别无意识偏见现象.首先,以调查问卷的形式调研了不同性别和不同年龄段的人群对63个职业的性别倾向,发现和BCC语料库中多领域的职业性别偏见度呈显著的正相关.然后从共时的角度,
Aspect-based sentiment analysis(ABSA)aims at identifying sentiment polarities towards aspect in a sentence.Attention mechanism has played an important role in previous state-of-the-art neural models.H
This present study aims to investigate the colligational structures in China English.A corpus-based and comparative methodology was adopted in which three verbs of communication(discuss,communicate an
Answer selection(AS)is an important subtask of question answering(QA)that aims to choose the most suitable answer from a list of candidate an-swers.Existing AS models usually explored the single-scale
In recent years,machine reading comprehension is becoming a more and more popular research topic.Promising results were obtained when the machine reading comprehension task had only two inputs,context
Most of the current man-machine dialogues are at the two end-points of a spectrum of dialogues,i.e.goal-driven dialogues and non goal-driven chitchats.Document-driven dialogues provide a bridge betwee
Natural language inference(NLI)is a challenging task to determine the relationship between a pair of sentences.Existing Neural Network-based(NN-based)models have achieved prominent success.However,rar