论文部分内容阅读
The purpose of this thesis is to explore the relevant issues in automatic dialogue summarization (ADS) within the context of the speech summarization project. While the majority of techniques in computational linguistics for summarization are "computationally"oriented, our approach to the task is largely "linguistically" oriented.
To start with, we build a corpus of dialogues from the business domain and classify them into 16 sub-domains, each of which is concerned with a specific activity in business transaction, such as inquiring, offering, price negotiation, etc. The dialogues are then segmented and tagged with part-of-speech labels.
Taking summarization as sentence extraction we try to identify the distinguished linguistic features of dialogues. Through the linguistic analysis of dialogues we define two information structures: the information inquiry dialogue with the parallel information allocation and the negotiation dialogue with the sequential information allocation. Based on this observation we developed different summarization methods for the two types of dialogues in our corpus. We use the Edmundsonian paradigm to assign a score for each utterance in the dialogue, rank the utterances and select the top n (adjustable) utterances as the summary. We also propose a new evaluation method which evaluates both the informativeness and redundancy of the summary. The evaluation results indicate the better performance of our methods compared with the MMR baseline.