论文部分内容阅读
网络热点话题检测与跟踪已成为舆情分析领域的前沿研究课题,具有广阔应用前景。本文研究基于主题演化图的网络论坛(BBS)热点跟踪问题。在采用共词分析和bisecting K-means聚类算法检测BBS热点话题基础上,提出了一个综合考虑话题帖子篇数与帖子热度的热点话题关注度计算方法。然后给出了一个基于相对熵的热点话题语义距离计算方法。最后通过构造主题演化图实现BBS热点话题的自动跟踪。在由实际BBS论坛数据构成的测试集上的实验表明,本文提出的方法是有效的。
Network hot topic detection and tracking has become the forefront of public opinion analysis of the field of research topics, with broad application prospects. This paper studies the problem of hot spot tracking in web forum (BBS) based on thematic evolution graph. Based on the analysis of BBS hot topics using co-word analysis and bisecting K-means clustering algorithm, a method to calculate the attention degree of hot topics considering the number of topic posts and the post popularity is proposed. Then a semantic distance calculation method of hot topic based on relative entropy is given. Finally, we construct thematic evolution map to realize the automatic tracking of BBS hot topics. Experiments on a test set consisting of actual BBS forum data show that the proposed method is effective.