论文部分内容阅读
【目的】研究标签聚类中标注内容与用户属性及其结合对聚类效果的影响。【方法】采用科学网博客数据,对其进行特征抽取、模型构建和相似度计算,利用线性函数和Sigmod函数进行相似度加权,并使用AP聚类算法进行标签聚类。【结果】在学科分类体系下,用户属性与标注内容的结合均对标签聚类的结果有所提升,Sigmod加权表现最优;在系统分类体系下,两者结合均不如标注内容结果表现优秀。【局限】选择的数据量较小,评估标签聚类的分类体系不够完善,AP聚类算法不适用于大数据的处理。【结论】两种特征的结合在部分情况下能够提高聚类效果,标签聚类中应更加关注标签的内容特征。
【Objective】 The purpose of this study is to investigate the impact of tagging content and user attributes in tag clustering and their combination on clustering performance. 【Method】 Using scientific blogging data, feature extraction, model construction and similarity calculation were performed. The similarity was weighted by linear function and Sigmod function, and AP clustering algorithm was used for tag clustering. 【Result】 Under the discipline classification system, the combination of user attributes and annotation content improved the result of label clustering, and Sigmod weighted performance was the best. Under the system classification system, the combination of the two was not as good as the annotation content. [Limitations] Select a small amount of data to assess the classification of tag clustering is not perfect, AP clustering algorithm is not suitable for processing big data. 【Conclusion】 The combination of the two features can improve the clustering effect in some cases. Tag clustering should pay more attention to the content characteristics of the tag.