【摘 要】
:
As high quality descriptors of web page semantics,social annotations or tags have been used for web document clustering and achieved promising results.However,m
【机 构】
:
Institute of Computing Technology,Department of Computer Science
论文部分内容阅读
As high quality descriptors of web page semantics,social annotations or tags have been used for web document clustering and achieved promising results.However,most web pages have few tags (less than 10).This sparsity seriously limits the usage of tags for clustering.In this work,we propose a user-related tag expansion method to overcome this problem,which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge.Unfortunately,simply adding tags may cause topic drift,i.e.,the dominant topic(s) of the original document may be changed.To tackle this problem,we have designed a novel generative model called Folk-LDA,which jointly models original and expanded tags as independent observations.ExPerimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion,especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods,which indicates that tags could be a better resource for the clustering task.
其他文献
Dispersive liquid-liquid microextraction technique was introducd to remove the centrifuging step and conduct inclusion microextraction of charged porphyrins by
为提高粒子群算法的收敛性能,提出一种自适应粒子认知域方法.在粒子位置的更新方法中,粒子运动到当前的最好位置由计算得到的最好位置为中心,粒子的认知方向为导向来确定.利
提出一种基于随机蕨丛的双层视频分割算法,实现对单目视频的自动分割.算法在对视频运动特征进行聚类的基础上,构造视频运动特征字典,通过随机蕨丛对运动特征进行建模.在此基础上利用条件随机场约束视频颜色、运动特征以及邻域关系,通过graph-cut算法求解出全局最优的分割结果.在实验中采用多种环境的视频数据对本文算法的有效性进行测试,并与其他分割算法的结果进行比较.
Supercooling of the microencapsulated phase change materials(PCMs) during cooling usually happens.This phenomenon can interfere with heat transfer and is necess
The growth of social networks in modern information systems has enabled the collaboration of experts at a scale that was unseen before.Given a task and a graph
Multilayered 1,2-ethylene-silica nanotubes were prepared with cetyltrimethylammonium bromide(CTAB)as a template and(S)-β-citronellol(CN)as a co-structure-direc
Leymus chinensis(Trin.) Tzvel.,widely distributed at eastern Eurasian steppe and divided into gray-green type and yellow-green type,has different stress resista
Background Genetic association studies on populations of European origin have identified the DCDC2 gene as a susceptibility locus for developmental dyslexia.Her
We present a study of the star 2MASS J22472238+5801214 with the aim of identifying its true nature which has hitherto been uncertain.This object,which is a memb