论文部分内容阅读
通过对实际Web访问日志的统计分析认为,在日志中用户的兴趣具有集中性,这说明用户由稳定兴趣驱动访问Web的频率远远高于偶然兴趣的驱动,因此一定时间段的Web访问日志中一定蕴含了用户的稳定兴趣。本文试图利用因子分析理论从用户访问频率矩阵中挖掘出用户的稳定兴趣因子,以此构造用户兴趣空间,并在用户兴趣空间中进行Web文档聚类。该用户兴趣空间突出了用户的共同兴趣,是一个正交空间。实验结果表明,用户兴趣空间中的Web文档聚类优于直接在用户访问频率矩阵(即用户空间)中的聚类。同时,空间的转换达到了数据压缩的效果。
Through the statistical analysis of the actual Web access log, the user’s interest in the log is centralized, which indicates that the frequency of the user being driven by the stable interest is much higher than that of the occasional interest. Therefore, in a certain period of time in the Web access log Certainly contains the user’s stable interest. This paper tries to use the factor analysis theory to mine the user’s stable interest factor from the user’s access frequency matrix to construct the user’s interest space and to cluster the Web document in the user’s interest space. The user interest space highlights the user’s common interest, is an orthogonal space. Experimental results show that Web document clustering in user interest space is superior to clustering directly in user access frequency matrix (ie, user space). At the same time, space conversion achieved the effect of data compression.