论文部分内容阅读
对Web内容的挖掘,Map/Reduce,Hive以及粒子群聚类算法的综合应用,需要搭建实现分布式计算系统的集成平台。首先考虑到需要分析海量的Web日志信息,因此需要借助Web日志挖掘平台,通过对原始数据的获取、整理,对样本数据的提取,使用粒子群优化算法对用户日志中有效的数据进行聚类。
For the comprehensive application of Web content mining, Map / Reduce, Hive, and particle swarm optimization algorithms, an integrated platform for realizing distributed computing system needs to be set up. First of all, considering the need to analyze massive Web log information, we need to use Web log mining platform to get the original data, sort the sample data, and use PSO algorithm to cluster the valid data in the user log.