论文部分内容阅读
针对现实生活中大规模不平衡数据的分类问题,设计了一种基于云计算平台的代价敏感集成学习分类算法。Hadoop云计算平台对海量数据进行划分用于并行学习,同时结合代价敏感的思想对学习得到的基分类器进行加权集成,实现了云计算平台上的代价敏感集成学习分类模型。仿真实验表明该模型能够明显提高少数类的查全率,同时Hadoop的并行机制使得云平台坏境下的集成学习时间较集中式环境有大幅度的缩减,进一步提高了大规模不平衡数据分类问题的学习效率。
In view of the classification of large-scale unbalanced data in real life, a cost-sensitive integrated learning classification algorithm based on cloud computing platform is designed. The Hadoop cloud computing platform divides the massive data for parallel learning, and at the same time integrates the learned base classifier with the cost-sensitive thought to realize the cost-sensitive integrated learning classification model on the cloud computing platform. The simulation results show that the model can significantly improve the recall of a few classes. At the same time, the parallel mechanism of Hadoop makes the integrated learning time under the cloud environment significantly reduced compared with the centralized environment, which further improves the large-scale unbalanced data classification Learning efficiency