论文部分内容阅读
模糊c均值算法是一种局部搜索迭代法,易陷入局部最小解,而且算法未考虑样本对聚类的贡献程度。针对传统的模糊c均值(FCM)算法的不足和基因表达数据高噪声的特点,提出了一种基于小波变换和改进的FCM聚类模型,最后将该模型应用于白血病基因数据分析。根据Xie-Beni指数,在没有先验知识的条件下,确定了最佳聚类个数。为了体现文中提到的算法对样本聚类的准确性,本文分别采用传统的FCM聚类算法和分层聚类的方法在同样的试验条件下进行试验。样本聚类的结果表明:该方法能得到高准确度的样本分型结果。
Fuzzy c-means algorithm is a local search iterative method, easy to fall into the local minimum solution, and the algorithm does not consider the contribution of the sample to the cluster. Aiming at the deficiency of traditional fuzzy c-means (FCM) algorithm and the high noise of gene expression data, a FCM clustering model based on wavelet transform and improved is proposed. Finally, the model is applied to the gene data analysis of leukemia. According to the Xie-Beni index, under the condition of no prior knowledge, the optimal number of clusters is determined. In order to reflect the accuracy of the sample clustering algorithm mentioned in this paper, we use the traditional FCM clustering algorithm and hierarchical clustering method to test under the same experimental conditions. The results of sample clustering show that this method can get high accuracy of the sample classification results.