论文部分内容阅读
微生物菌群结构的异质性在影响宿主健康与疾病等方面有着十分重要的作用.对于菌群结构的时间与空间尺度异质性研究主要有非监督学习算法以及监督学习算法.由于菌群数据特性与文本数据特性之间的相似性,本文采用非监督学习的LDA概率话题模型对菌群结构的时间异质性进行研究,并与系统聚类和K-Means聚类这两种方法进行比较.采用LDA模型折叠Gibbs抽样的蒙特卡洛算法对两种数据源北平顶猴(Macaca leonina)阴道菌群(MVB)和轻微型肝性脑病(MHE)菌群的时间异质性OTUs数据集进行了分析.用LDA模型分别将MVB和MHE数据源中的27个样本和77个样本的OTUs数据集分为6个Topic和4个Topic.这与系统聚类和K-Means聚类划分成的簇数目(分别为5,3与4,3)有所不同.此外,实验表明结合MVB样本间生理数据-pH和MHE中样本α多样性,pH和α值的分类相似性更能与LDA模型的样本分类特性保持一致.因此,LDA在样本的聚集程度上更能精确地对OTUs数据集进行分类.更为重要的是,LDA模型还可以鉴定出每个Topic中具有代表性的OTUs.与系统聚类和K-Means聚类方法相比较,LDA模型不仅能更为有效地量化菌群结构的异质性,还能鉴定出相对应影响异质性的OTUs.
The heterogeneity of microbial community structure plays an important role in affecting host health and disease, etc. The study on scale and space scale heterogeneity mainly includes unsupervised learning algorithm and supervised learning algorithm.Because the data of the flora And the characteristics of text data. In this paper, LDA probabilistic topic model of unsupervised learning is used to study the time heterogeneity of bacterial population structure and compared with the two methods of system clustering and K-Means clustering . Monte Carlo Algorithm for Folding Gibbs Sampling with LDA Model The temporal heterogeneity OTUs datasets from two data sources, Macaca leonina vaginal flora (MVB) and mild hepatic encephalopathy (MHE) The LDA model was used to divide 27 OTUs and 77 OTUs from MVB and MHE data sources into 6 topics and 4 topics respectively.This is related to the clustering of system clustering and K-Means clustering The number of clusters (5, 3 and 4, 3, respectively) is different.In addition, the experimental results show that the classification similarity of pH and alpha values can be better correlated with the LDA model The sample classification characteristics are consistent , LDA classifies the OTUs dataset more accurately in terms of the degree of aggregation of the samples.More importantly, the LDA model can also identify the representative OTUs in each Topic.Compared with clustering and K-Means clustering The LDA model can not only quantify the heterogeneity of bacterial flora more effectively, but also identify the corresponding OTUs that affect heterogeneity.