论文部分内容阅读
目的通过全基因组序列分析结核分枝杆菌(MTB)单核苷酸多态性(SNP)特征,为结核病的预防、控制及治疗提供参考依据。方法从美国国立生物技术信息中心(NCBI)和欧洲核酸数据库(ENA)中共下载来自全球2 372株MTB全基因组序列,原始数据按照质控要求去除冗余,BWA v 0.7.12软件将菌株的测序文件回帖到结核杆菌参考基因组H37Rv上;SAMtools v 1.3、Picard v 1.112、Varscan筛选SNPs位点以及去除非特异性SNP位点;采用最大似然法软件RAxML v 8.2.8构建系统进化树;Genepop v 4.5.1软件计算每个SNP位点的遗传分化系数(Fst);SnpEff v 4.3c软件注释。结果初步筛选得到107 654个SNP位点,构建的系统进化树将2 347株MTB明确地划分为7个谱系以及69个亚谱系。优化后得到285个谱系定义的SNP位点,将2 347株MTB准确划分为7个分支及67个亚谱系。结论本研究通过基因组序列分析发现一批基于系统进化的SNP位点,而且基于系统进化285个SNP位点不仅可以用于系统发育及进化相关分析,同时也能够作为基因分型技术靶标,用于结核病分子流行病学。
Objective To analyze the characteristics of single nucleotide polymorphism (SNP) of Mycobacterium tuberculosis (MTB) by genome-wide sequence analysis and provide a reference for the prevention, control and treatment of tuberculosis. METHODS: A total of 2 372 MTB genome sequences were downloaded from NCBI and ENA. The original data was deleted according to the quality control requirements. BWA v 0.7.12 software sequenced the strains SAMtools v 1.3, Picard v 1.112, Varscan screening SNPs and removing non-specific SNP sites; phylogenetic tree using the maximum likelihood method software RAxML v 8.2.8; Genepop v 4.5 .1 Software calculates genetic differentiation coefficient (Fst) at each SNP site; SnpEff v 4.3c Software Annotation. RESULTS: A total of 107 654 SNPs were screened by preliminary screening. The phylogenetic tree was divided into 7 pedigrees and 69 sub-lineages. After optimization, 285 SNPs defined by pedigree were obtained, and 2 347 MTBs were accurately divided into 7 branches and 67 sub-lines. Conclusion In this study, we found a number of SNP sites based on phylogenetic analysis by genomic sequence analysis. Based on phylogenetic analysis, 285 SNP sites can be used not only for phylogeny and phylogenetic analysis, but also as a target for genotyping Molecular epidemiology of tuberculosis.