论文部分内容阅读
本研究利用NCBI的GenBank数据库中公布的花生86132条EST序列以及利用高油酸品种E12所创建的cDNA文库中的12501条EST序列,对这些序列进行前期处理,总共获得非冗余且拼接较长的singleton11260条,contig9972条。通过MISA软件分析发现两个EST库中共包含有3104个SSR位点,占到总共非冗余序列的11.08%。这些SSR位点被分成二核苷酸重复、三核苷酸重复、四核苷酸重复、五核苷酸重复、六核苷酸重复以及混合核苷酸重复等,其中三核苷酸重复占的比例最多,分别占到NCBI和cDNA文库的43.0%和56.8%,二核苷酸和五核苷酸重复占到所有重复位点的第二位和第三位,六核苷酸重复的比例最少。在所有重复基序中,AG/TC重复的数量最多,分别占到NCBI和cDNA文库的8.65%和13.42%。在三核苷酸重复中,CTT/GAA出现的频率最大,分别占到6.7%和13.42%。所有这些SSR基序的长度在4~51个之间。
In this study, 86132 EST sequences of peanuts published in the NCBI GenBank database and 12501 EST sequences of cDNA library created by the high-oleic variety E12 were used to pre-process these sequences to obtain non-redundant and long splicing Singleton11260, contig9972. A total of 3,104 SSR loci were found in the two EST pools by MISA software analysis, accounting for 11.08% of the total non-redundant sequences. These SSR sites are divided into dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats, hexanucleotide repeats and mixed nucleotide repeats, among which trinucleotide repeats Accounted for 43.0% and 56.8% of the NCBI and cDNA libraries, respectively. The dinucleotide and pentranucleotide repeats accounted for the second and third positions of all the repeats, and the proportion of hexanucleotide repeats least. Among all the repeat motifs, the largest number of AG / TC repeats accounted for 8.65% and 13.42% of the NCBI and cDNA libraries, respectively. In trinucleotide repeats, CTT / GAA appeared the most frequently, accounting for 6.7% and 13.42% respectively. The length of all these SSR motifs is between 4 and 51.