论文部分内容阅读
Background: Based on RNA-seq data, currently, there is a lack of satisfactory method for detecting differentially expressed genes when only a single biological replicate is available.Surprisingly, even as the sequencing cost decreases, most of published RNA-seq studies do not have biological replicates.For example, in the last four year, almost 70% of all the human RNA-seq samples in Gene Expression Omnibus (GEO) do not have biological replicates.From 2010 to 2011, the number of un-replicated RNA-seq samples increases even faster than the number of replicated RNA-seq samples.Methods: In this paper, we describe a technique for measuring fold change that takes into account the uncertainty of gene expression measurement by RNA-seq.Our representation of fold change is derived from the posterior distribution of the raw fold change.This representation, denoted as GFOLD, balances the estimated degree of change with the significance of this change.We also built a hierarchical model for cases in which biological replicates are available.The calculation is based on MCMC.Results: We applied GFOLD to five datasets (4 RNA-seq and 1 GRO-seq) with biological replicates and compared it with edgeR, DESeq, DEGseq, Poisson, Cufflinks and fold change with offset.Comparisons show that GFOLD outperforms all other methods in most cases when there is only a single replicate.When biological replicates are available, GFOLD provides comparable results to existing methods.Conclusions: GFOLD provides a more consistent and more biologically meaningful approach to ranking differentially expressed genes than other commonly used methods for RNA-seq data without biological replicates.The concept of GFOLD can be broadly applied, beyond RNA-seq or GRO-seq, to other types of genomic data, including ChIP-seq .