论文部分内容阅读
Epigenetic mechanisms,such as DNA methylation,histone modification and non-coding RNA,can change gene expression and cause cancer without changing the underlying DNA sequence.During the past ~10 years,next generation sequencing has been transforming the field of epigenetics via the production of vast amounts of BS-seq,ChIP-seq and RNA-seq data.However,it also poses great challenges for the data analysis.We have developed a series of bioinformatics algorithms,several of which are now widely used in the field,to harness the full power of such big data.Through integrative analysis of epigenetic data from ENCODE and cancer genomics data from TCGA,we have made several seminal discoveries (1) DNA methylation canyons (>3.5Kb regions with very low DNA methylation) as a new genome feature in all normal cells (Nature Genetics 2014),and hyper-methylation of Canyons as a novel mechanism to activate oncogenes (unpublished).2) Broad peaks for H3K4me3 (wider than 4 kb) as the first epigenetic signature for tumor suppressors,such as TP53 and PTEN (Nature Genetics 2015).3) The first bioinformatics algorithm DaPars for Dynamic Analyses of Alternative Polyadenylation directly from RNA-Seq (Nature Communications 2014).We used DaPars to identify CFIm25 (Nature 2014),a master APA regulator,as a glioblastoma (GBM) tumor suppressor.We also showed 3UTR-shortening can repress tumor suppressors in trans through disrupting ceRNA crosstalk (Nature Genetics,in revision).Finally I will show how our bioinformatics analyses corrected several major errors in two highly cited Cell Papers.