Orthogonal Projection Correction for Confounders

来源 :第五届全国生物信息学与系统生物学学术大会 | 被引量 : 0次 | 上传用户:ssskstar
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Background: Machine learning methods are widely used in the field of bioinformatics, for example, to discover important genes for specific disease or phenotype, to classify proteins based on their structure, and to predict the risk of a disease for an individual based on its genome.However, the existence of confounders such as age, gender, data source and population structure increases the difficulty level for the tasks, especially when the data are not sampled randomly.It is still unclear how to effectively correct confounders.Methods: In this work, we propose an orthogonal projection correction (OPC) method to correct confounders.This is achieved by looking for a specific decomposition of each feature to a confounding component and non-confounding component, such that the original data can be best reconstructed by only the non-confounding components of features.We show that this can be done by an orthogonal projection of each feature to the complement of a confounder subspace, and the OPC procedure is kernelizable.We then further propose a proSVM method by integrating the OPC method with support vector machine for classification.Results: We applied the OPC method on cross-platform microarray data for tumor diagnosis and SNP data of Arabidopsis thaliana in presence of population structure for genome-wide analysis.In both experiments, the OPC method for confounder correction outperforms the other competitors.Conclusions: In this work, we proposed the OPC method for confounder correction and showed the OPC method has excellent performance for confounder correction in biological applications .
其他文献
Background: Integrative and conjugative elements (ICEs) are bacterial self-transmissible elements that encode a full complement of machinery for conjugation as well as excision from the chromosome and
Background: Antifreeze proteins(AFPs), also known as thermal hysteresis proteins, are ice-binding proteins.AFPs can adsorb to ice crystal surface and inhibit the growth of ice crystals in solution.So
Background: MicroRNAs (miRNAs) are a set of short (19~24nt) non-coding RNAs that play significant roles as posttranscriptional regulators in animals and plants.The ab initio prediction methods show ex
Background: The research of protein thermostability is very important both in understanding the mechanism of protein unfolding and industrial application.There have been many strategies suggested to i
Background: Human complement receptor type 2 (CR2/CD21), a cell surface protein highly expressed on B cells and follicular dendritic cells, is a member of the regulators of complement activation prote
Background: MicroRNAs (miRNAs) are a class of small non-coding RNAs, which negatively regulate protein coding genes at the posttranscriptional level.These tiny regulators have been associated to almos
Background: T-cell acute lymphoblastic leukemia (T-ALL) is an aggressive hematological malignancy, understanding of its gene expression regulation and molecular mechanisms still remain elusive.MicroRN
Background: Profilin is involved in motility and invasion of apicomplexan (protozoan) parasites and is used for invading host cells.In 2005, mouse Toll-like receptor (TLR) 11 was found to initiate an
Background: As a member of the E6AP carboxy terminus(HECT) domain-containing family ofubiquitin E3 ligase, Nedd4 is known to be a unique E3 protein containing the overall structure which is highly con
会议
Background: Since various diseases and therapeutic approaches are correlated with protein subcellular localization, effective medical approaches require delivery of the drug to the appropriate subcell