Graduate Thesis Or Dissertation


Regularization-Based Data Integration and Its Application to Genome-Wide Association Studies Public Deposited

Downloadable Content

Download PDF


Attribute NameValues
  • In genome-wide association studies, it is often challenging to have consistent findings from independent studies. This "lack of reproducibility" issue is partially caused by the relatively small sample size compared to the number of genetic markers in every single study. Therefore, researchers have developed statistical methods to integrate multiple studies in order to enhance statistical power as well as to seek more consistent conclusions. This class of statistical methods is called "data integration" or "integrative analysis," in which regularization methods often play an important role to impose a special structure on the solutions. Regularization methods are widely used to build simpler statistical models for high-dimensional data and different regularization methods enjoy unique properties. In this dissertation, we propose two new regularization-based data integration methods equipped with appropriate penalty functions aiming to incorporate useful prior information and to jointly analyze datasets from multiple studies, respectively. Numerical studies evidence that the new methods outperform existing ones in terms of variable and group selection, parameter estimation, and prediction. Moreover, the new methods overcome the lack of reproducibility problem by combining information from different studies. At last, we were able to identify promising genetic signals when applying the proposed methods to real genome-wide association studies.
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Committee Member
Academic Affiliation
Rights Statement
Embargo reason
  • Pending Publication
Embargo date range
  • 2018-01-05 to 2019-02-05



This work has no parents.

In Collection: