¶¡ÏãÔ°AV

MENU

DAISY (YING) YU

Title: Penalized Likelihood Methods for Sparse Datasets, with Applications to Genetic Epidemiology
Date:
Thursday, August 17th, 2023
Time: 10:00AM
Location:
Zoom
Supervised by: Dr. Brad McNeney

Abstract: Increasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare exposures. Sparseness leads to maximum likelihood estiamtes (MLEs) of log odds-ratio parameters that are biased away from their null value of zero and tests with inflated type I errors. Different penalized-likelihood methods have been developed to mitigate sparse-data bias. We study penalized logistic and conditional regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. The thesis is organized in three parts. First, we propose a two-step methodology for implementing log-F penalization for inference of regression parameters from logistic regression, with application to genome-wide association studies. In the first step we estimate the shrinkage parameter, and in the second step we use the penalized regression estimator to estimate single-variant associations across the genome. Next, we explore log-F penalization for inference of regression parameters from conditional logistic regression, with applicationto data from matched case-control and case-parent trio studies. In the first two projects we use simulation to study the statistical properties of our methods and make comparisons to methods that use Firth penalization. Finally, we apply log-F-penalized logistic regression to data from the UK Biobank, to investigate the method’s feasibility for genome-wide, biobank-scale data. The complexity and size of biobank data present unique challenges, and we make modifications to our methodology to increase its flexibility and adaptability to such datasets.


Keywords: rare-variant analysis; penalized logistic regression; conditional logistic regression; sparse-data bias; empirical Bayes; UK Biobank