A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging.

Publication Type:

Journal Article


Bioinformatics (Oxford, England), Volume 28, Issue 13, p.1738-1744 (2012)


2012, Center-Authored Paper, May 2012, Public Health Sciences Division, Vaccine and Infectious Disease Division


MOTIVATION: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. While many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. RESULTS: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. AVAILABILITY: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. CONTACT: blogsdon@fhcrc.org.