SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association.

Publication Type:

Journal Article

Source:

Biostatistics (Oxford, England), Volume 10, Issue 4, p.680-93 (2009)

Keywords:

2009, Aged, Algorithms, Biostatistics, Center-Authored Paper, Female, Genome-Wide Association Study, Haplotypes, Humans, LINKAGE DISEQUILIBRIUM, Lipoproteins, Middle Aged, Models, Statistical, Polymorphism, Single Nucleotide, Public Health Sciences Division, Regression Analysis, Venous Thrombosis

Abstract:

Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation. Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses. Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure. Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.