SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association.

Publication Type:

Journal Article


Biostatistics (Oxford, England), Volume 10, Issue 4, p.680-93 (2009)


2009, Aged, Algorithms, Biostatistics, Center-Authored Paper, Female, Genome-Wide Association Study, Haplotypes, Humans, LINKAGE DISEQUILIBRIUM, Lipoproteins, Middle Aged, Models, Statistical, Polymorphism, Single Nucleotide, Public Health Sciences Division, Regression Analysis, Venous Thrombosis


Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation. Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses. Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure. Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.