Predicting multiallelic genes using unphased and flanking single nucleotide polymorphisms.

Publication Type:

Journal Article


Genetic epidemiology, Volume 35, Issue 2, p.85-92 (2011)


2011, Alleles, Center-Authored Paper, Clinical Research Division, Cohort Studies, Genetic Techniques, Genetics, Population, Genomics Core Facility, Haplotypes, HLA Antigens, HLA-A Antigens, HLA-B Antigens, HLA-C Antigens, HLA-DQ Antigens, HLA-DR Antigens, Humans, Molecular Epidemiology, Polymorphism, Genetic, Polymorphism, Single Nucleotide, Public Health Sciences Division, Reproducibility of Results, Research Trials Office Core Facility - Biostatistics Service, Shared Resources


Recent advances in genotyping technologies have enabled genomewide association studies (GWAS) of many complex traits including autoimmune disease, infectious disease, cancer and heart disease. To facilitate interpretations and establish biological basis, it could be advantageous to identify alleles of functional genes, beyond just single nucleotide polymorphisms (SNPs) within or nearby genes. Leslie et al. ([2008] Am J Hum Genet 82:48–56) have proposed an Identity-by-Decent method (IBD-based) for predicting human leukocyte antigen (HLA) alleles (multiallelic and highly polymorphic) with SNP data, and predictions have achieved a satisfactory accuracy on the order of 97%. Building upon their success, we introduce a complementary method for predicting highly polymorphic alleles using unphased SNP data as the training data set. Due to its generality and flexibility, the new method is readily applicable to large population studies. Applying it to HLA genes in a cohort of 630 healthy individuals as a training set, we constructed predictive models for HLA-A, B, C, DRB1 and DQB1. Then, we performed a validation study with another cohort of 630 healthy individuals, and the predictive models achieved predictive accuracies for HLA alleles defined at intermediate or high resolution ranging as high as (100%, 97%) for HLA-A, (98%, 96%) for B, (98%, 98%) for C, (97%, 96%) for DRB1 and (98%, 95%) for DQB1, respectively. These preliminary results suggest the feasibility of predicting other polymorphic genetic alleles, since HLA loci are almost certainly among most polymorphic genes.