Charles L. Kooperberg
Ph.D., University of California, Berkeley, Statistics, 1991.
M.A., University of California, Berkeley, Statistics, 1988.
B.Sc., Delft University of Technology, Mathematics, 1985.
Adaptive function estimation for genomic data .
My main research area is nonparametric function estimation and the analysis of high dimensional data, in particular as applied to genomic and proteomics data.
The publication of the sequence of the human genome and breakthroughs in the high throughput technologies for single nucleotide polymorphism (SNP) genotyping, gene expression, and protein measurements have offered new opportunities for the study of genome complexity. New technologies are generating large amounts of high-dimensional data at an astounding speed. Relative to the high dimension of the data the number of independent samples is often rather small, either because the techniques are too expensive, or because it is hard to obtain enough independent biological samples. Clearly, the development of new statistical techniques is required for the extraction of useful biological information from such data.
Adaptive regression methods, which combine variable selection and nonlinear modeling, are well suited for many of these problems. In my research I try to develop and enhance these methods to address the practical problems that arise directly from several collaborative projects. In particular I focus on association studies with SNP, microarray, and proteomics data.
For SNP association studies we have developed Logic Regression. Logic Regression is a methodology for regression problems in which all (most) of the predictors are binary, and in which our interest is to discover potential high order interactions between these predictors. In Logic Regression new predictors that are logic (Boolean) combinations of the binary predictors are constructed.
There are numerous situations in which data is generated by some (unknown) mechanism, where interest lies in estimating a function that is related to a model for the data. In the polynomial spline approach we model such a function to be in a linear space of smooth piecewise polynomials (splines). In practice we often use stepwise algorithms to determine this space adaptively. For example, in the popular proportional hazards model the dependence of survival times on the covariates is modeled fully parametrically. Hazard regression (HARE) employs an adaptive algorithm based on splines to model the conditional log-hazard function. It does not assume a proportional hazards model, but it contains these models as a special subclass. I am both interested in developing similar nonparametric methodologies for other function estimation problems, as in the extension of existing methods to applications of these methodologies.
In addition, I am actively involved in the activities of the Clinical Coordinating Center of the Women's Health Initiative. This is a 15-year program that involves a clinical trial of 67,500 postmenopausal women and an observational study of an additional 100,000 women. The primary outcomes that are studied are breast cancer, colorectal cancer, coronary heart disease and hip fractures. Within the coordinating center I am primarily involved in the outcomes procedures and the periodic reporting to the Data Safety and Monitoring Board.
1999-2003, Affiliate Associate Professor, University of Washington, School of Public Health and Community Medicine, Biostatistics
1997-2002, Associate Member, Fred Hutchinson Cancer Research Center, Public Health Sciences Division, Biostatistics
1991-1997, Assistant Professor, University of Washington, College of Arts and Sciences, Statistics
Rare and low-frequency coding variants alter human adult height.. Nature. 542(7640):186-190.. 2017.
Trans-ethnic Fine Mapping Highlights Kidney-Function Genes Linked to Salt Sensitivity.. American journal of human genetics. 99(3):636-46.. 2016.
Replication of Genome-Wide Association Study Findings of Longevity in White, African American, and Hispanic Women: The Women's Health Initiative.. The journals of gerontology. Series A, Biological sciences and medical sciences.. 2016.
Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci.. Nature genetics. 48(10):1162-1170.. 2016.
A reference panel of 64,976 haplotypes for genotype imputation.. Nature genetics. 48(10):1279-1283.. 2016.
Three new pancreatic cancer susceptibility signals identified on chromosomes 1q32.1, 5p15.33 and 8q24.21.. Oncotarget. 7(41):66328-66343.. 2016.
Generalization and fine mapping of European ancestry-based central adiposity variants in African ancestry populations.. International journal of obesity (2005).. 2016.
Long-term oral bisphosphonate use in relation to fracture risk in postmenopausal women with breast cancer: findings from the Women's Health Initiative.. Menopause (New York, N.Y.). 23(11):1168-1175.. 2016.
Augmented case-only designs for randomized clinical trials with failure time endpoints.. Biometrics. 72(1):30-38.. 2016.
A meta-analysis of 120 246 individuals identifies 18 new loci for fibrinogen concentration.. Human molecular genetics. 25(2):358-70.. 2016.
Testing the Role of Predicted Gene Knockouts in Human Anthropometric Trait Variation.. Human molecular genetics.. 2016.
Gene by Environment Investigation of Incident Lung Cancer Risk in African-Americans.. EBioMedicine. 4:153-61.. 2016.
Genome-wide Trans-ethnic Meta-analysis Identifies Seven Genetic Loci Influencing Erythrocyte Traits and a Role for RBPMS in Erythropoiesis.. American journal of human genetics.. 2016.
SOS2 and ACP1 Loci Identified through Large-Scale Exome Chip Analysis Regulate Kidney Development and Function.. Journal of the American Society of Nephrology : JASN.. 2016.
Four Susceptibility Loci for Gallstone Disease Identified in a Meta-analysis of Genome-wide Association Studies.. Gastroenterology. 151(2):351-363.. 2016.
Female chromosome X mosaicism is age-related and preferentially affects the inactivated X chromosome.. Nature communications. 7:11843.. 2016.
Strategies for Enriching Variant Coverage in Candidate Disease Loci on a Multiethnic Genotyping Array.. PloS one. 11(12):e0167758.. 2016.
Group association test using a hidden Markov model.. Biostatistics (Oxford, England).. 2015.
Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture.. Nature. 526(7571):112-7.. 2015.
Leukocyte Telomere Length and Risks of Incident Coronary Heart Disease and Mortality in a Racially Diverse Population of Postmenopausal Women.. Arteriosclerosis, thrombosis, and vascular biology. 35(10):2225-31.. 2015.