Using the Whole Cohort in the Analysis of Case-Control Data: Application to the Women's Health Initiative.

Publication Type:

Journal Article


Statistics in biosciences, Volume 5, Issue 2 (2013)


2013, January 2014, Public Health Sciences Division


Standard analyses of data from case-control studies that are nested in a large cohort ignore information available for cohort members not sampled for the sub-study. This paper reviews several methods designed to increase estimation efficiency by using more of the data, treating the case-control sample as a two or three phase stratified sample. When applied to a study of coronary heart disease among women in the hormone trials of the Women's Health Initiative, modest but increasing gains in precision of regression coefficients were observed depending on the amount of cohort information used in the analysis. The gains were particularly evident for pseudo- or maximum likelihood estimates whose validity depends on the assumed model being correct. Larger standard errors were obtained for coefficients estimated by inverse probability weighted methods that are more robust to model misspecification. Such misspecification may have been responsible for an important difference in one key regression coefficient estimated using the weighted compared with the more efficient methods.