Time series modeling of pathogen-specific disease probabilities with subsampled data.

Publication Type:

Journal Article


Biometrics (2016)


Many diseases arise due to exposure to one of multiple possible pathogens. We consider the situation in which disease counts are available over time from a study region, along with a measure of clinical disease severity, for example, mild or severe. In addition, we suppose a subset of the cases are lab tested in order to determine the pathogen responsible for disease. In such a context, we focus interest on modeling the probabilities of disease incidence given pathogen type. The time course of these probabilities is of great interest as is the association with time-varying covariates such as meteorological variables. In this set up, a natural Bayesian approach would be based on imputation of the unsampled pathogen information using Markov Chain Monte Carlo but this is computationally challenging. We describe a practical approach to inference that is easy to implement. We use an empirical Bayes procedure in a first step to estimate summary statistics. We then treat these summary statistics as the observed data and develop a Bayesian generalized additive model. We analyze data on hand, foot, and mouth disease (HFMD) in China in which there are two pathogens of primary interest, enterovirus 71 (EV71) and Coxackie A16 (CA16). We find that both EV71 and CA16 are associated with temperature, relative humidity, and wind speed, with reasonably similar functional forms for both pathogens. The important issue of confounding by time is modeled using a penalized B-spline model with a random effects representation. The level of smoothing is addressed by a careful choice of the prior on the tuning variance.