Bayesian Hierarchical Modeling to Assess Pathogen Risk in Natural Water Supplies


Ciprian Crainiceanu1, David Ruppert, Jery Stedinger, and Christopher Behr




Keywords: Bayesian analysis, waterborne pathogens, Generalized Linear Mixed Model





Cryptosporidium parvum is a microscopic waterborne organism that once ingested can produce gastrointestinal illness, and even death in individuals with a weakened immune system. The Environmental Protection Agency (EPA) conducted national investigations of Cryptosporidium concentrations under the Information Collection Rule (ICR). The statistical analysis of this data is challenging because of the discrete nature of the response variable (number of oocysts actually counted), the high frequency of zero counts (90%), seasonality, regional effects, and missing observations.


Observed count data serves as the basis of a Generalized Linear Mixed Model (GLMM) with a hierarchical structure that includes sites, regions and an overall national average. Possible covariates include site characteristics (type of water source, population served), and time dependent covariates (sampling date, flow rate, water turbidity). A fully Bayesian approach is used for modeling and subsequent risk analysis. MCMC simulation is employed to compute the posterior distributions of the parameters.


Results illustrate the steps involved in parameter estimation, model selection, and risk assessment. The replicates generated by the simulation are used to describe parameter uncertainty and the predictive distribution of Cryptosporidium concentrations in the subsequent analysis of the cost-effectiveness of alternative EPA information collection strategies and treatment rules. Several strategies for improving MCMC simulation performance are discussed.




1 Department of Statistical Science, 301 Malott Hall, Ithaca NY 14853, Cornell University, E-mail: cmc59@cornell.edu