Exploratory Data Analysis

Suppose you own a company that produces chocolate chip cookies. In mass-produced chocolate chip cookies, you prepare a large amount of dough, then mix in a large number of chips, and then chunk out individual cookies. In this process, the number of chips per cookie approximately follows a Poisson distribution (the number of chips per cookie) with some mean lambda (the average number of chips per cookie). You decide to carry out an experiment. You produce 150 test cookies: 30 from your location and then 30 from four other factory locations. Let’s take a look at the data you got:

column=450x

column=450x

Bayesian Hierarchical Modeling

Convergence Diagnostics I

The plots look really good, it appears as if everything has converged, but you want to conduct more precise checks to monitor convergence of Markov Chain Monte Carlo (MCMC), for instance, using Gelman diagnostics.

Potential scale reduction factors:

       Point est. Upper C.I.
lam[1]       1.00       1.00
lam[2]       1.00       1.00
lam[3]       1.00       1.00
lam[4]       1.00       1.00
lam[5]       1.00       1.00
mu           1.00       1.00
sig          1.01       1.02

Multivariate psrf

1

The Gelman and Rubin’s convergence diagnostic looks good as approximate convergence is diagnosed when the upper limit is close to 1.

Convergence Diagnostics II

The autocorrelation function (and plots, not shown here) for Markov chains looks also good.

              lam[1]       lam[2]       lam[3]       lam[4]       lam[5]
Lag 0   1.0000000000  1.000000000  1.000000000  1.000000000  1.000000000
Lag 1   0.0093518171  0.125299590  0.021157613  0.023630411  0.077075415
Lag 5   0.0002784837 -0.004181718 -0.006751679  0.002617047  0.007110552
Lag 10 -0.0176500262  0.005656024  0.001876984  0.011418855 -0.010427011
Lag 50 -0.0026948190 -0.018116694  0.001672026 -0.001343748  0.007007317
                 mu         sig
Lag 0   1.000000000  1.00000000
Lag 1   0.369341221  0.58415339
Lag 5   0.018586853  0.10616631
Lag 10  0.005823401  0.01955953
Lag 50 -0.007425217 -0.01253007

Convergence Diagnostics III

The effective sample size for estimating the mean adjusted for autocorrelation are summed across chains and look pretty much the same size as the simulation size, so this looks OK as well. Lastly, you generate the deviance information criterion (DIC) and the penalized expected deviance.

   lam[1]    lam[2]    lam[3]    lam[4]    lam[5]        mu       sig 
14927.946 10522.670 14285.616 14447.184 11385.385  6679.432  3457.466 
Mean deviance:  783.6 
penalty 4.787 
Penalized deviance: 788.4 

Model Checking I

After assessing convergence, you must check the fit via residuals. As this is a hierarchical model, there are two levels of residuals: the observation level and the location mean level. To simplify, you look at the residuals associated with the posterior means of the parameters. First, you check observation residuals, based on the estimates of location means. The plot looks good. There seems to be an increase in the variance of the residuals, but that is to be expected because this a Poisson likelihood, where the variance increases as the mean increases. Another way to check that is to calculate the variance of the residuals for each group.

Model Checking II

Now you can look at how the location means differ from the overall mean μ (mu: location level residuals), lambda

Model Results