Bayesian diagnostics - personal logistic regression

The following checks were made when implementing the Bayesian logistic regressions using the first imputed data set. This follows the When to worry and how to Avoid the Misuse of Bayesian Statistics (WAMBS)-Checklist. As uninformative priors were used, the steps involving comparing informative to uninformative priors were skipped.

The priors were the same for all models, so these are only presented once. However, steps 2-6 are presented for each model (corresponding to goal type). All these steps are also repeated for the ordinal regression found here.

1) Description of priors

There was limited previous research which could be used to construct informative priors. Consequently, as this was an exploratory study, generic and unformative priors were chosen. These were considered uninformative, when compared to the sample size and explanatory variable scale (all continuous variables were scaled and centered). These are generic, and so the same prior distributional form was supplied for all coefficient and intercept parameters, as there was no prior evidence suggesting a specific form, and the normal distribution was considered to be parsimonious. Consequently, all coefficient and intercepts priors were fixed to

\[ prior \sim \mathcal{N}(0,100)\ \]

where $ $ denotes the normal distribution. Additionally, the default weakly informative priors were used for the standard deviation of random effects. This was a half student-t distribution with 3 degrees of freedom, and a scale parameter of 2.5. This prior was used in all models.

The cumulative logit models were run with the above specified priors, for 15000 iterations with a burn-in of 7500 iterations which were discarded, with four chains.

Personal model - goal endorsement

2) Examining convergence

The trace-plots for each model and chain, and for each parameter, were examined. Here, we’re looking for marked jumps indicating the model has not converged.

PSRF values from the Gelman-Rubin Diagnostic were also examined, checking if the upper CI was near 1, which it was for all models and parameters.

# Convert to MCMC object 
personal_modelposterior.a <- as.mcmc(personal_model.a)

# Gelman-Rubin Diagnostic
gelman.plot(personal_modelposterior.a[,1:param_num])

The Geweke Diagnostic, testing for equality between the first and last half of the iterations, was performed. The estimates were also plotted, examining if the values exceed the 1.96 or below -1.96 z-score boundaries. Both tests indicated the models had converged.

#  Geweke diagnostic
geweke.plot(personal_modelposterior.a[, 1:param_num])

3) Examination of convergence after doubling the number of iterations

To check that the models had not simply reached local convergence, the model was re-run with twice the number of iterations.

# Load double model
s3load("personal_model.a.d.RData", bucket = "cor-analysis")

# Burn in and iterations for double model 
burn_in.d <- personal_model.a.d$fit@sim$warmup
iterations.d <- personal_model.a.d$fit@sim$iter

We can look at the trace-plots.

The Gelman-Rubin Diagnostic.

# Convert to MCMC object 
personal_model_posterior_d <- as.mcmc(personal_model.a.d)

# Gelman-Rubin Diagnostic
gelman.plot(personal_model_posterior_d[,1:param_num])

The Geweke Diagnostic.

#  Geweke diagnostic
geweke.plot(personal_model_posterior_d[, 1:param_num])

4) Examine the histogram of estimates in each iteration to determine if there were enough iterations to approximate the posterior

Examining a histogram of posterior distribution density indicates if it contains enough information. Regression coefficients do not have to be symmetric, but should form a peak with sloping sides.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

5) Examine the degree of autocorrelation between the chains

The autocorrelation between chains was examined. High autocorrelation alone is not an issue, but if accompanied by poor convergence, might be indicative of several problems.

## Using lag as id variables

6) Evaluate if the posterior distribution is sensible

The posterior distributions were checked to see if they are sensible. This included checking they were smooth, did not have a posterior SD greater than the scale of the original parameter, and the range of the posterior CI was not greater than the scale of the original parameter.

# Personal 
mcmc_plot(personal_model.a, type = "dens", pars = "^b_")

# Remove used objects 
remove(list=c("personal_model.a"))
file.remove("personal_model.a.RData")

## Warning in file.remove("personal_model.a.RData"): cannot remove file 'personal_model.a.RData', reason 'No such file or directory'

## [1] FALSE