Bayesian Analysis - KL 01/02/18

Material is derived from the following sources:

Doing Bayesian Data Analysis, John Kruschke, 2015
Bayesian Analysis for the Social Sciences, Simon Jackman, 2009

Two foundational ideas in bayesian data analysis.

The reallocation of credibility across plausable outcomes is Bayesian inference.
- Our beliefs can be quantified into credibility we can assign to possible values of an outcome. The re-allocated credability to possibilities is the posterior distribution (our belief after taken into account new observations).
- The posterior distribution becomes prior beliefs for future obserations.
Plausable outcomes are parameter values in meaningful mathematical models.
- Important to bayesian analysis is defining the set of possibilities with allocated credibility. Not entirely possible to capture all outcomes but we choose a set of possibilities that cover the range of outcomes we are interested in.

Steps for Bayesian Analysis:

Identify measurement scales of data of interest.
Define a descriptive model. Parameters should be meaningful and appropriate for analysis.
Specify prior distribution on the parameters that stands up to scrutny.
Use bayesian inference to re-allocate credibiity across parameter values. Interpret posterior distribution.
Does the posterior distribution mimic the data (i.e., posterior predictive check).

Bayes rule applied to parameters and data. \[ \underbrace{p(\theta|D)}_{posterior} ~=~ \frac{\overbrace{p(D|\theta)}^{likelihood}~ \overbrace{p(\theta)}^{prior}} {\underbrace{p(D)}_{evidence}} \] where \[ p(D)=\sum_{\theta^*}~p(D|\theta^*)~p(\theta^*) \]

\(\theta^*\) reminds us that it is distinct from theta in the numerator.
Likelihood \(p(D|\theta)\) is the probability that the data could be generated by the model with parameter value \(\theta\)
Prior \(p(\theta)\) is the credibility of the \(\theta\) values without the data.
Evidence, or marginal likelihood, \(p(D)\) is the overall probability of the data according to the model. Averaging all possible parameter values weighted by the strength of belief (prior) in those rameter values.
For continous variables, the marginal likelihood woulbe be as follows: \(p(D)=\lmoustache d \theta^* P(D|\theta^*)P(\theta^*)\)
Posterior \(p(D)\) is the credibility of \(\theta\) values with the data \(D\) taken into account.

Another interpretation of Bayes theorem:

\[ p(\theta)~(Prior ~Beliefs)~ \Rightarrow ~y~(data)~ \Rightarrow ~p(\theta|y) ~(Posterior ~Beliefs) \] ####In this graphical representation \(\theta\) , can be a parameter, hypothesis, model, or data point subject to uncertainty and bayes theorem rearranges prior beliefs in light of new data to generate a posterior belief \(p(\theta|y)\). Therefore, we learn from the data, as in posterior distribution are probabilities over \(p(\theta)\).

Comparing to Frequentist Inference:

The main difference is in how each methodology conditions. * Bayesian procedure = p(hypothesis|data) - conditions on data to make statment about a hypothesis or parameter. + This approach is realistic considering in most cases we cannot conduct repeated sampling undermining the notion of standard errors and p-values. * Frequentist procedure = P(data|hypothesis) - conditions on the hypothesis to infer on the plausability of the data and proceeds to reject or accept the null hypothesis. + This relates to “statistical significance” interpreting results with a p-value (defined as the relative frequency of obtaining at least as extreme as the result actually obtained). + What this approach is really asking is “how often”" would we obtain results at least as extreme as the actual results that were obtained under the assumption that the null hypothesis is true.

A 2nd distinction between the two approaches is in the interpretation of confidence intervals * Frequentist: Parameter values in the population are fixed and so a parameter estimate is in the confidencer interval or is not. Interpretation, 95% of samples on n will contain the parameter estimate. * Bayes: credible intervals; states as a probability “I am 95% sure that the parameter estimates (not assumed to be fixed) is within the two credible intervals.”

Based on Doing Bayesian Data Analysis:

Just Another Gibbs Sampler (JAGS) communicates with R through rjags and runjags and is used for building Markov Chain Monte Carlo (MCMC) sampling using the Gibbs sampling.

Let’s load the Titanic dataset

library(COUNT)
library(dplyr)
data("titanic", package="COUNT")
head(titanic)

##       class    age sex survived
## 1 1st class adults man      yes
## 2 1st class adults man      yes
## 3 1st class adults man      yes
## 4 1st class adults man      yes
## 5 1st class adults man      yes
## 6 1st class adults man      yes

titanic %>% group_by(sex) %>% summarise(count=n())

## # A tibble: 2 x 2
##      sex count
##   <fctr> <int>
## 1  women   447
## 2    man   869

The model we will fit to the data is a bernoulli likelihood funtion for y=did not survive vs. survived conditional on sex s=female or male. \[y_{i|s}\sim dbern(\theta_s)\] With each surivor’s bias, \(\theta_s\), has an independent beta-distributed prior; \[y_{s}\sim dbeta(A, B)\] where A=2 and B=2.

Building the Model:

First, we need to create a numerical outcome “survived” from 0=did not survive and 1=survived.

s <- relevel(titanic$sex, ref = "man")
y <- as.numeric(titanic$survived)
y[y ==1]=0
y[y ==2]=1
y <- as.integer(y)
myData <- data.frame(y, s)

We will also want to examine the data.

library(gmodels)
Sur_Sex <- table(y,s) # 1=survived, 0=did not surive
Response_joint <- CrossTable(Sur_Sex, prop.chisq=F)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1316 
## 
##  
##              | s 
##            y |       man |     women | Row Total | 
## -------------|-----------|-----------|-----------|
##            0 |       694 |       123 |       817 | 
##              |     0.849 |     0.151 |     0.621 | 
##              |     0.799 |     0.275 |           | 
##              |     0.527 |     0.093 |           | 
## -------------|-----------|-----------|-----------|
##            1 |       175 |       324 |       499 | 
##              |     0.351 |     0.649 |     0.379 | 
##              |     0.201 |     0.725 |           | 
##              |     0.133 |     0.246 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |       869 |       447 |      1316 | 
##              |     0.660 |     0.340 |           | 
## -------------|-----------|-----------|-----------|
## 
##

Response_joint$prop.col

##    s
## y         man     women
##   0 0.7986191 0.2751678
##   1 0.2013809 0.7248322

We see that the \(p(surviving|male)=.201\) given males and \(p(surviving|female)=.725\) given females.

Hypothesis: We would expect females to out survive males in our data hence our prior will be set to a=3, b=2 which is slightly skewed to the left.

We want to load a few functions from the Doing Bayesian Data Analysis book.

## 
## *********************************************************************
## Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
## A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
## *********************************************************************

First step in Bayesian data analysis is to create a list from the dataset we created earlier.

dataList <- list(y=myData$y, s=myData$s, Ntotal=length(myData$y), 
                 Nsubj=length(unique(myData$s)))

Next we create the model within a string for JAGS to call.

modelString = "
  model {
    for ( i in 1:Ntotal ) {
      y[i] ~ dbern( theta[s[i]] )
    }
    for ( s in 1:Nsubj ) {
      theta[s] ~ dbeta( 3 , 2 ) 
    }
  }
  "
writeLines( modelString , con="TEMPmodel.txt" )

We run the model in JAGS with n.chains set to 2 since this example was performed on a laptop with a dou core processor. n.adapt are adaptation steps (tunning) and default is set to 1000.

model <- jags.model(textConnection(modelString), data = dataList, n.chains = 2, n.adapt= 500)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1316
##    Unobserved stochastic nodes: 2
##    Total graph size: 2640
## 
## Initializing model

Using the update function, we set the number of iterations to 1000 to take in the MCMC chains. This is a short burn in process.

update(model, n.iter = 1000)  # Burnin for 10000 samples

After burn in we generate MCMC samples to represent the posterior distribution. We will generate 4446 steps since we specified 2 chains (# of chains * n.iter)

codaSamples <- coda.samples(model, variable.names=c("theta"), n.iter=2223)

Let’s take a glance at the summary of the posterior with credible intervals.

summary(codaSamples)

## 
## Iterations = 1001:3223
## Thinning interval = 1 
## Number of chains = 2 
## Sample size per chain = 2223 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##            Mean      SD  Naive SE Time-series SE
## theta[1] 0.2036 0.01348 0.0002022      0.0002022
## theta[2] 0.7230 0.02083 0.0003124      0.0003027
## 
## 2. Quantiles for each variable:
## 
##            2.5%    25%    50%    75% 97.5%
## theta[1] 0.1778 0.1943 0.2032 0.2122 0.231
## theta[2] 0.6816 0.7089 0.7230 0.7373 0.762

We can display diagnostics of chain and specify parameter.

diag_MCMC_theta1 <- diagMCMC(codaSamples, parName="theta[1]")
diag_MCMC_theta1

## NULL

diag_MCMC_theta2 <- diagMCMC(codaSamples, parName="theta[2]")
diag_MCMC_theta2

## NULL

Displays numerical summary statistics of chain.

smryMCMC(codaSamples, compVal = NULL, compValDiff = 0.0)

##                         Mean     Median       Mode  ESS HDImass     HDIlow
## theta[1]           0.2035516  0.2031885  0.2030383 4446    0.95  0.1765473
## theta[2]           0.7229916  0.7230156  0.7260973 4446    0.95  0.6821370
## theta[1]-theta[2] -0.5194400 -0.5195290 -0.5205342 4446    0.95 -0.5673920
##                      HDIhigh CompVal PcntGtCompVal ROPElow ROPEhigh
## theta[1]           0.2292045      NA            NA      NA       NA
## theta[2]           0.7622933      NA            NA      NA       NA
## theta[1]-theta[2] -0.4709618       0             0      NA       NA
##                   PcntLtROPE PcntInROPE PcntGtROPE
## theta[1]                  NA         NA         NA
## theta[2]                  NA         NA         NA
## theta[1]-theta[2]         NA         NA         NA

##                         Mean     Median       Mode  ESS HDImass     HDIlow
## theta[1]           0.2035516  0.2031885  0.2030383 4446    0.95  0.1765473
## theta[2]           0.7229916  0.7230156  0.7260973 4446    0.95  0.6821370
## theta[1]-theta[2] -0.5194400 -0.5195290 -0.5205342 4446    0.95 -0.5673920
##                      HDIhigh CompVal PcntGtCompVal ROPElow ROPEhigh
## theta[1]           0.2292045      NA            NA      NA       NA
## theta[2]           0.7622933      NA            NA      NA       NA
## theta[1]-theta[2] -0.4709618       0             0      NA       NA
##                   PcntLtROPE PcntInROPE PcntGtROPE
## theta[1]                  NA         NA         NA
## theta[2]                  NA         NA         NA
## theta[1]-theta[2]         NA         NA         NA

Display of graphical posterior information.

plot_mcmc <- plotMCMC(codaSamples, data=myData, compVal=NULL, compValDiff=0.0)
plot_mcmc

## NULL

Notice that our modes for our parameter estimates almost resemble our expected probabilities we computed earlier.

It is also a good habit to examine the prior distribution for difference of biases. The only change required would be to comment out “y” in our list.

dataList <- list(
 # y=myData$y, 
  s=myData$s, 
  Ntotal=length(myData$y),
  Nsubj=length(unique(myData$s)))