The reallocation of credibility across plausable outcomes is Bayesian inference.
Our beliefs can be quantified into credibility we can assign to possible values of an outcome. The re-allocated credability to possibilities is the posterior distribution (our belief after taken into account new observations).
The posterior distribution becomes prior beliefs for future obserations.
Plausable outcomes are parameter values in meaningful mathematical models.
Bayes rule applied to parameters and data. \[ \underbrace{p(\theta|D)}_{posterior} ~=~ \frac{\overbrace{p(D|\theta)}^{likelihood}~ \overbrace{p(\theta)}^{prior}} {\underbrace{p(D)}_{evidence}} \] where \[ p(D)=\sum_{\theta^*}~p(D|\theta^*)~p(\theta^*) \]
\[ p(\theta)~(Prior ~Beliefs)~ \Rightarrow ~y~(data)~ \Rightarrow ~p(\theta|y) ~(Posterior ~Beliefs) \] ####In this graphical representation \(\theta\) , can be a parameter, hypothesis, model, or data point subject to uncertainty and bayes theorem rearranges prior beliefs in light of new data to generate a posterior belief \(p(\theta|y)\). Therefore, we learn from the data, as in posterior distribution are probabilities over \(p(\theta)\).
The main difference is in how each methodology conditions. * Bayesian procedure = p(hypothesis|data) - conditions on data to make statment about a hypothesis or parameter. + This approach is realistic considering in most cases we cannot conduct repeated sampling undermining the notion of standard errors and p-values. * Frequentist procedure = P(data|hypothesis) - conditions on the hypothesis to infer on the plausability of the data and proceeds to reject or accept the null hypothesis. + This relates to “statistical significance” interpreting results with a p-value (defined as the relative frequency of obtaining at least as extreme as the result actually obtained). + What this approach is really asking is “how often”" would we obtain results at least as extreme as the actual results that were obtained under the assumption that the null hypothesis is true.
A 2nd distinction between the two approaches is in the interpretation of confidence intervals * Frequentist: Parameter values in the population are fixed and so a parameter estimate is in the confidencer interval or is not. Interpretation, 95% of samples on n will contain the parameter estimate. * Bayes: credible intervals; states as a probability “I am 95% sure that the parameter estimates (not assumed to be fixed) is within the two credible intervals.”
Just Another Gibbs Sampler (JAGS) communicates with R through rjags and runjags and is used for building Markov Chain Monte Carlo (MCMC) sampling using the Gibbs sampling.
Let’s load the Titanic dataset
library(COUNT)
library(dplyr)
data("titanic", package="COUNT")
head(titanic)
## class age sex survived
## 1 1st class adults man yes
## 2 1st class adults man yes
## 3 1st class adults man yes
## 4 1st class adults man yes
## 5 1st class adults man yes
## 6 1st class adults man yes
titanic %>% group_by(sex) %>% summarise(count=n())
## # A tibble: 2 x 2
## sex count
## <fctr> <int>
## 1 women 447
## 2 man 869
The model we will fit to the data is a bernoulli likelihood funtion for y=did not survive vs. survived conditional on sex s=female or male. \[y_{i|s}\sim dbern(\theta_s)\] With each surivor’s bias, \(\theta_s\), has an independent beta-distributed prior; \[y_{s}\sim dbeta(A, B)\] where A=2 and B=2.
Building the Model:
First, we need to create a numerical outcome “survived” from 0=did not survive and 1=survived.
s <- relevel(titanic$sex, ref = "man")
y <- as.numeric(titanic$survived)
y[y ==1]=0
y[y ==2]=1
y <- as.integer(y)
myData <- data.frame(y, s)
We will also want to examine the data.
library(gmodels)
Sur_Sex <- table(y,s) # 1=survived, 0=did not surive
Response_joint <- CrossTable(Sur_Sex, prop.chisq=F)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 1316
##
##
## | s
## y | man | women | Row Total |
## -------------|-----------|-----------|-----------|
## 0 | 694 | 123 | 817 |
## | 0.849 | 0.151 | 0.621 |
## | 0.799 | 0.275 | |
## | 0.527 | 0.093 | |
## -------------|-----------|-----------|-----------|
## 1 | 175 | 324 | 499 |
## | 0.351 | 0.649 | 0.379 |
## | 0.201 | 0.725 | |
## | 0.133 | 0.246 | |
## -------------|-----------|-----------|-----------|
## Column Total | 869 | 447 | 1316 |
## | 0.660 | 0.340 | |
## -------------|-----------|-----------|-----------|
##
##
Response_joint$prop.col
## s
## y man women
## 0 0.7986191 0.2751678
## 1 0.2013809 0.7248322
We see that the \(p(surviving|male)=.201\) given males and \(p(surviving|female)=.725\) given females.
Hypothesis: We would expect females to out survive males in our data hence our prior will be set to a=3, b=2 which is slightly skewed to the left.
We want to load a few functions from the Doing Bayesian Data Analysis book.
##
## *********************************************************************
## Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
## A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
## *********************************************************************
First step in Bayesian data analysis is to create a list from the dataset we created earlier.
dataList <- list(y=myData$y, s=myData$s, Ntotal=length(myData$y),
Nsubj=length(unique(myData$s)))
Next we create the model within a string for JAGS to call.
modelString = "
model {
for ( i in 1:Ntotal ) {
y[i] ~ dbern( theta[s[i]] )
}
for ( s in 1:Nsubj ) {
theta[s] ~ dbeta( 3 , 2 )
}
}
"
writeLines( modelString , con="TEMPmodel.txt" )
We run the model in JAGS with n.chains set to 2 since this example was performed on a laptop with a dou core processor. n.adapt are adaptation steps (tunning) and default is set to 1000.
model <- jags.model(textConnection(modelString), data = dataList, n.chains = 2, n.adapt= 500)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1316
## Unobserved stochastic nodes: 2
## Total graph size: 2640
##
## Initializing model
Using the update function, we set the number of iterations to 1000 to take in the MCMC chains. This is a short burn in process.
update(model, n.iter = 1000) # Burnin for 10000 samples
After burn in we generate MCMC samples to represent the posterior distribution. We will generate 4446 steps since we specified 2 chains (# of chains * n.iter)
codaSamples <- coda.samples(model, variable.names=c("theta"), n.iter=2223)
Let’s take a glance at the summary of the posterior with credible intervals.
summary(codaSamples)
##
## Iterations = 1001:3223
## Thinning interval = 1
## Number of chains = 2
## Sample size per chain = 2223
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## theta[1] 0.2036 0.01348 0.0002022 0.0002022
## theta[2] 0.7230 0.02083 0.0003124 0.0003027
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## theta[1] 0.1778 0.1943 0.2032 0.2122 0.231
## theta[2] 0.6816 0.7089 0.7230 0.7373 0.762
We can display diagnostics of chain and specify parameter.
diag_MCMC_theta1 <- diagMCMC(codaSamples, parName="theta[1]")
diag_MCMC_theta1
## NULL
diag_MCMC_theta2 <- diagMCMC(codaSamples, parName="theta[2]")
diag_MCMC_theta2
## NULL
Displays numerical summary statistics of chain.
smryMCMC(codaSamples, compVal = NULL, compValDiff = 0.0)
## Mean Median Mode ESS HDImass HDIlow
## theta[1] 0.2035516 0.2031885 0.2030383 4446 0.95 0.1765473
## theta[2] 0.7229916 0.7230156 0.7260973 4446 0.95 0.6821370
## theta[1]-theta[2] -0.5194400 -0.5195290 -0.5205342 4446 0.95 -0.5673920
## HDIhigh CompVal PcntGtCompVal ROPElow ROPEhigh
## theta[1] 0.2292045 NA NA NA NA
## theta[2] 0.7622933 NA NA NA NA
## theta[1]-theta[2] -0.4709618 0 0 NA NA
## PcntLtROPE PcntInROPE PcntGtROPE
## theta[1] NA NA NA
## theta[2] NA NA NA
## theta[1]-theta[2] NA NA NA
## Mean Median Mode ESS HDImass HDIlow
## theta[1] 0.2035516 0.2031885 0.2030383 4446 0.95 0.1765473
## theta[2] 0.7229916 0.7230156 0.7260973 4446 0.95 0.6821370
## theta[1]-theta[2] -0.5194400 -0.5195290 -0.5205342 4446 0.95 -0.5673920
## HDIhigh CompVal PcntGtCompVal ROPElow ROPEhigh
## theta[1] 0.2292045 NA NA NA NA
## theta[2] 0.7622933 NA NA NA NA
## theta[1]-theta[2] -0.4709618 0 0 NA NA
## PcntLtROPE PcntInROPE PcntGtROPE
## theta[1] NA NA NA
## theta[2] NA NA NA
## theta[1]-theta[2] NA NA NA
Display of graphical posterior information.
plot_mcmc <- plotMCMC(codaSamples, data=myData, compVal=NULL, compValDiff=0.0)
plot_mcmc
## NULL
Notice that our modes for our parameter estimates almost resemble our expected probabilities we computed earlier.
It is also a good habit to examine the prior distribution for difference of biases. The only change required would be to comment out “y” in our list.
dataList <- list(
# y=myData$y,
s=myData$s,
Ntotal=length(myData$y),
Nsubj=length(unique(myData$s)))