Frequentists ask, “what is the likelihood of observing the data, given the parameter(s) of the model.” (Maximum likelihood methods basically work by iteratively finding values for \(\theta\) that maximize this function.)
\[ p(y \,|\, \theta) \]
Bayesian statistics describe the probability of the parameters given the data.
\[ p(\theta \,|\, y) = \frac{p(y \,|\, \theta) \, p(\theta)} {p(y)} \]
Bayes’ rule is commonly presented in this form, which assumes the sum of the marginal probabilities is equal to 1.
\[ p(\theta \,|\, y) \propto p(y \,|\, \theta) \, p(\theta) \]
The inference we are used to making is that the probability of the data given the null hypothesis is true.
\[ p(y \geq Y \,|\, H_0) \]
There are two problems with this:
People confuse \(p(y \geq Y \,|\, H_0)\) with \(p(H_0 \,|\, y \geq Y)\). If the probability of the data, given the null is true, is small, the probability that the null is true, given the data, must be small, too, right?! RIGHT?! Sadly, NO.
Through Bayes’ Rule, we know they’re only equal if the marginal probability of \(H_0\) being true is equal to the marginal probability of \(y\) being greater than of equal to \(Y\).
x <- rnorm(200, 0, 1) * 5
dat <- data.frame(x)
dat[1:100, 'group'] <- 'A'
dat[1:100, 'y'] <- ((x[1:100]*2) + rnorm(100, 0, 2))
dat[101:200, 'group'] <- 'B'
dat[101:200, 'y'] <- ((x[101:200]/3) + rnorm(100, 0, 3))
The model saved as a .bug file
rjags.example.bug <-'
model {
for (i in 1:N){
y[i] ~ dnorm(y[i], tau)
y.hat[i] <- a + b * x[i]
}
a ~ dnorm(0, .0001)
b ~ dnorm(0, .0001)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
'
Run model
library(rjags)
xVar <- dat$x
yVar <- dat$y
N <- 200
epsilon <- rnorm(N, 0, 1)
jags <- jags.model('rjags.example.bug',
data = list('xVar' = xVar,
'yVar' = yVar,
'N' = N),
n.chains = 4,
n.adapt = 100,
quiet = TRUE)
Generate posterior samples
coefs <- jags.samples(jags,
c('a', 'b'),
2000)
Extract estimates - means from posterior samples and results from frequentist lm() model
intercept <- mean(coefs$a[])
b <- mean(coefs$b[])
lm1 <- lm(y ~ x, data=dat)
Compare the results
# results from frequentist lm() model
round(coefficients(lm1), 3)
## (Intercept) x
## -0.405 1.210
# results from bayesian jags.samples() model
round(data.frame(intercept, b), 3)
## intercept b
## 1 -0.404 1.209