Mike Grosskopf
bayesglm()StanBayesian Data Analysis; 3\( ^{rd} \) edition
Data Analysis Using Regression and Multilevel/Hierarchical Models
Catagorical Data Analysis
The one advantage of the Bayesian approach is the use of external information to improve the estimates of the linear model coefficients
Uncertainty introduced by adding addtional model complexity leads to a natural regularization
\[ p(Y_i \mid X_i,\beta) \sim \text{Binomial}(n_i,p_i) \\ \text{where } p_i = \text{logit}^{-1}(\beta^TX) \]
\[ p(Y_i \mid X_i,\beta) \sim \text{Poisson}(\lambda_i) \\ \text{where } \lambda_i = e^{\beta^TX} \]
bayesglm()
glm command in Rsim() and coef()
Can generate simulated data where covariate \( x1 \) has no association with the outcome and covariate \( x2 \) completely determines the outcome:
y x1 x2
1 0 -0.5605 -0.8572
2 0 -0.2302 -0.5855
3 0 1.5587 -0.5863
y x1 x2
18 1 -1.9666 0.7533
19 1 0.7014 0.8950
20 1 -0.4728 0.3745
glm(formula = y ~ x1 + x2, family = "binomial", data = d)
coef.est coef.se
(Intercept) 5.28 33507.11
x1 25.40 65424.13
x2 117.66 170423.35
---
n = 20, k = 3
residual deviance = 0.0, null deviance = 27.7 (difference = 27.7)
bayesglm() does some automatic scaling to priorsbayesglm(formula = y ~ x1 + x2, family = "binomial", data = d,
prior.scale = 10, prior.df = Inf)
coef.est coef.se
(Intercept) 0.64 1.41
x1 1.49 1.75
x2 9.36 4.16
---
n = 20, k = 3
residual deviance = 1.3, null deviance = 3.3 (difference = 2.0)
bayesglm(formula = y ~ x1 + x2, family = "binomial", data = d,
prior.scale = 10)
coef.est coef.se
(Intercept) 0.69 1.50
x1 1.50 1.84
x2 10.21 4.88
---
n = 20, k = 3
residual deviance = 1.1, null deviance = 2.9 (difference = 1.8)
bayesglm(formula = y ~ x1 + x2, family = "binomial", data = d,
prior.scale = Inf)
coef.est coef.se
(Intercept) 5.53 55242.70
x1 26.54 107862.04
x2 122.82 280964.40
---
n = 20, k = 3
residual deviance = 0.0, null deviance = 0.0 (difference = 0.0)
bayesglm(formula = y ~ x1 + x2, family = "binomial", data = d,
prior.mean = 100)
coef.est coef.se
(Intercept) -1.721e+14 1.539e+07
x1 4.956e+14 1.590e+07
x2 3.138e+15 2.513e+07
---
n = 20, k = 3
residual deviance = 0.0, null deviance = 0.0 (difference = 0.0)
logitReg <- 'data {
int<lower=0> N; real x1[N]; real x2[N]; int<lower=0,upper=1> y[N];}
parameters {
real alpha; real beta1; real beta2;}
model {
alpha ~ normal(0,10.); beta1 ~ normal(0,10.); beta2 ~ normal(0,10.);
for (n in 1:N)
y[n] ~ bernoulli(inv_logit(alpha + beta1 * x1[n]+beta2 * x2[n]));
}'
logitData <- list(N=10, x1=c(runif(5,-1,0),runif(5,0,1)), x2=c(rnorm(10)),y=c(rep(0,5),rep(1,5)))
fit <- stan(model_code=logitReg,
data=logitData,iter=1000,chains=4)
TRANSLATING MODEL 'logitReg' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'logitReg' NOW.
SAMPLING FOR MODEL 'logitReg' NOW (CHAIN 1).
Iteration: 1 / 1000 [ 0%] (Warmup)
Iteration: 100 / 1000 [ 10%] (Warmup)
Iteration: 200 / 1000 [ 20%] (Warmup)
Iteration: 300 / 1000 [ 30%] (Warmup)
Iteration: 400 / 1000 [ 40%] (Warmup)
Iteration: 500 / 1000 [ 50%] (Warmup)
Iteration: 600 / 1000 [ 60%] (Sampling)
Iteration: 700 / 1000 [ 70%] (Sampling)
Iteration: 800 / 1000 [ 80%] (Sampling)
Iteration: 900 / 1000 [ 90%] (Sampling)
Iteration: 1000 / 1000 [100%] (Sampling)
Elapsed Time: 0.01 seconds (Warm-up)
0.01 seconds (Sampling)
0.02 seconds (Total)
SAMPLING FOR MODEL 'logitReg' NOW (CHAIN 2).
Iteration: 1 / 1000 [ 0%] (Warmup)
Iteration: 100 / 1000 [ 10%] (Warmup)
Iteration: 200 / 1000 [ 20%] (Warmup)
Iteration: 300 / 1000 [ 30%] (Warmup)
Iteration: 400 / 1000 [ 40%] (Warmup)
Iteration: 500 / 1000 [ 50%] (Warmup)
Iteration: 600 / 1000 [ 60%] (Sampling)
Iteration: 700 / 1000 [ 70%] (Sampling)
Iteration: 800 / 1000 [ 80%] (Sampling)
Iteration: 900 / 1000 [ 90%] (Sampling)
Iteration: 1000 / 1000 [100%] (Sampling)
Elapsed Time: 0.01 seconds (Warm-up)
0.01 seconds (Sampling)
0.02 seconds (Total)
SAMPLING FOR MODEL 'logitReg' NOW (CHAIN 3).
Iteration: 1 / 1000 [ 0%] (Warmup)
Iteration: 100 / 1000 [ 10%] (Warmup)
Iteration: 200 / 1000 [ 20%] (Warmup)
Iteration: 300 / 1000 [ 30%] (Warmup)
Iteration: 400 / 1000 [ 40%] (Warmup)
Iteration: 500 / 1000 [ 50%] (Warmup)
Iteration: 600 / 1000 [ 60%] (Sampling)
Iteration: 700 / 1000 [ 70%] (Sampling)
Iteration: 800 / 1000 [ 80%] (Sampling)
Iteration: 900 / 1000 [ 90%] (Sampling)
Iteration: 1000 / 1000 [100%] (Sampling)
Elapsed Time: 0.01 seconds (Warm-up)
0.01 seconds (Sampling)
0.02 seconds (Total)
SAMPLING FOR MODEL 'logitReg' NOW (CHAIN 4).
Iteration: 1 / 1000 [ 0%] (Warmup)
Iteration: 100 / 1000 [ 10%] (Warmup)
Iteration: 200 / 1000 [ 20%] (Warmup)
Iteration: 300 / 1000 [ 30%] (Warmup)
Iteration: 400 / 1000 [ 40%] (Warmup)
Iteration: 500 / 1000 [ 50%] (Warmup)
Iteration: 600 / 1000 [ 60%] (Sampling)
Iteration: 700 / 1000 [ 70%] (Sampling)
Iteration: 800 / 1000 [ 80%] (Sampling)
Iteration: 900 / 1000 [ 90%] (Sampling)
Iteration: 1000 / 1000 [100%] (Sampling)
Elapsed Time: 0.01 seconds (Warm-up)
0.01 seconds (Sampling)
0.02 seconds (Total)
Inference for Stan model: logitReg.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha -1.2 0.1 2.9 -8.1 -3.0 -1.0 0.7 4.1 430 1
beta1 16.5 0.3 6.6 5.8 11.7 15.8 20.8 30.5 448 1
beta2 0.8 0.2 3.5 -5.9 -1.4 0.7 2.9 8.3 481 1
lp__ -2.3 0.1 1.4 -5.7 -2.9 -1.9 -1.2 -0.7 451 1
Samples were drawn using NUTS(diag_e) at Wed Apr 2 00:02:24 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Comparing data from the model to the observed data, \( y_{obs} \):
Graphical and quantitative methods are useful
Bayesian Data Analysis; 3\( ^{rd} \) edition
Data Analysis Using Regression and Multilevel/Hierarchical Models
Catagorical Data Analysis