This is my first attempt at using Stan, loosely following this tutorial. First we have to load the library and set some basic options. Setting the seed is for reproducibility.
library(rstan)
## Loading required package: ggplot2
## rstan (Version 2.8.2, packaged: 2015-11-26 15:27:02 UTC, GitRev: 05c3d0058b6a)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())
library(reshape2)
rstan_options(auto_write=TRUE)
options(mc.cores=parallel::detectCores())
set.seed(39)
This is a super basic model, just to make sure everything behaves as expected. The parameters are a vector of regression coefficients and the variance parameter \(\sigma^2\). The model is: \[Y\sim\mathcal{N}(\mu+X\beta,\sigma^2 \mathbb{I})\]
#super basic example: linear regression with homoskedastic errors
#Y~N(mu+Xb,sigma2)
# X is an 50x9 matrix where each column is gaussian(5,sd=3)
X<-replicate(9,rnorm(50,5,3))
colMeans(X) #should be 5
## [1] 4.672748 5.640947 4.616894 4.392355 4.956233 4.944081 5.829872 3.914864
## [9] 5.035238
apply(X,2,sd) #should be 3
## [1] 2.895687 2.929268 2.532625 2.803260 2.676363 3.424549 2.628690 2.425035
## [9] 2.835277
Now we set the parameters to true values, and we will later try to recover these in the Stan model. We intentionally allow some coefficients to be near zero to see if Stan detects any false positives.
mu<-33
b<-c(-3,-1.5,-.5,-.25,-0.01,0.01,.5,1,1.5)
#strong negative: 1,2
#weak negative: 3,4
#insignificant effects: 5,6
#weak positive: 7
#strong positive: 8,9
sigma2<-4 #fairly high level of noise
y<-rnorm(50,mu+X%*%b,sqrt(sigma2))
First we will use the standard frequentist approach (maximum likelihood) to get a baseline for comparison.
frq<-lm(y~X)
summary(frq)
##
## Call:
## lm(formula = y ~ X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1788 -1.3673 -0.1481 1.2069 3.9545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.87521 1.72958 17.851 < 2e-16 ***
## X1 -2.94321 0.10594 -27.782 < 2e-16 ***
## X2 -1.43153 0.10758 -13.306 2.82e-16 ***
## X3 -0.43621 0.12254 -3.560 0.000975 ***
## X4 -0.17815 0.10954 -1.626 0.111737
## X5 0.04598 0.12041 0.382 0.704573
## X6 -0.01712 0.09222 -0.186 0.853665
## X7 0.67480 0.11788 5.724 1.15e-06 ***
## X8 1.08066 0.13431 8.046 6.85e-10 ***
## X9 1.40413 0.11140 12.604 1.64e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.057 on 40 degrees of freedom
## Multiple R-squared: 0.9717, Adjusted R-squared: 0.9653
## F-statistic: 152.6 on 9 and 40 DF, p-value: < 2.2e-16
The standard approach accurately recovers true parameter values. We will save these for later to compare relative errors
frq_params<-c(coef(frq),summary(frq)$sigma)
N<-nrow(X)
K<-ncol(X)
true_params<-c(mu,b,sqrt(sigma2))
frq_err<-frq_params-true_params
frq_rel_err<-frq_err/true_params
Run the stan program for linear regression. There is no need for data in a list like in the example on their website, since stan searches the calling environment. The stan code is in a separate file. It looks like this:
data {
int<lower=0> N;
int<lower=0> K;
matrix[N,K] X;
vector[N] y;
}
parameters {
real alpha;
vector[K] beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + X*beta, sigma);
}
Note that we did not specify priors, so Stan just uses default “flat” priors. This is the most similar to the frequentist approach so the result should be very similar also.
system.time(lrfit<-stan(file="linreg.stan",iter=1000,chains=4))
## user system elapsed
## 21.840 0.936 28.084
I noticed that the time was about 20 seconds the first time I ran the command, probably due to the C++ program compiling. But when I ran the command again, it only took 5 seconds. This is still a lot slower than the frequentist approach. Next, let’s explore the output of the Stan program.
print(lrfit) #lrfit is S4 object of class stanfit
## Inference for Stan model: linreg.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 30.88 0.06 1.78 27.32 29.71 30.85 32.04 34.44 913 1.01
## beta[1] -2.94 0.00 0.11 -3.15 -3.02 -2.95 -2.87 -2.73 1835 1.00
## beta[2] -1.43 0.00 0.12 -1.65 -1.51 -1.43 -1.36 -1.20 1661 1.00
## beta[3] -0.44 0.00 0.13 -0.71 -0.53 -0.44 -0.35 -0.18 1736 1.00
## beta[4] -0.18 0.00 0.11 -0.39 -0.26 -0.18 -0.11 0.04 1846 1.00
## beta[5] 0.05 0.00 0.13 -0.21 -0.04 0.05 0.13 0.30 1811 1.00
## beta[6] -0.01 0.00 0.09 -0.20 -0.08 -0.01 0.05 0.17 2000 1.00
## beta[7] 0.67 0.00 0.12 0.41 0.60 0.67 0.76 0.91 1367 1.00
## beta[8] 1.08 0.00 0.14 0.80 0.99 1.09 1.18 1.35 1485 1.00
## beta[9] 1.40 0.00 0.12 1.17 1.32 1.40 1.48 1.64 1563 1.00
## sigma 2.13 0.01 0.24 1.72 1.96 2.10 2.28 2.67 1495 1.00
## lp__ -61.18 0.11 2.72 -67.69 -62.74 -60.82 -59.21 -56.96 656 1.00
##
## Samples were drawn using NUTS(diag_e) at Thu Feb 25 23:59:21 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
#help("stanfit-class")
#get posterior means
#summary(lrfit)
stan_params<-summary(lrfit)$summary
stan_signif<-stan_params[,"2.5%"]*stan_params[,"97.5%"]>0
Stan found variables 4,5,6 to be not significant from zero versus true value being 5,6. The frequentist approach (OLS) made the same mistake (see above). Let’s look at some pretty graphs! We now compare the relative errors of stan and lm
stan_params<-stan_params[-12,1]
stan_err<-stan_params-true_params
stan_rel_err<-stan_err/true_params
rel_err<-data.frame(var=names(stan_params),OLS=frq_rel_err,STAN=stan_rel_err)
plot_dat<-melt(rel_err,variable.name="method",id.vars="var",value.name="rel_err")
ggplot(data=plot_dat,aes(var,rel_err))+geom_bar(aes(fill=method),position="dodge",stat="identity")+theme_bw()
ggplot(data=plot_dat[-c(6,7,17,18),],aes(var,rel_err))+geom_bar(aes(fill=method),position="dodge",stat="identity")+theme_bw()
mse_frq<-sum(frq_err^2)/11
mse_stan<-sum(stan_err^2)/11
mse_frq
## [1] 0.4168444
mse_stan
## [1] 0.416226
We see that the stan method gives very similar results to the OLS method due to the uninformative prior.
I will use this template to practice with Stan in more interesting and complicated/ useful models in the future.