Stan Practice

This is my first attempt at using Stan, loosely following this tutorial. First we have to load the library and set some basic options. Setting the seed is for reproducibility.

Linear Regression with Homoskedastic Errors

This is a super basic model, just to make sure everything behaves as expected. The parameters are a vector of regression coefficients and the variance parameter \(\sigma^2\). The model is: \[Y\sim\mathcal{N}(\mu+X\beta,\sigma^2 \mathbb{I})\]

#super basic example: linear regression with homoskedastic errors
#Y~N(mu+Xb,sigma2)
# X is an 50x9 matrix where each column is gaussian(5,sd=3)
X<-replicate(9,rnorm(50,5,3))
colMeans(X) #should be 5

## [1] 4.672748 5.640947 4.616894 4.392355 4.956233 4.944081 5.829872 3.914864
## [9] 5.035238

apply(X,2,sd) #should be 3

## [1] 2.895687 2.929268 2.532625 2.803260 2.676363 3.424549 2.628690 2.425035
## [9] 2.835277

Now we set the parameters to true values, and we will later try to recover these in the Stan model. We intentionally allow some coefficients to be near zero to see if Stan detects any false positives.

mu<-33
b<-c(-3,-1.5,-.5,-.25,-0.01,0.01,.5,1,1.5) 
#strong negative: 1,2
#weak negative: 3,4
#insignificant effects: 5,6
#weak positive: 7
#strong positive: 8,9
sigma2<-4 #fairly high level of noise
y<-rnorm(50,mu+X%*%b,sqrt(sigma2))

First we will use the standard frequentist approach (maximum likelihood) to get a baseline for comparison.

frq<-lm(y~X)
summary(frq)

## 
## Call:
## lm(formula = y ~ X)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1788 -1.3673 -0.1481  1.2069  3.9545 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.87521    1.72958  17.851  < 2e-16 ***
## X1          -2.94321    0.10594 -27.782  < 2e-16 ***
## X2          -1.43153    0.10758 -13.306 2.82e-16 ***
## X3          -0.43621    0.12254  -3.560 0.000975 ***
## X4          -0.17815    0.10954  -1.626 0.111737    
## X5           0.04598    0.12041   0.382 0.704573    
## X6          -0.01712    0.09222  -0.186 0.853665    
## X7           0.67480    0.11788   5.724 1.15e-06 ***
## X8           1.08066    0.13431   8.046 6.85e-10 ***
## X9           1.40413    0.11140  12.604 1.64e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.057 on 40 degrees of freedom
## Multiple R-squared:  0.9717, Adjusted R-squared:  0.9653 
## F-statistic: 152.6 on 9 and 40 DF,  p-value: < 2.2e-16

The standard approach accurately recovers true parameter values. We will save these for later to compare relative errors

frq_params<-c(coef(frq),summary(frq)$sigma)
N<-nrow(X)
K<-ncol(X)
true_params<-c(mu,b,sqrt(sigma2))
frq_err<-frq_params-true_params
frq_rel_err<-frq_err/true_params

Run the stan program for linear regression. There is no need for data in a list like in the example on their website, since stan searches the calling environment. The stan code is in a separate file. It looks like this:

data {
    int<lower=0> N;
    int<lower=0> K;
    matrix[N,K] X;
    vector[N] y;
}
parameters {
    real alpha;
    vector[K] beta;
    real<lower=0> sigma;
}
model {
    y ~ normal(alpha + X*beta, sigma);
}

Note that we did not specify priors, so Stan just uses default “flat” priors. This is the most similar to the frequentist approach so the result should be very similar also.

system.time(lrfit<-stan(file="linreg.stan",iter=1000,chains=4))

##    user  system elapsed 
##  21.840   0.936  28.084

I noticed that the time was about 20 seconds the first time I ran the command, probably due to the C++ program compiling. But when I ran the command again, it only took 5 seconds. This is still a lot slower than the frequentist approach. Next, let’s explore the output of the Stan program.

print(lrfit) #lrfit is S4 object of class stanfit

## Inference for Stan model: linreg.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##           mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
## alpha    30.88    0.06 1.78  27.32  29.71  30.85  32.04  34.44   913 1.01
## beta[1]  -2.94    0.00 0.11  -3.15  -3.02  -2.95  -2.87  -2.73  1835 1.00
## beta[2]  -1.43    0.00 0.12  -1.65  -1.51  -1.43  -1.36  -1.20  1661 1.00
## beta[3]  -0.44    0.00 0.13  -0.71  -0.53  -0.44  -0.35  -0.18  1736 1.00
## beta[4]  -0.18    0.00 0.11  -0.39  -0.26  -0.18  -0.11   0.04  1846 1.00
## beta[5]   0.05    0.00 0.13  -0.21  -0.04   0.05   0.13   0.30  1811 1.00
## beta[6]  -0.01    0.00 0.09  -0.20  -0.08  -0.01   0.05   0.17  2000 1.00
## beta[7]   0.67    0.00 0.12   0.41   0.60   0.67   0.76   0.91  1367 1.00
## beta[8]   1.08    0.00 0.14   0.80   0.99   1.09   1.18   1.35  1485 1.00
## beta[9]   1.40    0.00 0.12   1.17   1.32   1.40   1.48   1.64  1563 1.00
## sigma     2.13    0.01 0.24   1.72   1.96   2.10   2.28   2.67  1495 1.00
## lp__    -61.18    0.11 2.72 -67.69 -62.74 -60.82 -59.21 -56.96   656 1.00
## 
## Samples were drawn using NUTS(diag_e) at Thu Feb 25 23:59:21 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

#help("stanfit-class")
#get posterior means
#summary(lrfit)
stan_params<-summary(lrfit)$summary
stan_signif<-stan_params[,"2.5%"]*stan_params[,"97.5%"]>0

Stan found variables 4,5,6 to be not significant from zero versus true value being 5,6. The frequentist approach (OLS) made the same mistake (see above). Let’s look at some pretty graphs! We now compare the relative errors of stan and lm

stan_params<-stan_params[-12,1]
stan_err<-stan_params-true_params
stan_rel_err<-stan_err/true_params

rel_err<-data.frame(var=names(stan_params),OLS=frq_rel_err,STAN=stan_rel_err)
plot_dat<-melt(rel_err,variable.name="method",id.vars="var",value.name="rel_err")
ggplot(data=plot_dat,aes(var,rel_err))+geom_bar(aes(fill=method),position="dodge",stat="identity")+theme_bw()

ggplot(data=plot_dat[-c(6,7,17,18),],aes(var,rel_err))+geom_bar(aes(fill=method),position="dodge",stat="identity")+theme_bw()

mse_frq<-sum(frq_err^2)/11
mse_stan<-sum(stan_err^2)/11
mse_frq

## [1] 0.4168444

mse_stan

## [1] 0.416226

We see that the stan method gives very similar results to the OLS method due to the uninformative prior.

Stan Practice

Will Townes

February 25, 2016

Linear Regression with Homoskedastic Errors

Coming Soon- more interesting models!