For economists used to fitting frequentist models using software like Stata, the transition to using Stan can be a bit jarring. For most models, Stata and similar software calculates point estimates and the covariance matrix for the parameters of interest using a pre-specified formula or optimization method. For example, for a simple linear model with homoscedasticity, Stata uses the familiar formulas:
\[ \hat{\beta}= (X'X)^{-1}Xy ; \hat{Var(\hat{\beta)}}=\hat{\sigma^2}(X'X)^{-1}\] To test the hypothesis that one element of \(\hat{\beta}\) is equal to 0 the economist would typically use the point estimate and the estimate of the standard error of that element to conduct a wald test.
Unfortunately, except for very simple Bayesian models, there are no closed form solutions to the posterior distribution and the assumption of normality is often a dangerous one. Thus, rather than generate point estimates and a covariance, Stan instead simulates draws from the posterior distribution of our parameters using a sophisticated algorith called Hamiltonian Monte Carlo. Stan output is a matrix in which each row is a simulated draw from the posterior and each column is a parameter (or function of parameters). To conduct inference using Stan output, the analyst calculates the proportion of draws which meet some condition. For example, to estimate \(Pr(\beta_1>0)\) the analysis would calculate the proportion of draws (i.e. rows in the matrix) for which the simulated \(\beta_1>0\).
(In the interest of simplicity, I am over-simplifying slightly. Stata can fit some basic Bayesian models and Stan can generate point estimates rather than simulated draws from a posterior.)
Stan was designed specifically for defining and fitting statistical models. You can’t clean or process data in Stan or use it to create graphs or other output. Thus, Stan must be called from within another more general-purpose language such as R, Julia, or Python. Regardless of the general-purpose language used to call Stan, the basic workflow is:
In addition to these basic steps, the analyst will often also a) perform a variety of checks to ensure that Stan “converged” (i.e. that its output can be trusted) and b) perform model checks.
We provide a very quick tour of a very simple Stan model. Suppose we have data from 20 flips of a coin. We suspect that the coin is not an honest one. In fact, our prior is that the probability that the coin lands heads up, \(\theta\) is distributed Beta(2,6) (i.e. we think we are being cheated!)
Our likelihood is \(Pr(y_i=1)=\theta\) and our prior is \(\theta\sim Beta(2,6)\)
This model is simple enough that we could easily derive the posterior analytically but it is useful for illustrating how to define a simple model in Stan.
The first in our workflow is preparing the data. RStan expects the data as a list, so we first save our fictitious coin toss data as a vector and then save the vector and a scalar indicating the length of the vector as a list. (You will see why Stan requires the length of the vector below.)
coin_tosses <- c(1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1)
coin_toss_dat <- list(N = length(coin_tosses),
y = coin_tosses)
This is where most of the action happens. In the code below we define the Stan model and save it to the string “stan_model”. (Note that we have assigned the Stan model to an R string so that you can directly run this notebook with minimal changes but the Stan manual advises users to instead save Stan models as external documents.)
We walk through each line of the Stan model in detail below. Note that any text after “//” characters is a comment.
stan_model <- "
data {
// # of obs; constrained to be greater than 0
int<lower=0> N;
// define Y as an array of integers length N;
// each element either 0 and 1
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0,upper=1> theta;
}
model {
// our prior for theta
theta ~ beta(2,6);
// our likelihood for y
y ~ bernoulli(theta);
}"
This first bit of code defines the data and parameters blocks in Stan. Our model here contains a data block, a parameters block, and a model block. In addition to these three predefined block types, Stan models may also includ four additional blocks (generated quantities, functions, transformed data, and transformed parameters).
Each block serves a distinct purpose and blocks must be defined in a set order (e.g. the parameters block can’t come before the data block).
The data block defines what data the model expects and assigns this data to Stan variables with specified types. Note that the names of the variables in the data block must match the names of the data in the list passed to Stan. The parameters block defines the parameters that will be used in the model. In Stan, all parameters and data must be defined as variables with a specific type. (This is a bit of a pain but allows Stan to be really fast.) The Stan Reference Manual section “Data Types and Declarations” provides a comprehensive list of the different variables types, but here is a whirlwind tour:
The model block defines the statistical model, including both the prior and the likelihood. She Stan manual doesn’t seem to have a comprehensive list of built-in distributions and the syntax for each distribution (if you find this, let us know!) but for those familiar with R, the syntax should be fairly straightforward though.
After defining the model in Stan and preparing the data, we fit the model in Stan. Firt, we must load the rstan package. We then use the “stan” function to compile and fit the model in one step. (If we want, we can perform these steps separately.) In the arguments section of the function call, we specify that we want Stan to perform 1000 simulated draws from the posterior using 4 different MCMC chains. ()
library(rstan)
fit <- stan(model_code = stan_model, data = coin_toss_dat, iter = 1000, chains = 4)
Finally, we may perform inference by calculating the proportion of draws that meet a certain condition. For example, to calculate the probability that the coin is “unfair” (which we define as \(\theta<.5\)), we simply calculate the proportion of draws in which the simualated theta is < .5. According to our data, it seems like we are definitely being cheated!
mean(extracted_values$theta < .5)
[1] 0.995