ANLY505 - Intro to Stan Problem Set #1

Intro to STAN Homework Set #1

After our Intro to Stan lecture I think it would be valuable to have you go through a similar exercise. Let’s test a second research question.

Research question: Is sea ice extent declining in the Southern Hemisphere over time? Is the same pattern happening in the Antarctic as in the Arctic? Fit a Stan model to find out!

Make sure you follow the steps we used in class.

What do your Stan model results indicate so far?

Load and Inspect Data

library(rstan)
library(gdata)
library(bayesplot)

sea_ice <- read.csv("seaice.csv", stringsAsFactors = F)
colnames(sea_ice) <- c("year", "extent_north", "extent_south")

Plot the data

plot(extent_south ~ year, pch = 20, col='red',main="sea ice extent  in the Southern Hemisphere over time ", data = sea_ice)

Run a general linear model using lm()

lm1 <- lm(extent_south ~ year, data = sea_ice)
summary(lm1)

## 
## Call:
## lm(formula = extent_south ~ year, data = sea_ice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -14.199551  10.925576  -1.300   0.2018  
## year          0.012953   0.005468   2.369   0.0232 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318

Prepare the data, re-run the lm() and extract summary statistics

x <- I(sea_ice$year - 1978)#this is to give numeric values to year starting from 1 
y <- sea_ice$extent_south
N <- length(sea_ice$year)

lm2 <- lm(y ~ x)
summary(lm2)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.421555   0.125490  91.015   <2e-16 ***
## x            0.012953   0.005468   2.369   0.0232 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318

lm_alpha <- summary(lm2)$coeff[1]  
lm_beta <- summary(lm2)$coeff[2] 
lm_sigma <- sigma(lm2) 


#input data in the form of a list for stan model

stan_input_data <- list(N=N,x=x,y=y)

Write the Stan model

write("// Stan model for simple linear regression

data {
 int < lower = 1 > N; // Sample size
 vector[N] x; // Predictor
 vector[N] y; // Outcome
}

parameters {
 real alpha; // Intercept
 real beta; // Slope (regression coefficients)
 real < lower = 0 > sigma; // Error SD
}

model {
 y ~ normal(alpha + x * beta , sigma);
}

generated quantities {
} // The posterior predictive distribution",

"stan_model1.stan")

stanc("stan_model1.stan")

## $status
## [1] TRUE
## 
## $model_cppname
## [1] "model35d476a93d90_stan_model1"
## 
## $cppcode
## [1] "// Code generated by Stan version 2.18.1\n\n#include <stan/model/model_header.hpp>\n\nnamespace model35d476a93d90_stan_model1_namespace {\n\nusing std::istream;\nusing std::string;\nusing std::stringstream;\nusing std::vector;\nusing stan::io::dump;\nusing stan::math::lgamma;\nusing stan::model::prob_grad;\nusing namespace stan::math;\n\nstatic int current_statement_begin__;\n\nstan::io::program_reader prog_reader__() {\n    stan::io::program_reader reader;\n    reader.add_event(0, 0, \"start\", \"model35d476a93d90_stan_model1\");\n    reader.add_event(22, 20, \"end\", \"model35d476a93d90_stan_model1\");\n    return reader;\n}\n\nclass model35d476a93d90_stan_model1 : public prob_grad {\nprivate:\n    int N;\n    vector_d x;\n    vector_d y;\npublic:\n    model35d476a93d90_stan_model1(stan::io::var_context& context__,\n        std::ostream* pstream__ = 0)\n        : prob_grad(0) {\n        ctor_body(context__, 0, pstream__);\n    }\n\n    model35d476a93d90_stan_model1(stan::io::var_context& context__,\n        unsigned int random_seed__,\n        std::ostream* pstream__ = 0)\n        : prob_grad(0) {\n        ctor_body(context__, random_seed__, pstream__);\n    }\n\n    void ctor_body(stan::io::var_context& context__,\n                   unsigned int random_seed__,\n                   std::ostream* pstream__) {\n        typedef double local_scalar_t__;\n\n        boost::ecuyer1988 base_rng__ =\n          stan::services::util::create_rng(random_seed__, 0);\n        (void) base_rng__;  // suppress unused var warning\n\n        current_statement_begin__ = -1;\n\n        static const char* function__ = \"model35d476a93d90_stan_model1_namespace::model35d476a93d90_stan_model1\";\n        (void) function__;  // dummy to suppress unused var warning\n        size_t pos__;\n        (void) pos__;  // dummy to suppress unused var warning\n        std::vector<int> vals_i__;\n        std::vector<double> vals_r__;\n        local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());\n        (void) DUMMY_VAR__;  // suppress unused var warning\n\n        // initialize member variables\n        try {\n            current_statement_begin__ = 4;\n            context__.validate_dims(\"data initialization\", \"N\", \"int\", context__.to_vec());\n            N = int(0);\n            vals_i__ = context__.vals_i(\"N\");\n            pos__ = 0;\n            N = vals_i__[pos__++];\n            current_statement_begin__ = 5;\n            validate_non_negative_index(\"x\", \"N\", N);\n            context__.validate_dims(\"data initialization\", \"x\", \"vector_d\", context__.to_vec(N));\n            validate_non_negative_index(\"x\", \"N\", N);\n            x = vector_d(static_cast<Eigen::VectorXd::Index>(N));\n            vals_r__ = context__.vals_r(\"x\");\n            pos__ = 0;\n            size_t x_i_vec_lim__ = N;\n            for (size_t i_vec__ = 0; i_vec__ < x_i_vec_lim__; ++i_vec__) {\n                x[i_vec__] = vals_r__[pos__++];\n            }\n            current_statement_begin__ = 6;\n            validate_non_negative_index(\"y\", \"N\", N);\n            context__.validate_dims(\"data initialization\", \"y\", \"vector_d\", context__.to_vec(N));\n            validate_non_negative_index(\"y\", \"N\", N);\n            y = vector_d(static_cast<Eigen::VectorXd::Index>(N));\n            vals_r__ = context__.vals_r(\"y\");\n            pos__ = 0;\n            size_t y_i_vec_lim__ = N;\n            for (size_t i_vec__ = 0; i_vec__ < y_i_vec_lim__; ++i_vec__) {\n                y[i_vec__] = vals_r__[pos__++];\n            }\n\n            // validate, data variables\n            current_statement_begin__ = 4;\n            check_greater_or_equal(function__,\"N\",N,1);\n            current_statement_begin__ = 5;\n            current_statement_begin__ = 6;\n            // initialize data variables\n\n\n            // validate transformed data\n\n            // validate, set parameter ranges\n            num_params_r__ = 0U;\n            param_ranges_i__.clear();\n            current_statement_begin__ = 10;\n            ++num_params_r__;\n            current_statement_begin__ = 11;\n            ++num_params_r__;\n            current_statement_begin__ = 12;\n            ++num_params_r__;\n        } catch (const std::exception& e) {\n            stan::lang::rethrow_located(e, current_statement_begin__, prog_reader__());\n            // Next line prevents compiler griping about no return\n            throw std::runtime_error(\"*** IF YOU SEE THIS, PLEASE REPORT A BUG ***\");\n        }\n    }\n\n    ~model35d476a93d90_stan_model1() { }\n\n\n    void transform_inits(const stan::io::var_context& context__,\n                         std::vector<int>& params_i__,\n                         std::vector<double>& params_r__,\n                         std::ostream* pstream__) const {\n        stan::io::writer<double> writer__(params_r__,params_i__);\n        size_t pos__;\n        (void) pos__; // dummy call to supress warning\n        std::vector<double> vals_r__;\n        std::vector<int> vals_i__;\n\n        if (!(context__.contains_r(\"alpha\")))\n            throw std::runtime_error(\"variable alpha missing\");\n        vals_r__ = context__.vals_r(\"alpha\");\n        pos__ = 0U;\n        context__.validate_dims(\"initialization\", \"alpha\", \"double\", context__.to_vec());\n        double alpha(0);\n        alpha = vals_r__[pos__++];\n        try {\n            writer__.scalar_unconstrain(alpha);\n        } catch (const std::exception& e) { \n            throw std::runtime_error(std::string(\"Error transforming variable alpha: \") + e.what());\n        }\n\n        if (!(context__.contains_r(\"beta\")))\n            throw std::runtime_error(\"variable beta missing\");\n        vals_r__ = context__.vals_r(\"beta\");\n        pos__ = 0U;\n        context__.validate_dims(\"initialization\", \"beta\", \"double\", context__.to_vec());\n        double beta(0);\n        beta = vals_r__[pos__++];\n        try {\n            writer__.scalar_unconstrain(beta);\n        } catch (const std::exception& e) { \n            throw std::runtime_error(std::string(\"Error transforming variable beta: \") + e.what());\n        }\n\n        if (!(context__.contains_r(\"sigma\")))\n            throw std::runtime_error(\"variable sigma missing\");\n        vals_r__ = context__.vals_r(\"sigma\");\n        pos__ = 0U;\n        context__.validate_dims(\"initialization\", \"sigma\", \"double\", context__.to_vec());\n        double sigma(0);\n        sigma = vals_r__[pos__++];\n        try {\n            writer__.scalar_lb_unconstrain(0,sigma);\n        } catch (const std::exception& e) { \n            throw std::runtime_error(std::string(\"Error transforming variable sigma: \") + e.what());\n        }\n\n        params_r__ = writer__.data_r();\n        params_i__ = writer__.data_i();\n    }\n\n    void transform_inits(const stan::io::var_context& context,\n                         Eigen::Matrix<double,Eigen::Dynamic,1>& params_r,\n                         std::ostream* pstream__) const {\n      std::vector<double> params_r_vec;\n      std::vector<int> params_i_vec;\n      transform_inits(context, params_i_vec, params_r_vec, pstream__);\n      params_r.resize(params_r_vec.size());\n      for (int i = 0; i < params_r.size(); ++i)\n        params_r(i) = params_r_vec[i];\n    }\n\n\n    template <bool propto__, bool jacobian__, typename T__>\n    T__ log_prob(vector<T__>& params_r__,\n                 vector<int>& params_i__,\n                 std::ostream* pstream__ = 0) const {\n\n        typedef T__ local_scalar_t__;\n\n        local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());\n        (void) DUMMY_VAR__;  // suppress unused var warning\n\n        T__ lp__(0.0);\n        stan::math::accumulator<T__> lp_accum__;\n\n        try {\n            // model parameters\n            stan::io::reader<local_scalar_t__> in__(params_r__,params_i__);\n\n            local_scalar_t__ alpha;\n            (void) alpha;  // dummy to suppress unused var warning\n            if (jacobian__)\n                alpha = in__.scalar_constrain(lp__);\n            else\n                alpha = in__.scalar_constrain();\n\n            local_scalar_t__ beta;\n            (void) beta;  // dummy to suppress unused var warning\n            if (jacobian__)\n                beta = in__.scalar_constrain(lp__);\n            else\n                beta = in__.scalar_constrain();\n\n            local_scalar_t__ sigma;\n            (void) sigma;  // dummy to suppress unused var warning\n            if (jacobian__)\n                sigma = in__.scalar_lb_constrain(0,lp__);\n            else\n                sigma = in__.scalar_lb_constrain(0);\n\n\n            // transformed parameters\n\n\n\n            // validate transformed parameters\n\n            const char* function__ = \"validate transformed params\";\n            (void) function__;  // dummy to suppress unused var warning\n\n            // model body\n\n            current_statement_begin__ = 16;\n            lp_accum__.add(normal_log<propto__>(y, add(alpha,multiply(x,beta)), sigma));\n\n        } catch (const std::exception& e) {\n            stan::lang::rethrow_located(e, current_statement_begin__, prog_reader__());\n            // Next line prevents compiler griping about no return\n            throw std::runtime_error(\"*** IF YOU SEE THIS, PLEASE REPORT A BUG ***\");\n        }\n\n        lp_accum__.add(lp__);\n        return lp_accum__.sum();\n\n    } // log_prob()\n\n    template <bool propto, bool jacobian, typename T_>\n    T_ log_prob(Eigen::Matrix<T_,Eigen::Dynamic,1>& params_r,\n               std::ostream* pstream = 0) const {\n      std::vector<T_> vec_params_r;\n      vec_params_r.reserve(params_r.size());\n      for (int i = 0; i < params_r.size(); ++i)\n        vec_params_r.push_back(params_r(i));\n      std::vector<int> vec_params_i;\n      return log_prob<propto,jacobian,T_>(vec_params_r, vec_params_i, pstream);\n    }\n\n\n    void get_param_names(std::vector<std::string>& names__) const {\n        names__.resize(0);\n        names__.push_back(\"alpha\");\n        names__.push_back(\"beta\");\n        names__.push_back(\"sigma\");\n    }\n\n\n    void get_dims(std::vector<std::vector<size_t> >& dimss__) const {\n        dimss__.resize(0);\n        std::vector<size_t> dims__;\n        dims__.resize(0);\n        dimss__.push_back(dims__);\n        dims__.resize(0);\n        dimss__.push_back(dims__);\n        dims__.resize(0);\n        dimss__.push_back(dims__);\n    }\n\n    template <typename RNG>\n    void write_array(RNG& base_rng__,\n                     std::vector<double>& params_r__,\n                     std::vector<int>& params_i__,\n                     std::vector<double>& vars__,\n                     bool include_tparams__ = true,\n                     bool include_gqs__ = true,\n                     std::ostream* pstream__ = 0) const {\n        typedef double local_scalar_t__;\n\n        vars__.resize(0);\n        stan::io::reader<local_scalar_t__> in__(params_r__,params_i__);\n        static const char* function__ = \"model35d476a93d90_stan_model1_namespace::write_array\";\n        (void) function__;  // dummy to suppress unused var warning\n        // read-transform, write parameters\n        double alpha = in__.scalar_constrain();\n        double beta = in__.scalar_constrain();\n        double sigma = in__.scalar_lb_constrain(0);\n        vars__.push_back(alpha);\n        vars__.push_back(beta);\n        vars__.push_back(sigma);\n\n        // declare and define transformed parameters\n        double lp__ = 0.0;\n        (void) lp__;  // dummy to suppress unused var warning\n        stan::math::accumulator<double> lp_accum__;\n\n        local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());\n        (void) DUMMY_VAR__;  // suppress unused var warning\n\n        try {\n\n\n\n            // validate transformed parameters\n\n            // write transformed parameters\n            if (include_tparams__) {\n            }\n            if (!include_gqs__) return;\n            // declare and define generated quantities\n\n\n\n            // validate generated quantities\n\n            // write generated quantities\n        } catch (const std::exception& e) {\n            stan::lang::rethrow_located(e, current_statement_begin__, prog_reader__());\n            // Next line prevents compiler griping about no return\n            throw std::runtime_error(\"*** IF YOU SEE THIS, PLEASE REPORT A BUG ***\");\n        }\n    }\n\n    template <typename RNG>\n    void write_array(RNG& base_rng,\n                     Eigen::Matrix<double,Eigen::Dynamic,1>& params_r,\n                     Eigen::Matrix<double,Eigen::Dynamic,1>& vars,\n                     bool include_tparams = true,\n                     bool include_gqs = true,\n                     std::ostream* pstream = 0) const {\n      std::vector<double> params_r_vec(params_r.size());\n      for (int i = 0; i < params_r.size(); ++i)\n        params_r_vec[i] = params_r(i);\n      std::vector<double> vars_vec;\n      std::vector<int> params_i_vec;\n      write_array(base_rng,params_r_vec,params_i_vec,vars_vec,include_tparams,include_gqs,pstream);\n      vars.resize(vars_vec.size());\n      for (int i = 0; i < vars.size(); ++i)\n        vars(i) = vars_vec[i];\n    }\n\n    static std::string model_name() {\n        return \"model35d476a93d90_stan_model1\";\n    }\n\n\n    void constrained_param_names(std::vector<std::string>& param_names__,\n                                 bool include_tparams__ = true,\n                                 bool include_gqs__ = true) const {\n        std::stringstream param_name_stream__;\n        param_name_stream__.str(std::string());\n        param_name_stream__ << \"alpha\";\n        param_names__.push_back(param_name_stream__.str());\n        param_name_stream__.str(std::string());\n        param_name_stream__ << \"beta\";\n        param_names__.push_back(param_name_stream__.str());\n        param_name_stream__.str(std::string());\n        param_name_stream__ << \"sigma\";\n        param_names__.push_back(param_name_stream__.str());\n\n        if (!include_gqs__ && !include_tparams__) return;\n\n        if (include_tparams__) {\n        }\n\n\n        if (!include_gqs__) return;\n    }\n\n\n    void unconstrained_param_names(std::vector<std::string>& param_names__,\n                                   bool include_tparams__ = true,\n                                   bool include_gqs__ = true) const {\n        std::stringstream param_name_stream__;\n        param_name_stream__.str(std::string());\n        param_name_stream__ << \"alpha\";\n        param_names__.push_back(param_name_stream__.str());\n        param_name_stream__.str(std::string());\n        param_name_stream__ << \"beta\";\n        param_names__.push_back(param_name_stream__.str());\n        param_name_stream__.str(std::string());\n        param_name_stream__ << \"sigma\";\n        param_names__.push_back(param_name_stream__.str());\n\n        if (!include_gqs__ && !include_tparams__) return;\n\n        if (include_tparams__) {\n        }\n\n\n        if (!include_gqs__) return;\n    }\n\n}; // model\n\n}\n\ntypedef model35d476a93d90_stan_model1_namespace::model35d476a93d90_stan_model1 stan_model;\n\n"
## 
## $model_name
## [1] "stan_model1"
## 
## $model_code
## [1] "// Stan model for simple linear regression\n\ndata {\n int < lower = 1 > N; // Sample size\n vector[N] x; // Predictor\n vector[N] y; // Outcome\n}\n\nparameters {\n real alpha; // Intercept\n real beta; // Slope (regression coefficients)\n real < lower = 0 > sigma; // Error SD\n}\n\nmodel {\n y ~ normal(alpha + x * beta , sigma);\n}\n\ngenerated quantities {\n} // The posterior predictive distribution"
## attr(,"model_name2")
## [1] "stan_model1"

Run the Stan model

stan_model1 <- "stan_model1.stan"

#run the model
fit <- stan(file = stan_model1, data = stan_input_data, warmup = 400, iter = 1500, chains = 4, cores = 2, thin = 1)
fit

## Inference for Stan model: stan_model1.
## 4 chains, each with iter=1500; warmup=400; thin=1; 
## post-warmup draws per chain=1100, total post-warmup draws=4400.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha 11.42    0.00 0.13 11.16 11.34 11.42 11.52 11.68  2042    1
## beta   0.01    0.00 0.01  0.00  0.01  0.01  0.02  0.02  2176    1
## sigma  0.40    0.00 0.05  0.32  0.36  0.39  0.43  0.50  1953    1
## lp__  16.29    0.03 1.31 12.87 15.76 16.64 17.20 17.73  1623    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Jun 12 21:50:43 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

posterior <- extract(fit)
str(posterior)

## List of 4
##  $ alpha: num [1:4400(1d)] 11.6 10.9 11.5 11.5 11.5 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ beta : num [1:4400(1d)] 0.00394 0.03461 0.01363 0.00848 0.01159 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ sigma: num [1:4400(1d)] 0.369 0.511 0.351 0.361 0.481 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ lp__ : num [1:4400(1d)] 16.3 10 16.7 17.3 15.8 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL

Compare your results to our results to “lm”

plot(y ~ x, pch = 20, main="Comaring the resut of Linear and Stan Model")
abline(lm2, col = "navy", pch=22, lty = 2, lw = 3)
abline( mean(posterior$alpha), mean(posterior$beta), col = "red", lw = 1)
legend("topleft",c("Linear Model","Stan Model"),fill=c("navy","red"))

plot(y ~ x, pch = 20, main="Comaring the result of Linear and 'All' Stan Models")

for (i in 1:500) {
 abline(posterior$alpha[i], posterior$beta[i], col = "gray", lty = 1)
}

abline(lm2, col = "navy", pch=22, lty = 2, lw = 3)
abline( mean(posterior$alpha), mean(posterior$beta), col = "red", lw = 1)
legend("topleft",c("Linear Model","Mean Stan Model", "All stan Models"),fill=c("navy","red","grey"))

Comments: Question: What do your Stan model results indicate so far?

We can clearly see from the output visualization of stan model and linear model for sourthern hemisphere that the ice extent does NOT seem to be declining. This is opposite to what we observed for “northern hemisphere”