ANLY505-Intro to Stan

Intro to STAN Homework Set #1 After our Intro to Stan lecture I think it would be valuable to have you go through a similar exercise. Let’s test a second research question. Research question: Is sea ice extent declining in the Southern Hemisphere over time? Is the same pattern happening in the Antarctic as in the Arctic? Fit a Stan model to find out! Make sure you follow the steps we used in class. What do your Stan model results indicate so far?

Load and Inspect Data

library(rstan)

## Loading required package: StanHeaders

## Loading required package: ggplot2

## rstan (Version 2.19.2, GitRev: 2e1f913d3ca3)

## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)

## For improved execution time, we recommend calling
## Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
## although this causes Stan to throw an error on a few processors.

library(gdata)

## gdata: Unable to locate valid perl interpreter
## gdata: 
## gdata: read.xls() will be unable to read Excel XLS and XLSX files
## gdata: unless the 'perl=' argument is used to specify the location
## gdata: of a valid perl intrpreter.
## gdata: 
## gdata: (To avoid display of this message in the future, please
## gdata: ensure perl is installed and available on the executable
## gdata: search path.)

## gdata: Unable to load perl libaries needed by read.xls()
## gdata: to support 'XLX' (Excel 97-2004) files.

##

## gdata: Unable to load perl libaries needed by read.xls()
## gdata: to support 'XLSX' (Excel 2007+) files.

##

## gdata: Run the function 'installXLSXsupport()'
## gdata: to automatically download and install the perl
## gdata: libaries needed to support Excel XLS and XLSX formats.

## 
## Attaching package: 'gdata'

## The following object is masked from 'package:stats':
## 
##     nobs

## The following object is masked from 'package:utils':
## 
##     object.size

## The following object is masked from 'package:base':
## 
##     startsWith

library(bayesplot)

## This is bayesplot version 1.7.0

## - Online documentation and vignettes at mc-stan.org/bayesplot

## - bayesplot theme set to bayesplot::theme_default()

##    * Does _not_ affect other ggplot2 plots

##    * See ?bayesplot_theme_set for details on theme setting

library(readxl)
seaice_1_ <- read_excel("C:/Users/yanru/Desktop/R File/seaice (1).xlsx")
head(seaice_1_)

## # A tibble: 6 x 3
##    year extent_north extent_south
##   <dbl>        <dbl>        <dbl>
## 1  1979         12.3         11.7
## 2  1980         12.3         11.2
## 3  1981         12.1         11.4
## 4  1982         12.4         11.6
## 5  1983         12.3         11.4
## 6  1984         11.9         11.5

colnames(seaice_1_)=c("year","extent_north","extent_south")
str(seaice_1_)

## Classes 'tbl_df', 'tbl' and 'data.frame':    39 obs. of  3 variables:
##  $ year        : num  1979 1980 1981 1982 1983 ...
##  $ extent_north: num  12.3 12.3 12.1 12.4 12.3 ...
##  $ extent_south: num  11.7 11.2 11.4 11.6 11.4 ...

Plot the data

Run a general linear model using lm()

lmseaice=lm(extent_south~year,data=seaice_1_)
summary(lmseaice)

## 
## Call:
## lm(formula = extent_south ~ year, data = seaice_1_)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -14.199551  10.925576  -1.300   0.2018  
## year          0.012953   0.005468   2.369   0.0232 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318

plot(extent_south~year,pch=15,data=seaice_1_)
abline(lmseaice,col=3,lty=2)

Prepare the data, re-run the lm() and extract summary statistics

x=I(seaice_1_$year-1978)
y=seaice_1_$extent_south
lmseaice1=lm(y~x)
summary(lmseaice1)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.421555   0.125490  91.015   <2e-16 ***
## x            0.012953   0.005468   2.369   0.0232 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318

Write the Stan model

N=length(seaice_1_$year)
stan_data = list(N = N, x = x, y = y)
write("//stan model

      data {
      int < lower = 1 > N; // sample size
      vector[N] x; //predictor
      vector[N] y; //outcome
      }
      
      parameters {
      real alpha; // intercept
      real beta; // slope (regression coefficients)
      real < lower = 0 > sigma; // Error SD
      }
      
      model {
      y ~ normal(alpha + x * beta , sigma);
      }
      
      generated quantities {
      } // The posterior predictive distribution",
"stan_model1.stan")

Run the Stan model

stan_model1="stan_model1.stan"
fit <- stan(file = stan_model1, data = stan_data, warmup = 500, iter = 1500, chains = 4, cores = 2, thin = 1)
fit

## Inference for Stan model: stan_model1.
## 4 chains, each with iter=1500; warmup=500; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha 11.42    0.00 0.13 11.16 11.33 11.42 11.50 11.66  1691    1
## beta   0.01    0.00 0.01  0.00  0.01  0.01  0.02  0.02  1767    1
## sigma  0.40    0.00 0.05  0.32  0.36  0.40  0.43  0.50  1951    1
## lp__  16.27    0.03 1.31 12.77 15.71 16.62 17.20 17.72  1476    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Oct 13 22:45:47 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

posterior1=extract(fit)
str(posterior1)

## List of 4
##  $ alpha: num [1:4000(1d)] 11.5 11.3 11.5 11.6 11.3 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ beta : num [1:4000(1d)] 0.00784 0.01835 0.00743 0.00588 0.01805 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ sigma: num [1:4000(1d)] 0.362 0.337 0.407 0.345 0.404 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ lp__ : num [1:4000(1d)] 17.2 16.6 17.2 16.3 17.3 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL

Compare your results to our results to “lm”

plot(y~x,pch=15)
abline(lmseaice,col=3,lty=3)
abline(mean(posterior1$alpha),mean(posterior1$beta),col=5)

plot(y ~ x, pch = 15)
for (i in 1:500){
  abline(posterior1$alpha[i], posterior1$beta[i], col = "light gray", lty = 2)
}
abline(mean(posterior1$alpha), mean(posterior1$beta), col = 5)

What do your Stan model results indicate so far?

From both the stan model and general linear model, we can tell the ice extent increases in the southern hemosphere.

ANLY505-Intro to Stan

Rui Yan

Oct.13,2019