Intro to STAN Homework Set #1 After our Intro to Stan lecture I think it would be valuable to have you go through a similar exercise. Let’s test a second research question. Research question: Is sea ice extent declining in the Southern Hemisphere over time? Is the same pattern happening in the Antarctic as in the Arctic? Fit a Stan model to find out! Make sure you follow the steps we used in class. What do your Stan model results indicate so far?
Load and Inspect Data
library(rstan)
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.19.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For improved execution time, we recommend calling
## Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
## although this causes Stan to throw an error on a few processors.
library(gdata)
## gdata: Unable to locate valid perl interpreter
## gdata:
## gdata: read.xls() will be unable to read Excel XLS and XLSX files
## gdata: unless the 'perl=' argument is used to specify the location
## gdata: of a valid perl intrpreter.
## gdata:
## gdata: (To avoid display of this message in the future, please
## gdata: ensure perl is installed and available on the executable
## gdata: search path.)
## gdata: Unable to load perl libaries needed by read.xls()
## gdata: to support 'XLX' (Excel 97-2004) files.
##
## gdata: Unable to load perl libaries needed by read.xls()
## gdata: to support 'XLSX' (Excel 2007+) files.
##
## gdata: Run the function 'installXLSXsupport()'
## gdata: to automatically download and install the perl
## gdata: libaries needed to support Excel XLS and XLSX formats.
##
## Attaching package: 'gdata'
## The following object is masked from 'package:stats':
##
## nobs
## The following object is masked from 'package:utils':
##
## object.size
## The following object is masked from 'package:base':
##
## startsWith
library(bayesplot)
## This is bayesplot version 1.7.0
## - Online documentation and vignettes at mc-stan.org/bayesplot
## - bayesplot theme set to bayesplot::theme_default()
## * Does _not_ affect other ggplot2 plots
## * See ?bayesplot_theme_set for details on theme setting
library(readxl)
seaice_1_ <- read_excel("C:/Users/yanru/Desktop/R File/seaice (1).xlsx")
head(seaice_1_)
## # A tibble: 6 x 3
## year extent_north extent_south
## <dbl> <dbl> <dbl>
## 1 1979 12.3 11.7
## 2 1980 12.3 11.2
## 3 1981 12.1 11.4
## 4 1982 12.4 11.6
## 5 1983 12.3 11.4
## 6 1984 11.9 11.5
colnames(seaice_1_)=c("year","extent_north","extent_south")
str(seaice_1_)
## Classes 'tbl_df', 'tbl' and 'data.frame': 39 obs. of 3 variables:
## $ year : num 1979 1980 1981 1982 1983 ...
## $ extent_north: num 12.3 12.3 12.1 12.4 12.3 ...
## $ extent_south: num 11.7 11.2 11.4 11.6 11.4 ...
Plot the data
Run a general linear model using lm()
lmseaice=lm(extent_south~year,data=seaice_1_)
summary(lmseaice)
##
## Call:
## lm(formula = extent_south ~ year, data = seaice_1_)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.23372 -0.18142 0.01587 0.18465 0.88814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14.199551 10.925576 -1.300 0.2018
## year 0.012953 0.005468 2.369 0.0232 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared: 0.1317, Adjusted R-squared: 0.1082
## F-statistic: 5.611 on 1 and 37 DF, p-value: 0.02318
plot(extent_south~year,pch=15,data=seaice_1_)
abline(lmseaice,col=3,lty=2)
Prepare the data, re-run the lm() and extract summary statistics
x=I(seaice_1_$year-1978)
y=seaice_1_$extent_south
lmseaice1=lm(y~x)
summary(lmseaice1)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.23372 -0.18142 0.01587 0.18465 0.88814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.421555 0.125490 91.015 <2e-16 ***
## x 0.012953 0.005468 2.369 0.0232 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared: 0.1317, Adjusted R-squared: 0.1082
## F-statistic: 5.611 on 1 and 37 DF, p-value: 0.02318
Write the Stan model
N=length(seaice_1_$year)
stan_data = list(N = N, x = x, y = y)
write("//stan model
data {
int < lower = 1 > N; // sample size
vector[N] x; //predictor
vector[N] y; //outcome
}
parameters {
real alpha; // intercept
real beta; // slope (regression coefficients)
real < lower = 0 > sigma; // Error SD
}
model {
y ~ normal(alpha + x * beta , sigma);
}
generated quantities {
} // The posterior predictive distribution",
"stan_model1.stan")
Run the Stan model
stan_model1="stan_model1.stan"
fit <- stan(file = stan_model1, data = stan_data, warmup = 500, iter = 1500, chains = 4, cores = 2, thin = 1)
fit
## Inference for Stan model: stan_model1.
## 4 chains, each with iter=1500; warmup=500; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 11.42 0.00 0.13 11.16 11.33 11.42 11.50 11.66 1691 1
## beta 0.01 0.00 0.01 0.00 0.01 0.01 0.02 0.02 1767 1
## sigma 0.40 0.00 0.05 0.32 0.36 0.40 0.43 0.50 1951 1
## lp__ 16.27 0.03 1.31 12.77 15.71 16.62 17.20 17.72 1476 1
##
## Samples were drawn using NUTS(diag_e) at Sun Oct 13 22:45:47 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
posterior1=extract(fit)
str(posterior1)
## List of 4
## $ alpha: num [1:4000(1d)] 11.5 11.3 11.5 11.6 11.3 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ beta : num [1:4000(1d)] 0.00784 0.01835 0.00743 0.00588 0.01805 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ sigma: num [1:4000(1d)] 0.362 0.337 0.407 0.345 0.404 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ lp__ : num [1:4000(1d)] 17.2 16.6 17.2 16.3 17.3 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
Compare your results to our results to “lm”
plot(y~x,pch=15)
abline(lmseaice,col=3,lty=3)
abline(mean(posterior1$alpha),mean(posterior1$beta),col=5)
plot(y ~ x, pch = 15)
for (i in 1:500){
abline(posterior1$alpha[i], posterior1$beta[i], col = "light gray", lty = 2)
}
abline(mean(posterior1$alpha), mean(posterior1$beta), col = 5)
What do your Stan model results indicate so far?
From both the stan model and general linear model, we can tell the ice extent increases in the southern hemosphere.