Intro to STAN Homework Part #1

After our Intro to Stan lecture I think it would be valuable to have you go through a similar exercise. Let’s test a second research question.

Research question: Is sea ice extent declining in the Southern Hemisphere over time? Is the same pattern happening in the Antarctic as in the Arctic? Fit a Stan model to find out!

Make sure you follow the steps we used in class.

What do your Stan model results indicate so far?

1. Load and Inspect Data

#place the code here

sea_ice <- read.csv("seaice.csv", stringsAsFactors = F)
colnames(sea_ice) <- c("year", "extent_north", "extent_south")

2. Plot the data

#plot data

plot(extent_south ~ year, pch = 20, col='red',main="sea ice extent  in the Southern Hemisphere over time ", data = sea_ice)

3. Run a general linear model using lm()

#write the code

lm1 <- lm(extent_south ~ year, data = sea_ice)
summary(lm1)
## 
## Call:
## lm(formula = extent_south ~ year, data = sea_ice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -14.199551  10.925576  -1.300   0.2018  
## year          0.012953   0.005468   2.369   0.0232 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318

4. Index the data, re-run the lm(), extract summary statistics and turn the indexed data into a dataframe to pass into Stan

#write the code here

x <- I(sea_ice$year - 1978)#this is to give numeric values to year starting from 1 
y <- sea_ice$extent_south
N <- length(sea_ice$year)

lm2 <- lm(y ~ x)
summary(lm2)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.421555   0.125490  91.015   <2e-16 ***
## x            0.012953   0.005468   2.369   0.0232 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318
lm_alpha <- summary(lm2)$coeff[1]  
lm_beta <- summary(lm2)$coeff[2] 
lm_sigma <- sigma(lm2) 

#input data in the form of a list for stan model

stan_input_data <- list(N=N,x=x,y=y)

5. Write the Stan model

#write the code

 write("// Stan model for simple linear regression

data {
 int < lower = 1 > N; // Sample size
 vector[N] x; // Predictor
 vector[N] y; // Outcome
}

parameters {
 real alpha; // Intercept
 real beta; // Slope (regression coefficients)
 real < lower = 0 > sigma; // Error SD
}

model {
 y ~ normal(alpha + x * beta , sigma);
}

generated quantities {
} // The posterior predictive distribution",

"stan_model1.stan")


stan_model1 <- "stan_model1.stan"

6. Run the Stan model and inspect the results

#code here

stan_model1 <- "stan_model1.stan"

#run the model
fit <- stan(file = stan_model1, data = stan_input_data, warmup = 400, iter = 1500, chains = 4, cores = 2, thin = 1)
fit
## Inference for Stan model: stan_model1.
## 4 chains, each with iter=1500; warmup=400; thin=1; 
## post-warmup draws per chain=1100, total post-warmup draws=4400.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha 11.42    0.00 0.13 11.16 11.34 11.42 11.51 11.68  1627    1
## beta   0.01    0.00 0.01  0.00  0.01  0.01  0.02  0.02  1728    1
## sigma  0.40    0.00 0.05  0.32  0.36  0.39  0.43  0.50  2118    1
## lp__  16.31    0.03 1.24 13.10 15.74 16.62 17.21 17.73  1501    1
## 
## Samples were drawn using NUTS(diag_e) at Mon Mar 30 21:43:42 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

7. Extract the posterior estimates into a list so we can plot them

#code here

posterior <- extract(fit)
str(posterior)
## List of 4
##  $ alpha: num [1:4400(1d)] 11.5 11.4 11.5 11.4 11.2 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ beta : num [1:4400(1d)] 0.00952 0.01618 0.01013 0.0136 0.01918 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ sigma: num [1:4400(1d)] 0.441 0.414 0.37 0.382 0.477 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL
##  $ lp__ : num [1:4400(1d)] 16.8 17.4 17.6 17.8 13.9 ...
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ iterations: NULL

8. Compare your results to our results to “lm”

#code here

plot(y ~ x, pch = 20, main="Comaring the resut of Linear and Stan Model")
abline(lm2, col = "navy", pch=22, lty = 2, lw = 3)
abline( mean(posterior$alpha), mean(posterior$beta), col = "red", lw = 1)
legend("topleft",c("Linear Model","Stan Model"),fill=c("navy","red"))

9. Plot multiple estimates from the posterior

#code here


plot(y ~ x, pch = 20, main="Comaring the result of Linear and 'All' Stan Models")

for (i in 1:500) {
 abline(posterior$alpha[i], posterior$beta[i], col = "gray", lty = 1)
}

abline(lm2, col = "navy", pch=22, lty = 2, lw = 3)
abline( mean(posterior$alpha), mean(posterior$beta), col = "red", lw = 1)
legend("topleft",c("Linear Model","Mean Stan Model", "All stan Models"),fill=c("navy","red","grey"))

We can clearly see from the output visualization of stan model and linear model for southern hemisphere the ice extent is not declining.