After our Intro to Stan lecture I think it would be valuable to have you go through a similar exercise. Let’s test a second research question.
Research question: Is sea ice extent declining in the Southern Hemisphere over time? Is the same pattern happening in the Antarctic as in the Arctic? Fit a Stan model to find out!
Make sure you follow the steps we used in class.
What do your Stan model results indicate so far?
### Import the data in to R
seaice <- read.csv("C:/Users/Dhruva/Desktop/ANLY 505-90-O/Assignments/seaice.csv", fileEncoding="UTF-8-BOM")
str(seaice)
## 'data.frame': 39 obs. of 3 variables:
## $ year : int 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 ...
## $ extent_north: num 12.3 12.3 12.1 12.4 12.3 ...
## $ extent_south: num 11.7 11.2 11.4 11.6 11.4 ...
summary(seaice)
## year extent_north extent_south
## Min. :1979 Min. :10.15 Min. :10.69
## 1st Qu.:1988 1st Qu.:10.90 1st Qu.:11.42
## Median :1998 Median :11.67 Median :11.67
## Mean :1998 Mean :11.46 Mean :11.68
## 3rd Qu.:2008 3rd Qu.:11.98 3rd Qu.:11.88
## Max. :2017 Max. :12.45 Max. :12.78
The seaice dataset contains 39 observations and 3 variables year, extent_north and extent_south, starting from year 1979 to 2017.
#plot data
plot(extent_south ~ year, data = seaice)
lm <- lm(extent_south ~ year, data = seaice)
summary(lm)
##
## Call:
## lm(formula = extent_south ~ year, data = seaice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.23372 -0.18142 0.01587 0.18465 0.88814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14.199551 10.925576 -1.300 0.2018
## year 0.012953 0.005468 2.369 0.0232 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared: 0.1317, Adjusted R-squared: 0.1082
## F-statistic: 5.611 on 1 and 37 DF, p-value: 0.02318
x <- I(seaice$year - 1978)
y <- seaice$extent_south
N <- length(seaice$year)
lm1 <- lm(y ~ x)
summary(lm1)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.23372 -0.18142 0.01587 0.18465 0.88814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.421555 0.125490 91.015 <2e-16 ***
## x 0.012953 0.005468 2.369 0.0232 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared: 0.1317, Adjusted R-squared: 0.1082
## F-statistic: 5.611 on 1 and 37 DF, p-value: 0.02318
stan_data <- list(N = N, x = x, y = y)
write("// Stan model for simple linear regression
data {
int < lower = 1 > N; // Sample size
vector[N] x; // Predictor
vector[N] y; // Outcome
}
parameters {
real alpha; // Intercept
real beta; // Slope (regression coefficients)
real < lower = 0 > sigma; // Error SD
}
model {
y ~ normal(alpha + x * beta , sigma);
}
generated quantities {
} // The posterior predictive distribution",
"stan_model1.stan")
stan_model1 <- "stan_model1.stan"
fit <- stan(file = stan_model1, data = stan_data, warmup = 500, iter = 1000, chains = 4, cores = 2, thin = 1)
fit
## Inference for Stan model: stan_model1.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 11.41 0.00 0.13 11.16 11.33 11.41 11.50 11.67 792 1.00
## beta 0.01 0.00 0.01 0.00 0.01 0.01 0.02 0.02 854 1.00
## sigma 0.40 0.00 0.05 0.32 0.37 0.39 0.42 0.51 920 1.01
## lp__ 16.33 0.04 1.25 12.99 15.80 16.67 17.26 17.72 818 1.01
##
## Samples were drawn using NUTS(diag_e) at Sun Jun 14 12:44:10 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
posterior <- extract(fit)
str(posterior)
## List of 4
## $ alpha: num [1:2000(1d)] 11.5 11.5 11.4 11.2 11.4 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ beta : num [1:2000(1d)] 0.01073 0.00237 0.02079 0.01846 0.01438 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ sigma: num [1:2000(1d)] 0.514 0.419 0.393 0.442 0.354 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ lp__ : num [1:2000(1d)] 14.9 13.6 15.4 16.3 17.5 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
plot(y ~ x, pch = 20)
abline(lm1, col = 2, lty = 2, lw = 3)
abline( mean(posterior$alpha), mean(posterior$beta), col = 6, lw = 2)
plot(y ~ x, pch = 20, main="Multiple estimates from the posterior")
for (i in 1:500) {
abline(posterior$alpha[i], posterior$beta[i], col = "pink", lty = 1)
}
abline(lm1, col = "blue", pch=22, lty = 2, lw = 3)
abline( mean(posterior$alpha), mean(posterior$beta), col = "red", lw = 1)
legend("topleft",c("Linear Model","Stan Model", "Multi Estimates"),fill=c("blue","red","pink"))
The sea ice extent is increasing in the Southern Hemisphere over time, which means the Antarctic sea ice extent pattern is completely opposite to the sea ice extent Arctic pattern.