ANLY505 - Intro to Stan Problem Set #1

Intro to STAN Homework Set #1

After our Intro to Stan lecture I think it would be valuable to have you go through a similar exercise. Let’s test a second research question.

Research question: Is sea ice extent declining in the Southern Hemisphere over time? Is the same pattern happening in the Antarctic as in the Arctic? Fit a Stan model to find out!

Make sure you follow the steps we used in class.

What do your Stan model results indicate so far?

Load and Inspect Data

seaice <- read_excel("~/Grad School/ANLY 505/Problem Sets/Problem Set #5 - Intro to STAN/seaice.xlsx")
head(seaice)

## # A tibble: 6 x 3
##    year extent_north extent_south
##   <dbl>        <dbl>        <dbl>
## 1  1979         12.3         11.7
## 2  1980         12.3         11.2
## 3  1981         12.1         11.4
## 4  1982         12.4         11.6
## 5  1983         12.3         11.4
## 6  1984         11.9         11.5

Plot the data

plot(extent_south ~ year, pch = 20, data = seaice)

At initial inspection, it appears that there isn’t any loss in sea ice extent in the southern hemisphere, as time moves forward. To be sure, we will run a linear model.

Run a general linear model using lm()

lm1 <- lm(extent_south ~ year, data = seaice)
summary(lm1)

## 
## Call:
## lm(formula = extent_south ~ year, data = seaice)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -14.199551  10.925576  -1.300   0.2018  
## year          0.012953   0.005468   2.369   0.0232 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318

With a p-value of 0.02318, we have a statistically significant model. Looking at the year coefficient, we see a value of 0.012953. This tells us that for every increase in year, we basically see zero effect on the extent of sea ice.

Prepare the data, re-run the lm() and extract summary statistics

x <- I(seaice$year - 1978)
y <- seaice$extent_south
N <- length(seaice$year)

lm1 <- lm(y ~ x)
summary(lm1)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23372 -0.18142  0.01587  0.18465  0.88814 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.421555   0.125490  91.015   <2e-16 ***
## x            0.012953   0.005468   2.369   0.0232 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 37 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1082 
## F-statistic: 5.611 on 1 and 37 DF,  p-value: 0.02318

lm_alpha <- summary(lm1)$coeff[1]; lm_alpha  # the intercept

## [1] 11.42155

lm_beta <- summary(lm1)$coeff[2]; lm_beta  # the slope

## [1] 0.01295304

lm_sigma <- sigma(lm1); lm_sigma  # the residual error

## [1] 0.384331

Write the Stan model

stan_data <- list(N = N, x = x, y = y)

write("// Stan model for simple linear regression

      data {
      int < lower = 1 > N; // Sample size
      vector[N] x; // Predictor
      vector[N] y; // Outcome
      }
      
      parameters {
      real alpha; // Intercept
      real beta; // Slope (regression coefficients)
      real < lower = 0 > sigma; // Error SD
      }
      
      model {
      y ~ normal(alpha + x * beta , sigma);
      }
      
      generated quantities {
      } // The posterior predictive distribution",

      "stan_model1.stan")

# Now let's save that file path
stan_model1 <- "stan_model1.stan"

Run the Stan model

fit <- stan(file = stan_model1, data = stan_data, warmup = 500, iter = 1000, chains = 4, cores = 2, thin = 1)
fit

## Inference for Stan model: stan_model1.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha 11.42    0.00 0.13 11.16 11.33 11.42 11.51 11.67   800    1
## beta   0.01    0.00 0.01  0.00  0.01  0.01  0.02  0.02   829    1
## sigma  0.40    0.00 0.05  0.32  0.37  0.39  0.42  0.51  1079    1
## lp__  16.30    0.05 1.30 12.88 15.74 16.65 17.25 17.75   644    1
## 
## Samples were drawn using NUTS(diag_e) at Mon Jul 29 19:11:17 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Compare your results to our results to “lm”

posterior <- extract(fit)
plot(y ~ x, pch = 20)
abline(lm1, col = 2, lty = 2, lw = 3)

abline( mean(posterior$alpha), mean(posterior$beta), col = 6, lw = 2)

The results are identical to our “lm” model.