Statistical Rethinking: Sample Problems

Here, I am going to run through some sample questions from the end of chapters 1 through 7 of the Bayesian statistics textbook “Statistical Rethinking”. These are all my interpretations of the chapter problems.

2H4, (p. 47)

I’m going to answer the first part of the question using the bayes theorum formula, and the second part using the counting method. I just wanted to try mixing things up a bit to see if I could answer the second question the way Marta answered similar questions in class.

Part 1.

Compute the probability that the species is Panda A, given the test says it is Panda A.

Solve this:

\(p(panda = A|test = A) = \displaystyle \frac{p(test = A|panda = A) * p(panda = A)}{p(test = A)}\)

Variables

\(p(panda = A) = 0.5\)
\(p(panda = B) = 0.5\)
\(p(test = A|panda = A) = 0.8\)
\(p(test = B|panda = B) = 0.65\)
\(p(test = A|panda = B) = 0.35\)
\(p(test = B|panda = A) = 0.2\)

Marginal Probability

\(p(test = A)\)
\(= p(test = A|panda = A) * p(panda = A) + p(test = A|panda = B) * p(panda = B)\)
\(= ((0.8)(0.5)) + ((0.35)(0.5))\)
\(= 0.575\)

Solution

\(p(panda = A|test = A)\)

\(=\frac{((0.8) (0.5))}{0.575}\)

\(= 0.695\)

Answer: The probability that the species is Panda A, given the test says it is Panda A, is ~ 70%.

Part 2.

Compute the probability that the species is Panda A, given the test says it is Panda A, the Panda had a twin born first, and the panda had a singleton born second.

Solve this:

Probability Flowchart = Probability of panda A | test = A, firstborn = twin, secondborn = singleton

Above is a tree diagram I put together using a website that lets you make flowcharts and save the images. If we start with 100,000 pandas, and if we are testing panda A, 3,600 will have the test come back as A, have a twin as the first born, and have a singleton as the second born. I just follow the chart from Panda A through all of the relevant conditions: Pandas (100,000) -> A (50,000) -> test = A (40,000) -> twins (4,000) -> singleton (3,600). The only other way to get a test that says A, when a twin as the first born, and when a singleton is the second born is if we are testing panda B, but the test mistakes Panda B for Panda A. In that case, 2,800 pandas out of 100,000 would fall into this category: Pandas (100,000) -> B (50,000) -> test = A (17,500) -> twins (3,500) -> singleton (2,800). That gives us 6,400 pandas (3,600 + 2,800) out of 100,000 that could have a test identify them as panda A, have twins for firstborns, and have a singleton as a second born:
3600/(3600 + 2800) = 0.563 = 56%.
Therefore, there is a 56% probability that we are testing panda A given the test says panda A, the panda’s firstborn was twins, and the secondborn was a singleton.

3M5. (p. 70)

First, I need to redo what I did for 3M1 to 3M4, and then build on that.

#Load libraries.
library(tidyverse)
library(rethinking)

First, I’ll compute the model with a flat prior.

# 3M1

# Construct a standardised posterior.
p_grid <- seq(from = 0, to = 1, length.out = 1000)
prior <- rep(1, 1000)
likelihood <- dbinom(8, size = 15, prob = p_grid)
posterior <- likelihood * prior
posterior_flat <- posterior/sum(posterior)

# 3M2

samples_flat <- sample(p_grid, size = 1e4, replace = TRUE, prob = posterior_flat)

# Set plotting layout.
par(mfrow=c(1,2))
plot(samples_flat)

dens(samples_flat, show.HPDI = 0.9)

HPDI(samples_flat, prob = 0.90)

##      |0.9      0.9| 
## 0.3373373 0.7157157

The posterior seems to be centered around 50%, close to the 8/15 tosses that produced it. The 90% highest probability density interval is between approximately 0.33-0.71. So the 90% most probable values exist in quite a large range. There is a decent amount of uncertainty.

# 3M3

# Posterior predictive check. Take random value from posterior sample distribution corresponding to a probability distribution of the sample distribuiton and 15 draws. Construct likelihood distribution with n = 15 and a parameter value corresponding to the previous draw. Do this 10000 times and plot the resulting distribution of data. It looks like 8 out of 15 is the most common value drawn from all the likelihood distributions.
w <- rbinom(1e4, size = 15, prob = samples_flat)
simplehist(w)

# Create data table with the probability of all 15 possible draws.
tab <- table(w)/1e4

# Turn table into dataframe.
tab <- as.data.frame(tab)

# display probability of 8 out of 15 draws = 0.1499, or 15%.
tab[tab$w == 8, ]$Freq

## [1] 0.1491

The most probable value of water based off draws from the posterior is 8. It has approximately 14.4% probablility of producing the data we have.

Now it asks us to calculate the probability of observing 6/9 waters, constructed from the new (8/15) data.

# 3M4

# Posterior predictive check
w <- rbinom(1e4, size = 9, prob = samples_flat)
simplehist(w)

# Create data table with the probability of all 15 possible draws.
tab <- table(w)/1e4

# Turn table into dataframe.
tab <- as.data.frame(tab)

# display probability of 6 out of 9 draws = 0.1499, or 15%.
tab[tab$w == 6, ]$Freq

## [1] 0.1845

The probability of observing 6/9 waters is approximately 18%.

Now, I’ll question 3M5.

First, it says to construct a posterior distribution via grid approximation using a prior that is zero below 0.5, and constant above that. This prior suggests Earth is covered by a majority of water, so the posterior should be closer to the true proportion (I think).

# Construct standardised posterior.
p_grid <- seq(from = 0, to = 1, length.out = 1000)
prior <- ifelse(p_grid < 0.5, 0, 1)
likelihood <- dbinom(8, size = 15, prob = p_grid)
posterior <- likelihood * prior
posterior_skewed <- posterior/sum(posterior)

Now sample from the posterior, and plot samples.

samples_skewed <- sample(p_grid, size = 1e4, replace = TRUE, prob = posterior_skewed)

# Set plotting layout.
par(mfrow=c(1,2))
plot(samples_skewed)

dens(samples_skewed, show.HPDI = 0.9)

HPDI(samples_skewed, prob = 0.90)

##      |0.9      0.9| 
## 0.5005005 0.7137137

Because the prior is zero below 0.5, this biases the posterior to values higher than 0.5. Now, the HPDI is approximately between 0.5 and 0.7. With the previous totally flat prior, the 90% HPDI was from approximately 0.33 nd 0.72. In this case our upper estimate is relatively the same and is sitting around ~.70, while the lower estimate is obviously biased upwards with the modified prior.

w <- rbinom(1e4, size = 15, prob = samples_skewed)
simplehist(w)

# Create data table with the probability of all 15 possible draws.
tab <- table(w)/1e4

# Turn table into dataframe.
tab <- as.data.frame(tab)

# display probability of 8 out of 15 draws = 0.1499, or 15%.
tab[tab$w == 8,]$Freq

## [1] 0.1532

Now, the probability of getting 8/15 via the PPC is most probable at approximately 15% as opposed to the PPC with the flat prior which was at 14%. So the more “informative” prior makes the true data generating value a bit more likely, but not by very much. So the new prior enhanced our estimate, but only by a little bit. The numbers differ a bit each time I run the map models, so these comparisons may not be exactly the same as what the output shows. But, the estimate with the skewed prior always leads to 8/15 being slightly more probable compared to the flat prior.

# Posterior predictive check
w <- rbinom(1e4, size = 9, prob = samples_skewed)
simplehist(w)

# Create data table with the probability of all 15 possible draws.
tab <- table(w)/1e4

# Turn table into dataframe.
tab <- as.data.frame(tab)

# display probability of 6 out of 9 draws = 0.1499, or 15%.
tab[tab$w == 6,]$Freq

## [1] 0.2318

Now, the probability of getting 6 of 9 water is about 23%, rather than the previous 18%.

Overall, the skewed prior does seem to to slightly enhance our estimate as the prior is a more realistic estimate of the true proportion of water. However, it does not seem to improve our estimates very much. Perhaps more data is needed to do that.

4H3 (p. 116)

# Load library
library(rethinking)

# Load data
data("Howell1")

# Make dataframe
d <- Howell1

# Take log weight
d$log_weight <- log(d$weight)

# Check variable
str(d)

## 'data.frame':    544 obs. of  5 variables:
##  $ height    : num  152 140 137 157 145 ...
##  $ weight    : num  47.8 36.5 31.9 53 41.3 ...
##  $ age       : num  63 63 65 41 51 35 32 27 19 54 ...
##  $ male      : int  1 0 0 1 0 1 0 1 0 1 ...
##  $ log_weight: num  3.87 3.6 3.46 3.97 3.72 ...

a)

# Make model with height as a function of weight.
M4H3 <- map(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + bW*log_weight ,
a ~ dnorm( 178 , 100 ) ,
bW ~ dnorm( 0 , 100 ) , 
sigma ~ dunif( 0 , 50 )
) ,
data=d )

precis(M4H3)

##             mean        sd       5.5%      94.5%
## a     -23.783346 1.3351205 -25.917127 -21.649566
## bW     47.075078 0.3825466  46.463694  47.686461
## sigma   5.134716 0.1556694   4.885927   5.383506

Height increases by 47.08cm for every 1-unit log/kg of weight increase. Because it is not centered, the intercept does not make sense, but we aren’t interpreting it here so I think it’s fine to leave it as is.

b)

Here is the plot we are told to begin with. In addition, the plot beside it is the superimposed function, 97% HPDI for the mean, and the 97% HPDI for predicted height.

par(mfrow = c(1,2))
plot(height ~ weight, data=Howell1, col = col.alpha(rangi2, 0.4))

# Plot model
plot(height ~ weight, data = Howell1, col = col.alpha(rangi2, 0.5))
weight.seq <- seq(from = 0, to = 70, by = 1)
mu <- link(M4H3, data = data.frame(log_weight = log(weight.seq))) # Added log to weight.seq
mu.means <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.97)
sim.height <- sim(M4H3, data = list(log_weight = log(weight.seq))) # Added log to weight.seq
height.PI <- apply(sim.height, 2, PI, prob = 0.97)
lines(weight.seq, mu.means)
shade(mu.HPDI, weight.seq)
shade(height.PI, weight.seq)

It appears to be a really good fit! So, the collegue from the question telling us to model the relationship like this was right.

5H3, (p. 164)

Here is the fox data.

library(rethinking)
data("foxes")
foxes <- foxes
str(foxes)

## 'data.frame':    116 obs. of  5 variables:
##  $ group    : int  1 1 2 2 3 3 4 4 5 5 ...
##  $ avgfood  : num  0.37 0.37 0.53 0.53 0.49 0.49 0.45 0.45 0.74 0.74 ...
##  $ groupsize: int  2 2 2 2 2 2 2 2 3 3 ...
##  $ area     : num  1.09 1.09 2.05 2.05 2.12 2.12 1.29 1.29 3.78 3.78 ...
##  $ weight   : num  5.02 2.84 5.33 6.07 5.85 3.25 4.53 4.09 6.13 5.59 ...

We are asked to make two models, and then compare them to the models from the previous questions.

####### All models required for 5H3.

# 1: weight ~ food + groupsize
Weight_x_Food.GroupSize <- map(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bF * avgfood + bG * groupsize,
    a ~ dnorm(0, 100),
    bF ~ dnorm(0, 10),
    bG ~ dnorm(0, 10),
    sigma ~ dunif(0, 10)
  ), data = foxes
)

# 2: weight ~ food + groupsize + area
Weight_x_Food.GroupSize.Area <- map(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bF * avgfood + bG * groupsize + bA * area,
    a ~ dnorm(0, 100),
    bF ~ dnorm(0, 10),
    bG ~ dnorm(0, 10),
    bA ~ dnorm(0, 10),
    sigma ~ dunif(0, 10)
  ), data = foxes
)

# Models from H1.

# H1: weight ~ area.
Weight_x_Area <- map(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bA * area,
    a ~ dnorm(0, 100),
    bA ~ dnorm(0, 10),
    sigma ~ dunif(0, 10)
  ), data = foxes
)

# H1: weight ~ groupsize.
Weight_x_Groupsize <- map(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bG * groupsize,
    a ~ dnorm(0, 100),
    bG ~ dnorm(0, 10),
    sigma ~ dunif(0, 10)
  ), data = foxes
)

# Model from H2.

# H2: weight ~ area + groupsize.
Weight_x_Area.Groupsize <- map(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bA * area + bG * groupsize,
    a ~ dnorm(0, 100),
    bA ~ dnorm(0, 10),
    bG ~ dnorm(0, 10),
    sigma ~ dunif(0, 10)
  ), data = foxes
)

par(mfrow=c(2,3))
precis_plot(precis(Weight_x_Area), main = "H1: Weight ~ Area")
precis_plot(precis(Weight_x_Groupsize), main = "H1: Weight ~ Groupsize")
precis_plot(precis(Weight_x_Area.Groupsize), main = "H2: Weight ~ Area + Groupsize")
precis_plot(precis(Weight_x_Food.GroupSize), main = "H3: Weight ~ Food + Groupsize")
precis_plot(precis(Weight_x_Food.GroupSize.Area), main = "H3: Weight ~ Food + Groupsize + Area")

a)

This question asks if food or area is a better predictor of weight. On its own, area does not predict weight very well at all. However, when controlling for group size, area DOES seem to predict weight in a positive direction. When groupsize is controlled for, average food positively predicts weight, although there is very large uncertainty associated with the coefficient. Before looking at what happens when they are both put in the same model (which is what happens in the second part of the question), I’ll just look at them on there own below.

# Area for: Weight ~ Area model:

# Prepare new counterfactual data.
A1.seq <- seq(from = 0, to = 6, length.out = 30)
pred.data.area <- data.frame(
  area = A1.seq
)
# Copmute counterfactual mean divorce (mu)
mu.area <- link(Weight_x_Area, data = pred.data.area)
mu.mean.area <- apply(mu.area, 2, mean)
mu.PI.area <- apply(mu.area, 2, PI)
# simulate counterfactual divorce rate
R.sim.area <- sim(Weight_x_Area, data = pred.data.area, n = 1e4)
R.PI.area <- apply(R.sim.area, 2, PI)


# Area for: Weight ~ Area + Groupsize

# Prepare new counterfactual data.
G.avg <- mean(foxes$groupsize)
A2.seq <- seq(from = 0, to = 6, length.out = 30)
pred.data.avgarea <- data.frame(
  area = A2.seq,
  groupsize = G.avg
)
# Copmute counterfactual mean divorce (mu)
mu.avgarea <- link(Weight_x_Area.Groupsize, data = pred.data.avgarea)
mu.mean.avgarea <- apply(mu.avgarea, 2, mean)
mu.PI.avgarea <- apply(mu.avgarea, 2, PI)
# simulate counterfactual divorce rate
R.sim.avgarea <- sim(Weight_x_Area.Groupsize, data = pred.data.avgarea, n = 1e4)
R.PI.avgarea <- apply(R.sim.avgarea, 2, PI)

par(mfrow=c(1,2))
# Display predictions, hiding raw data with type = "n"
plot(weight ~ area, data = foxes, type = "n")
mtext("Weight ~ Area")
#lines(A1.seq, mu.mean.area)
shade(mu.PI.area, A1.seq)
shade(R.PI.area, A1.seq)

# Display predictions, hiding raw data with type = "n"
plot(weight ~ area, data = foxes, type = "n")
mtext("Groupsize = 0")
#lines(A2.seq, mu.avgarea)
shade(mu.PI.avgarea, A2.seq)
shade(R.PI.avgarea, A2.seq)

It’s pretty clear that area does not predict weight, unless group size is controlled for.

# Area for: Weight ~ Area model:

# Prepare new counterfactual data.
AF1.seq <- seq(from = 0, to = 2, length.out = 30)
G.avg <- mean(foxes$avgfood)
pred.data.area <- data.frame(
  avgfood = AF1.seq,
  groupsize = G.avg
)
# Copmute counterfactual mean divorce (mu)
mu.averagefood <- link(Weight_x_Food.GroupSize, data = pred.data.area)
mu.mean.averagefood <- apply(mu.averagefood, 2, mean)
mu.PI.averagefood <- apply(mu.averagefood, 2, PI)
# simulate counterfactual divorce rate
R.sim.averagefood <- sim(Weight_x_Food.GroupSize, data = pred.data.area, n = 1e4)
R.PI.averagefood <- apply(R.sim.averagefood, 2, PI)


# Area for: Weight ~ Area + Groupsize

# Prepare new counterfactual data.
AF2.seq <- seq(from = 0, to = 2, length.out = 30)
G.avg <- mean(foxes$avgfood)
A.avg <- mean(foxes$area)
pred.data.area2 <- data.frame(
  avgfood = AF1.seq,
  groupsize = G.avg,
  area = A.avg
)
# Copmute counterfactual mean divorce (mu)
mu.averagefood2 <- link(Weight_x_Food.GroupSize.Area, data = pred.data.area2)
mu.mean.averagefood2 <- apply(mu.averagefood2, 2, mean)
mu.PI.averagefood2 <- apply(mu.averagefood2, 2, PI)
# simulate counterfactual divorce rate
R.sim.averagefood2 <- sim(Weight_x_Food.GroupSize.Area, data = pred.data.area2, n = 1e4)
R.PI.averagefood2 <- apply(R.sim.averagefood2, 2, PI)

par(mfrow=c(1,2))
# Display predictions, hiding raw data with type = "n"
plot(weight ~ avgfood, data = foxes, type = "n")
mtext("Groupsize = 0")
#lines(A1.seq, mu.mean.area)
shade(mu.PI.averagefood, AF1.seq)
shade(R.PI.averagefood, AF1.seq)

# Display predictions, hiding raw data with type = "n"
plot(weight ~ avgfood, data = foxes, type = "n")
mtext("Area = 0, Groupsize = 0")
#lines(A2.seq, mu.avgarea)
shade(mu.PI.averagefood2, AF2.seq)
shade(R.PI.averagefood2, AF2.seq)

Average food does predict wieght in both models which control other variables, but there is a lot of uncertainty and the interval estimates are huge compared to other intervals plotted above.

In terms of picking just area or average food, it is hard to say. Here is a pairs-plot of the data, which might help figure out what the associations are between predictors in the dataset.

plot(foxes)

On its own, area does not visually seem to predict weight, and neither does groupsize. However, groupsize and area both strongly correlate with each other. Average food also does not seem to be associated with weight, but it does correlate strongly with the other predictors we controlled for: area and groupsize. I’m still not sure which is better, but here’s what I think: Neither area nor average food correlate strongly with weight. It is only when controlling for other factors that either then correlate. With average food, we are controlling for two other predictors which both correlate strongly with it, whereas with area we are only controlling for one. I think that the very large intervals associated with average food might be due to multicollinearity. Like the left and right legs example from the book, it might be that once we know about area and groupsize, average food does not tell us much which increases it’s possible posterior values. The small posterior intervals on group size might indicate that when controlling for area, groupsize is still an important predictor. Therefore, I would choose group size over average food if I could only use one.

b)

I think this only requires a brief answer. As mentioned above, average food and area are highly correlated. When two highly correlated predictors are included in the same model, multicollinearity becomes an issue. With multicollinearity, the individual estimates become smaller and more uncertain (wider intervals) compared to when they are used together. This can be seen on page 143 of Statistical Rethinking, for example.

# Area for: Weight ~ Area + Groupsize

# Prepare new counterfactual data.
G.avg <- mean(foxes$groupsize)
A2.seq <- seq(from = 0, to = 6, length.out = 30)
pred.data.avgarea <- data.frame(
  area = A2.seq,
  groupsize = G.avg
)
# Copmute counterfactual mean divorce (mu)
mu.avgarea <- link(Weight_x_Area.Groupsize, data = pred.data.avgarea)
mu.mean.avgarea <- apply(mu.avgarea, 2, mean)
mu.PI.avgarea <- apply(mu.avgarea, 2, PI)
# simulate counterfactual divorce rate
R.sim.avgarea <- sim(Weight_x_Area.Groupsize, data = pred.data.avgarea, n = 1e4)
R.PI.avgarea <- apply(R.sim.avgarea, 2, PI)


# Area for: Weight ~ Avg Food + Groupsize + Area

# Prepare new counterfactual data.
A3.seq <- seq(from = 0, to = 6, length.out = 30)
G3.avg <- mean(foxes$groupsize)
AF3.avg <- mean(foxes$avgfood)
pred.data.area3 <- data.frame(
  avgfood = AF3.avg,
  groupsize = G3.avg,
  area = A3.seq
)
# Copmute counterfactual mean divorce (mu)
mu.area3 <- link(Weight_x_Food.GroupSize.Area, data = pred.data.area3)
mu.mean.area3 <- apply(mu.area3, 2, mean)
mu.PI.area3 <- apply(mu.area3, 2, PI)
# simulate counterfactual divorce rate
R.sim.area3 <- sim(Weight_x_Food.GroupSize.Area, data = pred.data.area3, n = 1e4)
R.PI.area3 <- apply(R.sim.area3, 2, PI)



par(mfrow=c(1,2))

# Display predictions, hiding raw data with type = "n"
plot(weight ~ area, data = foxes, type = "n")
mtext("Groupsize = 0")
#lines(A2.seq, mu.avgarea)
shade(mu.PI.avgarea, A2.seq)
shade(R.PI.avgarea, A2.seq)

# Display predictions, hiding raw data with type = "n"
plot(weight ~ area, data = foxes, type = "n")
mtext("Groupsize = 0, Avg Food = 0")
#lines(A2.seq, mu.avgarea)
shade(mu.PI.area3, A3.seq)
shade(R.PI.area3, A3.seq)

It’s hard to see, but the slope for area reduces in steepness with the additional control (right graph). THis can be seen in the outputs below.

precis(Weight_x_Area.Groupsize)

##             mean         sd       5.5%      94.5%
## a      4.4503660 0.37084851  3.8576784  5.0430535
## bA     0.6178377 0.20009185  0.2980523  0.9376231
## bG    -0.4324057 0.12074263 -0.6253758 -0.2394357
## sigma  1.1184508 0.07342971  1.0010960  1.2358057

precis(Weight_x_Food.GroupSize.Area)

##             mean         sd         5.5%      94.5%
## a      4.0724910 0.42789824  3.388626986  4.7563551
## bF     2.4530256 1.43762049  0.155430409  4.7506208
## bG    -0.6034956 0.15583637 -0.852552216 -0.3544390
## bA     0.3897667 0.23845717  0.008666069  0.7708673
## sigma  1.1042838 0.07249348  0.988425215  1.2201424

Looking at the coeeficient for area, it’s steepness is reduced with additional control, and the interval is wider with additional control. The same occurs with average food.

6H6 (p. 207)

library(rethinking)
data("Howell1")
d <- Howell1
d$age <- (d$age - mean(d$age))/sd(d$age)
set.seed(1000)
i <- sample(1:nrow(d), size = nrow(d)/2)
d1 <- d[i, ]
d2 <- d[-i, ]

a.start <- mean(d$height)
sigma.start <- log(sd(d$height))

# Need to assume flat, or nearly flat prior on a and theta.
M6H6 <- map(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b1*age + b2*(age^2) + b3*(age^3) + b4*(age^4) + b5*(age^5) + b6*(age^6),
a ~ dnorm(150, 100) ,
b1 ~ dnorm(0 ,5) ,
b2 ~ dnorm(0 ,5) ,
b3 ~ dnorm(0 ,5) ,
b4 ~ dnorm(0 ,5) ,
b5 ~ dnorm(0 ,5) ,
b6 ~ dnorm(0 ,5) ,
sigma ~ dunif(0, 50)
) ,
data=d1, start = list(a = a.start, b1 = 0, b2 = 0, b3 = 0, b4 = 0, b5 = 0, b6 = 0, sigma = sigma.start))

precis(M6H6)

##              mean        sd        5.5%       94.5%
## a     155.4079341 0.8487858 154.0514104 156.7644578
## b1      6.1820710 1.7680811   3.3563359   9.0078060
## b2    -14.0228496 2.0914092 -17.3653255 -10.6803738
## b3     12.8188337 2.6845996   8.5283250  17.1093424
## b4     -4.9347570 1.1679793  -6.8014135  -3.0681005
## b5     -0.1235567 1.0265953  -1.7642543   1.5171408
## b6      0.3076826 0.3349884  -0.2276936   0.8430588
## sigma   7.8805084 0.3407644   7.3359011   8.4251157

Here is the model from the book fitted to the data in d1.

precis_plot(precis(M6H6))

And here are the parameter estimates plotted.

# m6
age.seq <- seq(from=-5,to=5,length.out=272)
d.predict <- list(
    height = rep(0,272), # empty outcome
    age = age.seq,
    age2 = age.seq^2,
    age3 = age.seq^3,
    age4 = age.seq^4,
    age5 = age.seq^5,
    age6 = age.seq^6)  

m6h <- link( M6H6 , data=d.predict )
mu <- apply( m6h , 2 , mean )
mu.PI <- apply( m6h , 2 , PI ,0.97)

# plot it all
plot( height ~ age , d1 , col=rangi2 )

predictions <- sim(M6H6, data=d.predict)
predictions.pi <- apply(predictions, 2, PI, .97)

shade(predictions.pi, age.seq)
lines(age.seq, mu)
shade(mu.PI, age.seq)

The implied predictions for the model fitted to the first 50% of the data (d1) are plotted here. It actually does seem to fit the data well, but this could be due to overfitting (although the priors were regularised).

Now the question asks us to compute the out-of-sample deviance using the other 50% of the data (d2). Below I fit the d2 data to the model parameters used to fit d1.

M6H6_OOS <- map(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + M6H6@coef[2]*age + M6H6@coef[3]*(age^2) + M6H6@coef[4]*(age^3) + M6H6@coef[5]*(age^4) + M6H6@coef[6]*(age^5) + M6H6@coef[7]*(age^6)
) ,
data=d2, start = list(a = a.start, sigma = sigma.start))

d1_Deviance <- (-2) * logLik(M6H6)
d2_Deviance <- (-2) * logLik(M6H6_OOS)

cbind(d1_Deviance, d2_Deviance)

##      d1_Deviance d2_Deviance
## [1,]    1894.926    4785.252

d1_deviance refers to the deviance in the model fit to d1, and d2 deviance refers to the deviance in the model where data are fit around the fixed d1 model parameters. Obviously, we have higher deviance in the d2_deviance calculation since the model was not fit to this part of the dataset.

I decided to see how the regression plot would look over the data for d1 and d2 seperately.

# m6
age.seq <- seq(from=-5,to=5,length.out=272)
d.predict <- list(
    height = rep(0,272), # empty outcome
    age = age.seq,
    age2 = age.seq^2,
    age3 = age.seq^3,
    age4 = age.seq^4,
    age5 = age.seq^5,
    age6 = age.seq^6)  

m6h <- link( M6H6 , data=d.predict )
mu <- apply( m6h , 2 , mean )
mu.PI <- apply( m6h , 2 , PI ,0.97)

par(mfrow = c(1,2))
# plot it all
plot( height ~ age , d1 , col=rangi2, xlim=c(xmin = -1.5, xmax = 2.5),  ylim=c(xmin = 55, xmax = 185))

predictions <- sim(M6H6, data=d.predict)
predictions.pi <- apply(predictions, 2, PI, .97)

shade(predictions.pi, age.seq)
lines(age.seq, mu)
shade(mu.PI, age.seq)
mtext("d1: In Sample Predictions")

# plot it all
plot(height ~ age , d2 , col=rangi2, xlim=c(xmin = -1.5, xmax = 2.5),  ylim=c(xmin = 55, xmax = 185))

shade(predictions.pi, age.seq)
lines(age.seq, mu)
shade(mu.PI, age.seq)
mtext("d2: Out of Sample Predictions")

They actually both look pretty nice. The plot on the right is the model plotted over the data it was fit to and the plot on the right is the same model superimposed over the d2 data (which it was not fit to).

The question asks me to compare this model to the ones from earlier, so first I’m just going to run all the models I fit from homework.

library(rethinking)
data("Howell1")
d <- Howell1
d$age <- (d$age - mean(d$age))/sd(d$age)
set.seed(1000)
i <- sample(1:nrow(d), size = nrow(d)/2)
d1 <- d[i, ]
d2 <- d[-i, ]

a.start <- mean(d1$height)
sigma.start <- log(sd(d1$height))

m1 <- map(
  alist(
    height ~ dnorm(mu, exp(log.sigma)),
    mu <- a + B * age
    ),
  data = d1, start = list(a = a.start, B = 0, log.sigma = sigma.start))

m2 <- map(
  alist(
    height ~ dnorm(mu, exp(log.sigma)),
    mu <- a + B * age + B2 * (age^2)
    ),
  data = d1, start = list(a = a.start, B = 0, B2 = 0, log.sigma = sigma.start))

m3 <- map(
  alist(
    height ~ dnorm(mu, exp(log.sigma)),
    mu <- a + B * age + B2 * (age^2) + B3 * (age^3)
    ),
  data = d1, start = list(a = a.start, B = 0, B2 = 0, B3 = 0, log.sigma = sigma.start))

m4 <- map(
  alist(
    height ~ dnorm(mu, exp(log.sigma)),
    mu <- a + B * age + B2 * (age^2) + B3 * (age^3) + B4 * (age^4)
    ),
  data = d1, start = list(a = a.start, B = 0, B2 = 0, B3 = 0, B4 = 0, log.sigma = sigma.start))

m5 <- map(
  alist(
    height ~ dnorm(mu, exp(log.sigma)),
    mu <- a + B * age + B2 * (age^2) + B3 * (age^3) + B4 * (age^4) + B5 * (age^5)
    ),
  data = d1, start = list(a = a.start, B = 0, B2 = 0, B3 = 0, B4 = 0, B5 = 0, log.sigma = sigma.start))

m6 <- map(
  alist(
    height ~ dnorm(mu, exp(log.sigma)),
    mu <- a + B * age + B2 * (age^2) + B3 * (age^3) + B4 * (age^4) + B5 * (age^5) + B6 * (age^6)
    ),
  data = d1, start = list(a = a.start, B = 0, B2 = 0, B3 = 0, B4 = 0, B5 = 0, B6 = 0, log.sigma = sigma.start))

compare(m1,m2,m3,m4,m5,m6)

##        WAIC       SE      dWAIC       dSE    pWAIC        weight
## m6 1907.940 26.72866   0.000000        NA 7.792796  4.119777e-01
## m4 1907.967 26.88884   0.026680  3.658191 6.008738  4.065184e-01
## m5 1909.580 27.12894   1.639392  3.669776 6.848590  1.815032e-01
## m3 1934.468 24.54594  26.528008 11.577907 5.366777  7.151380e-07
## m2 2122.456 25.59802 214.515801 28.471578 5.984609  1.079840e-47
## m1 2393.109 21.69836 485.169111 30.876995 3.350710 1.827005e-106

It looks like model 6 is has the highest weighting. Now to compare it with the model from this question.

compare(m6, M6H6)

##          WAIC       SE    dWAIC      dSE    pWAIC    weight
## m6   1907.416 26.58093 0.000000       NA 7.544135 0.7783666
## M6H6 1909.929 27.61250 2.512346 3.787761 6.987681 0.2216334

The model I used for this question, with the regularised priors, does not seem to fit as well as the M6 model. Pehaps the data really do fit a 6 parameter model without regularisation better than the current one.

7H4, (p. 238)

library(rethinking)
data("nettle")
n <- nettle
n$lang.per.cap <- n$num.lang/n$k.pop
n$log.lang.per.cap <- log(n$lang.per.cap)
n$log.area <- log(n$area)

a)

Hypothesis: Language diversity is positively associated with average length of growing season. Consider log-area as a covariate. oO address this hypothesis, I have two regression models: Both regress language diversity as a function of mean growing season, but only the second model adds log-area as a covariate.

m7h4.1 <- map(
  alist(
    log.lang.per.cap ~ dnorm(mu, sigma),
    mu <- a + bMGS*mean.growing.season + sigma,
    a ~ dnorm(0, 10),
    bMGS ~ dnorm(0, 5),
    sigma ~ dunif(0, 100)
  ),
  data = n
)

m7h4.2 <- map(
  alist(
    log.lang.per.cap ~ dnorm(mu, sigma),
    mu <- a + bMGS*mean.growing.season + bLA*log.area + sigma,
    a ~ dnorm(0, 10),
    bMGS ~ dnorm(0, 5),
    bLA ~ dnorm(12, 5),
    sigma ~ dunif(0, 100)
  ),
  data = n
)

par(mfrow = c(1,2))
precis_plot(precis(m7h4.1))
precis_plot(precis(m7h4.2))

Here, the intercept gains a lot more uncertainty when the model controls for log-area, but the coefficient estimate of the predictor “mean growth season” does not change. It still positively predicts language diversity.

I decided to plot the posteriors as well.

# Make posteriors
post.m7h4.1 <- extract.samples(m7h4.1, n = 300000)
post.m7h4.2 <- extract.samples(m7h4.2, n = 300000)

post.m7h4.1_alpha <- post.m7h4.1$a
post.m7h4.2_alpha <- post.m7h4.2$a

post.m7h4.1_bMGS <- post.m7h4.1$bMGS
post.m7h4.2_bMGS <- post.m7h4.2$bMGS

post.m7h4.2_bLA <- post.m7h4.2$bLA

post.m7h4.1_sigma <- post.m7h4.1$sigma
post.m7h4.2_sigma <- post.m7h4.2$sigma

# Plot posteriors
par(mfrow=c(2,2))

dens(post.m7h4.1_alpha, add = F, xlim=c(xmin = -10, xmax = 2))
dens(post.m7h4.2_alpha, add = T, col = "blue")
mtext("Intercept")
legend(-4.25, .9, legend=c("bMGS", "bMGS + bLA"),
       col=c("black", "blue"), lty=1:2, cex=0.8)

dens(post.m7h4.1_bMGS, add = F, xlim=c(xmin = -1, xmax = 1))
dens(post.m7h4.2_bMGS, add = T, col = "blue")
mtext("bMGS")
legend(-1, 7, legend=c("bMGS", "bMGS + bLA"),
       col=c("black", "blue"), lty=1:2, cex=0.8)

dens(post.m7h4.2_bLA, add = F,col = "blue", xlim=c(xmin = -1.5, xmax = .5))
mtext("bLA")
legend(-1.5, 3, legend=c("bMGS + bLA"),
       col=c("blue"), lty=1:2, cex=0.8)


dens(post.m7h4.1_sigma, add = F, xlim=c(xmin = 0.5, xmax = 2))
dens(post.m7h4.2_sigma, add = T, col = "blue")
mtext("Sigma")
legend(0.5, 3, legend=c("bMGS", "bMGS + bLA"),
       col=c("black", "blue"), lty=1:2, cex=0.8)

While controlling for log-area has a substantial impact on the distribution of the intercept, it has no noticable impact on the position of the posterior of the predictor, nor sigma.

# Plot all
par(mfrow=c(1,3))

# plot model 1
mgs.seq1 <- seq(from=0,to=100,length.out=30)
mu1 <- link(m7h4.1, data = list(mean.growing.season = mgs.seq1))
mu.mean1 <- apply(mu1, 2, mean)
mu.PI1 <- apply(mu1, 2, PI)
simu1 <- sim(m7h4.1, data = list(mean.growing.season = mgs.seq1))
simu.mean1 <- apply(simu1, 2, mean)
simu.PI1 <- apply(simu1, 2, PI)
plot(log.lang.per.cap ~ mean.growing.season, data = n, pch = 16, col = 'blue')
lines(mgs.seq1, mu.mean1)
shade(mu.PI1, mgs.seq1)
shade(simu.PI1, mgs.seq1)
mtext("growing season predictor only")

# Main effect: mean growth seasons AFTER controlling for log area.
a.avg2 <- mean(n$log.area)
mgs.seq2 <- seq(from=0,to=100,length.out=30)
pred.data <- data.frame(
  mean.growing.season = mgs.seq2,
  log.area = a.avg2
)
mu2 <- link(m7h4.2, data = pred.data)
mu.mean2 <- apply(mu2, 2, mean)
mu.PI2 <- apply(mu2, 2, PI)
R.sim2 <- sim(m7h4.2, data = pred.data, n = 1e4)
R.PI2 <- apply(R.sim2, 2, PI)
plot(log.lang.per.cap ~ mean.growing.season, data = n, pch = 16, col = 'blue')
lines(mgs.seq2, mu.mean2)
shade(mu.PI2, mgs.seq2)
shade(R.PI2, mgs.seq2)
mtext("log area = 0")

# Main effect: log area AFTER controlling for mean growth seasons.
mgs.avg <- mean(n$mean.growing.season)
a.seq <- seq(from=0,to=100,length.out=30)
pred.data <- data.frame(
  log.area = a.seq,
  mean.growing.season = mgs.avg
)
mu <- link(m7h4.2, data = pred.data)
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
R.sim <- sim(m7h4.2, data = pred.data, n = 1e4)
R.PI <- apply(R.sim, 2, PI)
plot(log.lang.per.cap ~ log.area, data = n, pch = 16, col = 'blue')
lines(a.seq, mu.mean)
shade(mu.PI, a.seq)
shade(R.PI, a.seq)
mtext("mean growth seasons = 0")

This counterfactual plots also demonstrate this result. Here, the slope for mean growing season does not change meaningfully if we control for log area.

Now to interpret the results.

precis(m7h4.1)

##             mean         sd       5.5%     94.5%
## a     -8.0765675 0.41909622 -8.7463642 -7.406771
## bMGS   0.1723822 0.05239153  0.0886504  0.256114
## sigma  1.4083057 0.11562285  1.2235180  1.593093

When log-area is not controlled for, for every one unit increase of mean growing season, there is a 0.17 increase in log-languages per capita.

precis(m7h4.2)

##             mean         sd        5.5%        94.5%
## a     -5.1772696 1.92811064 -8.25876281 -2.095776411
## bMGS   0.1421535 0.05532824  0.05372825  0.230578666
## bLA   -0.2062335 0.13491466 -0.42185319  0.009386167
## sigma  1.3886572 0.11406126  1.20636524  1.570949100

Even when we control for log-area, the relationship between the predictor mean growing season and log-language per capita remains relatively the same. Additionally, there is a negative relationship between log-area and log-languages per capita such that for every 1-unit increase in log-area, there is a -0.21 decrease in log-languages per capita. The 89% interval does overlap with zero, but only barely, and I don’t like dicotimizing interpretations like that, so I would say the coeeficient is a meaningful predictor.

compare(m7h4.1, m7h4.2)

##            WAIC       SE     dWAIC      dSE    pWAIC    weight
## m7h4.1 268.0272 15.24899 0.0000000       NA 3.733474 0.5732988
## m7h4.2 268.6178 15.94141 0.5906458 3.856145 5.160413 0.4267012

However, it appears that the model that does not control for log area is weighted higher than the model that does control for it. I imagine this is because the 2nd model causes huge uncertainty in the intercept estimate. However, the weighting has a 60/40 split, so it is not as if the 1st model is superior.

To relate this back to the hypothesis, there seem to be more languages in regions with longer growing seasons. Longer growing seasons means greater food security. Therefore, I assume increasing language diversity is a proxy for a decreased need to form social networks with others. If you don’t have a long growing season, you may rely on others, which means it is adaptive to all speak the same language since communities likely form. Log-area having a negative association with language diversity per capita also makes sense, since the number of languages spoken moves further from zero as log-area increases. This is likely because larger areas naturally divide communities of people, giving them time to develop and evolve their own language.

b)

Hypothesis: Language diversity is negatively associated with the standard deviation of the length of the growing season. First I will estimate two models: one models log-language per capita as a function of the standard deviation of the growing season, and the other controls for log-area in addition to this.

m7h4.3 <- map(
  alist(
    log.lang.per.cap ~ dnorm(mu, sigma),
    mu <- a + bGS_sd*sd.growing.season + sigma,
    a ~ dnorm(0, 10),
    bGS_sd ~ dnorm(0, 5),
    sigma ~ dunif(0, 100)
  ),
  data = n
)

m7h4.4 <- map(
  alist(
    log.lang.per.cap ~ dnorm(mu, sigma),
    mu <- a + bGS_sd*sd.growing.season + bLA*log.area + sigma,
    a ~ dnorm(0, 10),
    bGS_sd ~ dnorm(0, 5),
    bLA ~ dnorm(0, 5),
    sigma ~ dunif(0, 100)
  ),
  data = n
)

par(mfrow = c(1,2))
precis_plot(precis(m7h4.3))
precis_plot(precis(m7h4.4))

Here again, the intercept gains a lot more uncertainty when the model controls for log-area, but the coefficient estimate of the predictor “standard deviation of mean growth season” does not really change, even if it slightly overlaps with zero when log-area is controlled for (which doesn’t really matter). I’m also a bit unclear as to why the intercept becomes so much more uncertain when log area is controlled for. I’m thinking this might be because log-area has a negative impact on language diversity, the opposite of the positive impact growing season has. So maybe controlling for something that has the opposite effect of your predictor introduces a lot of uncertainty. That makes sense I think.

Now I will plot the posteriors.

# Make posteriors
post.m7h4.3 <- extract.samples(m7h4.3, n = 300000)
post.m7h4.4 <- extract.samples(m7h4.4, n = 300000)

post.m7h4.3_alpha <- post.m7h4.3$a
post.m7h4.4_alpha <- post.m7h4.4$a

post.m7h4.3_bGS_sd <- post.m7h4.3$bGS_sd
post.m7h4.4_bGS_sd <- post.m7h4.4$bGS_sd

post.m7h4.4_bLA <- post.m7h4.4$bLA

post.m7h4.3_sigma <- post.m7h4.3$sigma
post.m7h4.4_sigma <- post.m7h4.4$sigma

# Plot posteriors
par(mfrow=c(2,2))

dens(post.m7h4.3_alpha, add = F, xlim=c(xmin = -10, xmax = 2))
dens(post.m7h4.4_alpha, add = T, col = "blue")
mtext("Intercept")
legend(-4.25, .9, legend=c("bGS_sd", "bGS_sd + bLA"),
       col=c("black", "blue"), lty=1:2, cex=0.8)

dens(post.m7h4.3_bGS_sd, add = F, xlim=c(xmin = -1, xmax = 1))
dens(post.m7h4.4_bGS_sd, add = T, col = "blue")
mtext("bGS_sd")
legend(0.1, 2.5, legend=c("bGS_sd", "bGS_sd + bLA"),
       col=c("black", "blue"), lty=1:2, cex=0.8)

dens(post.m7h4.4_bLA, add = F,col = "blue", xlim=c(xmin = -1.5, xmax = .5))
mtext("bLA")
legend(-1.5, 2.5, legend=c("bGS_sd + bLA"),
       col=c("blue"), lty=1:2, cex=0.8)


dens(post.m7h4.3_sigma, add = F, xlim=c(xmin = 0.5, xmax = 2))
dens(post.m7h4.4_sigma, add = T, col = "blue")
mtext("Sigma")
legend(0.5, 3, legend=c("bGS_sd", "bGS_sd + bLA"),
       col=c("black", "blue"), lty=1:2, cex=0.8)

These look similar to the plot in question a, although controlling for log-area shifts the coefficient estimate for the standard deviation of growing season closer to zero.

# Plot all
par(mfrow=c(1,3))

# plot model 1
gs_sd.seq <- seq(from=0,to=100,length.out=30)
mu <- link(m7h4.3, data = list(sd.growing.season = gs_sd.seq))
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
simu <- sim(m7h4.3, data = list(sd.growing.season = gs_sd.seq))
simu.mean <- apply(simu, 2, mean)
simu.PI <- apply(simu, 2, PI)
plot(log.lang.per.cap ~ sd.growing.season, data = n, pch = 16, col = 'blue')
lines(gs_sd.seq, mu.mean)
shade(mu.PI, gs_sd.seq)
shade(simu.PI, gs_sd.seq)
mtext("growing season sd predictor only")

# Main effect: mean growth seasons sd AFTER controlling for log area.
a.avg <- mean(n$log.area)
gs_sd.seq <- seq(from=-2,to=10,length.out=30)
pred.data <- data.frame(
  sd.growing.season = gs_sd.seq,
  log.area = a.avg
)
mu <- link(m7h4.4, data = pred.data)
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
R.sim <- sim(m7h4.4, data = pred.data, n = 1e4)
R.PI <- apply(R.sim, 2, PI)
plot(log.lang.per.cap ~ sd.growing.season, data = n, pch = 16, col = 'blue')
lines(gs_sd.seq, mu.mean)
shade(mu.PI, gs_sd.seq)
shade(R.PI, gs_sd.seq)
mtext("log area = 0")

# Main effect: log area AFTER controlling for growth seasons sd.
gs_sd.avg <- mean(n$sd.growing.season)
a.seq <- seq(from=0,to=100,length.out=30)
pred.data <- data.frame(
  log.area = a.seq,
  sd.growing.season = gs_sd.avg
)
mu <- link(m7h4.4, data = pred.data)
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
R.sim <- sim(m7h4.4, data = pred.data, n = 1e4)
R.PI <- apply(R.sim, 2, PI)
plot(log.lang.per.cap ~ log.area, data = n, pch = 16, col = 'blue')
lines(a.seq, mu.mean)
shade(mu.PI, a.seq)
shade(R.PI, a.seq)
mtext("growth seasons sd = 0")

The slope can also be shown to reduce in steepness when log-area is accounted for.

Now to interpret the results.

precis(m7h4.3)

##              mean        sd       5.5%     94.5%
## a      -6.2968122 0.3419898 -6.8433780 -5.750247
## bGS_sd -0.3638317 0.1601732 -0.6198194 -0.107844
## sigma   1.4602381 0.1199109  1.2685974  1.651879

When log-area is not controlled for, for every one unit decrease in the standard deviation of the growing season, there is a -0.36 decrease in log-languages per capita.

precis(m7h4.4)

##              mean        sd       5.5%        94.5%
## a      -3.3237495 1.8474768 -6.2763743 -0.371124659
## bGS_sd -0.2044317 0.1857509 -0.5012975  0.092434132
## bLA    -0.2491111 0.1534725 -0.4943898 -0.003832353
## sigma   1.4377788 0.1180974  1.2490363  1.626521247

This relationship persists, even when controlling for log-area.

compare(m7h4.3, m7h4.4)

##            WAIC       SE      dWAIC      dSE    pWAIC    weight
## m7h4.3 273.7348 17.17599 0.00000000       NA 4.057636 0.5040035
## m7h4.4 273.7668 16.86402 0.03202901 3.848152 5.309418 0.4959965

Again, the log-area control model does not seem to perform as well compared to the model where only the standard deviation of the growing season is a predictor. However, like before, one model does not neccessarily outperform the other.

c)

The final question asks us to evaluate the interaction between mean growing season and the standard deviation of the growing season

m7h4c.1 <- map(
  alist(
    log.lang.per.cap ~ dnorm(mu, sigma),
    mu <- a +bMGS*mean.growing.season + bGS_sd*sd.growing.season + bMGSbGS_sd*mean.growing.season*sd.growing.season + sigma,
    a ~ dnorm(0, 10),
    bMGS ~ dnorm(0, 5),
    bGS_sd ~ dnorm(0, 5),
    bMGSbGS_sd ~ dnorm(0, 5),
    sigma ~ dunif(0, 100)
  ),
  data = n
)

precis_plot(precis(m7h4c.1))

Now, both mean growing season and the standard deviation of the growing season positively predict language diversity per capita. The interaction is negative. I’ll look at this in more detail now.

First, I will plot the posteriors.

# Make posteriors
post.m7h4c.1 <- extract.samples(m7h4c.1, n = 300000)

post.m7h4c.1_alpha <- post.m7h4c.1$a

post.m7h4c.1_bMGS <- post.m7h4c.1$bMGS

post.m7h4c.1_bGS_sd <- post.m7h4c.1$bGS_sd

post.m7h4c.1_bMGSbGS_sd <- post.m7h4c.1$bMGSbGS_sd

post.m7h4c.1_sigma <- post.m7h4c.1$sigma

# Plot posteriors
par(mfrow=c(3,2))

dens(post.m7h4c.1_alpha,xlim=c(xmin = -11, xmax =-5))
mtext("Intercept")

dens(post.m7h4c.1_bMGS, xlim=c(xmin = 0, xmax = .6))
mtext("bMGS")

dens(post.m7h4c.1_bGS_sd, xlim=c(xmin = -1, xmax = 2))
mtext("bGS_sd")

dens(post.m7h4c.1_bMGSbGS_sd, xlim=c(xmin = -0.3, xmax = .1))
mtext("bMGSbGS_sd")

dens(post.m7h4c.1_sigma, add = F, xlim=c(xmin = 0.8, xmax = 1.8))
mtext("Sigma")

I found a function called “cut” when trying to figure out how to partition continuous data for the triptych plot, although I am unsure I have seperated the data correctly. As far as I can tell, it seperates the data based on the fractions you provide it. So I separated it into 0, 1/3, 2/3, and 1, such that each regression is performed on only the values of each predictor in the bottom 3rd, middle 3rd, and top 3rd. Then when I plot the data by copying R code 7.28 on page 233, I end up with 3 panels, each superimposed over the data, but each only displaying a regression on part of the data. That way we can see how the regression changed from low to high predictor values.

# Discretize variables into groups
n$mean.growing.season_cut <- cut(
  n$mean.growing.season, 
  breaks = quantile(n$mean.growing.season, probs = c(0, 1/3, 2/3, 1)),
  include.lowest = TRUE, 
  dig.lab = 2
)
n$sd.growing.season_cut <- cut(
  n$sd.growing.season, 
  breaks = quantile(n$sd.growing.season, probs = c(0, 1/3, 2/3, 1)),
  include.lowest = TRUE, 
  dig.lab = 2
)

par(mfrow = c(2, 3))
# Plot first row as mean against SD
mgs.seq <- seq(from = 0, to = 12, length.out = 50)
for (i in levels(n$sd.growing.season_cut)) {
  nt <- n[n$sd.growing.season_cut == i, ]
  plot(log.lang.per.cap ~ mean.growing.season, data = n, col = rangi2, xlim = c(1.5, 11.5), ylim = c(-10, 0),
    main = paste("Standard Deviation Of Growing Season = ", i), xlab = "Mean Growing Season")
  mu <- link(m7h4c.1, data = data.frame(mean.growing.season = mgs.seq, sd.growing.season = mean(nt$sd.growing.season)),
    refresh = 0)
  mu.mean <- apply(mu, 2, mean)
  mu.PI <- apply(mu, 2, PI, prob = 0.97)
  lines(mgs.seq, mu.mean)
  lines(mgs.seq, mu.PI[1, ], lty = 2)
  lines(mgs.seq, mu.PI[2, ], lty = 2)
}

# Plot second row as SD against mean
gs_sd.seq <- seq(from = 0, to = 8, length.out = 50)
for (i in levels(n$mean.growing.season_cut)) {
  ntt <- n[n$mean.growing.season_cut == i, ]
  plot(log.lang.per.cap ~ sd.growing.season, data = n, col = rangi2, xlim = c(0.5, 7.5), ylim = c(-10, 0),
    main = paste("Mean Growing Season = ", i), xlab = "Standard Deviation Of Growing Season")
  mu <- link(m7h4c.1, data = data.frame(sd.growing.season = gs_sd.seq, mean.growing.season = mean(ntt$mean.growing.season)),
    refresh = 0)
  mu.mean <- apply(mu, 2, mean)
  mu.PI <- apply(mu, 2, PI, prob = 0.97)
  lines(gs_sd.seq, mu.mean)
  lines(gs_sd.seq, mu.PI[1, ], lty = 2)
  lines(gs_sd.seq, mu.PI[2, ], lty = 2)
}

The slope of the line associated with mean growing season predicting language is positive for low values of the standard deviation of the growing season, but becomes much more flat as the standard deviation of the growing season increases. Conversely, the slope of the line associated with the standard deviation of the growing season predicting language is relatively flat for when the mean growing season is short, and then becomes more negative as the mean growing season increases.

10H4(p. 330)

library(rethinking)
data("salamanders")
s <- salamanders

# Site = Plot of land where salamanders were counted
# Salman = Number of salamanders counted in each plot.
# PCTCOVER = % of ground cover
# FORESTAGE = Age of trees in plot

a)

Here I ran a quadratic approximation and MCMC model with salamander count as a function of percent coverage. The question asks for weakly informative priors to be used, which is what I’ve included.

# Quadratic approximation: percent coverage predicting salamander count.
m10h4_a_QUAD <- map(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER,
    a ~ dnorm(0,15),
    bC ~ dnorm(0,10)
  ),
  data=s)

# MCMC: percent coverage predicting salamander count.
m10h4_a_MCMC <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER,
    a ~ dnorm(0,10), # weakly informative prior on intercept
    bC ~ dnorm(0,10) # weakly informative prior on beta
  ),
  data=s , chains=4, cores=4 , iter=10000 , warmup=1000 )

# Plot Quadratic Approximation and MCMC estimates.
precis(m10h4_a_QUAD)

##           mean          sd        5.5%       94.5%
## a  -1.45202067 0.450341023 -2.17175261 -0.73228874
## bC  0.03206237 0.005324519  0.02355276  0.04057198

precis(m10h4_a_MCMC)

##          mean         sd        5.5%       94.5%    n_eff    Rhat4
## a  -1.5586611 0.45807557 -2.31607933 -0.86034594 6521.007 1.000411
## bC  0.0332238 0.00540622  0.02486183  0.04213632 6553.472 1.000330

The intercepts seem similar between the two models, and the inteval estimates for the intercept and beta coefficient are nearly identical in length.

# Plot Quadratic Approximation and MCMC estimates.
plot(coeftab(m10h4_a_QUAD, m10h4_a_MCMC))

The quadratic approximation and MCMC models appear similar in terms of the interval lengths, although the point estimate of the intercept is slightly different between them. Additionally, the intevals seem similar for the beta coefficient, but they are very small, so it is difficult to see the difference from this plot alone. So I’m going to plot the distributions themselves.

# Make posteriors by extrcting samples from the two model.
post.Quad <- extract.samples(m10h4_a_QUAD, n = 300000)
post.MCMC <- extract.samples(m10h4_a_MCMC, n = 300000)

# Create intercept posteriors.
post.Quad_alpha <- post.Quad$a
post.MCMC_alpha <- post.MCMC$a

# Create percent coverage coefficient posteriors.
post.Quad_bC <- post.Quad$bC
post.MCMC_bC <- post.MCMC$bC

# Plot posteriors
par(mfrow=c(1,2))

# Graph intercept posteriors.
dens(post.Quad_alpha, add = F)
dens(post.MCMC_alpha, add = T, col = "blue")
mtext("Intercept")

# Graph percent coverage coefficient posteriors.
dens(post.Quad_bC, add = F)
dens(post.MCMC_bC, add = T, col = "blue")
mtext("bC")

This makes it easier to assess the differences between the estimates from quadratic approximation and MCMC. The MCMC distributions seem extremely slightly shifted from the quadratic approximation distributions, but it appears the posteriors are gaussian in the MCMC case (they should always be gaussian in the quadratic case). Even though they are similar, I personally prefer to use MCMC since I don’t like the gaussian assumption of quadratic approximation. So I will use the MCMC posteriors for the next part.

Before I do that, I’m curious as to which model might have more weighting. So I’m trying a model comparison, even though the models were fit with different functions.

# Compare models.
compare(m10h4_a_QUAD, m10h4_a_MCMC)

##                  WAIC       SE     dWAIC       dSE    pWAIC    weight
## m10h4_a_MCMC 213.3117 26.45514 0.0000000        NA 4.715254 0.5331643
## m10h4_a_QUAD 213.5774 26.12382 0.2657049 0.7332848 4.928047 0.4668357

As expected, I get an error message indicating the models were fit with different algorithms. They have equal weighting, but I’m going with MCMC.

The next part of the question asks us to plot the expected counts and their intervals against percent coverage.

# plot model predictions
pctcover.seq <- seq(from=0,to=100,length.out=30)
lambda <- link(m10h4_a_MCMC, data = list(PCTCOVER = pctcover.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

lambda.mean <- apply(lambda, 2, mean)
lambda.PI <- apply(lambda, 2, PI)
counts <- sim(m10h4_a_MCMC, data = list(PCTCOVER = pctcover.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

counts.mean <- apply(counts, 2, mean)
counts.PI <- apply(counts, 2, PI)
plot(SALAMAN ~ PCTCOVER, data = s, pch = 16, col = 'blue')
lines(pctcover.seq, lambda.mean)
shade(lambda.PI, pctcover.seq)
shade(counts.PI, pctcover.seq)

This model does a good job predicting salmon count when the percent of ground covered is low. However, the model seems to underpredict salamander count when the percent of ground covered is very high. There is however a lot of variability in salamander count at the higher percent coverage sites, so it might just be difficult to account for this.

b)

First, I use MCMC to generate models with: main effect of percent coverage, main effect of forest age, main effects of both percent coverage and forest age, and main effects of both percent coverage and forest age with their interaction.

# MCMC: Main effect of percent coverage.
m10h4_b_MCMC.PCTCOVER <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER,
    a ~ dnorm(0,15),
    bC ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

# MCMC: Main effect of forest age.
m10h4_b_MCMC.FORESTAGE <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bF*FORESTAGE,
    a ~ dnorm(0,10),
    bF ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

# MCMC: Main effects of percent coverage and forest age.
m10h4_b_MCMC.PCTCOVER_FORESTAGE <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER + bF*FORESTAGE,
    a ~ dnorm(0,15),
    bC ~ dnorm(0,10),
    bF ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

# MCMC: Main effects of percent coverage and forest age + their interaction.
m10h4_b_MCMC.PCTCOVERFORESTAGE <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER + bF*FORESTAGE + bCF*PCTCOVER*FORESTAGE,
    a ~ dnorm(0,10),
    bC ~ dnorm(0,10),
    bF ~ dnorm(0,10),
    bCF ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

# Plot MCMC coefficient comparison of the 4 models.
plot(coeftab(m10h4_b_MCMC.PCTCOVER, m10h4_b_MCMC.FORESTAGE, m10h4_b_MCMC.PCTCOVER_FORESTAGE, m10h4_b_MCMC.PCTCOVERFORESTAGE ))

Interestingly, the intercept intervals become widest with the interaction, and the model with only forest age as a predictor has a positive intercept. There must be something going on with the forest age estimate when percent coverage is not accounted for. I’m going to look at the posteriors closer up.

# Extract samples for posterior.
post.PCTCOVER <- extract.samples(m10h4_b_MCMC.PCTCOVER, n = 300000)
post.FORESTAGE <- extract.samples(m10h4_b_MCMC.FORESTAGE, n = 300000)
post.PCTCOVER_FORESTAGE <- extract.samples(m10h4_b_MCMC.PCTCOVER_FORESTAGE, n = 300000)
post.PCTCOVERFORESTAGE <- extract.samples(m10h4_b_MCMC.PCTCOVERFORESTAGE, n = 300000)

post.PCTCOVER_a <- post.PCTCOVER$a
post.FORESTAGE_a <- post.FORESTAGE$a
post.PCTCOVER_FORESTAGE_a <- post.PCTCOVER_FORESTAGE$a
post.PCTCOVERFORESTAGE_a <- post.PCTCOVERFORESTAGE$a

post.PCTCOVER_bC <- post.PCTCOVER$bC
post.PCTCOVER_FORESTAGE_bC <- post.PCTCOVER_FORESTAGE$bC
post.PCTCOVERFORESTAGE_bC <- post.PCTCOVERFORESTAGE$bC

post.FORESTAGE_bF <- post.FORESTAGE$bF
post.PCTCOVER_FORESTAGE_bF <- post.PCTCOVER_FORESTAGE$bF
post.PCTCOVERFORESTAGE_bF <- post.PCTCOVERFORESTAGE$bF

post.PCTCOVERFORESTAGE <- post.PCTCOVERFORESTAGE$bCF

# Set plotting layout.
par(mfrow=c(2,2))

# Plot posteriors.
dens(post.PCTCOVER_a, add = F, xlim=c(xmin = -4, xmax = 3), ylim = c(ymin = 0, ymax = 5))
dens(post.FORESTAGE_a, add = T, col = "blue")
dens(post.PCTCOVER_FORESTAGE_a, add = T, col = "red")
dens(post.PCTCOVERFORESTAGE_a, add = T, col = "green")
mtext("Intercept")

dens(post.PCTCOVER_bC, add = F, xlim=c(xmin = -.03, xmax = .08), ylim = c(ymin = 0, ymax = 90))
dens(post.PCTCOVER_FORESTAGE_bC, add = T, col = "red")
dens(post.PCTCOVERFORESTAGE_bC, add = T, col = "green")
mtext("bCOVERAGE")

dens(post.FORESTAGE_bF, add = F, col = "blue", xlim=c(xmin = -.03, xmax = .17), ylim = c(ymin = 0, ymax = 160))
dens(post.PCTCOVER_FORESTAGE_bF, add = T, col = "red")
dens(post.PCTCOVERFORESTAGE_bF, add = T, col = "green")
mtext("bFORESTAGE")

dens(post.PCTCOVERFORESTAGE, add = F, col = "green", xlim=c(xmin = -.0003, xmax = .0004))
mtext("bCOVERAGEFORESTAGE")

Black = Main effect of coverage model. Blue = Main effect of forest age model. Red = Main effect of both coverage and forest age model. Green = Interaction model.

The model with a main effect of percent coverage (black), produces posterior estimates that appear gaussian for all predictors. The model with only a main effect of forest age (Blue) produces a weird posterior on the intercept, as does the interaction model (Green). There is something weird going on when we estimate forest age, but don’t account for percent coverage.

I’m plotting counterfactuals to see how the estimates look.

# Set plotting layout.
par(mfrow=c(2,2))

# Main effect: Percent Coverage
pctcover.seq <- seq(from=0,to=100,length.out=30)
lambda <- link(m10h4_b_MCMC.PCTCOVER, data = list(PCTCOVER = pctcover.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

lambda.mean <- apply(lambda, 2, mean)
lambda.PI <- apply(lambda, 2, PI)
count <- sim(m10h4_b_MCMC.PCTCOVER, data = list(PCTCOVER = pctcover.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

counts.mean <- apply(counts, 2, mean)
counts.PI <- apply(counts, 2, PI)
plot(SALAMAN ~ PCTCOVER, data = s, pch = 16, col = 'blue')
lines(pctcover.seq, lambda.mean)
shade(lambda.PI, pctcover.seq)
shade(counts.PI, pctcover.seq)
mtext("Percent cover only")

# Main effect: Forest Age
forestage.seq <- seq(from=0,to=800,length.out=30)
lambda <- link(m10h4_b_MCMC.FORESTAGE, data = list(FORESTAGE = forestage.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

lambda.mean <- apply(lambda, 2, mean)
lambda.PI <- apply(lambda, 2, PI)
count <- sim(m10h4_b_MCMC.FORESTAGE, data = list(FORESTAGE = forestage.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

counts.mean <- apply(counts, 2, mean)
counts.PI <- apply(counts, 2, PI)
plot(SALAMAN ~ FORESTAGE, data = s, pch = 16, col = 'blue')
lines(forestage.seq, lambda.mean)
shade(lambda.PI, forestage.seq)
shade(counts.PI, forestage.seq)
mtext("Forest age only")

# Main effect: Percent coverage AFTER controlling for forest age
f.avg <- mean(s$FORESTAGE)
p.seq <- seq(from=0,to=100,length.out=30)
pred.data <- data.frame(
  PCTCOVER = p.seq,
  FORESTAGE = f.avg
)
mu <- link(m10h4_b_MCMC.PCTCOVER_FORESTAGE, data = pred.data)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
R.sim <- sim(m10h4_b_MCMC.PCTCOVER_FORESTAGE, data = pred.data, n = 1e4)

## [ 1000 / 10000 ]
[ 2000 / 10000 ]
[ 3000 / 10000 ]
[ 4000 / 10000 ]
[ 5000 / 10000 ]
[ 6000 / 10000 ]
[ 7000 / 10000 ]
[ 8000 / 10000 ]
[ 9000 / 10000 ]
[ 10000 / 10000 ]

R.PI <- apply(R.sim, 2, PI)
plot(SALAMAN ~ PCTCOVER, data = s, pch = 16, col = 'blue')
lines(p.seq, mu.mean)
shade(mu.PI, p.seq)
shade(R.PI, p.seq)
mtext("Forest age = 0")

# Main effect: Forest age AFTER controlling for percent coverage.
p.avg <- mean(s$PCTCOVER)
f.seq <- seq(from=0,to=800,length.out=30)
pred.data <- data.frame(
  FORESTAGE = f.seq,
  PCTCOVER = p.avg
)
mu <- link(m10h4_b_MCMC.PCTCOVER_FORESTAGE, data = pred.data)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
R.sim <- sim(m10h4_b_MCMC.PCTCOVER_FORESTAGE, data = pred.data, n = 1e4)

## [ 1000 / 10000 ]
[ 2000 / 10000 ]
[ 3000 / 10000 ]
[ 4000 / 10000 ]
[ 5000 / 10000 ]
[ 6000 / 10000 ]
[ 7000 / 10000 ]
[ 8000 / 10000 ]
[ 9000 / 10000 ]
[ 10000 / 10000 ]

R.PI <- apply(R.sim, 2, PI)
plot(SALAMAN ~ FORESTAGE, data = s, pch = 16, col = 'blue')
lines(f.seq, mu.mean)
shade(mu.PI, f.seq)
shade(R.PI, f.seq)
mtext("Percent cover = 0")

The estimate of percent coverage does not really change whether we control for forest age or not. However, whether we control for percent coverage does seem to affect our estimate of forest age. When only estimating the effect of forest age on salamander count, there is a more pronounced slope compared to when we control for percent covered. So when we know percent coverage, the forest age does not seem to matter so much.

# Compare models.
compare(m10h4_b_MCMC.PCTCOVER,m10h4_b_MCMC.FORESTAGE,m10h4_b_MCMC.PCTCOVER_FORESTAGE, m10h4_b_MCMC.PCTCOVERFORESTAGE)

##                                          WAIC       SE         dWAIC      dSE
## m10h4_b_MCMC.PCTCOVER            2.134629e+02 26.44297  0.000000e+00       NA
## m10h4_b_MCMC.PCTCOVER_FORESTAGE  2.179883e+02 27.38000  4.525439e+00 1.489895
## m10h4_b_MCMC.PCTCOVERFORESTAGE   2.212725e+02 28.02070  7.809567e+00 3.296466
## m10h4_b_MCMC.FORESTAGE          1.378436e+175      NaN 1.378436e+175      Inf
##                                         pWAIC     weight
## m10h4_b_MCMC.PCTCOVER            4.786347e+00 0.88951163
## m10h4_b_MCMC.PCTCOVER_FORESTAGE  7.843904e+00 0.09256887
## m10h4_b_MCMC.PCTCOVERFORESTAGE   9.957465e+00 0.01791950
## m10h4_b_MCMC.FORESTAGE          6.892179e+174 0.00000000

When comparing models, it appears the model that only has a main effect of percent coverage has most weighting, although the model with two main effects has ~11% weighting. Clearly, knowing forest age does not really matter when assessing salamander numbers.

I decided to plot the variables against each other to see if I could find any associations.

# Set plotting window layout.
par(mfrow=c(2,2))

# Look at associations.
plot(s$SALAMAN~s$FORESTAGE)
plot(s$SALAMAN~s$PCTCOVER)
plot(s$PCTCOVER~s$FORESTAGE)

Looking at the different distributions of data above, I wonder if this is affecting why forest age does not improve the model. When plotting salamander count against forest age (top left), there does not seem to be a big association, but there does seem to be a huge range of forest ages. Half the ages are less than 100, and the other half range between 100 and 700. I wonder if this is a confound. Perhaps there are issues with how the data were collected. First, we see that older forests seem to have more salamanders than younger forests especially very young forests whose salamander counts are near zero (top left). Next, we see that only when a very large percentage of an area is covered are more salamanders found (top right). Finally, we see that young forests have very little ground covered, and that older forests had a ton of coverage in comparison (bottom left). So forest age cannot really tell us much about salamander count since younger forests didn’t have much ground covered and might be causing an issue.

However, I still wanted to see if scaling the data improves model weighting. so I centered the predictors and ran the models again.

# Scale predictors.
s$PCTCOVER_Scale <- scale(s$PCTCOVER)
s$FORESTAGE_Scale <- scale(s$FORESTAGE)

# Run scaled models.
m10h4_b_MCMC.PCTCOVER_Scale <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER_Scale,
    a ~ dnorm(0,15),
    bC ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

m10h4_b_MCMC.FORESTAGE_Scale <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bF*FORESTAGE_Scale,
    a ~ dnorm(0,15),
    bF ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER_Scale + bF*FORESTAGE_Scale,
    a ~ dnorm(0,15),
    bC ~ dnorm(0,10),
    bF ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

m10h4_b_MCMC.PCTCOVERFORESTAGE_Scale <- map2stan(
  alist(
    SALAMAN ~ dpois( lambda ),
    log(lambda) <- a + bC*PCTCOVER_Scale + bF*FORESTAGE_Scale + bCF*PCTCOVER_Scale*FORESTAGE_Scale,
    a ~ dnorm(0,15),
    bC ~ dnorm(0,10),
    bF ~ dnorm(0,10),
    bCF ~ dnorm(0,10)
  ),
  data=s , chains=4, cores=4 , iter=4000 , warmup=1000 )

# Plot coefficients of scaled models.
plot(coeftab(m10h4_b_MCMC.PCTCOVER_Scale, m10h4_b_MCMC.FORESTAGE_Scale, m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale, m10h4_b_MCMC.PCTCOVERFORESTAGE_Scale))

Now, the intercept estimates seem more similar, although there is a lot of uncertainty in the interaction model. The estimate of percent coverage also seems similar between models, although the interaction model introduces more uncertainty. The forest age predictor only seems to matter if percent coverage is not included in the model. Otherwise, it includes as many positive as negative values. Finally, the interaction coefficient seems greatly uncertain.

# Extract samples for posterior.
post.PCTCOVER_Scale <- extract.samples(m10h4_b_MCMC.PCTCOVER_Scale, n = 300000)
post.FORESTAGE_Scale <- extract.samples(m10h4_b_MCMC.FORESTAGE_Scale, n = 300000)
post.PCTCOVER_FORESTAGE_Scale <- extract.samples(m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale, n = 300000)
post.PCTCOVERFORESTAGE_Scale <- extract.samples(m10h4_b_MCMC.PCTCOVERFORESTAGE_Scale, n = 300000)

post.PCTCOVER_a_Scale <- post.PCTCOVER_Scale$a
post.FORESTAGE_a_Scale <- post.FORESTAGE_Scale$a
post.PCTCOVER_FORESTAGE_a_Scale <- post.PCTCOVER_FORESTAGE_Scale$a
post.PCTCOVERFORESTAGE_a_Scale <- post.PCTCOVERFORESTAGE_Scale$a

post.PCTCOVER_bC_Scale <- post.PCTCOVER_Scale$bC
post.PCTCOVER_FORESTAGE_bC_Scale <- post.PCTCOVER_FORESTAGE_Scale$bC
post.PCTCOVERFORESTAGE_bC_Scale <- post.PCTCOVERFORESTAGE_Scale$bC

post.FORESTAGE_bF_Scale <- post.FORESTAGE_Scale$bF
post.PCTCOVER_FORESTAGE_bF_Scale <- post.PCTCOVER_FORESTAGE_Scale$bF
post.PCTCOVERFORESTAGE_bF_Scale <- post.PCTCOVERFORESTAGE_Scale$bF

post.PCTCOVERFORESTAGE_bCF_Scale <- post.PCTCOVERFORESTAGE_Scale$bCF

# Set plotting layout.
par(mfrow=c(2,2))

# Plot posteriors.
dens(post.PCTCOVER_a_Scale, add = F, xlim=c(xmin = -1, xmax = 1.2), ylim=c(ymin = 0, ymax = 4.1))
dens(post.FORESTAGE_a_Scale, add = T, col = "blue")
dens(post.PCTCOVER_FORESTAGE_a_Scale, add = T, col = "red")
dens(post.PCTCOVERFORESTAGE_a_Scale, add = T, col = "green")
mtext("Intercept")

dens(post.PCTCOVER_bC_Scale, add = F, xlim=c(xmin = 0.5, xmax = 2.5), ylim=c(ymin = 0, ymax = 3))
dens(post.PCTCOVER_FORESTAGE_bC_Scale, add = T, col = "red")
dens(post.PCTCOVERFORESTAGE_bC_Scale, add = T, col = "green")
mtext("bCOVERAGE")

dens(post.FORESTAGE_bF_Scale, add = F, col = "blue", xlim=c(xmin = -1.2, xmax = 1.5), ylim=c(ymin = 0, ymax = 5))
dens(post.PCTCOVER_FORESTAGE_bF_Scale, add = T, col = "red")
dens(post.PCTCOVERFORESTAGE_bF_Scale, add = T, col = "green")
mtext("bFORESTAGE")

dens(post.PCTCOVERFORESTAGE_bCF_Scale, col = "green", xlim=c(xmin = -2, xmax = 2.2))
mtext("bCF")

Black = Main effect of coverage model. Blue = Main effect of forest age model. Red = Main effect of both coverage and forest age model. Green = Interaction model.

The intercept is the same whether we use only percent coverage, or whether we use percent coverage and forest age. Their interaction causes a lot of uncertainty, and using only forest age makes the estimate larger. Like discussed before, older forests had greater coverage. Most importantly, while the forest age coefficient is positive when it is the only predictor, including percent coverage in the model, or an interaction, makes it’s estimate essentially zero. So forest age really doesn’t seem to add anything to the estimate of salamander count.

Counterfactual plots

# Set graphing window
par(mfrow=c(2,2))

# Main effect: Percent Coverage
pctcover_Scale.seq <- seq(from=-2,to=2,length.out=30)
lambda <- link(m10h4_b_MCMC.PCTCOVER_Scale, data = list(PCTCOVER_Scale = pctcover_Scale.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

lambda.mean <- apply(lambda, 2, mean)
lambda.PI <- apply(lambda, 2, PI)
count <- sim(m10h4_b_MCMC.PCTCOVER_Scale, data = list(PCTCOVER_Scale = pctcover_Scale.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

counts.mean <- apply(counts, 2, mean)
counts.PI <- apply(counts, 2, PI)
plot(SALAMAN ~ PCTCOVER_Scale, data = s, pch = 16, col = 'blue')
lines(pctcover_Scale.seq, lambda.mean)
shade(lambda.PI, pctcover_Scale.seq)
shade(counts.PI, pctcover_Scale.seq)
mtext("Scaled: Percent cover only")

# Main effect: Forest Age
forestage_Scale.seq <- seq(from=-2,to=3,length.out=30)
lambda <- link(m10h4_b_MCMC.FORESTAGE_Scale, data = list(FORESTAGE_Scale = forestage_Scale.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

lambda.mean <- apply(lambda, 2, mean)
lambda.PI <- apply(lambda, 2, PI)
count <- sim(m10h4_b_MCMC.FORESTAGE_Scale, data = list(FORESTAGE_Scale = forestage_Scale.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

counts.mean <- apply(counts, 2, mean)
counts.PI <- apply(counts, 2, PI)
plot(SALAMAN ~ FORESTAGE_Scale, data = s, pch = 16, col = 'blue')
lines(forestage_Scale.seq, lambda.mean)
shade(lambda.PI, forestage_Scale.seq)
shade(counts.PI, forestage_Scale.seq)
mtext("Scaled: Forest age only")

# Main effect: Percent coverage AFTER controlling for forest age
f.avg <- mean(s$FORESTAGE_Scale)
p.seq <- seq(from=-2,to=2,length.out=30)
pred.data <- data.frame(
  PCTCOVER_Scale = p.seq,
  FORESTAGE_Scale = f.avg
)
mu <- link(m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale, data = pred.data)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
R.sim <- sim(m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale, data = pred.data, n = 1e4)

## [ 1000 / 10000 ]
[ 2000 / 10000 ]
[ 3000 / 10000 ]
[ 4000 / 10000 ]
[ 5000 / 10000 ]
[ 6000 / 10000 ]
[ 7000 / 10000 ]
[ 8000 / 10000 ]
[ 9000 / 10000 ]
[ 10000 / 10000 ]

R.PI <- apply(R.sim, 2, PI)
plot(SALAMAN ~ PCTCOVER_Scale, data = s, pch = 16, col = 'blue')
lines(p.seq, mu.mean)
shade(mu.PI, p.seq)
shade(R.PI, p.seq)
mtext("Scaled: Forest age = 0")

# Main effect: Forest age AFTER controlling for percent coverage.
p.avg <- mean(s$PCTCOVER_Scale)
f.seq <- seq(from=-2,to=3,length.out=30)
pred.data <- data.frame(
  FORESTAGE_Scale = f.seq,
  PCTCOVER_Scale = p.avg
)
mu <- link(m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale, data = pred.data)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)
R.sim <- sim(m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale, data = pred.data, n = 1e4)

## [ 1000 / 10000 ]
[ 2000 / 10000 ]
[ 3000 / 10000 ]
[ 4000 / 10000 ]
[ 5000 / 10000 ]
[ 6000 / 10000 ]
[ 7000 / 10000 ]
[ 8000 / 10000 ]
[ 9000 / 10000 ]
[ 10000 / 10000 ]

R.PI <- apply(R.sim, 2, PI)
plot(SALAMAN ~ FORESTAGE_Scale, data = s, pch = 16, col = 'blue')
lines(f.seq, mu.mean)
shade(mu.PI, f.seq)
shade(R.PI, f.seq)
mtext("Scaled: Percent cover = 0")

As can be seen again, percent coverage predicts salamander count whether forest age is or is not included in the model. And while forest age also predicts salamander count on its own, when we control for percent coverage it’s effect reduces to essentially zero.

# Compare scaled models.
compare(m10h4_b_MCMC.PCTCOVER_Scale,m10h4_b_MCMC.FORESTAGE_Scale,m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale, m10h4_b_MCMC.PCTCOVERFORESTAGE_Scale)

##                                           WAIC       SE     dWAIC       dSE
## m10h4_b_MCMC.PCTCOVER_Scale           213.3772 26.41707  0.000000        NA
## m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale 217.7827 27.32375  4.405513  1.477482
## m10h4_b_MCMC.PCTCOVERFORESTAGE_Scale  221.8168 28.22411  8.439580  2.521683
## m10h4_b_MCMC.FORESTAGE_Scale          265.4304 35.65343 52.053154 22.720632
##                                           pWAIC       weight
## m10h4_b_MCMC.PCTCOVER_Scale            4.725921 8.887310e-01
## m10h4_b_MCMC.PCTCOVER_FORESTAGE_Scale  7.692984 9.820313e-02
## m10h4_b_MCMC.PCTCOVERFORESTAGE_Scale  10.404798 1.306588e-02
## m10h4_b_MCMC.FORESTAGE_Scale           7.217552 4.421520e-12

Scaling the data does not seem to do much to the model weights. It seems like the best model is the one that only includes percent coverage as a predictor. In conclusion, the forest age predictor seems confounded by the fact that they covereged more ground in older forests. Therefore, the number of salamanders seems to largely be associated with the amount of area the researchers covered.

knitr::knit_exit()

Statistical Rethinking: Sample Problems

Stephen Pierzchajlo

June 24, 2020

2H4, (p. 47)

Part 1.

Solve this:

Variables

Marginal Probability

Solution

Part 2.

Solve this:

3M5. (p. 70)

4H3 (p. 116)

a)

b)

5H3, (p. 164)

a)

b)

6H6 (p. 207)

7H4, (p. 238)

a)

b)

c)

10H4(p. 330)

a)

b)

Counterfactual plots