4E1. Solution: The data is labeled by \(y_i\) and thus \[ y_i \sim \text{Normal}(\mu,\sigma) \] is your likelihood. An interesting note from the solution manual

… a likelihood is effectively a prior for the residuals.

4E2. Solution: The two parameters in the posterior are \(\mu\) and \(\sigma\).

4E3. Solution: \[\begin{align} P(m,s|Y=y) &= \frac{P(\mu=m)P(\sigma=s)\mathcal{L}(Y=y|m,s)}{\int_{\mu}\int_{\sigma} P(m,s,y) \, d\sigma \, d\mu} \\ &=\frac{\mathcal{N}(m;0,10)\, Exp(s;1) \, \mathcal{N}(y;m,s)}{\iint_{\mathbb{R}^{+}\times\mathbb{R}}\mathcal{N}(\mu;0,10)\, Exp(\sigma;1) \, \mathcal{N}(y;\mu,\sigma) \, d\mu \, d\sigma} \end{align}\]

4E4. Solution: The line \(\mu_i = \alpha + \beta x_i\) is the linear model.

4E5. Solution: There are 3 parameters in the posterior: \(\alpha, \beta, \sigma\). \(\mu\) is not a paramater since it is entirely determined by \(\alpha, \beta, x\).

4M1. Solution: Simulating from the prior and displaying the density of \(y_i\):

sim_size=1e4
set.seed(1234)
mu <- rnorm(sim_size, sd=10)
sigma <- rexp(sim_size)
ys <- rnorm(sim_size, mu, sigma)
dens(ys)

4M2. Solution: Translating into a quap formula:

m4.2 <-
  alist(
    y ~ dnorm(mu, sigma),
    mu ~ dnorm(0, 10),
    sigma ~ dexp(1)
  )

4M3. Solution: as a mathematical model: \[\begin{align} y_i &\sim \mathcal{N}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta\,x_i \\ \alpha &\sim \mathcal{N}(0,10) \\ \beta &\sim \mathcal{U}(0,1) \\ \sigma &\sim Exp(1) \end{align}\]

4M4. Solution: Let’s assume we’re working with high schoolers, so that we can expect a leveling out of height as we conclude our study: \[\begin{align} h_{ij} &\sim \mathcal{N}(\mu_{ij}, \sigma) \\ \mu_{ij} &= \alpha + \beta_1 y_j - \beta_2 y_j^2 \;\; (y_j \in \{1,2,3\}) \\ \alpha &\sim \mathcal{N}(100,10) \\ \beta_1 &\sim \mathcal{N}(0,10) \\ \beta_2 &\sim \text{lognormal}(0,1) \\ \sigma &\sim Exp(1) \end{align}\] Where \(h_{ij}\) is the height for student \(i\) at year \(j\) (\(y_j \in \{1,2,3\}\)) and while \(\mu_{ij}\) is indexed for each student and year, it’s clear that once \(\alpha, \beta_{1}, \beta_{2}, \sigma\) are chosen, \(\mu_{ij}\) is the same for each student, in a given year… So that’s not amazing. Better yet would be extra predictors like gender, or something that was auto-regressive, where \(\mu_{ij}\) involved \(h_{i(j-1)}\).

To get an idea of what our priors actually mean, let’s plot them:

Clearly, our priors are crap, given we have the ability for mu to have a negative slope.

4M5. Solution: To fix our issue with the shrinking students, we can modify the functional form of \(mu\): \[\begin{align} \mu &= \alpha + \beta y \, (\gamma - y) \\ \alpha &\sim \mathcal{N}(100,10) \\ \beta &\sim 3+\text{lognormal}(0,1) \\ \gamma &\sim 6+\text{lognormal}(1,1) \end{align}\] and the purpose of this is to force \(\mu\) to be a concave down parabola whose apex is never earlier than \(y=3\). The following prior densities arise:

This isn’t perfect, but it might do.

4M6. Solution: Given variance is \(\sigma^2\), we just need the modeled standard deviation, \(\sigma\), won’t exceed \(\sqrt{64}=8\). So we can model that with \[ \sigma \sim \mathcal{U}(0, 8) \]

4M7. Solution:

data("Howell1"); d <- Howell1; d2 <- d[d$age >= 18,]
xbar <- mean(d2$weight)
m4.3 <- quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <-a+b*(weight-xbar),
    a ~ dnorm(178,20),
    b ~ dlnorm(0,1),
    sigma ~ dunif(0,50)
  ), data=d2)

m4.3b <-quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <-a+b*(weight),
    a ~ dnorm(178,20),
    b ~ dlnorm(0,1),
    sigma ~ dunif(0,50)
  ), data=d2, start = list(a=178, b=1, sigma=25))

Here’s the correlation matrix for m4.3:

a b sigma
a 1.0000000 0.0055064 0.0161530
b 0.0055064 1.0000000 0.0052493
sigma 0.0161530 0.0052493 1.0000000

Here’s the correlation matrix for m4.3b:

a b sigma
a 1.0000000 -0.9896766 0.0280503
b -0.9896766 1.0000000 -0.0286594
sigma 0.0280503 -0.0286594 1.0000000

The second model has stronger associations between all variables.

If we make predictions from the second model:

library(rethinking)
weights <- seq(from=31, to=63, by=1)
# simulate heights:
set.seed(1234)
height_m4.3.b <- sim(m4.3b, data=list(weight=weights))
height_PI <- apply(height_m4.3.b, 2, PI, prob=0.89)
mu_m4.3.b <- link(m4.3b, data=data.frame(weight=weights))
mu_mean <- apply(mu_m4.3.b, 2, mean)
mu_hdpi <- apply(mu_m4.3.b, 2, PI, prob=0.89)

plot(height ~ weight, data=d2, col=col.alpha(rangi2, 0.5))
lines(weights, mu_mean)
shade(mu_hdpi, weights)
shade(height_PI, weights)

we see that we’re getting the same posterior predictions.

4M8. Solution:

library(splines)
data("cherry_blossoms")
d <- cherry_blossoms
d2 <- d[complete.cases(d$doy),]
knot_analysis <- function(num_knots=15, w_sd=10) {
  
  knot_list <-
    quantile(d2$year, probs=seq(0,1,length.out=num_knots))
  
  B <- bs(d2$year,
          knots=knot_list[-c(1,num_knots)],
          degree=3,
          intercept=TRUE)
  
  m4.7 <- quap(
    alist(
      D ~ dnorm(mu,sigma),
      mu <- a+B%*%w,
      a ~ dnorm(100,10),
      w ~ dnorm(0,w_sd),
      sigma ~ dexp(1)
    ),
    data=list(D=d2$doy,B=B, w_sd=w_sd),
    start=list(w=rep(0,ncol(B)))
  )
  
  post <- extract.samples(m4.7)
  #w <- apply(post$w, 2, mean)
  mu <- link(m4.7)
  mu_mean <- apply(mu, 2, mean)
  mu_PI <- apply(mu, 2, PI)
  with(d2, {
    plot(doy ~ year, col=col.alpha(rangi2, 0.3), pch=16)
    lines(year, mu_mean)
    shade(mu_PI, year, col=col.alpha("black", 0.5))
    mtext(paste("knots:", num_knots, "& w_sd:", w_sd))
  })
}

Below is the posterior distribution for \(\mu\) as a function of knots:

It appears as if the prior on the weights affects how “wiggly” the splines can be, and the number knots affects how many places the spline can switch directions…

4H1. Solution: Let’s use the model fit earlier in the chapter, m4.3:

weight expected_height lb ub
46.95 156.2759 148.0415 163.3756
43.72 153.5507 144.9212 160.6191
64.78 172.7135 165.1116 181.2886
32.59 143.5252 134.8237 151.0139
54.63 163.5319 156.1532 171.6025

4H2. Solution: Let’s fit the under 18 data:

d <- Howell1; d2 <- d[d$age < 18,]
xbar <- mean(d2$weight)
# average child age in dataset is 7yrs old, and
# average height for a 7yr old is 120cm
m4 <- quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <-a+b*(weight-xbar),
    a ~ dnorm(120,20),
    b ~ dlnorm(0,1),
    sigma ~ dunif(0,50)
  ), data=d2)

Here are the estimates:

            mean         sd       5.5%      94.5%
a     179.967408 1.90785800 176.918283 183.016534
b       2.698049 0.06809083   2.589226   2.806871
sigma   8.439549 0.43092727   7.750844   9.128254

This indicates that for every 10 units, we can expect an change in around 27cm. To see the model’s predicted heights versus the data check out the following graph:

As we can see, the data have a non-linear response to weight that isn’t being captured by our linear model. In particular, the association between weight and height “levels out” as we get heavier (and most likely older). By contrast, the slope is much steeper when the weight is lower, as we’re probably observing children experiencing growth spurts. If I had to change any assumptions, I’d make our model not assume that the standard deviation is independent of weight/age. Clearly, there’s a larger spread among possible heights when weights are under 20 units (as compared to over 25).

4H3. Solution: Let’s say that we want to model height against log(weight):

d$log_weight <- log(d$weight)
m5 <- quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <-a+b*log_weight,
    a ~ dnorm(120,20),
    b ~ dlnorm(0,1),
    sigma ~ dunif(0,50)
  ), data=d)

Our estimates look like:

mean sd X5.5. X94.5.
a -23.132421 1.3334245 -25.263491 -21.00135
b 46.890724 0.3820800 46.280087 47.50136
sigma 5.135978 0.1557838 4.887005 5.38495

Which means for a 1 log-kg we can expect an increase in around 47cm.

Now lets look at predictions:

The model does a pretty good job fitting the data! Now let’s see what happens when we change the scale back.

plot(d$height ~ d$weight)
shade(heights_hdpi, exp(weights), col=col.alpha("tomato", 0.2))
shade(mu_hdpi, exp(weights), col=col.alpha("blue", 0.75))
lines(exp(weights), mu_mean)