4E1. Solution: The data is labeled by \(y_i\) and thus \[ y_i \sim \text{Normal}(\mu,\sigma) \] is your likelihood. An interesting note from the solution manual
… a likelihood is effectively a prior for the residuals.
4E2. Solution: The two parameters in the posterior are \(\mu\) and \(\sigma\).
4E3. Solution: \[\begin{align} P(m,s|Y=y) &= \frac{P(\mu=m)P(\sigma=s)\mathcal{L}(Y=y|m,s)}{\int_{\mu}\int_{\sigma} P(m,s,y) \, d\sigma \, d\mu} \\ &=\frac{\mathcal{N}(m;0,10)\, Exp(s;1) \, \mathcal{N}(y;m,s)}{\iint_{\mathbb{R}^{+}\times\mathbb{R}}\mathcal{N}(\mu;0,10)\, Exp(\sigma;1) \, \mathcal{N}(y;\mu,\sigma) \, d\mu \, d\sigma} \end{align}\]
4E4. Solution: The line \(\mu_i = \alpha + \beta x_i\) is the linear model.
4E5. Solution: There are 3 parameters in the posterior: \(\alpha, \beta, \sigma\). \(\mu\) is not a paramater since it is entirely determined by \(\alpha, \beta, x\).
4M1. Solution: Simulating from the prior and displaying the density of \(y_i\):
sim_size=1e4
set.seed(1234)
mu <- rnorm(sim_size, sd=10)
sigma <- rexp(sim_size)
ys <- rnorm(sim_size, mu, sigma)
dens(ys)4M2. Solution: Translating into a quap formula:
4M3. Solution: as a mathematical model: \[\begin{align} y_i &\sim \mathcal{N}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta\,x_i \\ \alpha &\sim \mathcal{N}(0,10) \\ \beta &\sim \mathcal{U}(0,1) \\ \sigma &\sim Exp(1) \end{align}\]
4M4. Solution: Let’s assume we’re working with high schoolers, so that we can expect a leveling out of height as we conclude our study: \[\begin{align} h_{ij} &\sim \mathcal{N}(\mu_{ij}, \sigma) \\ \mu_{ij} &= \alpha + \beta_1 y_j - \beta_2 y_j^2 \;\; (y_j \in \{1,2,3\}) \\ \alpha &\sim \mathcal{N}(100,10) \\ \beta_1 &\sim \mathcal{N}(0,10) \\ \beta_2 &\sim \text{lognormal}(0,1) \\ \sigma &\sim Exp(1) \end{align}\] Where \(h_{ij}\) is the height for student \(i\) at year \(j\) (\(y_j \in \{1,2,3\}\)) and while \(\mu_{ij}\) is indexed for each student and year, it’s clear that once \(\alpha, \beta_{1}, \beta_{2}, \sigma\) are chosen, \(\mu_{ij}\) is the same for each student, in a given year… So that’s not amazing. Better yet would be extra predictors like gender, or something that was auto-regressive, where \(\mu_{ij}\) involved \(h_{i(j-1)}\).
To get an idea of what our priors actually mean, let’s plot them:
Clearly, our priors are crap, given we have the ability for mu to have a negative slope.
4M5. Solution: To fix our issue with the shrinking students, we can modify the functional form of \(mu\): \[\begin{align} \mu &= \alpha + \beta y \, (\gamma - y) \\ \alpha &\sim \mathcal{N}(100,10) \\ \beta &\sim 3+\text{lognormal}(0,1) \\ \gamma &\sim 6+\text{lognormal}(1,1) \end{align}\] and the purpose of this is to force \(\mu\) to be a concave down parabola whose apex is never earlier than \(y=3\). The following prior densities arise:
This isn’t perfect, but it might do.
4M6. Solution: Given variance is \(\sigma^2\), we just need the modeled standard deviation, \(\sigma\), won’t exceed \(\sqrt{64}=8\). So we can model that with \[ \sigma \sim \mathcal{U}(0, 8) \]
4M7. Solution:
data("Howell1"); d <- Howell1; d2 <- d[d$age >= 18,]
xbar <- mean(d2$weight)
m4.3 <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <-a+b*(weight-xbar),
a ~ dnorm(178,20),
b ~ dlnorm(0,1),
sigma ~ dunif(0,50)
), data=d2)
m4.3b <-quap(
alist(
height ~ dnorm(mu, sigma),
mu <-a+b*(weight),
a ~ dnorm(178,20),
b ~ dlnorm(0,1),
sigma ~ dunif(0,50)
), data=d2, start = list(a=178, b=1, sigma=25))Here’s the correlation matrix for m4.3:
| a | b | sigma | |
|---|---|---|---|
| a | 1.0000000 | 0.0055064 | 0.0161530 |
| b | 0.0055064 | 1.0000000 | 0.0052493 |
| sigma | 0.0161530 | 0.0052493 | 1.0000000 |
Here’s the correlation matrix for m4.3b:
| a | b | sigma | |
|---|---|---|---|
| a | 1.0000000 | -0.9896766 | 0.0280503 |
| b | -0.9896766 | 1.0000000 | -0.0286594 |
| sigma | 0.0280503 | -0.0286594 | 1.0000000 |
The second model has stronger associations between all variables.
If we make predictions from the second model:
library(rethinking)
weights <- seq(from=31, to=63, by=1)
# simulate heights:
set.seed(1234)
height_m4.3.b <- sim(m4.3b, data=list(weight=weights))
height_PI <- apply(height_m4.3.b, 2, PI, prob=0.89)
mu_m4.3.b <- link(m4.3b, data=data.frame(weight=weights))
mu_mean <- apply(mu_m4.3.b, 2, mean)
mu_hdpi <- apply(mu_m4.3.b, 2, PI, prob=0.89)
plot(height ~ weight, data=d2, col=col.alpha(rangi2, 0.5))
lines(weights, mu_mean)
shade(mu_hdpi, weights)
shade(height_PI, weights)we see that we’re getting the same posterior predictions.
4M8. Solution:
library(splines)
data("cherry_blossoms")
d <- cherry_blossoms
d2 <- d[complete.cases(d$doy),]
knot_analysis <- function(num_knots=15, w_sd=10) {
knot_list <-
quantile(d2$year, probs=seq(0,1,length.out=num_knots))
B <- bs(d2$year,
knots=knot_list[-c(1,num_knots)],
degree=3,
intercept=TRUE)
m4.7 <- quap(
alist(
D ~ dnorm(mu,sigma),
mu <- a+B%*%w,
a ~ dnorm(100,10),
w ~ dnorm(0,w_sd),
sigma ~ dexp(1)
),
data=list(D=d2$doy,B=B, w_sd=w_sd),
start=list(w=rep(0,ncol(B)))
)
post <- extract.samples(m4.7)
#w <- apply(post$w, 2, mean)
mu <- link(m4.7)
mu_mean <- apply(mu, 2, mean)
mu_PI <- apply(mu, 2, PI)
with(d2, {
plot(doy ~ year, col=col.alpha(rangi2, 0.3), pch=16)
lines(year, mu_mean)
shade(mu_PI, year, col=col.alpha("black", 0.5))
mtext(paste("knots:", num_knots, "& w_sd:", w_sd))
})
}Below is the posterior distribution for \(\mu\) as a function of knots:
It appears as if the prior on the weights affects how “wiggly” the splines can be, and the number knots affects how many places the spline can switch directions…