Chapter 4 - Geocentric Models

This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

# the first line is the likelihood
# the second line is the prior for μ, and the third line is the prior for σ

4E2. In the model definition just above, how many parameters are in the posterior distribution?

# 2 parameters to be estimated: μ and σ
# yi is the observed data

4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.

\[\begin{align*} P(\mu, \sigma| y_i) &\propto \text{Likelihood } \times \text{ Prior probability} \\ & \propto \mathcal{L}(y | \mu, \sigma ) \times P(\mu) \times P(\sigma) \end{align*}\] For the likelihood, we get the following term: \[\begin{align*} \mathcal{L}(y | \mu, \sigma ) &= \prod_i \text{Normal}(y_i|\mu, \sigma) \\ & = \prod_i \frac{1}{\sqrt{2\pi \sigma^2} }\exp(- \frac{(y_i-\mu)^2}{2\sigma^2}). \end{align*}\] and for the two priors we have \[\begin{align*} P(\mu) &= \text{Normal}(\mu| 0,10) \\ &= \frac{1}{\sqrt{2\pi \times 10^2} }\exp(- \frac{\mu^2}{2\times 10^2}) \\ \\ P(\sigma) &= \text{Exponential}(\sigma|1) \\ &= e^{-\sigma}. \end{align*}\] Plugging everything in one equation, we get the following: \[\begin{align*} P(\mu, \sigma| y_i) &= \frac{\prod_i \text{Normal}(y_i|\mu, \sigma) \times \text{Normal}(\mu| 0,10) \times \text{Exponential}(\sigma|1) } {\int \prod_i \text{Normal}(y_i|\mu, \sigma) \times \text{Normal}(\mu| 0,10) \times \text{Exponential}(\sigma|1) \text{ d}\mu\text{d}\sigma} \\ \\ &\propto \prod_i \frac{1}{\sqrt{2\pi \sigma^2} }\exp(- \frac{(y_i-\mu)^2}{2\sigma^2}) \times \frac{1}{\sqrt{2\pi \times 10^2} }\exp(- \frac{\mu^2}{2\times 10^2}) \times e^{-\sigma}. \end{align*}\]

4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]

# the second line is linear model: μ_i = α + βx_i
# The first line is the likelihood, the second line is the linear model, the third line is the prior for α. 
# The fourth line is the prior for β, and the fifth line is the prior for σ.

4E5. In the model definition just above, how many parameters are in the posterior distribution?

# three parameters: α , β, and σ

4M1. For the model definition below, simulate observed y values from the prior (not the posterior). \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

sample_mu <- rnorm(1e4, 0, 10)
sample_sigma <- runif(1e4, 0, 10)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_y)

4M2. Translate the model just above into a quap formula.

formula <- alist(
  y ~ dnorm(mu, sigma),
  mu ~ dnorm(0, 10),
  sigma ~ dunif(0, 10)
)
4M3. Translate the quap model formula below into a mathematical model definition:

y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )

flist <- alist(
  y ~ dnorm(mu, sigma),
  mu <- a + b*x,
  a ~ dnorm(0, 50),
  b ~ dunif(0, 10),
  sigma ~ dunif(0, 50)
)

4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.

\[\begin{align*} h_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta t_i \\ \alpha &\sim \text{Normal}(150, 20) \\ \beta &\sim \text{Normal}(0, 10) \\ \sigma &\sim \text{Uniform}(0, 50) \end{align*}\]

Alpah: average height of a student (normal distriution with Mean = 150 and SD = 20) beta: average growth in height every year (normal distriution with Mean = 0 and SD = 10) sigma: variance heights among students (uniform distribution with variance between 0 to 50)

N <- 1000
a <- rnorm( N, 150, 20 )
b <- rnorm( N, 0, 10 )
sigma <- runif( N, 0, 50 )


# Predicted distribution of height at the first year:
height_0 <- rnorm( N, a + b * 0, sigma )
dens( height_0 )

4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How?

\[\begin{align*} \alpha &\sim \text{Normal}(150, 20) \\ \beta &\sim \text{Log-Normal}(0, 2) \end{align*}\]

# resulting modle:
set.seed(123)
N <- 1000
a <- rnorm( N, 150, 20 )
b <- rlnorm( N, 0, 2 )
sigma <- runif( N, 0, 50 )

Since every student got taller each yearand the beta is changed to a log-normal distriution, and te beta is greater than zero. In this case, the SDs are decreased as well to fit the story.

4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?

\[\sigma \sim \text{Uniform}(0, 64).\]

4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.

# set up
data("Howell1")
d <- Howell1
d2 <- d[ d$age >= 18, ]

# define the average weight, x-bar
xbar <- mean( d2$weight )

# fit model
m4.3 <- quap(
        alist(
          height ~ dnorm( mu, sigma),
          mu <- a + b * ( weight - xbar ) ,
          a ~ dnorm( 178, 20),
          b ~ dlnorm( 0, 1),
          sigma ~ dunif(0, 50)
        ) ,
        data = d2
)
precis(m4.3)
##              mean         sd        5.5%       94.5%
## a     154.6013716 0.27030755 154.1693679 155.0333752
## b       0.9032841 0.04192361   0.8362821   0.9702862
## sigma   5.0718789 0.19115459   4.7663769   5.3773809
# refit uncentered
m4.3_u <- quap(
        alist(
          height ~ dnorm( mu, sigma),
          mu <- a + b * weight ,
          a ~ dnorm( 178, 20),
          b ~ dlnorm( 0, 1),
          sigma ~ dunif(0, 50)
        ) ,
        data = d2
)
precis(m4.3_u)
##              mean         sd        5.5%       94.5%
## a     114.5342929 1.89775143 111.5013196 117.5672662
## b       0.8907293 0.04175808   0.8239918   0.9574668
## sigma   5.0727308 0.19125007   4.7670763   5.3783854
# compute $\mu$ in the uncentered version for when $x$ is the average weight:
113.9 + 0.9 * xbar
## [1] 154.3914
# ( vcm <- vcov( m4.3_u ) )
# 
# vcm <- vcov( m4.3_u ) 
# vcm %>% prettify()
# 
# cov2cor( vcm )
# cov2cor( vcm ) %>%  prettify()
# pairs( m4.3_u )

plot_posterior <- function(model) {
  weight.seq <- seq( from=25, to=70, by=1)
  
  mu <- link(model, data = data.frame( weight=weight.seq ) )
  mu.mean <- apply(mu, 2, mean)
  mu.PI <- apply(mu, 2, PI, prob=0.89)
  sim.height <- sim( model, data=list(weight=weight.seq))
  height.PI <- apply(sim.height, 2, PI, prob=0.89)
  plot( height ~ weight, d2, col=col.alpha(rangi2, 0.5))
  
  # draw a MAP line
  lines(weight.seq, mu.mean)
  
  # draw PI region for line
  shade(mu.PI, weight.seq)
  
  # draw PI region for simulated heights
  shade(height.PI, weight.seq )
}
par(mfrow=c(1,2))
plot_posterior(m4.3_u)
mtext("Uncentered model")
plot_posterior(m4.3)
mtext("Centered model")

par(mfrow=c(1,1))

4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

# load the data
data("cherry_blossoms")
d <- cherry_blossoms
d2 <- d[ complete.cases(d$doy) , ]

library(splines)
num_knots <- 30
knot_list <- quantile( d2$year, probs = seq(0, 1, length.out = num_knots ) )

B <- bs(d2$year,
        knots=knot_list[-c(1, num_knots)],
        degree=3, intercept=TRUE)

plot(NULL, xlim=range(d2$year),ylim=c(0,1),  ylab="basis")
for (i in 1:ncol(B)) lines(d2$year,B[,i])

num_knots2 <- 15
knot_list2 <- quantile( d2$year, probs = seq(0, 1, length.out = num_knots2 ) )

B2 <- bs(d2$year,
        knots=knot_list2[-c(1, num_knots2)],
        degree=3, intercept=TRUE)

plot(NULL, xlim=range(d2$year),ylim=c(0,1),  ylab="basis")
for (i in 1:ncol(B2)) lines(d2$year,B2[,i])

The curve with 30 knots is less smooth than the curve with 15 knots. More knots means there are more basis functions and so the curve can fit smaller local details. Fitting only a handful of knots will force the model to fit the more global trends. It’s a trade-off between model accuracy and interpretability. Both factors are critical in terms of choosing the models.

4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.

  1. Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

  2. Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.

  3. What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.

d2 <- Howell1[Howell1$age < 18, ]
nrow(d2)
## [1] 192
# a:
formula <- alist(
  height ~ dnorm(mu, sigma),
  mu <- a + b * weight,
  a ~ dnorm(110, 30),
  b ~ dnorm(0, 10),
  sigma ~ dunif(0, 60)
)
m <- map(formula, data = d2)
precis(m, corr = TRUE)
##            mean         sd      5.5%     94.5%
## a     58.345025 1.39572980 56.114379 60.575671
## b      2.715035 0.06823357  2.605984  2.824085
## sigma  8.437188 0.43056875  7.749056  9.125320
# b:
plot(height ~ weight, data = d2, col = col.alpha(rangi2, 0.3))

weight.seq <- seq(from = min(d2$weight), to = max(d2$weight), by = 1)
mu <- link(m, data = data.frame(weight = weight.seq))

mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.89)
lines(weight.seq, mu.mean)
shade(mu.HPDI, weight.seq)

sim.height <- sim(m, data = list(weight = weight.seq))

height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.89)
shade(height.HPDI, weight.seq)

For part (c) will need to assess the fit of the model. It overestimates the height at both weight < 10 and high > 30. It underestimates height at a range of 10-30.

It may be related to the relationship μ and weight is linear, therefore violates the one of the assumptions when fitting a linear model.