Chapter 4 - Geocentric Models

This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

# The first one y_i ~ Normal(μ, σ) is the likelyhood.

4E2. In the model definition just above, how many parameters are in the posterior distribution?

# Two: μ and σ.

4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.

# P(μ, σ|y) = [∏iNormal(yi|μ, σ)Normal(μ|0,10)Uniform(σ|0,10)i]/[∫∫∏iNormalhi|μ, σ)Normal(μ|0,10)Uniform(σ|0,10)dμdσ]

4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]

# The liner model is 2nd line: μ_i = α + βx_i

4E5. In the model definition just above, how many parameters are in the posterior distribution?

# Three: α, β, σ.

4M1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

μ <- rnorm(1e4, 0, 10)
σ <- dexp(1)
prior_y <- rnorm(1e4, μ, σ)
dens(prior_y)

4M2. Translate the model just above into a quap formula.

# we can use alist() to generate formula
formula <- alist(
  y ~ dnorm(μ, σ),
  μ ~ dnorm(0, 10),
  σ ~ dexp(1)
)
4M3. Translate the quap model formula below into a mathematical model definition:

y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )

# y_i ∼ Normal(μ, σ) 
# μ_i = α + βx_i 
# α ∼ Normal(0, 10) 
# β ∼ Uniform(0, 1) 
# σ ∼ Exponential(1)

4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height(cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?

# Since the range of the age is not provided, we can assume the range to be school age (5 years old) to university age (20 years old). For α, I choose 150cm as the center point of a normal distribution of heights, The standard diviation (SD) is 20cm. For β, I choose 4cm per year as the center of a normal distribution, and 2cm as the SD. For σ, a uniform of 0 to 50cm to cover both tight and wide distribution.

# Therefore, we have the following mathematical definition:
# y_i ~ Normal(μ, σ)
# μ_i = α + βx_i
# α ∼ Normal(150, 20) 
# β ∼ Normal(4, 2) 
# σ ~ Uniform(0, 50)

# Now let's simulate
N <- 50
α <- rnorm( N, 150, 20 )
β <- rnorm( N, 4, 2 )
σ <- runif( N, 0, 50 )

plot( NULL, xlim=range(0:3), ylim=c(0, 300),
      xlab="year", ylab="height")
for (i in 1:50) curve( α[i] + β[i] * x,
                      from=0, to=3, add  = TRUE,
                      col=col.alpha("black", 0.2))

# If we want to make these priors more likely, we would need more information, such as the range of the age, or the average height of the student in certain year. Those information could help us narrow down the prior.

4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot.

# Now that we know the student got taller each year, we can narrow our prior range, since this usually happen in younger age. For α, I change it to 120cm, and 10cm as SD. For β, I change it to 6cm per year and 1cm as SD. For σ, I change to 20cm since we don't need to include older students.

# Therefore, we have the following mathematical definition:
# y_i ~ Normal(μ, σ)
# μ_i = α + βx_i
# α ∼ Normal(120, 10) 
# β ∼ Normal(6, 1) 
# σ ~ Uniform(0, 20)

# Now let's simulate
N <- 50
α <- rnorm( N, 120, 10 )
β <- rnorm( N, 6, 1 )
σ <- runif( N, 0, 20 )

plot( NULL, xlim=range(0:3), ylim=c(0, 300),
      xlab="year", ylab="height")
for (i in 1:50) curve( α[i] + β[i] * x,
                      from=0, to=3, add  = TRUE,
                      col=col.alpha("black", 0.2))

4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?

# The variance is the square of σ, so the new σ should be 8.

# Therefore, we have the following mathematical definition:
# y_i ~ Normal(μ, σ)
# μ_i = α + βx_i
# α ∼ Normal(120, 10) 
# β ∼ Normal(6, 1) 
# σ ~ Uniform(0, 8)

4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. Show the pairs() plot. What is different? Then compare the posterior predictions of both models.

# Let's first fit model m4.3

data(Howell1)
d<- Howell1
d2<- d[d$age>=18,]

xbar <- mean(d2$weight)

m4.3_orig <- quap( alist(height ~ dnorm( μ , σ ) , μ <- α + β*( weight - xbar ) , 
    α ~ dnorm( 178 , 20 ) , β ~ dlnorm( 0 , 1 ) , σ ~ dunif( 0 , 50 )) , data=d2)
m4.3_orig
## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(µ, s)
## µ <- a + ß * (weight - xbar)
## a ~ dnorm(178, 20)
## ß ~ dlnorm(0, 1)
## s ~ dunif(0, 50)
## 
## Posterior means:
##           a           ß           s 
## 154.6013672   0.9032807   5.0718821 
## 
## Log-likelihood: -1071.01
# Now refit with omitting xbar
m4.3_refit <- quap( alist(height ~ dnorm( μ , σ ) , μ <- α + β* weight , 
    α ~ dnorm( 178 , 20 ) , β ~ dlnorm( 0 , 1 ) , σ ~ dunif( 0 , 50 )) , data=d2)
m4.3_refit
## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(µ, s)
## µ <- a + ß * weight
## a ~ dnorm(178, 20)
## ß ~ dlnorm(0, 1)
## s ~ dunif(0, 50)
## 
## Posterior means:
##           a           ß           s 
## 114.5343275   0.8907304   5.0727414 
## 
## Log-likelihood: -1071.07
# Comparing with orignial m4.3, we can see the β and σ are about the same, but α has changed a lot, with broader width.

4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

# Let's first plot the original spline.
library(splines)

data(cherry_blossoms) 
d <- cherry_blossoms 
d2 <- d[ complete.cases(d$doy) , ]
num_knots <- 15
knot_list <- quantile( d2$year , probs=seq(0,1,length.out=num_knots))

B15 <- bs(d2$year, knots=knot_list[-c(1,num_knots)] , degree=3 , intercept=TRUE )
plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis" )
for ( i in 1:ncol(B15) ) lines( d2$year , B15[,i])

# Now that we increase the knots to 25
num_knots <- 25
knot_list <- quantile( d2$year , probs=seq(0,1,length.out=num_knots))

B30 <- bs(d2$year, knots=knot_list[-c(1,num_knots)] , degree=3 , intercept=TRUE )
plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis" )
for ( i in 1:ncol(B30) ) lines( d2$year , B30[,i])

# Comparing with original 15 knots, we can see the curve is sharper with increasing knots. 

4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.

  1. Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

  2. Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.

  3. What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.

d <- Howell1
d2 <- d[ d$age < 18, ]
# (a)
xbar <- mean( d2$weight )

m4H2a <- quap(
  alist(
    height ~ dnorm( µ, σ) ,
    µ <- α + β * ( weight - xbar )  ,
    α ~ dnorm( 156, 20) ,
    β ~ dlnorm( 0, 1 ) ,
    σ ~ dunif(0, 50)
  ),
  data=d2
)
m4H2a
## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(µ, s)
## µ <- a + ß * (weight - xbar)
## a ~ dnorm(156, 20)
## ß ~ dlnorm(0, 1)
## s ~ dunif(0, 50)
## 
## Posterior means:
##          a          ß          s 
## 108.362064   2.716689   8.436840 
## 
## Log-likelihood: -681.91
precis(m4H2a)
##         mean         sd       5.5%      94.5%
## a 108.362064 0.60861048 107.389387 109.334741
## ß   2.716689 0.06831275   2.607512   2.825866
## s   8.436840 0.43052482   7.748778   9.124902
# Looking at the precis, we can see α means the average height, which is 108.36. β is the slope, which indicates every 1 unit change can result a height increase of 2.72. σ as the SD indicates the unvertainty of the predictions.
# Therefore, every 10 units of increase in weight could result 27.2cm increase in height.

# (b)
# We can plot the regression line first, then compute the µ by sampling from the posterior distribution, and get the mean and 89% Prediction interval of µ.
# Next, we compute the 89% Prediction interval of hieght.

weight.seq <- seq(from=min(d2$weight), to = max(d2$weight), length.out = 30)          
post <- extract.samples(m4H2a)          

µ <- link( m4H2a, data = list(weight = weight.seq))
µ.mean <- apply(µ, 2, mean)
µ.PI <- apply(µ, 2, PI, prob=0.89)

sim.height <- sim( m4H2a, data = list(weight = weight.seq ))
height.PI <- apply(sim.height, 2, PI, prob=0.89)

plot(height ~ weight, data=d2, col=col.alpha(rangi2, 0.9), ylim=c(50, 180))
lines(weight.seq, µ.mean)
shade( µ.PI, weight.seq)
shade( height.PI, weight.seq)

# (c)
# As we can see from the plot, the weight range at < 10 and > 30 are off the regression line a lot, which means it doesn't fit the data so well. I believe a polynomial model can be used to improve the model.