Chapter 4 - Geocentric Models

This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

# y_i ∼ Normal(μ, σ) is the likelihood

4E2. In the model definition just above, how many parameters are in the posterior distribution?

#There are two paramters: μ, σ

4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.

#Answer: Please see the Bayes' theorem below:

\[Pr(\mu, \sigma|y) = \frac{\prod_{i} Normal(y_i|\mu, \sigma) \ Normal(\mu|0,10) \ Exponential (\sigma|1)}{\iint \prod_{i} \ Normal(y_i|\mu, \sigma) \ Normal(\mu|0,10) \ Exponential (\sigma|1)d \mu d\sigma } \]

4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]

# The second line  μ_i = α + βx_i is the linear model

4E5. In the model definition just above, how many parameters are in the posterior distribution?

# There are three parameters in the posterior distribution: α, β, and σ.

4M1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

sample_mu = rnorm(1e4, 0, 10)
sample_sigma = rexp(1e4, 1)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_y)

4M2. Translate the model just above into a quap formula.

flist = alist(
  y ~ dnorm (mu, sigma),
  mu ~ dnorm(0,10),
  sigma ~ dexp(1)
  
)
4M3. Translate the quap model formula below into a mathematical model definition:

y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )

# Answer: Please see the mathematical model below:

\[\begin{align} \ y_i ∼ Normal(μ_i, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Uniform(0, 1) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height(cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?

# 1) Write down the mathematical model definition for this regression:

\[\begin{align} \ h_i ∼ Normal(μ_i, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(165, 15) \\ \ β ∼ Normal(0, 10) \\ \ σ ∼ Uniform(0,50) \\ \end{align}\]

#   sampling from the priors, simulate observed y values from the prior
set.seed(2971)
sample_alpha = rnorm(1e4, 165, 15)
sample_beta = rnorm(1e4, 0, 10)
sample_sigma = runif(1e4, 0, 50)
sample_x = runif(1e4, 0,3)
prior_y <- rnorm(1e4, sample_alpha + sample_beta*sample_x,sample_sigma )
dens(prior_y)

# Prior Predictive Simulations with 50 students
N <- 50
a <- rnorm(N, 165, 15)
b <- rnorm(N, 0, 10)
plot(NULL, xlim=c(0,3), ylim=c(-100,400),
     xlab="Year", ylab="Height")
abline(h=0, lty=2)
abline(h=272,lty=1,lwd=0.5)
mtext("b ~ dnorm(0,10)")
for(i in 1:N) curve(a[i]+b[i]*x,
                    from=0, to =3, add=TRUE,
                    col=col.alpha("black",0.2))

# Based on the simulation of observed y values from the prior, we notice that the majority of simulated height will be between 100cm to 240cm. The range makes sense.
# On the Prior Predictive Simulations plot, I've also addd a dashed line at zero - no one is shorter than 0; and a line at 272 cm for the world's tallest person. All the lines we simulated for student heights over 3 years are all within the range. From the plot, we believe the prior is sensible.

4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot.

# Yes. Since every student got taller each year, it implies that the beta should be restricted to positive values. Therefore I will change my prior Beta as log-normal instead:
# β∼Log-Normal(0,2.5)

# Prior Predictive Simulations
set.seed(2971)
N <- 50
a <- rnorm(N, 165, 15)
b <- rlnorm(N, 0, 2.5)
plot(NULL, xlim=c(0,3), ylim=c(-100,400),
     xlab="Year", ylab="Height")
abline(h=0, lty=2)
abline(h=272,lty=1,lwd=0.5)
mtext("b ~ dlnorm(0,2.5)")
for(i in 1:N) curve(a[i]+b[i]*x,
                    from=0, to =3, add=TRUE,
                    col=col.alpha("black",0.2))

4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?

# If variance among heights for students of the same age is never more than 64cm, then I will change my sigma prior to:
# σ ~ Uniform(0, 8)

4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. Show the pairs() plot. What is different? Then compare the posterior predictions of both models.

# Refit the model and omit the mean weight xbar this time
set.seed(2971)
data(Howell1); d <- Howell1; d2 <- d[ d$age >= 18 , ]
m4.3_new <- quap(
  alist(
    height ~ dnorm( mu , sigma ) ,
    mu <- a + b*(weight) ,
    a ~ dnorm( 178 , 20 ) ,
    b ~ dlnorm( 0 , 1 ) ,
    sigma ~ dunif( 0 , 50 )
  ) , data=d2 )

# Compare new model's posterior to that of the original model
precis(m4.3_new)
##              mean         sd        5.5%       94.5%
## a     114.5343664 1.89775066 111.5013943 117.5673385
## b       0.8907286 0.04175806   0.8239912   0.9574661
## sigma   5.0727268 0.19124969   4.7670729   5.3783807
# The precis function gives the quadratic approximation for α, β and σ.
# Compared to the posterior of original model, there's no much difference in the marginal distribution for β and σ. However, the marginal distribution of α changes a lot. α in refit ~ N(114.53, 1.9) vs. α in original ~ N(154.60, 0.27)

# Covariance 
round(vcov(m4.3_new),3)
##            a      b sigma
## a      3.601 -0.078 0.009
## b     -0.078  0.002 0.000
## sigma  0.009  0.000 0.037
# The variance of β and σ, and the covariance between β and σ are unchanged. However, the variance of refit α increases a lot. And the covariance between α and β/σ also changes.

# pairs plot
pairs(m4.3_new)

# From the comparison between pairs() for refit and original m4.3, we notice that α and β demonstrates a very strong correlation in the refit model without the xbar. The correlation is -0.99 in refit model vs. 0.02 in the original model. The strong correlation is also supported by the scatterplot between α and β shown on the upper right above the diagonal.

4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

data(cherry_blossoms)
d <- cherry_blossoms
d2 <- d[complete.cases(d$temp), ] 

# Step 1: Original 15 knots
num_knots = 15 
knot_list = quantile(d2$year, probs=seq(0,1,length.out=num_knots))

library(splines) 
B = bs(d2$year, knots=knot_list[-c(1,num_knots)] , degree=3 , intercept=TRUE)

m4.7 = quap( 
  alist(
    T ~ dnorm(mu , sigma) , 
    mu <- a + B %*% w , 
    a ~ dnorm(6,10), 
    w ~ dnorm(0,1), 
    sigma ~ dexp(1)), 
  data=list(T=d2$temp , B=B ) , 
  start=list(w=rep(0, ncol(B))))

plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis" )
for ( i in 1:ncol(B) ) lines( d2$year , B[,i] )

post <- extract.samples( m4.7 )
w <- apply( post$w , 2 , mean )
plot( NULL , xlim=range(d2$year) , ylim=c(-2,2) ,
xlab="year" , ylab="basis * weight" )
for ( i in 1:ncol(B) ) lines( d2$year , w[i]*B[,i] )

mu <- link( m4.7 )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$temp , col=col.alpha(rangi2,0.3) , pch=16 )
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )

# Step 2: Increase the number of knots to 40
num_knots = 40
knot_list = quantile(d2$year, probs=seq(0,1,length.out=num_knots))

library(splines) 
B = bs(d2$year, knots=knot_list[-c(1,num_knots)] , degree=3 , intercept=TRUE)

m4.7 = quap( 
  alist(
    T ~ dnorm(mu , sigma) , 
    mu <- a + B %*% w , 
    a ~ dnorm(6,10), 
    w ~ dnorm(0,1), 
    sigma ~ dexp(1)), 
  data=list(T=d2$temp , B=B ) , 
  start=list(w=rep(0, ncol(B))))

plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis" )
for ( i in 1:ncol(B) ) lines( d2$year , B[,i] )

post <- extract.samples( m4.7 )
w <- apply( post$w , 2 , mean )
plot( NULL , xlim=range(d2$year) , ylim=c(-2,2) ,
xlab="year" , ylab="basis * weight" )
for ( i in 1:ncol(B) ) lines( d2$year , w[i]*B[,i] )

mu <- link( m4.7 )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$temp , col=col.alpha(rangi2,0.3) , pch=16 )
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )

# Step3: adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens
# Here I changed the standard deviation to 10
num_knots = 40
knot_list = quantile(d2$year, probs=seq(0,1,length.out=num_knots))

library(splines) 
B = bs(d2$year, knots=knot_list[-c(1,num_knots)] , degree=3 , intercept=TRUE)

m4.7 = quap( 
  alist(
    T ~ dnorm(mu , sigma) , 
    mu <- a + B %*% w , 
    a ~ dnorm(6,10), 
    w ~ dnorm(0,10), 
    sigma ~ dexp(1)), 
  data=list(T=d2$temp , B=B ) , 
  start=list(w=rep(0, ncol(B))))

plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis" )
for ( i in 1:ncol(B) ) lines( d2$year , B[,i] )

post <- extract.samples( m4.7 )
w <- apply( post$w , 2 , mean )
plot( NULL , xlim=range(d2$year) , ylim=c(-2,2) ,
xlab="year" , ylab="basis * weight" )
for ( i in 1:ncol(B) ) lines( d2$year , w[i]*B[,i] )

mu <- link( m4.7 )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$temp , col=col.alpha(rangi2,0.3) , pch=16 )
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )

# From Comparison between step 1 "Original 15 knots" and Step 2 "Increase to 40 knots", I notice that the bigger the number of knots is, the more wiggly the resulting spline is. However, I would not recommend to include to many knots, as it may lead to the overfitting problem.
# From Comparison between Step 2 "Increase to 40 knots" and Step 3 "Increased standard deviation", we can observe that the bigger the stadard deviation of the prior weight is, the thicker the resulting spline is.

4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.

  1. Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

  2. Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.

  3. What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.

data(Howell1)
set.seed(42)

d18 <- Howell1[Howell1$age < 18, ]
nrow(d18)
## [1] 192
# a)
formula <- alist(
  height ~ dnorm(mu, sigma),
  mu <- a + b * weight,
  a ~ dnorm(160, 30),
  b ~ dnorm(0, 10),
  sigma ~ dunif(0, 60)
)
m <- map(formula, data = d18)
precis(m, corr = TRUE)
##            mean         sd      5.5%     94.5%
## a     58.454019 1.39590218 56.223098 60.684940
## b      2.710225 0.06824076  2.601163  2.819287
## sigma  8.437404 0.43060681  7.749212  9.125597
# The estimate of a indicates that around 58.4 cm is a plausible height for a participant below 18 years old with a weight of 0 kg 
# The estimate of b indicates that, in this sample, we can expect an increase in height of around 2.72 cm for each additional unit of weight. 
# The estimate of σ indicates that, for participants below 18 years old, the standard deviation of heights is around 8.44 cm. 
# For each 10 unit increase in weight, the model predicts a 27.2 cm increase in height.

# b)
plot(height ~ weight, data = d18, col = col.alpha(rangi2, 0.3))

weight.seq <- seq(from = min(d18$weight), to = max(d18$weight), by = 1)
mu <- link(m, data = data.frame(weight = weight.seq))
mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.89)
lines(weight.seq, mu.mean)
shade(mu.HPDI, weight.seq)

sim.height <- sim(m, data = list(weight = weight.seq))
height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.89)
shade(height.HPDI, weight.seq)

# c) The model is not very good. We can observe that the model does not fit well when weight < 10 or weight > 30. From the raw data plot between weight and height, we can observa a parabola, which may suggest a polynomial regression or spline.