Chapter 4 - Geocentric Models

This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.


4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

#y_i ∼ Normal(μ, σ) 

4E2. In the model definition just above, how many parameters are in the posterior distribution?

#There are 2 of them:μ and σ.

4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.


4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]


4E5. In the model definition just above, how many parameters are in the posterior distribution?

#Three: α , β, and σ

4M1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]


sample_mu <- rnorm(1e4, 0, 10)
sample_sigma <- runif(1e4, 0, 10)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)

4M2. Translate the model just above into a quap formula.

formula <- alist(
  y ~ dnorm(mu, sigma),
  mu ~ dnorm(0, 10),
  sigma ~ dunif(0, 10)
4M3. Translate the quap model formula below into a mathematical model definition:

y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )

flist <- alist(
  y ~ dnorm(mu, sigma),
  mu <- a + b*x,
  a ~ dnorm(0, 50),
  b ~ dunif(0, 10),
  sigma ~ dunif(0, 50)

4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height(cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?


# Every each, if we use the same sample of students, then the observations are not independent and we should use a linear mixed model. In this case, we haven’t learned that yet, therefore, it would make more sense to use a linear model. We don't have specific students' age, so it's good to start with a weak prior for the intercept, α, that will capture likely heights for students all the way from school age children to college age young adults. The same, I will use a weak prior for the slope, β, that will capture likely yearly growth rates for this wide age range. Last but not the least, I will use a uniform prior for the standard deviation of heights that can cover the full range if students from all ages are included.

# For the alpha prior, a normal distribution centered on 150 cm with an SD of 25 cm; 150 cm is in the middle of the expected distribution if both school and college students are included and 25 cm is enough variability that two SDs around the mean should include most students at the high and low end of the age distribution. For the beta prior, I chose a normal distribution centered on 4 cm/year with an SD of 2 cm/year. For the sigma prior, the uniform distribution is from 0 cm to 50 cm; this range includes both a tight distribution of students around the same age/height and a wide range of students at both school and college ages/heights; 50 cm is a bit high

4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot.

# Since the average height at the first year was 120 cm and that every student got taller each year makes me more confident that we are talking about school age students. As a result, I can reduce the range of my prior distributions to make heights and growth rates from older ages less plausible. The expectation for 𝜎 is also much lower now too as I no longer expect a balanced mix of young and old students.

4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?

# The variance is the square of σ.If the variance is never more than 64 cm, then σ is never more than 8 cm. Therefore, the maximum of the σ prior will be changed. THe new updates would impact the 𝛼 prior. However, I am not 100% confident about the relationship to update that prior.

4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. Show the pairs() plot. What is different? Then compare the posterior predictions of both models.

d <- Howell1
d2 <- d[ d$age >= 18, ]
xbar <- mean(d2$weight)
m4.3 <- quap(alist(
      height ~ dnorm( mu , sigma ) , 
      mu <- a + b*( weight - xbar ) , 
      a ~ dnorm( 178 , 20 ) ,
      b ~ dlnorm( 0 , 1 ) ,
      sigma ~ dunif( 0 , 50 )), data=d2 )
##              mean         sd        5.5%      94.5%
## a     154.6013684 0.27030769 154.1693645 155.033372
## b       0.9032809 0.04192364   0.8362789   0.970283
## sigma   5.0718815 0.19115484   4.7663791   5.377384
#Ommitting xbar
m4.3_new <- quap(alist(
      height ~ dnorm( mu , sigma ) , 
      mu <- a + b*( weight ) , 
      a ~ dnorm( 178 , 20 ) ,
      b ~ dlnorm( 0 , 1 ) ,
      sigma ~ dunif( 0 , 50 )), data=d2 )
##              mean         sd        5.5%       94.5%
## a     114.5349267 1.89761390 111.5021732 117.5676802
## b       0.8907079 0.04175507   0.8239753   0.9574406
## sigma   5.0723551 0.19121469   4.7667571   5.3779531

4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

d <- cherry_blossoms 
d2 <- d[ complete.cases(d$temp) , ]
num_knots <- 30
knot_list <- quantile( d2$year , probs=seq(0,1,length.out=num_knots))
B <- bs(d2$year, knots=knot_list[-c(1,num_knots)] , degree=3 , intercept=TRUE )
plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis value" )
for ( i in 1:ncol(B) ) lines( d2$year , B[,i])