This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
# y_i ∼ Normal(μ, σ) is likeihood
4E2. In the model definition just above, how many parameters are in the posterior distribution?
# there are two parameters in the posterior, μ and σ
4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.
#P(μ, σ|y) = [P(y|μ, σ)P(μ)P(σ)] / [∫∫P(y|μ, σ)P(μ)P(σ)dμdσ]
#P(μ, σ|y) = [Normal(y|μ, σ)Normal(μ|0,10)Exponential(σ|1)] / [∫∫Normal(y|μ, σ)Normal(μ|0,10)Exponential(σ|1)dμdσ]
4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]
#μ_i = α + βx_i
4E5. In the model definition just above, how many parameters are in the posterior distribution?
# 3 parameters in the posterior: α, β, σ.
4M1. For the model definition below, simulate observed y values from the prior (not the posterior). \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
mu<- rnorm( 1e4 , 0 , 10 )
sigma<- rexp( 1e4 , 1 )
yprior <- rnorm( 1e4 , mu , sigma )
dens( yprior )
4M2. Translate the model just above into a quap formula.
quapformula <- alist(y ~ dnorm(mu,sigma), mu ~ dnorm(0,10), sigma ~ dnorm(0,10))
4M3. Translate the quap model formula below into a mathematical model definition:
y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )
# y_i ∼ Normal(μ, σ)
# μ_i = α + βx_i
# α ∼ Normal(0, 10)
# β ∼ Uniform(0, 1)
# σ ∼ Exponential(1)
4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.
# h_i_j ∼ Normal(μ_i_j, σ)
# μ_i_j = α + β(y_j −¯y)
# α ∼ Normal(130, 10)
# β ∼ Normal(0, 10)
# σ ∼ Exponential(1)
#
# where:
# h-- height
# y-- year
# ¯y-- average year
# index i -- student
# index j -- year
# α -- mean height
# β -- slope with the year
# σ -- same year the variable students
4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How?
# yes
# as the slope is positive, can change and use Log-Normal distribution to define β
# h_i_j ∼ Normal(μ_i_j, σ)
# μ_i_j = α + β(y_j −¯y)
# α ∼ Normal(130, 10)
# β ∼ Log-Normal(1, 0.5)
# σ ∼ Exponential(1)
4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
# can change it to:
# h_i_j ∼ Normal(μ_i_j, σ)
# μ_i_j = α + β(y_j −¯y)
# α ∼ Normal(130, 10)
# β ∼ Log-Normal(1, 0.5)
# σ ∼ Uniform(0, 8)
4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.
#omit the mean weight sbar
#library(rethinking)
data(Howell1); d <- Howell1; d2 <- d[ d$age >= 18 , ]
m4.3b <- quap(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*weight ,
a ~ dnorm( 178 , 20 ) ,
b ~ dlnorm( 0 , 1 ) ,
sigma ~ dunif( 0 , 50 )
) , data=d2 )
precis( m4.3b )
## mean sd 5.5% 94.5%
## a 114.5355708 1.89755043 111.5029187 117.5682229
## b 0.8906939 0.04175367 0.8239634 0.9574243
## sigma 5.0721810 0.19119831 4.7666092 5.3777528
data(Howell1); d <- Howell1; d2 <- d[ d$age >= 18 , ]
# define the average weight, x-bar
xbar <- mean(d2$weight)
m4.3 <- quap(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*( weight - xbar ) ,
a ~ dnorm( 178 , 20 ) ,
b ~ dlnorm( 0 , 1 ) ,
sigma ~ dunif( 0 , 50 )
) , data=d2 )
precis( m4.3 )
## mean sd 5.5% 94.5%
## a 154.6013554 0.2703113 154.1693457 155.0333651
## b 0.9032684 0.0419242 0.8362655 0.9702714
## sigma 5.0719496 0.1911613 4.7664370 5.3774623
round( vcov( m4.3b ) , 2 )
## a b sigma
## a 3.60 -0.08 0.01
## b -0.08 0.00 0.00
## sigma 0.01 0.00 0.04
#it looks there is no covariance as above, change to a correlation matrix
round( cov2cor(vcov( m4.3b )) , 2 )
## a b sigma
## a 1.00 -0.99 0.03
## b -0.99 1.00 -0.03
## sigma 0.03 -0.03 1.00
pairs(m4.3b)
the meanings and relations betweens parameters are different, the models make the same posteriro predictions. generally when center the predictor variables, setting priors is easy, but can use prior simulations to set sensible priors.
4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?
# more knots numbers, the spline will wiggly more, to a point.
#the prior weights' width includes how wiggly the spline is.
#when the prior is wide, spline wiggly more
#the number of knots and the prior weights' width can make how tightly the spline match the sample.
4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.
Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.
#(a)
# new data frame without adults
library(rethinking)
data(Howell1)
d <- Howell1
d3 <- d[ d$age < 18 , ]
str(d3)
## 'data.frame': 192 obs. of 4 variables:
## $ height: num 121.9 105.4 86.4 129.5 109.2 ...
## $ weight: num 19.6 13.9 10.5 23.6 16 ...
## $ age : num 12 8 6.5 13 7 17 16 11 17 8 ...
## $ male : int 1 0 0 1 0 1 0 1 0 1 ...
# the quadratic approximate posterior, change the data frame
xbar <- mean( d3$weight )
m <- quap(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*( weight - xbar ) ,
a ~ dnorm( 178 , 20 ),
b ~ dnorm( 0 , 10 ) ,
sigma ~ dunif( 0 , 50 )
) , data=d3 )
precis(m)
## mean sd 5.5% 94.5%
## a 108.383377 0.60866721 107.410609 109.356145
## b 2.719962 0.06829287 2.610817 2.829108
## sigma 8.437355 0.43059259 7.749185 9.125525
the result shows: average 2.7 cm increase in height
#(b)
#compute 89% intervals for the mean and predicted heights.
post <- extract.samples( m )
w.seq <- seq(from=1,to=45,length.out=50)
mu <- sapply( w.seq , function(z) mean( post$a + post$b*(z-xbar) ) )
mu.ci <- sapply( w.seq , function(z)
PI( post$a + post$b*(z-xbar) , prob=0.89 ) )
pred.ci <- sapply( w.seq , function(z)
PI( rnorm( 10000 , post$a + post$b*(z-xbar) , post$sigma) , 0.89 ) )
#plot everything
plot( height ~ weight , data=d3 ,
col=col.alpha("slateblue",0.5) , cex=0.5 )
lines( w.seq , mu )
lines( w.seq , mu.ci[1,] , lty=2 )
lines( w.seq , mu.ci[2,] , lty=2 )
lines( w.seq , pred.ci[1,] , lty=2 )
lines( w.seq , pred.ci[2,] , lty=2 )
low weight part, the predicted mean heights are above most actual heights
middle weight part: predicted mean heights are below most heights
heigh weitht part: predicted heights are above the heights
A parabolic model would work. when weights increases, we hope the slope between height and weight decreases.