4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

#y_i ∼ Normal(μ, σ) is the likelihood

4E2. In the model definition just above, how many parameters are in the posterior distribution?

#μ, σ are the parameters in the posterior distribution

4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.

\[\begin{align} \Pr(\mu,\sigma |y) =\frac{\prod_{i}Normal(y_{i}|\mu,\sigma)Normal(\mu|0,10)Uniform(\sigma|0,10)}{\int\int\prod_{i}Normal(h_{i}|\mu,\sigma)Normal(\mu|0,10)Uniform(\sigma|0,10)d\mu d\sigma}\\ \end{align}\]

4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]

#μ_i = α + βx_i is linear model

4E5. In the model definition just above, how many parameters are in the posterior distribution?

#There are three parameters in the posterior distribution: α, β, and σ

4M1. For the model definition below, simulate observed y values from the prior (not the posterior). \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

set.seed(102)
sample_mu <- rnorm(1e4, 0, 10)
sample_sigma <- runif(1e4, 0, 10)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_y)

4M2. Translate the model just above into a quap formula.

formula <- alist(
  y ~ dnorm(mu, sigma),
  mu ~ dnorm(0, 10),
  sigma ~ dunif(0, 10)
)
4M3. Translate the quap model formula below into a mathematical model definition:

y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )

flist <- alist(
  y ~ dnorm(mu, sigma),
  mu <- a + b*x,
  a ~ dnorm(0, 50),
  b ~ dunif(0, 10),
  sigma ~ dunif(0, 50)
)

4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. \[\begin{align*} h_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta t_i \\ \alpha &\sim \text{Normal}(140, 25) \\ \beta &\sim \text{Normal}(3, 2) \\ \sigma &\sim \text{Uniform}(0, 50) \end{align*}\]

#No age was mention in the statement above, so I will not consider age as a parameter in the priors. hi is the height of the student. ti is the year of observation.Since, the age was not considered, There might be over 180cm student at age 17 and 100cm at age 9. I picked a normal distribution with mean 140  and standard deviation 20. For β, I picked a normal distribution with mean 0 and standard deviation 10, meaning on average, a person grows 3cm per year with standard deviation 2cm, For σ, I picked a uniform distribution over the interval [0,50], expecting that the variance among heights for students of the same age is not larger than 50cm.
#

4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? \[\begin{align*} h_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta t_i \\ \alpha &\sim \text{Normal}(140, 20) \\ \beta &\sim \text{Normal}(3, 1) \\ \sigma &\sim \text{Uniform}(0, 20) \end{align*}\]

#Knowing that every student got taller each year makes me confident that we can make the selection narrow to growing age student.I will center the α prior around 140 cm and decrease its SD to 20 cm to reflect our new knowledge about the average height. Similarly, I will recenter the β prior around 3 cm/year and decrease its SD to 1 cm/year. Last, I will reduce the maximum value in the σ prior to 20 cm, as a higher SD is less likely with such a low average height.

4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors? \[\begin{align*} h_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta t_i \\ \alpha &\sim \text{Normal}(140, 20) \\ \beta &\sim \text{Normal}(3, 1) \\ \sigma &\sim \text{Uniform}(0, 8) \end{align*}\]

#The variance is the square of σ, so if variance is never more than 64 cm, then σ is never more than 8 cm. So we can adjust the maximum of the σ prior. This information about σ may also have implications for the α prior.

4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.

data(Howell1)
d <- Howell1
d2 <- d[ d$age >= 18 , ]
xbar <- mean(d2$weight)
m4.3 <- quap(
alist(
      height ~ dnorm( mu , sigma ) , 
      mu <- a + b*( weight - xbar ) , 
      a ~ dnorm( 178 , 20 ) ,
      b ~ dlnorm( 0 , 1 ) ,
      sigma ~ dunif( 0 , 50 )), data=d2 )
m4.3
## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * (weight - xbar)
## a ~ dnorm(178, 20)
## b ~ dlnorm(0, 1)
## sigma ~ dunif(0, 50)
## 
## Posterior means:
##           a           b       sigma 
## 154.6013638   0.9032809   5.0718798 
## 
## Log-likelihood: -1071.01
round( vcov( m4.3 ) , 3 )
##           a     b sigma
## a     0.073 0.000 0.000
## b     0.000 0.002 0.000
## sigma 0.000 0.000 0.037
post <- extract.samples(m4.3)
mu.link <- function(weight) post$a + post$b*( weight - xbar ) 
weight.seq <- seq( from=25 , to=70 , by=1 )
mu <- sapply( weight.seq , mu.link )
mu.mean <- apply( mu , 2 , mean )
mu.CI <- apply( mu , 2 , PI , prob=0.89)
mu.HPDI <- apply( mu , 2 , HPDI , prob=0.89)
sim.height <- sim( m4.3 , data=list(weight=weight.seq) )
height.PI <- apply( sim.height , 2 , PI , prob=0.89 )

plot( height ~ weight , d2 , col=col.alpha(rangi2,0.5) )

lines( weight.seq , mu.mean )

shade( mu.HPDI , weight.seq )
## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used
shade( height.PI , weight.seq )
## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used

m4.3_new <- quap(
alist(
      height ~ dnorm( mu , sigma ) , 
      mu <- a + b*weight, 
      a ~ dnorm( 178 , 20 ) ,
      b ~ dlnorm( 0 , 1 ) ,
      sigma ~ dunif( 0 , 50 )), data=d2 )

m4.3_new
## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * weight
## a ~ dnorm(178, 20)
## b ~ dlnorm(0, 1)
## sigma ~ dunif(0, 50)
## 
## Posterior means:
##           a           b       sigma 
## 114.5339490   0.8907382   5.0727217 
## 
## Log-likelihood: -1071.07
round( vcov( m4.3_new) , 3 )
##            a      b sigma
## a      3.601 -0.078 0.009
## b     -0.078  0.002 0.000
## sigma  0.009  0.000 0.037
post <- extract.samples(m4.3_new)
mu.link <- function(weight) post$a + post$b*weight 
weight.seq <- seq( from=25 , to=70 , by=1 )
mu <- sapply( weight.seq , mu.link )
mu.mean <- apply( mu , 2 , mean )
mu.CI <- apply( mu , 2 , PI , prob=0.89)
mu.HPDI <- apply( mu , 2 , HPDI , prob=0.89)
sim.height <- sim( m4.3_new , data=list(weight=weight.seq) )
height.PI <- apply( sim.height , 2 , PI , prob=0.89 )

plot( height ~ weight , d2 , col=col.alpha(rangi2,0.5) )

lines( weight.seq , mu.mean )

shade( mu.HPDI , weight.seq )
## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used
shade( height.PI , weight.seq )
## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used

4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

data(cherry_blossoms) 
d <- cherry_blossoms 
d2 <- d[ complete.cases(d$temp) , ]
num_knots <- 30
knot_list <- quantile( d2$year , probs=seq(0,1,length.out=num_knots))
library(splines) 
B <- bs(d2$year, knots=knot_list[-c(1,num_knots)] , degree=3 , intercept=TRUE )
plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis value" )
for ( i in 1:ncol(B) ) lines( d2$year , B[,i])

m4.7 <- quap( alist(
                      T ~ dnorm( mu , sigma ) , mu <- a + B %*% w ,
                      a ~ dnorm(6,50),
                      w ~ dnorm(0,1),
                      sigma ~ dexp(1) ),
              data=list( T=d2$temp , B=B ) ,
              start=list( w=rep( 0 , ncol(B) ) ) )

post <- extract.samples(m4.7)
w <- apply( post$w , 2 , mean )
plot( NULL , xlim=range(d2$year) , ylim=c(-2,2) , xlab="year" , ylab="basis * weight" )
for ( i in 1:ncol(B) ) lines( d2$year , w[i]*B[,i] )

mu <- link( m4.7 )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$temp , col=col.alpha(rangi2,0.3) , pch=16 ) 
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )
## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used

4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.

  1. Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

  2. Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.

  3. What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.

d2 <- Howell1[Howell1$age < 18, ]
nrow(d2)
## [1] 192
formula <- alist(
  height ~ dnorm(mu, sigma),
  mu <- a + b * weight,
  a ~ dnorm(110, 30),
  b ~ dnorm(0, 10),
  sigma ~ dunif(0, 60)
)
m <- map(formula, data = d2)
precis(m, corr = TRUE)
##            mean         sd      5.5%     94.5%
## a     58.344299 1.39573943 56.113638 60.574960
## b      2.715063 0.06823405  2.606012  2.824114
## sigma  8.437251 0.43057670  7.749106  9.125396
plot(height ~ weight, data = d2, col = col.alpha(rangi2, 0.3))

weight.seq <- seq(from = min(d2$weight), to = max(d2$weight), by = 1)
mu <- link(m, data = data.frame(weight = weight.seq))
mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.89)
lines(weight.seq, mu.mean)
shade(mu.HPDI, weight.seq)
## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used
sim.height <- sim(m, data = list(weight = weight.seq))
height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.89)
shade(height.HPDI, weight.seq)
## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used