4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

Thus, the first line yi∼Normal(μ,σ) is the likelihood.

4E2. In the model definition just above, how many parameters are in the posterior distribution?

There are two parameters to be estimated in this model: μ and σ.

4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.

Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and prior.:

Pr(μ,σ|y)= ( ∏iNormal(yi|μ,σ)Normal(μ|0,10)Uniform(σ|0,10) ) / ( ∫∫∏iNormal(hi|μ,σ)Normal(μ|0,10)Uniform(σ|0,10)dμdσ )

4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]

Using the example on page 93. The first line is the likelihood, the second line is the linear model, the third line is the prior for α, the fourth line is the prior for β, and the fifth line is the prior for σ.

Thus, the linear model is μi=α+βxi.

4E5. In the model definition just above, how many parameters are in the posterior distribution?

There are three parameters in the posterior distribution: α, β, and σ

4M1. For the model definition below, simulate observed y values from the prior (not the posterior). \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

sample_mu <- rnorm(1e4, 0, 10)
sample_sigma <- runif(1e4, 0, 10)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_y)

4M2. Translate the model just above into a quap formula.

formula <- alist(
  y ~ dnorm(mu, sigma),
  mu ~ dnorm(0, 10),
  sigma ~ dunif(0, 10)
)
4M3. Translate the quap model formula below into a mathematical model definition:

y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )

flist <- alist(
  y ~ dnorm(mu, sigma),
  mu <- a + b*x,
  a ~ dnorm(0, 50),
  b ~ dunif(0, 10),
  sigma ~ dunif(0, 50)
)

4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.

Supposing that the same sample of students are repeatedly sampled each year, Then the observations are not independent.

I try to use linear model to predict the student height.

Assuming a weak prior for the intercept, α, that will represent the heights for students.

And a weak prior for the slope, β, that will represent yearly growth rates for students.

Finally, I will use a uniform prior for the standard deviation of heights that can cover the full range if students from all ages are included.

hi∼Normal(μ,σ)

μi=α+βxi

α∼Normal(150,25)

β∼Normal(4,2)

σ∼Uniform(0,50)

4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How?

Now knowing that the average height at the first year was 120 cm and every student got taller each year.

Thus, I can narrow the range of my prior distributions to make heights and growth rates from older ages less plausible. My expectation for σ is also much lower.

hi∼Normal(μ,σ)

μi=α+βxi

α∼Normal(120,10)

β∼Normal(7,1)

σ∼Uniform(0,20)

For example, I can center the α prior around 120 cm and lower its standard deviation to 10 cm, assuming the average height.

4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?

Because the variance is the square of σ, so when variance is never more than 64 cm, then σ should be never more than 8 cm.

Then I can adjust the maximum of the σ prior.

hi∼Normal(μ,σ)

μi=α+βxi

α∼Normal(120,10)

β∼Normal(7,1)

σ∼Uniform(0,8)

4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.

data("Howell1")
d<-Howell1
d2<-d[d$age >=18,]
xbar<-mean(d2$weight)

m4.3 <-quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu<-a+b*(weight-xbar),
    a ~ dnorm(178, 20),
    b ~dlnorm(0,1),
    sigma ~ dunif(0,50)
  ), data=d2)
m4.3
## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * (weight - xbar)
## a ~ dnorm(178, 20)
## b ~ dlnorm(0, 1)
## sigma ~ dunif(0, 50)
## 
## Posterior means:
##           a           b       sigma 
## 154.6013677   0.9032809   5.0718804 
## 
## Log-likelihood: -1071.01
data("Howell1")
m4.3b <-quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu<-a+b*(weight),
    a ~ dnorm(178, 20),
    b ~dlnorm(0,1),
    sigma ~ dunif(0,50)
  ), data=d2)
m4.3b
## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * (weight)
## a ~ dnorm(178, 20)
## b ~ dlnorm(0, 1)
## sigma ~ dunif(0, 50)
## 
## Posterior means:
##           a           b       sigma 
## 114.5336016   0.8907673   5.0729439 
## 
## Log-likelihood: -1071.07

4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

data("cherry_blossoms")
d<-cherry_blossoms
precis(d)
##                   mean          sd      5.5%      94.5%
## year       1408.000000 350.8845964 867.77000 1948.23000
## doy         104.540508   6.4070362  94.43000  115.00000
## temp          6.141886   0.6636479   5.15000    7.29470
## temp_upper    7.185151   0.9929206   5.89765    8.90235
## temp_lower    5.098941   0.8503496   3.78765    6.37000
##                                                                                                                           histogram
## year                       <U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2581>
## doy                                                                <U+2581><U+2582><U+2585><U+2587><U+2587><U+2583><U+2581><U+2581>
## temp                                                               <U+2581><U+2583><U+2585><U+2587><U+2583><U+2582><U+2581><U+2581>
## temp_upper <U+2581><U+2582><U+2585><U+2587><U+2587><U+2585><U+2582><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581>
## temp_lower <U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2583><U+2585><U+2587><U+2583><U+2582><U+2581><U+2581><U+2581>
d2<-d[complete.cases(d$doy),]
numknots<-15
knot_list<-quantile(d2$year, probs=seq(0,1,length.out=numknots))
knot_list
##        0% 7.142857% 14.28571% 21.42857% 28.57143% 35.71429% 42.85714%       50% 
##       812      1036      1174      1269      1377      1454      1518      1583 
## 57.14286% 64.28571% 71.42857% 78.57143% 85.71429% 92.85714%      100% 
##      1650      1714      1774      1833      1893      1956      2015
numknots2<-50
knot_list2<-quantile(d2$year, probs=seq(0,1,length.out=numknots2))
knot_list2
##        0% 2.040816% 4.081633% 6.122449% 8.163265% 10.20408%  12.2449% 14.28571% 
##  812.0000  912.8571  965.7143 1016.5714 1071.8571 1112.5714 1141.2857 1174.0000 
## 16.32653% 18.36735% 20.40816% 22.44898%  24.4898% 26.53061% 28.57143% 30.61224% 
## 1201.8571 1226.7143 1255.7143 1285.4286 1319.5714 1356.1429 1377.0000 1406.8571 
## 32.65306% 34.69388% 36.73469% 38.77551% 40.81633% 42.85714% 44.89796% 46.93878% 
## 1425.7143 1445.1429 1462.4286 1484.2857 1501.1429 1518.0000 1534.8571 1551.7143 
## 48.97959% 51.02041% 53.06122% 55.10204% 57.14286% 59.18367% 61.22449% 63.26531% 
## 1570.5714 1591.4286 1609.2857 1629.4286 1650.0000 1668.8571 1685.7143 1702.5714 
## 65.30612% 67.34694% 69.38776% 71.42857% 73.46939%  75.5102% 77.55102% 79.59184% 
## 1722.4286 1740.2857 1757.1429 1774.0000 1790.8571 1807.7143 1824.5714 1841.4286 
## 81.63265% 83.67347% 85.71429%  87.7551% 89.79592% 91.83673% 93.87755% 95.91837% 
## 1858.2857 1876.1429 1893.0000 1910.8571 1929.7143 1947.5714 1964.4286 1981.2857 
## 97.95918%      100% 
## 1998.1429 2015.0000

4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.

  1. Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

  2. Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.

  3. What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.

data(Howell1)
set.seed(4)

d3 <- Howell1[Howell1$age < 18, ]
nrow(d3)
## [1] 192
formula <- alist(
  height ~ dnorm(mu, sigma),
  mu <- a + b * weight,
  a ~ dnorm(110, 30),
  b ~ dnorm(0, 10),
  sigma ~ dunif(0, 60)
)
m <- map(formula, data = d3)
precis(m, corr = TRUE)
##            mean         sd      5.5%     94.5%
## a     58.344986 1.39574028 56.114323 60.575648
## b      2.715033 0.06823408  2.605982  2.824085
## sigma  8.437253 0.43057695  7.749108  9.125398
plot(height ~ weight, data = d3, col = col.alpha(rangi2, 0.3))

weight.seq <- seq(from = min(d3$weight), to = max(d3$weight), by = 1)
mu <- link(m, data = data.frame(weight = weight.seq))

mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.89)
lines(weight.seq, mu.mean)
shade(mu.HPDI, weight.seq)

sim.height <- sim(m, data = list(weight = weight.seq))

height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.89)
shade(height.HPDI, weight.seq)

Probably I need to assess the model. The linear model is not a good fit to predict height at most weights. It may overestimate height at both low end and high end.

I think that the major problem here cuold be the relationship between μ and weight is not linear. So I’d better use another model for regression.