4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
# see examples from book page 84. The 1st line is the likelihood. Second is prior for μ.
4E2. In the model definition just above, how many parameters are in the posterior distribution?
#There are two parameters in the posterior distribution, mu and sigma.
4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors. \[\begin{align*} y_i = \frac{\prod_i \text{Normal}(\mu, \sigma) \times \text{Normal}(\mu | 0, 10) \times \text{Uniform}(\sigma |0, 10)}{\int \int \prod_i \text{Normal}(\mu, \sigma) \times \text{Normal}(\mu | 0, 10) \times \text{Uniform}(\sigma | 0, 10) d\mu d\sigma} \end{align*}\]
4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]
# second one; μ_i = α + βx_i
4E5. In the model definition just above, how many parameters are in the posterior distribution? There are three parameters in the posterior distribution, \(\alpha, \beta and \sigma\).
4M1. For the model definition below, simulate observed y values from the prior (not the posterior). \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
sample_mu <- rnorm(1e4, 0, 10)
sample_sigma <- runif(1e4, 0, 10)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_y)
4M2. Translate the model just above into a quap formula.
formula = alist(
y ~ dnorm( mu, sig ),
mu ~ dnorm( 0, 10 ),
sig ~ dunif( 0, 10 )
)
4M3. Translate the quap model formula below into a mathematical model definition:
y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )
\[\begin{align*} y_i &\sim \text{Normal}(\mu, \sigma) \\ \mu &= \alpha + \beta x_i \\ a &\sim \text{Normal}(0, 50) \\ b &\sim \text{Uniform}(0, 10) \\ \sigma &\sim \text{Uniform}(0, 50) \end{align*}\]
4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. For the alpha prior, I chose a normal distribution centered on 160 cm with an SD of 25 cm; For the beta prior, I chose a normal distribution centered on 2 cm/year with an SD of 2 cm/year; for the sigma prior, I chose a uniform distribution from 0 cm to 50 cm. \[\begin{align*} height_i &\sim \text{Normal}(\mu, \sigma) \\ \mu &= \alpha + \beta year_i \\ \alpha &\sim \text{Normal}(160,25)\\ \beta &\sim \text{Normal}(2, 2)\\ \sigma &\sim \text{Uniform}(0, 50) \end{align*}\]
4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Since every student got taller, I assume the base height is not high. So I lowered the mean as 140 and lowered sd to be 10. I will recenter the β prior around 5 cm/year and decrease its SD to 1 cm/year. Besides, I will reduce the maximum value in the σ prior to 20 cm, because it’s less likely to have a high sd on students. \[\begin{align*} height_i &\sim \text{Normal}(\mu, \sigma) \\ \mu &= \alpha + \beta year_i \\ \alpha &\sim \text{Normal}(140,10)\\ \beta &\sim \text{Normal}(5, 1)\\ \sigma &\sim \text{Uniform}(0, 20) \end{align*}\]
4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors? adjust the sd to be the square foot of 64. \[\begin{align*} height_i &\sim \text{Normal}(\mu, \sigma) \\ \mu &= \alpha + \beta year_i \\ \alpha &\sim \text{Normal}(140,10)\\ \beta &\sim \text{Normal}(5, 1)\\ \sigma &\sim \text{Uniform}(0, 8) \end{align*}\]
4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models. a and b are decreased.
# load data again, since it's a long way back
library(rethinking)
data(Howell1)
d <- Howell1
d4 <- d[ d$age >= 18 , ]
# define the average weight, x-bar
xbar <- mean(d4$weight)
# fit model
m4.31 <- quap(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*( weight - xbar ) ,
a ~ dnorm( 178 , 20 ) ,
b ~ dlnorm( 0 , 1 ) ,
sigma ~ dunif( 0 , 50 )
) ,
data=d4 )
m4.31
##
## Quadratic approximate posterior distribution
##
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * (weight - xbar)
## a ~ dnorm(178, 20)
## b ~ dlnorm(0, 1)
## sigma ~ dunif(0, 50)
##
## Posterior means:
## a b sigma
## 154.6016904 0.9032803 5.0718833
##
## Log-likelihood: -1071.01
#The estimates for β and σ are still the same, only α is changed.
m4.32 <- quap(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*weight ,
a ~ dnorm(178, 20) ,
b ~ dnorm( 0 , 1 ) ,
sigma ~ dunif( 0 , 50 )
), data=d4 )
m4.32
##
## Quadratic approximate posterior distribution
##
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * weight
## a ~ dnorm(178, 20)
## b ~ dnorm(0, 1)
## sigma ~ dunif(0, 50)
##
## Posterior means:
## a b sigma
## 114.5263198 0.8909078 5.0727021
##
## Log-likelihood: -1071.07
4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?
data(cherry_blossoms)
d<-cherry_blossoms
precis(d)
## mean sd 5.5% 94.5% histogram
## year 1408.000000 350.8845964 867.77000 1948.23000 ▇▇▇▇▇▇▇▇▇▇▇▇▁
## doy 104.540508 6.4070362 94.43000 115.00000 ▁▂▅▇▇▃▁▁
## temp 6.141886 0.6636479 5.15000 7.29470 ▁▃▅▇▃▂▁▁
## temp_upper 7.185151 0.9929206 5.89765 8.90235 ▁▂▅▇▇▅▂▂▁▁▁▁▁▁▁
## temp_lower 5.098941 0.8503496 3.78765 6.37000 ▁▁▁▁▁▁▁▃▅▇▃▂▁▁▁
d4<-d[complete.cases(d$doy),]
numknots<-15
knot_list<-quantile(d4$year, probs=seq(0,1,length.out=numknots))
knot_list
## 0% 7.142857% 14.28571% 21.42857% 28.57143% 35.71429% 42.85714% 50%
## 812 1036 1174 1269 1377 1454 1518 1583
## 57.14286% 64.28571% 71.42857% 78.57143% 85.71429% 92.85714% 100%
## 1650 1714 1774 1833 1893 1956 2015
library(splines)
B<- bs(d4$year,knots=knot_list[-c(1,numknots)],degree=3,intercept = TRUE)
plot(NULL, xlim=range(d4$year),ylim=c(0,1), ylab="basis")
for (i in 1:ncol(B)) lines(d4$year,B[,i])
numknots2<-30
knot_list2<-quantile(d4$year, probs=seq(0,1,length.out=numknots2))
#knot_list2
B2<- bs(d4$year,knots=knot_list2[-c(1,numknots2)],degree=3,intercept = TRUE)
plot(NULL, xlim=range(d4$year),ylim=c(0,1), ylab="basis")
for (i in 1:ncol(B2)) lines(d4$year,B2[,i])
The curve with 30 knots is less smooth and even wigglier than the curve with 15 knots.
4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.
library(rethinking)
data(Howell1)
set.seed(4)
d5 <- Howell1[Howell1$age < 18, ]
nrow(d5)
## [1] 192
# Part a
formula <- alist(
height ~ dnorm(mu, sigma),
mu <- a + b * weight,
a ~ dnorm(140, 10),
b ~ dnorm(0, 10),
sigma ~ dunif(0, 60)
)
m <- quap(formula, data = d5)
precis(m, corr = TRUE)
## mean sd 5.5% 94.5%
## a 59.808780 1.39732445 57.575585 62.041974
## b 2.650638 0.06833996 2.541418 2.759859
## sigma 8.465144 0.43481329 7.770228 9.160059
# Part b
plot(height ~ weight, data = d5, col = col.alpha(rangi2, 0.3))
weight.seq <- seq(from = min(d5$weight), to = max(d5$weight), by = 1)
mu <- link(m, data = data.frame(weight = weight.seq))
mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.89)
lines(weight.seq, mu.mean)
shade(mu.HPDI, weight.seq)
sim.height <- sim(m, data = list(weight = weight.seq))
height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.89)
shade(height.HPDI, weight.seq)
#weight 10-30 is underestimated while <10 and >30 are over-estimated. The assumption may be wrong that the relationship may not be linear.