Assignment #3

4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

# y_i ∼ Normal(μ, σ) is the likelihood.

4E2. In the model definition just above, how many parameters are in the posterior distribution?

# There are two parameters in the model: μ and σ.

4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.

# Pr(μ,σ|y)=∏iNormal(yi|μ,σ)Normal(μ|0,10)Exponential(σ|1) / ∫ ∫ ∏iNormal(hi|μ,σ)Normal(μ|0,10)Exponential(σ|1)dμdσ

4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]

# μ_i = α + βx_i is the linear model.

4E5. In the model definition just above, how many parameters are in the posterior distribution?

# There are two parameters α and βx_i.

4M1. For the model definition below, simulate observed y values from the prior (not the posterior). \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]

sample_mu <- rnorm(1e4, 0, 10)
sample_sigma <- runif(1e4, 0, 10)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_y)

4M2. Translate the model just above into a quap formula.

formula <- alist(
  y ~ dnorm(mu, sigma),
  mu ~ dnorm(0, 10),
  sigma ~ dunif(0, 10)
)

4M3. Translate the quap model formula below into a mathematical model definition:

y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )

# fit model
flist <- alist(
  y ~ dnorm(mu, sigma),
  mu <- a + b*x,
  a ~ dnorm(0, 10),
  b ~ dunif(0, 1),
  sigma ~ dunif(0, 1)
)

4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.

#h_i ∼ Normal(μ, σ)
#μ_i = α + βx_i
#α ∼ Normal(150,25)
#β ∼ Normal(4,2)
#σ ∼ Uniform(0,30)
#For the α prior, I chose a normal distribution with centered on 150 cm with an standard deviation of 25 cm; 150 is the middele of the expected distribution if all students are inclueded and 2 standard deviation will be enough to cover the highest and lowest of the distribtion. 

#For the β prior, I chose a normal distribution centered on 4 cm/year with an standard deviation of 2 cm/year; 4 cm/year is in the middle of the expected distribution if students are included and 2 cm/year as standard deviation will be enough to cover the highest and lowest of the distribtion. 

#For the σ prior, I chose an uniform distribution from 0 cm to 40 cm. Even though 40 cm is a bit high, it would be good to have an conservative prior to begin with.

4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How?

# Yes, based on this info provided, we can narrow the range of my prior distributions to make heights and growth rates from older less likely. We will decrese prior mean from 150cm to 120 cm and decrease the SD from 20cm to 10 cm to reflect our knowledge about the average height. Also, We will recenter β prior around 7 cm/year and decrease the SD to 1cm/year. Then we will reduce the maximum value in the 𝜎 prior to 20cm because a higher SD is less likely with such a low average height.

#h_i ∼ Normal(μ, σ)
#μ_i = α + βx_i
#α ∼ Normal(120,10)
#β ∼ Normal(7,1)
#σ ∼ Uniform(0,20)

4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?

#Based on the information that variance would never more than 64cm, then σ would never more than 8cm. We should make below changes.
#h_i ∼ Normal(μ, σ)
#μ_i = α + βx_i
#α ∼ Normal(120,10)
#β ∼ Normal(7,1)
#σ ∼ Uniform(0,8)

4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.

library(rethinking)
data(Howell1)
d <- Howell1
d2 <- d[d$age >= 18 , ]
xbar <-mean(d2$weight)

m4.3a <-quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu<-a+b*(weight-xbar),
    a ~ dnorm(178, 20),
    b ~ dnorm(0,1),
    sigma ~ dunif(0,50)
  ), data=d2)
m4.3a

## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * (weight - xbar)
## a ~ dnorm(178, 20)
## b ~ dnorm(0, 1)
## sigma ~ dunif(0, 50)
## 
## Posterior means:
##          a          b      sigma 
## 154.601367   0.903448   5.072021 
## 
## Log-likelihood: -1071.01

m4.3b <- quap(
    alist(
        height ~ dnorm( mu , sigma ) ,
        mu <- a + b*weight ,
        a ~ dnorm(178, 20) ,
        b ~ dnorm( 0 , 1 ) ,
        sigma ~ dunif( 0 , 50 )
), data=d2 )
m4.3b

## 
## Quadratic approximate posterior distribution
## 
## Formula:
## height ~ dnorm(mu, sigma)
## mu <- a + b * weight
## a ~ dnorm(178, 20)
## b ~ dnorm(0, 1)
## sigma ~ dunif(0, 50)
## 
## Posterior means:
##           a           b       sigma 
## 114.5259764   0.8909155   5.0726724 
## 
## Log-likelihood: -1071.07

# Parameter a decreased while other parameters are still consistent.

4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?

data(cherry_blossoms)
d<-cherry_blossoms
precis(d)

##                   mean          sd      5.5%      94.5%
## year       1408.000000 350.8845964 867.77000 1948.23000
## doy         104.540508   6.4070362  94.43000  115.00000
## temp          6.141886   0.6636479   5.15000    7.29470
## temp_upper    7.185151   0.9929206   5.89765    8.90235
## temp_lower    5.098941   0.8503496   3.78765    6.37000
##                                                                                                                           histogram
## year                       <U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2581>
## doy                                                                <U+2581><U+2582><U+2585><U+2587><U+2587><U+2583><U+2581><U+2581>
## temp                                                               <U+2581><U+2583><U+2585><U+2587><U+2583><U+2582><U+2581><U+2581>
## temp_upper <U+2581><U+2582><U+2585><U+2587><U+2587><U+2585><U+2582><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581>
## temp_lower <U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2583><U+2585><U+2587><U+2583><U+2582><U+2581><U+2581><U+2581>

d4<-d[complete.cases(d$doy),]
numknots<-15
knot_list<-quantile(d4$year, probs=seq(0,1,length.out=numknots))
knot_list

##        0% 7.142857% 14.28571% 21.42857% 28.57143% 35.71429% 42.85714%       50% 
##       812      1036      1174      1269      1377      1454      1518      1583 
## 57.14286% 64.28571% 71.42857% 78.57143% 85.71429% 92.85714%      100% 
##      1650      1714      1774      1833      1893      1956      2015

library(splines)
B<- bs(d4$year,knots=knot_list[-c(1,numknots)],degree=3,intercept = TRUE)

plot(NULL, xlim=range(d4$year),ylim=c(0,1),  ylab="basis")
for (i in 1:ncol(B)) lines(d4$year,B[,i])

numknots2<-50
knot_list2<-quantile(d4$year, probs=seq(0,1,length.out=numknots2))
knot_list2

##        0% 2.040816% 4.081633% 6.122449% 8.163265% 10.20408%  12.2449% 14.28571% 
##  812.0000  912.8571  965.7143 1016.5714 1071.8571 1112.5714 1141.2857 1174.0000 
## 16.32653% 18.36735% 20.40816% 22.44898%  24.4898% 26.53061% 28.57143% 30.61224% 
## 1201.8571 1226.7143 1255.7143 1285.4286 1319.5714 1356.1429 1377.0000 1406.8571 
## 32.65306% 34.69388% 36.73469% 38.77551% 40.81633% 42.85714% 44.89796% 46.93878% 
## 1425.7143 1445.1429 1462.4286 1484.2857 1501.1429 1518.0000 1534.8571 1551.7143 
## 48.97959% 51.02041% 53.06122% 55.10204% 57.14286% 59.18367% 61.22449% 63.26531% 
## 1570.5714 1591.4286 1609.2857 1629.4286 1650.0000 1668.8571 1685.7143 1702.5714 
## 65.30612% 67.34694% 69.38776% 71.42857% 73.46939%  75.5102% 77.55102% 79.59184% 
## 1722.4286 1740.2857 1757.1429 1774.0000 1790.8571 1807.7143 1824.5714 1841.4286 
## 81.63265% 83.67347% 85.71429%  87.7551% 89.79592% 91.83673% 93.87755% 95.91837% 
## 1858.2857 1876.1429 1893.0000 1910.8571 1929.7143 1947.5714 1964.4286 1981.2857 
## 97.95918%      100% 
## 1998.1429 2015.0000

B2<- bs(d4$year,knots=knot_list2[-c(1,numknots2)],degree=3,intercept = TRUE)

plot(NULL, xlim=range(d4$year),ylim=c(0,1),  ylab="basis")
for (i in 1:ncol(B2)) lines(d4$year,B2[,i])

#When increase the number of knots, the spline more wiggly. Based on the chart in chapter 4, the combination of knot number and the prior on the weights controls how wiggly the spline can be.

4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.

Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.

data(Howell1)
set.seed(4)

d3 <- Howell1[Howell1$age < 18, ]
nrow(d3)

## [1] 192

#(a)
formula <- alist(
  height ~ dnorm(mu, sigma),
  mu <- a + b * weight,
  a ~ dnorm(110, 30),
  b ~ dnorm(0, 10),
  sigma ~ dunif(0, 60)
)
m <- map(formula, data = d3)
precis(m, corr = TRUE)

##            mean         sd      5.5%     94.5%
## a     58.344986 1.39574028 56.114323 60.575648
## b      2.715033 0.06823408  2.605982  2.824085
## sigma  8.437253 0.43057695  7.749108  9.125398

#For each 10 unit increase in weight, the model predicts a 27.2 cm increase in height.

#(b)
plot(height ~ weight, data = d3, col = col.alpha(rangi2, 0.3))

weight.seq <- seq(from = min(d3$weight), to = max(d3$weight), by = 1)
mu <- link(m, data = data.frame(weight = weight.seq))

mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.89)
lines(weight.seq, mu.mean)
shade(mu.HPDI, weight.seq)

## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used

## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used

## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used

## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used

sim.height <- sim(m, data = list(weight = weight.seq))

height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.89)
shade(height.HPDI, weight.seq)

## Warning in if (class(object) == "formula") {: the condition has length > 1 and
## only the first element will be used

## Warning in if (class(object) == "density") {: the condition has length > 1 and
## only the first element will be used

## Warning in if (class(object) == "matrix" & length(dim(object)) == 2) {: the
## condition has length > 1 and only the first element will be used

## Warning in if (class(object) == "matrix") {: the condition has length > 1 and
## only the first element will be used

#(c)
#As I can see from the plot, the model did not fit as good as expected. It obviously overestimates height at both low (<10) and high (>30) weights and underestimates height for the middle (10 to 30) weights. I think the assumption that the relationship between 𝜇 and weight is linear can be problematic.I would try to use apolynomial (e.g., quadratic) regression.

Assignment #3

Xinrui Zhang

2020-06-07