This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
#The line y_i ∼ Normal(μ, σ) is the likelihood
4E2. In the model definition just above, how many parameters are in the posterior distribution?
#There are two parameters in the posterior distribution: μ and σ
4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.
#\[Pr(\mu,\sigma |y) =\frac{\prod_{i}Normal(y_{i}|\mu,\sigma)Normal(\mu|0,10)Uniform(\sigma|0,10)}{\int\int\prod_{i}Normal(h_{i}|\mu,\sigma)Normal(\mu|0,10)Uniform(\sigma|0,10)d\mu d\sigma}\]
4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]
#The line μ_i = α + βx_i is the linear model.
4E5. In the model definition just above, how many parameters are in the posterior distribution?
# There are three parameters in the posterior distribution: α, β and σ.
4M1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
mu <- rnorm(10000, 0, 10)
sigma <- runif(10000, 0, 10)
prior <- rnorm(10000, mu, sigma)
dens(prior)
4M2. Translate the model just above into a quap formula.
quap <- alist(
mu ~ dnorm(0, 10),
sigma ~ dexp(1),
y ~ dnorm(mu, sigma))
4M3. Translate the quap model formula below into a mathematical model definition:
y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )
#\[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i ∼ α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Uniform(0, 1) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height(cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?
#hi ∼ Normal(μ,σ)
#μi = α+βxi
#α ~ Normal(175, 15)
#β ~ Uniform(0, 15)
#σ ~ Uniform(0, 50)
Sim <- 10000
a <- rnorm(Sim, 175, 15)
b <- rnorm(Sim, 0, 15)
sigma <- runif(Sim, 3, 50)
height <- rnorm(Sim, a + b * 0, sigma)
dens(height)
#I chose an average of 66.5” height for the students taking into account both male and female students. Converting to cm, this equates to approximately 169 cm. I randomly chose 15 cm as the α (standard deviation). I chose a simulation N value of 10000 as that seems to be what we’ve used previously. For β, I chose an average of 0 and standard deviation of 15.Based on the plot below, it looks like most students will be roughly between 120 cm and 250 cm in height. One thing we can do to make these priors more likely is to have a greater understanding of what type of students the question is referring to. The choice of students is from a school or a college or kindergarten? Also, age plays a major role in this aspect when it related to students. This would impact what average we choose to start the simulations with.
4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot.
# This would lead me to change my priors because if the students are getting taller each year, it is more likely that the students are not in college or high school, perhaps elementary or primary school. In this case, we would begin with a smaller average number.
Sim <- 10000
a <- rnorm(Sim, 130, 15)
b <- rnorm(Sim, 0, 15)
sigma <- runif(Sim, 3, 50)
height <- rnorm(Sim, a + b * 0, sigma)
dens(height)
4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
# Variance is defined as the square of sigma. So, if sigma is never greater than 8 (8^2=64), we would adjust our sigma equation maximum prior to be sigma = runif(Sim, 3, 8).
4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. Show the pairs() plot. What is different? Then compare the posterior predictions of both models.
data(Howell1)
d<-Howell1
d2<-d[d$age>=18, ]
xbar <- mean(d2$weight)
m4.3 <-quap(
alist(
height ~dnorm(mu,sigma),
mu <-a+b*(weight-xbar),
a ~dnorm(178,20),
b ~dlnorm(0,1),
sigma ~dunif(0,50)
) ,data=d2)
precis(m4.3)
## mean sd 5.5% 94.5%
## a 154.6013710 0.27030750 154.1693674 155.0333746
## b 0.9032808 0.04192361 0.8362788 0.9702828
## sigma 5.0718779 0.19115450 4.7663761 5.3773797
m4.3n <-quap(
alist(
height ~dnorm(mu,sigma),
mu <-a+b,
a ~dnorm(178,20),
b ~dlnorm(0,1),
sigma ~dunif(0,50)
) ,data=d2)
precis(m4.3n)
## mean sd 5.5% 94.5%
## a 154.2465717 0.5452629 153.3751362 155.1180072
## b 0.3608285 0.3573249 -0.2102457 0.9319028
## sigma 7.7313161 0.2913844 7.2656274 8.1970047
# From the two results below, we see that the after removing the mean weight- xbar, the alpha value seems to be similar. The value of beta is lower in the second model m4.3n and sigma is higher in the second model.
4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?
data(cherry_blossoms)
d <- cherry_blossoms
d2 <- d[complete.cases(d$doy), ]
library(splines)
num_knots <- 60
knot_list <- quantile(d2$year, probs=seq(0,1,length.out=num_knots))
B <- bs(d2$year, knots=knot_list[-c(1,num_knots)] ,degree=3, intercept=TRUE)
ms <- quap(
alist(
D ~ dnorm(mu, sigma),
mu <- a + B %*% w,
a ~ dnorm(100, 10),
w ~ dnorm(0, 10),
sigma ~ dexp(1)
), data=list(D=d2$doy, B=B),
start = list(w=rep(0, ncol(B)))
)
plot(NULL, xlim=range(d2$year), ylim=c(0,1), xlab="year", ylab="basis value")
for (i in 1:ncol(B)) lines(d2$year, B[,i])
ms <- quap(
alist(
D ~ dnorm(mu, sigma),
mu <- a + B %*% w,
a ~ dnorm(100, 10),
w ~ dnorm(0, 100),
sigma ~ dexp(1)
), data=list(D=d2$doy, B=B),
start = list(w=rep(0, ncol(B)))
)
# Looking at the plot, it is evident that by increasing the number of knots makes the chart look more compressed because more arches are fitted in the same area, making it more difficult to read. Also, by increasing the width, the chart looks more expanded and easier to read.
4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.
Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.
data(Howell1)
d <- Howell1
d1 <- d[d$age < 18, ]
nrow(d1)
## [1] 192
xbar <- mean(d1$weight)
xbar
## [1] 18.41419
modeld1 <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b * (weight - xbar),
a ~ dnorm(156, 20),
b ~ dlnorm(0, 1),
sigma ~ dunif(0, 50)
),
data=d1
)
precis(modeld1)
## mean sd 5.5% 94.5%
## a 108.361855 0.60863579 107.389137 109.334572
## b 2.716606 0.06831565 2.607425 2.825788
## sigma 8.437193 0.43056992 7.749060 9.125327
# (A) From the result below, we see that as the beta mean for the model is 2.72, this means that for every 10 units, the child is expected to grow 10*2.72 = 27.2cm.
# (B)
plot(height ~ weight, data = d1, col = col.alpha(rangi2, 0.3))
weight.seq <- seq(from = min(d1$weight), to = max(d1$weight), by = 1)
mu <- link(modeld1, data = data.frame(weight = weight.seq))
mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, prob = 0.89)
lines(weight.seq, mu.mean)
shade(mu.HPDI, weight.seq)
sim.height <- sim(modeld1, data = list(weight = weight.seq))
height.HPDI <- apply(sim.height, 2, HPDI, prob = 0.89)
shade(height.HPDI, weight.seq)
# (C) The linear model appears to only do a good job of predicting height when weight is around 10 and 30. Less than 10 and greater than 30, the model appears to predict the height above the actual values while model appears to predict the height slightly lower than the actual values when the weight ranges from 10 to 30. Based on the result,this linear model is not the best fit for the data.