This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
4E1. In the model definition below, which line is the likelihood? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
# Using the model on page 82, y_i ∼ Normal(μ, σ) is the likelihood
4E2. In the model definition just above, how many parameters are in the posterior distribution?
# There are two parameters in the posterior distribution μ and σ
4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.
# Using the model on page 83, Pr(μ, σ|yi) = ∏i Normal(yi|μ,σ) × Normal(μ|0,10) × Exponential(σ|1) / ∫∫ ∏i Normal(yi|μ,σ) × Normal(μ|0,10) × Exponential(σ|1) dμdσ
4E4. In the model definition below, which line is the linear model? \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ_i = α + βx_i \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(2) \\ \end{align}\]
# Using the model on page 93, μ_i = α + βx_i is the linear model
4E5. In the model definition just above, how many parameters are in the posterior distribution?
# There are three parameters in the posterior distribution α, β, and σ.
4M1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
sample_mu <- rnorm(n = 10000, mean = 0, sd = 10)
sample_sigma <- rexp(n = 10000, rate = 1)
prior_y <- rnorm( 1e4 , sample_mu , sample_sigma )
dens( prior_y )
4M2. Translate the model just above into a quap formula.
flist <- alist(
y ~ dnorm(mu, sigma),
mu ~ dnorm(0, 10),
sigma ~ dexp(1)
)
4M3. Translate the quap model formula below into a mathematical model definition:
y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )
# yi ~ Normal(µi, σ)
# µi = a + bxi
# a ~ Normal(0, 10)
# b ~ Normal(0, 1)
# σ ~ Exponential(1)
4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height(cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?
Here hi is the height, and xi is the year of the ith observation. α is the average height of a student at year zero. For α, i considerd normal distribution with mean 120 and standard deviation 20. For β, i considerd normal distribution with mean 0 and standard deviation 10. For σ, i considered a uniform distribution over the interval [ 0, 50 ].
The model mostly stays between 50cm and 150cm means most students at first year would be expected to be tall between it.
# The mathematical model definition:
# hi ~ Normal(µi, σ)
# µi = α + βxi
# α ~ Normal(120, 20)
# β ~ Normal(0, 10)
# σ ~ Uniform(0, 50)
library(rethinking)
data(Howell1);
d <- Howell1;
d2 <- d[ d$age >= 18 , ]
set.seed(2971)
N <- 100
a <- rnorm( N, 120, 20 )
b <- rnorm( N, 0, 10 )
sigma <- runif( N, 0, 50 )
plot( NULL , xlim=range(d2$weight) , ylim=c(-100,400) ,
xlab="weight" , ylab="height" )
abline( h=0 , lty=2 )
text(10, 20, "Embryo", col = "red")
abline( h=272 , lty=1 , lwd=0.5 )
text(15, 292, "World's tallest person (272cm)", col = "red")
mtext( "b ~ dnorm(0,10)" )
xbar <- mean(d2$weight)
for ( i in 1:N ) curve( a[i] + b[i]*(x - xbar) ,
from=min(d2$weight) , to=max(d2$weight) , add=TRUE ,
col=col.alpha("black",0.2) )
height <- rnorm( N, a + b * 0, sigma )
dens( height )
4M5. Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot.
Defining β as Log-Normal(0,1) means to claim that the logarithm of β has a Normal(0,1) distribution.
# The mathematical model definition:
# hi ~ Normal(µi, σ)
# µi = α + βxi
# α ~ Normal(120, 20)
# β ~ Log-Normal(0, 1)
# σ ~ Uniform(0, 50)
set.seed(2971)
N <- 100
a <- rnorm( N, 120, 20 )
b <- rlnorm( N, 0, 1 )
sigma <- runif( N, 0, 50 )
plot( NULL , xlim=range(d2$weight) , ylim=c(-100,400) ,
xlab="weight" , ylab="height" )
abline( h=0 , lty=2 )
text(10, 20, "Embryo", col = "red")
abline( h=272 , lty=1 , lwd=0.5 )
text(15, 292, "World's tallest person (272cm)", col = "red")
mtext( "b ~ rlnorm(0,1)" )
xbar <- mean(d2$weight)
for ( i in 1:N ) curve( a[i] + b[i]*(x - xbar) ,
from=min(d2$weight) , to=max(d2$weight) , add=TRUE ,
col=col.alpha("black",0.2) )
4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
# A variance of 64cm corresponds to a standard deviation of 8cm, so if variance is never more than 64 cm, then σ is never more than 8 cm. This leads to revise/adjust the maximum of the σ prior.
# hi ~ Normal(µ, σ)
# µi = α + βxi
# α ~ Normal(120, 10)
# β ~ Normal(7, 1)
# σ ~ Uniform(0, 8)
4M7. Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. Show the pairs() plot. What is different? Then compare the posterior predictions of both models.
The estimates for β and σ are still the same, only α is changed. This is because before α was the average height for when x - xbar was 0 but now α is the average height for the case that the weight is 0.
Every time the slope parameter increases a bit, the intercept changes in the opposite direction, i.e. decreases. The posterior predictions look very much the same for both models.
# load data again, since it's a long way back
library(rethinking)
data(Howell1);
d <- Howell1;
d2 <- d[ d$age >= 18 , ]
# define the average weight, x-bar
xbar <- mean(d2$weight)
# fit model
m4.3 <- quap(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*( weight - xbar ) ,
a ~ dnorm( 178 , 20 ) ,
b ~ dlnorm( 0 , 1 ) ,
sigma ~ dunif( 0 , 50 )
),
data=d2
)
precis(m4.3)
## mean sd 5.5% 94.5%
## a 154.6013689 0.27030769 154.1693650 155.033373
## b 0.9032809 0.04192364 0.8362788 0.970283
## sigma 5.0718815 0.19115484 4.7663791 5.377384
round(vcov(m4.3), 3)
## a b sigma
## a 0.073 0.000 0.000
## b 0.000 0.002 0.000
## sigma 0.000 0.000 0.037
plot( height ~ weight , data=d2 , col=rangi2, main = "Centered Model" )
post <- extract.samples( m4.3 )
a_map <- mean(post$a)
b_map <- mean(post$b)
curve( a_map + b_map*(x - xbar) , add=TRUE )
# uncentered model
m4.3_u <- quap(
alist(
height ~ dnorm( mu, sigma),
mu <- a + b * weight ,
a ~ dnorm( 178, 20),
b ~ dlnorm( 0, 1),
sigma ~ dunif(0, 50)
),
data = d2
)
precis(m4.3_u)
## mean sd 5.5% 94.5%
## a 114.5088794 1.89281648 111.4837931 117.5339657
## b 0.8913027 0.04164946 0.8247388 0.9578666
## sigma 5.0595455 0.19000935 4.7558739 5.3632172
round(vcov(m4.3_u), 3)
## a b sigma
## a 3.583 -0.078 0.009
## b -0.078 0.002 0.000
## sigma 0.009 0.000 0.036
plot( height ~ weight , data=d2 , col=rangi2, main = "Uncentered Model" )
post <- extract.samples( m4.3_u )
a_map <- mean(post$a)
b_map <- mean(post$b)
curve( a_map + b_map*x , add=TRUE )
4M8. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?
The curve with 30 knots is less smooth and even wigglier than the curve with 15 knots.
The curve with 30 knots having prior w ~ N(0,100) got even wigglier. More hitches implies there are more premise capacities and so the bend can fit littler neighborhood points of interest.
On the off chance that we turn up the standard deviation of the earlier we permit the weights to ended up bigger. In the event that the standard deviation is little, the weights will be closer to and in this way squirm around closer to the cruel line. On the off chance that the weights ended up bigger, we permit the bend to have more crests.
library(rethinking)
data(cherry_blossoms)
d <- cherry_blossoms
precis(d)
## mean sd 5.5% 94.5%
## year 1408.000000 350.8845964 867.77000 1948.23000
## doy 104.540508 6.4070362 94.43000 115.00000
## temp 6.141886 0.6636479 5.15000 7.29470
## temp_upper 7.185151 0.9929206 5.89765 8.90235
## temp_lower 5.098941 0.8503496 3.78765 6.37000
## histogram
## year <U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2581>
## doy <U+2581><U+2582><U+2585><U+2587><U+2587><U+2583><U+2581><U+2581>
## temp <U+2581><U+2583><U+2585><U+2587><U+2583><U+2582><U+2581><U+2581>
## temp_upper <U+2581><U+2582><U+2585><U+2587><U+2587><U+2585><U+2582><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581>
## temp_lower <U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2583><U+2585><U+2587><U+2583><U+2582><U+2581><U+2581><U+2581>
d2 <- d[ complete.cases(d$doy) , ] # complete cases on doy
num_knots <- 15
knot_list <- quantile( d2$year , probs=seq(0,1,length.out=num_knots) )
library(splines)
B <- bs(d2$year,
knots=knot_list[-c(1,num_knots)],
degree=3 , intercept=TRUE )
plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis", main = "15 Knots" )
for ( i in 1:ncol(B) ) lines( d2$year , B[,i] )
m4.7 <- quap(
alist(
D ~ dnorm( mu , sigma ) ,
mu <- a + B %*% w ,
a ~ dnorm(100,10),
w ~ dnorm(0,10),
sigma ~ dexp(1)
),
data=list( D=d2$doy , B=B ),
start=list( w=rep( 0 , ncol(B) ) )
)
post <- extract.samples( m4.7 )
w <- apply( post$w , 2 , mean )
plot( NULL , xlim=range(d2$year) , ylim=c(-6,6) ,
xlab="year" , ylab="basis * weight", main = "15 Knots" )
for ( i in 1:ncol(B) ) lines( d2$year , w[i]*B[,i] )
mu <- link( m4.7 )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$doy , col=col.alpha(rangi2,0.3) , pch=16, main = "15 Knots" )
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )
# change knots to 30
num_knots <- 30
knot_list <- quantile( d2$year , probs=seq(0,1,length.out=num_knots) )
library(splines)
B <- bs(d2$year,
knots=knot_list[-c(1,num_knots)],
degree=3 , intercept=TRUE )
plot( NULL , xlim=range(d2$year) , ylim=c(0,1) , xlab="year" , ylab="basis", main = "30 Knots" )
for ( i in 1:ncol(B) ) lines( d2$year , B[,i] )
m4.7 <- quap(
alist(
D ~ dnorm( mu , sigma ) ,
mu <- a + B %*% w ,
a ~ dnorm(100,10),
w ~ dnorm(0,10),
sigma ~ dexp(1)
),
data=list( D=d2$doy , B=B ),
start=list( w=rep( 0 , ncol(B) ) )
)
post <- extract.samples( m4.7 )
w <- apply( post$w , 2 , mean )
plot( NULL , xlim=range(d2$year) , ylim=c(-6,6) ,
xlab="year" , ylab="basis * weight", main = "30 Knots" )
for ( i in 1:ncol(B) ) lines( d2$year , w[i]*B[,i] )
mu <- link( m4.7 )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$doy , col=col.alpha(rangi2,0.3) , pch=16, main = "30 Knots" )
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )
# change width to 100
ms <- quap(
alist(
D ~ dnorm( mu, sigma ),
mu <- a + B %*% w,
a ~ dnorm( 100, 10 ),
w ~ dnorm( 0, 100 ),
sigma ~ dexp(1)
), data=list(D=d2$doy, B=B) ,
start = list( w=rep( 0, ncol(B)))
)
mu <- link( ms )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$doy , col=col.alpha(rangi2,0.3) , pch=16, main = "30 Knots with prior w ~ N(0,100)" )
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )
# change sd to 1
ms <- quap(
alist(
D ~ dnorm( mu, sigma ),
mu <- a + B %*% w,
a ~ dnorm( 100, 10 ),
w ~ dnorm( 0, 1 ),
sigma ~ dexp(1)
), data=list(D=d2$doy, B=B) ,
start = list( w=rep( 0, ncol(B)))
)
mu <- link( ms )
mu_PI <- apply(mu,2,PI,0.97)
plot( d2$year , d2$doy , col=col.alpha(rangi2,0.3) , pch=16, main = "30 Knots with prior w ~ N(0,1)" )
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )
4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.
Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
since the weight values are centered, the intercept a corresponds to the average height, which is here 179.1. The slope b is interpreted such that for every 10kg heavier, an individual is expected to be 26.7cm taller. The standard deviation sigma in this model suggesting a higher uncertainty in the predictions.
Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
We begin with computing the regression line by creating a arrangement over the complete run of weights for which we at that point test from the posterior distribution to compute a test of mu, of which we are able to at that point compute the mean and the 89% PI. We also compute the 89% PI for the predicted height.
What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.
The linear model doesn’t seem to be a very good fit for the data. It performs exceptionally ineffectively for the lower and higher values of weights. The main assumption that I think are problematic here are that the relationship between μ and weight is linear. One possibility to move forward the model may be to utilize a polynomial model.
d <- Howell1
d2 <- Howell1[Howell1$age < 18, ]
nrow(d2)
## [1] 192
# a)
model <- quap(
alist(
height ~ dnorm( mu, sigma) ,
mu <- a + b * ( weight - xbar ) ,
a ~ dnorm( 140, 10) ,
b ~ dlnorm( 0, 1 ) ,
sigma ~ dunif(0, 60)
),
data=d2
)
precis(model)
## mean sd 5.5% 94.5%
## a 179.081199 1.88973895 176.061032 182.101367
## b 2.668085 0.06752118 2.560173 2.775997
## sigma 8.450797 0.43261348 7.759397 9.142197
# b)
# define sequence of weights to compute predictions for
# these values will be on the horizontal axis
weight.seq <- seq( from=4, to=45, length.out = 30 )
# use link to compute mu
# for each sample from posterior
# and for each weight in weight.seq
mu <- link( model , data=data.frame(weight=weight.seq) )
str(mu)
## num [1:1000, 1:30] 69.4 70.2 69.1 70.2 69.7 ...
# summarize the distribution of mu
mu.mean <- apply( mu , 2 , mean )
mu.PI <- apply( mu , 2 , PI , prob=0.89 )
post <- extract.samples(model)
sim.height <- sim( model , data=list(weight=weight.seq) )
height.PI <- apply( sim.height , 2 , PI , prob=0.89 )
# plot raw data
plot(height ~ weight, data = d2, col = col.alpha(rangi2, 0.5))
# plot the MAP line, aka the mean mu for each weight
lines( weight.seq , mu.mean )
# plot a shaded region for 89% PI
shade( mu.PI , weight.seq )