This chapter (Chapter 4) introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for ßvisualizing prior and posterior distributions.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.
Finally, upon completion, name your final output .html
file as: YourName_ANLY505-Year-Semester.html and publish
the assignment to your R Pubs account and submit the link to Canvas.
Each question is worth 5 points.
4-1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
# Get mean and standard deviation of the model
NUM_POINTS <- 1e4
sample_mean <- rnorm(NUM_POINTS, 0, 10)
sample_stdev <- runif(NUM_POINTS, 0, 10)
# Simulated distribution of prior
prior_y <- rnorm(NUM_POINTS, sample_mean, sample_stdev)
dens(prior_y)
# The observed y values from the prior generated a vaguely normal distribution centered around 0.
4-2. Translate the model just above into a quap formula. ß
# The quadratic approximation for the model above is as follow
flist <- alist(
y ~ dnorm(mu, sigma),
mu ~ dnorm(0, 10),
sigma ~ dunif(0, 10)
)
4-3. Translate the quap model formula below into a
mathematical model definition:
y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )
# Mathematical model
# y_i ~ Normal(μ, σ)
# μ_i = α + β*x_i
# α ~ Normal(0, 10)
# β ~ Uniform(0, 1)
# σ ~ Exponential(1)
4-4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height (cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?
# Model assumption:
# Average height = 130 cm with a standard deviation of 20 cm
# Average growth rate = 5 cm annually with a standard deviation of 2 cm
# Sigma between 0 and 5 to control for long tail height of taller than 200 cm
# Mathematical model
# height_i ~ Normal(μ, σ)
# μ_i = α + β*year_i
# α ~ Normal(130, 20)
# β ~ Normal(5, 2)
# σ ~ Uniform(0, 5)
# Simulate the priors chosen
alpha <- rnorm(NUM_POINTS, 130, 20)
beta <- rnorm(NUM_POINTS, 5, 2)
sigma <- runif(NUM_POINTS, 0, 5)
# Simulated distribution of prior at year 0
height_0 <- rnorm(NUM_POINTS, alpha + beta*0, sigma)
dens(height_0)
# Simulate for 50 students starting from year 0 to year 3
plot(NULL,
xlim=range(0:3),
ylim=c(0, 300),
main='Height of 50 Students over 3-Year Period',
xlab='Year',
ylab='Simulated Height (cm)')
for (i in 1:50) curve(alpha[i] + beta[i] * x,
from=0,
to=3,
add = TRUE,
col=col.alpha("black"))
# To make these prior more likely, we can incorporate actual data on age range and average height, which would help us narrow down the prior. Note that the current model simulate a pretty even growth rate across all students.
4-5. (a) Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot. (b) Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
# Part a
# Knowing that every student got taller each year, we can update the beta to be uniform instead of normal, since it will better simulate this new information
# Mathematical model
# height_i ~ Normal(μ, σ)
# μ_i = α + β*year_i
# α ~ Normal(130, 20)
# β ~ Uniform(0, 10)
# σ ~ Uniform(0, 5)
# Simulate the priors chosen
alpha <- rnorm(NUM_POINTS, 130, 20)
beta <- runif(NUM_POINTS, 0, 10)
sigma <- runif(NUM_POINTS, 0, 5)
# Simulated distribution of prior at year 0
height_0 <- rnorm(NUM_POINTS, alpha + beta*0, sigma)
dens(height_0)
# Simulate for 50 students starting from year 0 to year 3
plot(NULL,
xlim=range(0:3),
ylim=c(0, 300),
main='Height of 50 Students over 3-Year Period\nKnown Height Growth Model',
xlab='Year',
ylab='Simulated Height (cm)')
for (i in 1:50) curve(alpha[i] + beta[i] * x,
from=0,
to=3,
add = TRUE,
col=col.alpha("black"))
# Note that the new prior creates a variety of growth rate for students, so grow taller at a faster rate than others.
# Part b
# If the variance is never more than 64 cm, that means, the standard deviation is 8 cm.
# Mathematical model
# height_i ~ Normal(μ, σ)
# μ_i = α + β*year_i
# α ~ Normal(130, 20)
# β ~ Uniform(0, 10)
# σ ~ Uniform(0, 8)
# Simulate the priors chosen
alpha <- rnorm(NUM_POINTS, 130, 20)
beta <- runif(NUM_POINTS, 0, 10)
sigma <- runif(NUM_POINTS, 0, 8)
# Simulated distribution of prior at year 0
height_0 <- rnorm(NUM_POINTS, alpha + beta*0, sigma)
dens(height_0)
# Simulate for 50 students starting from year 0 to year 3
plot(NULL,
xlim=range(0:3),
ylim=c(0, 300),
main='Height of 50 Students over 3-Year Period\nKnown Height Growth and Variance Model',
xlab='Year',
ylab='Simulated Height (cm)')
for (i in 1:50) curve(alpha[i] + beta[i] * x,
from=0,
to=3,
add = TRUE,
col=col.alpha("black"))
# This new prior created a slightly different simulated height than the model with only known height growth. However, they both look quite similar.