This chapter (Chapter 4) introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
4-1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
library(rstan)
library(rethinking)
library(tidyverse)
# Determination for simulating the observed y values from the prior rather than posterior.
Sample_mu <- rnorm( 1e4, 0, 10 ) # Sampling mean or mu from prior determination.
# Sampling standard deviation from prior determination.
Sample_sigma <- rexp( 1e4, 1) # Exp Rate is equivalent to 1 and N = 1e4 or 10000.
# Simulation of the observed y values by sampling from the prior.
Prior_y <- rnorm( 1e4, Sample_mu, Sample_sigma)
# Plotting density of the simulation of the observed y values from prior.
Plot_y <- dens(Prior_y)
4-2. Translate the model just above into a quap formula.
# Determination of the Quap formula from the translation of the above prior model.
Quap_Formula_List <- alist(
y ~ dnorm(mu , sigma) ,
mu ~ dnorm(0, 10) ,
sigma ~ dexp(1)
)
4-3. Translate the quap model formula below into a mathematical model definition:
y ~ dnorm(mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )
# Translation of the Quap model formula into the mathematical model definition:
## y_i ∼ Normal(𝞵i, 𝝈)
## 𝞵_i = a + bxi
## a ~ Normal(0, 10)
## b ~ Normal(0, 1)
## xi ~ Normal(0, 1)
## 𝝈 ∼ Exponential(1)
4-4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height(cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?
## Determination of the mathematical model definition for the above scenario's regression.
# Note: In the mathematical model Height_i represent the students height,
# height_i ~ Normal(𝞵i, 𝝈)
## 𝞵i ~ a + bxi
# Note: Selected the normal distribution with mean of 155 cm and standard deviation of 30 cm for prior Intercept "a" and this parameter capture students heights.
# a ~ Normal(155, 30)
# Note: Selected the normal distribution with mean of 4 and standard deviation of 2 for prior slope b for the inclusion of students at high and low extremes and prior b captures the yearly growth rate for various age ranges.
## b ~ Normal(4, 2)
# Note: Selected the Uniform distribution of interval of ) to 50 cm for prior𝝈 for this model for the inclusion of wide range of students of nearly same age group and prior 𝝈 is expected to cover all or full students age ranges.
# 𝝈 ~ Uniform(0, 50)
## The depicted mathematical model definition would be as under for simulation from prior.
## Mathematical Formula definition:
# height_i ~ Normal(𝞵i, 𝝈)
# 𝞵i ~ a + bxi
# a ~ Normal(155, 30)
# b ~ Normal(4, 2)
# 𝝈 ~ Uniform(0, 50)
## Determination for Simulation from prior calculation and above models regression.
set.seed(12345)
N <- 50
a <- rnorm(N, 155, 30)
b <- rnorm(N, 4, 2)
sigma <- runif(N, 0, 50)
## Since per our defined condition the years for sample students heights were measured for period of three years.
Year <- seq(from = 1, to = 3, len = 3)
## Plotting for the 50 linear relationships with height(cm) on the y-axis and year on the x-axis.
plot(NULL, xlim = c(1,3), ylim = c(50, 300),
xlab = "Years", ylab = "Student Heights(cm)", main = "Students Heights Vs Years")
for (i in 1:50)
lines(Year, a[i] +b[i]* Year, lwd =3, col =5)
4-5. (a) Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot. (b) Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
##(a):
# Yes, since as per the additional provided scenario condition, that the students are expected to get taller each year, and also considering the fact that the students heights are in generally expected in majority to increase till the age of 18 years, so per these conditions, my choice of priors will also be influenced and to accommodate comparatively narrower ranges of prior distributions selections for height and growth rate and the revised mathematical model under the new and revised selection of prior would be as below:
## Revised Mathematical model:
# height_i ~ Normal(𝞵i, 𝝈)
# 𝞵i ~ a + bxi
# a ~ Normal(120, 10)
# b ~ Normal(5, 1)
# 𝝈 ~ Uniform(0, 15)
## Determination for Simulation from prior calculation and above models regression.
set.seed(12345)
N <- 50
a <- rnorm(N, 120, 10)
b <- rnorm(N, 5, 1)
sigma <- runif(N, 0, 15)
## Since per condition years sample students heights were measured for period of three years.
Year <- seq(from = 1, to = 3, len = 3)
## Plotting for the 50 linear relationships with height(cm) on the y-axis and year on the x-axis.
plot(NULL, xlim = c(1,3), ylim = c(50, 300),
xlab = "Years", ylab = "Student Heights(cm)", main = "Students Heights Vs Years")
for (i in 1:50)
lines(Year, a[i] +b[i]* Year, lwd =3, col =16)
##(b):
# As per the additional scenario condition that the the students height's variance of the same age is never greater than 64 cm, then as per this condition, the maximum value for our sigma would not exceed 8 cm and per this development, our revised new adjusted mathematical model from prior would be as under:
## Revised Mathematical model:
# height_i ~ Normal(𝞵i, 𝝈)
# 𝞵i ~ a + bxi
# a ~ Normal(120, 10)
# b ~ Normal(5, 1)
# 𝝈 ~ Uniform(0, 8)