This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard deviations can produce an observation. To fit these models to data, the chapter introduced quadratic approximation of the posterior distribution and the tool quap. It also introduced new procedures for visualizing prior and posterior distributions.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
4-1. For the model definition below, simulate observed y values from the prior (not the posterior). Make sure to plot the simulation. \[\begin{align} \ y_i ∼ Normal(μ, σ) \\ \ μ ∼ Normal(0, 10) \\ \ σ ∼ Exponential(1) \\ \end{align}\]
sample_mu <- rnorm(1e4, 0, 10)
sample_sigma <- rexp(1e4, 1)
prior_y <- rnorm(1e4, sample_mu, sample_sigma)
plot(density(prior_y))
4-2. Translate the model just above into a quap formula.
formula = alist(
y ~ dnorm( mu, sig ),
mu ~ dnorm( 0, 10 ),
sig ~ dunif( 0, 10 )
)
4-3. Translate the quap model formula below into a mathematical model definition:
y ~ dnorm( mu , sigma ),
mu <- a + b*x,
a ~ dnorm( 0 , 10 ),
b ~ dunif( 0 , 1 ),
sigma ~ dexp( 1 )
print('yi ~ Normal(μ,σ)
μi = α+βxi
α ~ Normal(0,50)
β ~ Uniform(0,10)
σ ~ Uniform(0,50)')
## [1] "yi ~ Normal(μ,σ) \nμi = α+βxi\nα ~ Normal(0,50) \nβ ~ Uniform(0,10) \nσ ~ Uniform(0,50)"
4-4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors. Simulate from the priors that you chose to see what the model expects before it sees the data. Do this by sampling from the priors. Then consider 50 students, each simulated with a different prior draw. For each student simulate 3 years. Plot the 50 linear relationships with height(cm) on the y-axis and year on the x-axis. What can we do to make these priors more likely?
print('hi ~ Normal(μ,σ)
μi = α+βxi
α ~ Normal(150,25) #β ~ Normal(4,2)
σ ~ Uniform(0,50)')
## [1] "hi ~ Normal(μ,σ) \nμi = α+βxi\nα ~ Normal(150,25) #β ~ Normal(4,2)\nσ ~ Uniform(0,50)"
4-5. (a) Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How? Again, simulate from the priors and plot.
print('Yes. Because we know that an increase in year will always lead to increased height, we know that β will be positive. Therefore, our prior should reflect this by using, for example, a log-normal distribution.
β∼Log-Normal(2,0.5))
This log-normal prior has an 89% highest density interval between about 2.3 and 14.1cm growth per year.')
## [1] "Yes. Because we know that an increase in year will always lead to increased height, we know that β will be positive. Therefore, our prior should reflect this by using, for example, a log-normal distribution.\nβ∼Log-Normal(2,0.5))\nThis log-normal prior has an 89% highest density interval between about 2.3 and 14.1cm growth per year."
library(ggplot2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
ggplot(tibble(x = c(0, 20)), aes(x = x)) +
stat_function(fun = dexp, args = list(rate = 1)) +
labs(x = expression(sigma), y = "Density")
4-6. In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?
library(rethinking)
## Loading required package: rstan
## Loading required package: StanHeaders
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
##
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
##
## extract
## Loading required package: parallel
## Loading required package: dagitty
## rethinking (Version 2.01)
##
## Attaching package: 'rethinking'
## The following object is masked from 'package:purrr':
##
## map
## The following object is masked from 'package:stats':
##
## rstudent
data(cherry_blossoms)
d<-cherry_blossoms
precis(d)
## mean sd 5.5% 94.5% histogram
## year 1408.000000 350.8845964 867.77000 1948.23000 ▇▇▇▇▇▇▇▇▇▇▇▇▁
## doy 104.540508 6.4070362 94.43000 115.00000 ▁▂▅▇▇▃▁▁
## temp 6.141886 0.6636479 5.15000 7.29470 ▁▃▅▇▃▂▁▁
## temp_upper 7.185151 0.9929206 5.89765 8.90235 ▁▂▅▇▇▅▂▂▁▁▁▁▁▁▁
## temp_lower 5.098941 0.8503496 3.78765 6.37000 ▁▁▁▁▁▁▁▃▅▇▃▂▁▁▁
d4<-d[complete.cases(d$doy),]
numknots<-15
knot_list<-quantile(d4$year, probs=seq(0,1,length.out=numknots))
knot_list
## 0% 7.142857% 14.28571% 21.42857% 28.57143% 35.71429% 42.85714% 50%
## 812 1036 1174 1269 1377 1454 1518 1583
## 57.14286% 64.28571% 71.42857% 78.57143% 85.71429% 92.85714% 100%
## 1650 1714 1774 1833 1893 1956 2015
library(splines)
B<- bs(d4$year,knots=knot_list[-c(1,numknots)],degree=3,intercept = TRUE)
plot(NULL, xlim=range(d4$year),ylim=c(0,1), ylab="basis")
for (i in 1:ncol(B)) lines(d4$year,B[,i])
numknots2<-30
knot_list2<-quantile(d4$year, probs=seq(0,1,length.out=numknots2))
B2<- bs(d4$year,knots=knot_list2[-c(1,numknots2)],degree=3,intercept = TRUE)
plot(NULL, xlim=range(d4$year),ylim=c(0,1), ylab="basis")
for (i in 1:ncol(B2)) lines(d4$year,B2[,i])