Gaussian Models and Linear Regression
February 23, 2024
Discuss: Why does the normal distribution show up so often in many apparently unrelated fields of study?
Definition: The normal distribution arises naturally from the combination of a large number of independent random events or factors.
“A typical example is a person’s height, which is determined by a combination of many independent factors, both genetic and environmental. Each of these factors may tend to increase or decrease a person’s height, just as a ball in Galton’s board may bounce to the right or the left at each level. As Galton’s board shows, when you combine many chance factors, the resulting distribution is binomial. By the Central Limit Theorem, when the number of independent factors is very large, the binomial distribution is approximated by a normal curve.” - Paul Trow
Central Limit Theorem: According to the
central limit theorem , the sum or mean of a large number of measurements randomly sampled from a non-normal population is approximately normally distributed.
Summary
Summary
Definition: The
normal distribution is a continuous probability distribution describing a bell-shaped curve. It is a good approximation to the frequency distributions of many biological variables.
The probability density function \(f(Y)\) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \(\mu\) and \(\sigma\) are mean and standard deviation of \(Y\), respectively.
The probability density function \(f(Y)\) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \(\mu\) and \(\sigma\) are mean and standard deviation of \(Y\), respectively.
Three pieces:
Binomial model
\[ \begin{array}{rl} W\!\!\!\! & \sim \mathrm{Binomial}(N,p) \\ p\!\!\!\! & \sim \mathrm{Uniform}(0,1) \end{array} \] where
\[ \begin{array}{rl} W & = \textrm{Observed count of water}, \\ N & = \textrm{Observed number of tosses}, \\ p & = \textrm{Unobserved proportion of water on globe}. \end{array} \]
Binomial model
\[ \begin{array}{rl} W\!\!\!\! & \sim \mathrm{Binomial}(N,p) \\ p\!\!\!\! & \sim \mathrm{Uniform}(0,1) \end{array} \] In words: The count \(W\) is distributed binomially with sample size \(N\) and probability \(p\). The prior for \(p\) is assumed to be uniform between zero and one.
Notation alert: The \(\sim\) means the variable on the left is stochastic, i.e. it is defined by a probability distribution.
Binomial model
\[ \begin{array}{rll} W\!\!\!\! & \sim \mathrm{Binomial}(N,p), & \color{red}{Likelihood} \\ p\!\!\!\! & \sim \mathrm{Uniform}(0,1) & \color{red}{Prior} \end{array} \]
Posterior distribution
\[ \mathrm{Pr}(p|w,n) = \frac{\mathrm{Binomial}(w|n,p)\mathrm{Uniform}(p|0,1)}{\int \mathrm{Binomial}(w|n,p)\mathrm{Uniform}(p|0,1)\,dp} \]
Binomial model
Posterior distribution
\[ \mathrm{Pr}(p|w,n) = \frac{\mathrm{Binomial}(w|n,p)\mathrm{Uniform}(p|0,1)}{\int \mathrm{Binomial}(w|n,p)\mathrm{Uniform}(p|0,1)\,dp} \]
Grid-approximation using dunif
for prior:
… quadratic approximation
\[ \begin{array}{rl} W\!\!\!\! & \sim \mathrm{Binomial}(N,p) \\ p\!\!\!\! & \sim \mathrm{Uniform}(0,1) \end{array} \]
… quadratic approximation
\[ \begin{array}{rl} W\!\!\!\! & \sim \mathrm{Binomial}(N,p) \\ p\!\!\!\! & \sim \mathrm{Uniform}(0,1) \end{array} \]
Plot approximate posterior
\(n=9\)
\(n=18\)
\(n=36\)
Black curve is density estimate of quadratic approximate posterior.
Red curve is grid-approximate posterior.
Raw data model for proportion
We can represent our data as the collection of tosses, W W L W W L W L W, instead of number of water (e.g. \(x=6\)).
In this case we can code water as 1 and land as 0. The Bernoulli distribution gives the probability of water or land for only one toss and is given by:
\[ \mathrm{Pr}(x) = \left\{\begin{array}{rl} 1-p, & x = 0 \ \mathrm{(land)}, \\ p, & x = 1 \ \mathrm{(water)} \end{array} \right. \]
Raw data model for proportion
\[ \begin{array}{rl} t_{i}\!\!\!\! & \sim \mathrm{Bernoulli}(p) \\ p\!\!\!\! & \sim \mathrm{Uniform}(0,1) \end{array} \]
Posterior distribution
\[ \mathrm{Pr}(p|t) = \frac{\Pi_{i}\mathrm{Bernoulli}(t_i|p)\mathrm{Uniform}(p|0,1)}{\int \Pi_{i}\mathrm{Bernoulli}(t_i|p)\mathrm{Uniform}(p|0,1)\,dp} \]
\(\Pi_i\) means product (since \(t_{i}\) are independent).
Raw data model for proportion
\[ \begin{array}{rl} t_{i}\!\!\!\! & \sim \mathrm{Bernoulli}(p) \\ p\!\!\!\! & \sim \mathrm{Uniform}(0,1) \end{array} \]
Raw data model for proportion
Bernoulli equivalent to binomial
\[ \mathrm{Bernoulli}(t_i|p) = \left\{\begin{array}{rl} 1-p, & t_i = 0 \ \mathrm{(land)}, \\ p, & t_i = 1 \ \mathrm{(water)} \end{array} \right. \]
Suppose data is \(t = (1, 1, 0, 1, 0)\), i.e. W W L W L. Then, likelihood is given by:
\[ \begin{array}{ll} \mathrm{Pr}(t_1, t_2, t_3, t_4, t_5) & = \Pi_{i=1}^{5}Pr(t_i) \\ & = \Pi_{i=1}^{5}\mathrm{Bernoulli}(t_i|p) \\ & = p\times p \times (1-p) \times p \times (1-p) \\ & = p^3(1-p)^2 \end{array} \]
Thus, the likelihood function is the binomial (scaled).
import {aq, op} from "@uwdata/arquero" // JavaScript dplyr
import {vl} from "@vega/vega-lite-api-v5" // JavaScript ggplot2
ss = require('simple-statistics')
viewof gaussian_mu = Inputs.range(
[-5, 5],
{value: 0.0, step: 0.1, label: "mu"}
)
viewof gaussian_sd = Inputs.range(
[0.1, 5],
{value: 1.0, step: 0.1, label: "sd"}
)
N = 1000;
The prior and posterior will be a joint probability distribution \(\mathrm{Pr}(\mu, \sigma)\).
This can be also written as \[ \mathrm{Pr}(\mu \ \mathrm{and} \ \sigma) \]
Since this is a probability distribution, we also have \[ \int\int Pr(\mu,\sigma)\,d\mu\, d\sigma = 1 \]
The prior and posterior will be a joint probability distribution \(\mathrm{Pr}(\mu, \sigma)\).
Surface plot
Contour plot
Contour plot (with marginals)
Definition: Marginal distributions of a bivariate probability distribution are probability distributions of only one of the variables after having “integrated out” the other.
For example:
\[ \begin{array}{ll} \mathrm{Pr}(\mu) & = \int \mathrm{Pr}(\mu,\sigma)\,d\sigma, \\ \mathrm{Pr}(\sigma) & = \int \mathrm{Pr}(\mu,\sigma)\,d\mu \end{array} \]
We’ll look at partial census data for the Dobe area !Kung San (from interviews by Nancy Howell in late 1960s).
The rethinking
package has a summary function that can also be useful called precis
:
mean sd 5.5% 94.5% histogram
height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁
weight 35.6106176 14.7191782 9.360721 54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁
age 29.3443934 20.7468882 1.000000 66.13500 ▇▅▅▃▅▂▂▁▁
male 0.4724265 0.4996986 0.000000 1.00000 ▇▁▁▁▁▁▁▁▁▇
The precis
function calculates mean and standard deviation of the variables for us, as well as 89% percentile intervals. It also shows tiny little histograms of each variable.
We’re going to start by modeling height with a Gaussian distribution.
Hmmmm, super long tail to the left.
To start, we’ll focus only on adults (age is highly correlated with height before adulthood).
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu,\sigma), & \color{red}{Likelihood} \\ \mu\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\mu \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Posterior distribution \[ \small{ \mathrm{Pr}(\mu,\sigma|h) = \frac{\mathrm{Likelihood}(h|\mu,\sigma)\mathrm{Pr}(\mu,\sigma)}{\int\int \mathrm{Likelihood}(h|\mu,\sigma)\mathrm{Pr}(\mu,\sigma)\,d\mu d\sigma} } \]
Usually, the priors will be independent, and thus we can specify their distributions independently: \[ \begin{array}{ll} \mathrm{Pr}(\mu, \sigma) & = \mathrm{Pr}(\mu)\,\mathrm{Pr}(\sigma) \\ & = \mathrm{Normal}(178,20)\mathrm{Uniform}(0,50) \end{array} \]
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu,\sigma), & \color{red}{Likelihood} \\ \mu\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\mu \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Posterior distribution \[ \small{ \mathrm{Pr}(\mu,\sigma|h) = \frac{\mathrm{Likelihood}(h|\mu,\sigma)\mathrm{Normal}(178,20)\mathrm{Uniform}(0,50)}{\int\int \mathrm{Likelihood}(h|\mu,\sigma)\mathrm{Normal}(178,20)\mathrm{Uniform}(0,50)\,d\mu d\sigma} } \]
Similarly, data \(h_i\) are independent, so that \[ \begin{array}{ll} \mathrm{Likelihood}(h|\mu,\sigma) & = \mathrm{Pr}(h_1 \ \mathrm{and} \ h_2 \ \mathrm{and} \ldots h_n|\mu,\sigma) \\ & = \Pi_i\mathrm{Pr}(h_i|\mu,\sigma) \\ & = \Pi_i\mathrm{Normal}(h_i|\mu,\sigma) \end{array} \]
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu,\sigma), & \color{red}{Likelihood} \\ \mu\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\mu \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Posterior distribution \[ \small{ \mathrm{Pr}(\mu,\sigma|h) = \frac{\Pi_{i}\mathrm{Normal}(h_i|\mu,\sigma)\mathrm{Normal}(\mu|178,20)\mathrm{Uniform}(\sigma|0,50)}{\int\int \Pi_{i}\mathrm{Normal}(h_i|\mu,\sigma)\mathrm{Normal}(\mu|178,20)\mathrm{Uniform}(\sigma|0,50)\,d\mu d\sigma} } \]
Sample from prior
Prior predictive distribution
Definition: Since every posterior is also potentially a prior for a subsequent analysis, we can process priors just like posteriors. We thus can define a prior predictive distribution as an average data distribution, averaged over values of the prior.
Prior predictive distributions are important to understand how our parameters affect the data (in this case observable height). This helps you diagnose bad choices for priors.
Prior predictive simulation
Consider a less informative (flatter) prior for \(\mu\).
\[ \mu \sim \mathrm{Normal}(178, 100) \]
We can compute a grid-approximation of the posterior. Won’t show code though as it is uninformative at this stage.
Sample from the posterior
As before, we can sample from the posterior distribution. With two parameters, we end up with a data frame of \((\mu, \sigma)\) pairs:
mu sigma
1 154.6465 7.585859
2 154.8485 7.444444
3 154.2424 7.686869
4 154.6465 7.747475
5 154.9495 7.868687
6 155.4545 7.727273
Sample from the posterior
Here’s a scatterplot of the samples:
\[ \mathrm{Normal}(\mathrm{h_i}|\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(h_i-\mu)^2}{2\sigma^2}}, \]
Parameter distribution
Data distributions
Sample from the posterior
Red is prior; blue is posterior.
That is a very precise posterior!
We’ll be doing quadratic approximations for the rest of Bayesian Inference in this course. \[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu,\sigma), & \color{red}{Likelihood} \\ \mu\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\mu \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Quadratic approximate posterior distribution
Formula:
height ~ dnorm(mu, sigma)
mu ~ dnorm(178, 20)
sigma ~ dunif(0, 50)
Posterior means:
mu sigma
154.606900 7.730484
Log-likelihood: -1219.41
We can use precis
on the resulting posterior to get some summary information:
mean sd 5.5% 94.5%
mu 154.606900 0.4119494 153.948525 155.265274
sigma 7.730484 0.2913060 7.264921 8.196047
Thus, our best estimates for the population parameters \(\mu\) and \(\sigma\) are 154.6068997 and 7.7304838, respectively.
We also have 89% percentile intervals for each parameter estimate.
We can get a 95% PI as well:
mean sd 2.5% 97.5%
mu 154.606900 0.4119494 153.799494 155.414306
sigma 7.730484 0.2913060 7.159534 8.301433
Compare this to a t-test:
Question: Why 89%??
“But I don’t recommend 95% intervals, because readers will have a hard time not viewing them as significance tests. 89 is also a prime number, so if someone asks you to justify it, you can stare at them meaningfully and incant, ‘Because it is prime.’ That’s no worse justification than the conventional justification of 95%.” - Richard McElreath
How does prior affect posterior?
Consider a more informative prior for \(\mu\): \[ \mu \sim \mathrm{Normal}(178, 20) \longrightarrow \mu \sim \mathrm{Normal}(178, 0.1) \]
Consider a more informative prior for \(\mu\): \[ \mu \sim \mathrm{Normal}(178, 20) \longrightarrow \mu \sim \mathrm{Normal}(178, 0.1) \]
mean sd 5.5% 94.5%
mu 154.606900 0.4119494 153.948525 155.265274
sigma 7.730484 0.2913060 7.264921 8.196047
mean sd 5.5% 94.5%
mu 177.86375 0.1002354 177.7036 178.02395
sigma 24.51761 0.9289275 23.0330 26.00221
Sampling from posterior
You can use the rethinking
function extract.samples
to get random samples from a quadratic approximate posterior.
Sampling from posterior
You can use precis
to summarize the samples as well:
mean sd 5.5% 94.5% histogram
mu 154.606772 0.4132838 153.94176 155.261869 ▁▁▅▇▂▁▁
sigma 7.730985 0.2899272 7.26226 8.201302 ▁▁▁▁▂▅▇▇▃▁▁▁▁
Values should be close to MAP estimates from before:
Posterior predictive simulation
Posterior predictive check
hist( d2$height, freq=FALSE, ylim=c(0, 0.06) ) # Trained data distribution
dens( ppd_h, xlab="Height (cm)", add = TRUE) # Density approx for posterior predictive distribution
curve( dnorm(x, precis( post )$mean[1], precis( post )$mean[2]), from=min(ppd_h), to=max(ppd_h), col="red", add=TRUE ) # MAP estimate
Question: How does height in the Kalahari foragers (the outcome variable) covary with weight (the predictor variable)?
Definition: A Linear Model models one parameter as a linear function of other parameters and one or more predictor variables.
Technically, the linear regression equation is
\[\mu_{Y\, |\, X=X^{*}} = \alpha + \beta X^{*},\]
were \(\mu_{Y\, |\, X=X^{*}}\) is the mean of \(Y\) in the sub-population with \(X=X^{*}\).
You are predicting the mean of Y given X.
We go from this… \[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu,\sigma), & \color{red}{Likelihood} \\ \mu\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\mu \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
…to this. \[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \color{red}{Likelihood} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \color{red}{Linear \ model} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\alpha \ prior} \\ \beta\!\!\!\! & \sim \mathrm{Normal}(0, 10), & \color{red}{\beta \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
where \(h_i\) is the ith person’s height (in cm), \(w_i\) is the ith person’s weight (in kg), and \(\bar{w}\) is the average weight (in kg).
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \color{red}{Likelihood} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \color{red}{Linear \ model} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\alpha \ prior} \\ \beta\!\!\!\! & \sim \mathrm{Normal}(0, 10), & \color{red}{\beta \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Bayesian inference on this model “considers all the lines that relate height to weight and ranks them by plausibility, given the data.”
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \color{red}{Likelihood} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \color{red}{Linear \ model} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\alpha \ prior} \\ \beta\!\!\!\! & \sim \mathrm{Normal}(0, 10), & \color{red}{\beta \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Question: What is the data?
Answer: \(h_i\), \(w_i\), and \(\bar{w}\).
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \color{red}{Likelihood} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \color{red}{Linear \ model} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\alpha \ prior} \\ \beta\!\!\!\! & \sim \mathrm{Normal}(0, 10), & \color{red}{\beta \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Question: What are the parameters and their units? (Hint: \(h_i\) has units cm and \(w_i\) has units kg)
Answer: \(\alpha\) (cm), \(\beta\) (cm/kg), and \(\sigma\) (cm).
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \color{red}{Likelihood} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \color{red}{Linear \ model} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\alpha \ prior} \\ \beta\!\!\!\! & \sim \mathrm{Normal}(0, 10), & \color{red}{\beta \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Question: How can we interpret \(\alpha\) and \(\beta\)?
Answer: \(\alpha\) is the average height for individuals at average weight. \(\beta\) is the change in height with 1 kg change in weight.
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \color{red}{Likelihood} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \color{red}{Linear \ model} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\alpha \ prior} \\ \beta\!\!\!\! & \sim \mathrm{Normal}(0, 10), & \color{red}{\beta \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Question: Notice anything about \(\sigma\)?
Answer: It’s constant and doesn’t vary like \(\mu_i\). This is a model assumption!
\[ \begin{array}{rll} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \color{red}{Likelihood} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \color{red}{Linear \ model} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \color{red}{\alpha \ prior} \\ \beta\!\!\!\! & \sim \mathrm{Normal}(0, 10), & \color{red}{\beta \ prior} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \color{red}{\sigma \ prior} \end{array} \]
Question: Does \(\beta=0\) have a natural interpretation?
Answer: Yes! It corresponds to no relationship between height and weight.
Prior Predictive Simulations
We’ll simulate the linear models for various randomly chosen values of \(\alpha\) and \(\beta\).
First, we’ll draw 100 random samples from each prior distribution.
Prior Predictive Simulations
Second, we’ll plot the 100 lines using the data weight values.
plot( NULL, xlim=range(d2$weight), ylim=c(-100, 400), xlab="Weight (kg)", ylab="Height (cm)" )
abline( h=0, lty=2 ) # Zero height
abline( h=262, lty=1, lwd=0.5 ) # Tallest person
wbar <- mean(d2$weight) # Average weight
for ( i in 1:N )
curve( a[i] + b[i]*(x - wbar), from=min(d2$weight), to=max(d2$weight), add=TRUE, col=col.alpha("black", 0.2) )
Prior Predictive Simulations
Prior Predictive Simulations
Prior Predictive Simulations
Let’s assume \(\beta \geq 0\). Easiest way to enforce this is to use a log-normal distribution for \(\beta\).
Prior Predictive Simulations
Draw 100 random samples from \(\alpha\) and \(\beta\) again, but his time with a log-normal for \(\beta\).
Prior Predictive Simulations
Now draw 100 lines again as before.
Much better!
Shorthand model representation
\[ \begin{array}{rlrl} h_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma), & \texttt{height} & \texttt{~ dnorm(mu, sigma)} \\ \mu_i\!\!\!\! & = \alpha + \beta(w_{i}-\bar{w}), & \texttt{mu} & \texttt{<- a + b*(weight-wbar)} \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(178, 20), & \texttt{a} & \texttt{~ dnorm(178,20)} \\ \beta\!\!\!\! & \sim \textrm{Log-Normal}(0, 1), & \texttt{b} & \texttt{~ dlnorm(0,1)} \\ \sigma\!\!\!\! & \sim \mathrm{Uniform}(0, 50) & \texttt{sigma} & \texttt{~ dunif(0,50)} \end{array} \]
Quadratic approximation
Note: Everything that depends on parameters has a posterior distribution!! In particular, mu
will have uncertainty!
Summaries of parameters
mean sd 5.5% 94.5%
a 154.6013671 0.27030766 154.1693633 155.0333710
b 0.9032807 0.04192363 0.8362787 0.9702828
sigma 5.0718809 0.19115478 4.7663786 5.3773831
Discuss: Interpret these results. What can you say about \(\alpha\)?
The estimate of average height for individuals with average weight is 154.6, with 89% of posterior probability between 154.17 and 155.03.
Summaries of parameters
mean sd 5.5% 94.5%
a 154.6013671 0.27030766 154.1693633 155.0333710
b 0.9032807 0.04192363 0.8362787 0.9702828
sigma 5.0718809 0.19115478 4.7663786 5.3773831
Discuss: Interpret these results. What can you say about \(\beta\)?
Summaries of parameters
mean sd 5.5% 94.5%
a 154.6013671 0.27030766 154.1693633 155.0333710
b 0.9032807 0.04192363 0.8362787 0.9702828
sigma 5.0718809 0.19115478 4.7663786 5.3773831
Discuss: Interpret these results. What can you say about \(\sigma\)?
The parameter \(\sigma\) can be used to estimate how variable height is around the conditional means (\(\mu_{h|w_i}\)). 95% of heights will be within -10.2 cm and 10.2 cm of the conditional means.
Posterior predictive plots
First, let’s look at the linear model using mean point estimates.
\[ \mu_i = \alpha + \beta(w_i - \bar{w}) \]
Linear model
First, let’s look at the linear model using mean point estimates:
Remember, though, that a and b are distributions! How can we add uncertainty around this line?
Let’s plot the line for each sample from the posterior.
Adding uncertainty
A nicer plot using shading.
Conditional posterior predictive distribution for height
Remember, \(\mu\) has a posterior distribution since it depends on the parameters.
Conditional posterior predictive distribution for height
Conditional posterior predictive distribution for height
post.samples <- extract.samples( post ) # (1) Sample from posterior
w_grid <- seq( from=25, to=70, by=1 ) # (2) Define grid of weights
# (3) Compute mu posterior distribution for each weight, and
# (4) Estimate mean and PI of distribution for each weight
mu.mean <- rep(0, length(w_grid))
mu.PI <- matrix(rep(0, 2*length(w_grid)), nrow=2)
for (i in 1:length(w_grid) ) {
mu <- post.samples$a + post.samples$b*(w_grid[i] - wbar)
mu.mean[i] <- mean( mu )
mu.PI[,i] <- PI( mu, prob=0.89 )
}
# (5) Plot mean height as line and PI as shaded interval
plot( height ~ weight, data=d2, col=col.alpha(rangi2, 0.7))
lines( w_grid, mu.mean, lwd=2 )
shade( mu.PI, w_grid, col = col.alpha("black", 0.5) )
Conditional posterior predictive distribution for height
We can use apply
functions to do one-line for-loops:
post.samples <- extract.samples( post ) # (1) Sample from posterior
w_grid <- seq( from=25, to=70, by=1 ) # (2) Define grid of weights
# (3) Compute mu posterior distribution for each weight, and
# (4) Estimate mean and PI of distribution for each weight
mu.link <- function(weight) post.samples$a + post.samples$b*( weight - wbar )
mu <- sapply( w_grid, mu.link )
mu.mean <- apply( mu, 2, mean )
mu.PI <- apply( mu, 2, PI, prob=0.89 )
# (5) Plot mean height as line and PI as shaded interval
plot( height ~ weight, data=d2, col=col.alpha(rangi2, 0.7))
lines( w_grid, mu.mean, lwd=2 )
shade( mu.PI, w_grid, col = col.alpha("black", 0.5) )
Conditional posterior predictive distribution for height
Recommended code: use link
function.
w_grid <- seq( from=25, to=70, by=1 ) # (2) Define grid of weights
# (1 and 3) Compute mu posterior distribution for each weight
mu <- link( post, data=data.frame( weight = w_grid ) )
# (4) Estimate mean and PI of distribution for each weight
mu.mean <- apply( mu, 2, mean )
mu.PI <- apply( mu, 2, PI, prob=0.89 )
# (5) Plot mean height as line and PI as shaded interval
plot( height ~ weight, data=d2, col=col.alpha(rangi2, 0.7))
lines( w_grid, mu.mean, lwd=2 )
shade( mu.PI, w_grid, col = col.alpha("black", 0.5) )
Conditional posterior predictive distribution for height
Recommended code: use link
function.
Conditional posterior predictive distribution for height
The link
function output:
Thus, we have 1000 sampled values from the posterior distribution of mu, for 46 given values of weight.
Prediction intervals for height
We now want to generate 89% prediction intervals for actual heights, not just the average height \(\mu\).
\[ h_i \sim \textrm{Normal}(\mu_i, \sigma) \]
The previous code visualized the uncertainty in \(\mu_i\), which is the linear model of the mean. But we didn’t look at a posterior predictive plot of the heights themselves!
The heights should be normally distributed around the mean, with spread governed by \(\sigma\), the only parameter we haven’t looked at yet.
Prediction intervals for height
Here’s how to construct an 89% prediction interval for height.
N <- 1e3 # Number of samples
post.samples <- extract.samples( post, n=N ) # Sample from posterior
w_grid <- 25:70 # Compute grid of weights
sim.height <- sapply( w_grid, function(weight) # Loop over weights
rnorm(N, # Sample from normal distribution (likelihood)
mean = post.samples$a + post.samples$b*(weight - wbar), # Use linear model for mean
sd = post.samples$sigma # Use sampled sigma
)
)
height.PI <- apply( sim.height, 2, PI, prob=0.89 ) # Compute 89% percentile intervals for each height
Prediction intervals for height
Here’s how to construct an 89% prediction interval for height.
Prediction intervals for height
Package rethinking
has a function called sim
that will (1) sample from posterior and (2) loop over weights and sample from normal distribution.
Prediction intervals for height
Construct an 89% prediction interval for height (shorthand version).
The linear in linear regression is playing double duty. It can refer to
Linear with respect to parameters means the model of the mean is of the form:
\[ \mu_i = \alpha + \beta f(w_i) \]
where \(f\) could be nonlinear!
Consider this model, which creates an exponential relationship between a predictor \(x\) and an outcome variable \(y\). \[ \begin{array}{rl} y_i & \sim \textrm{Normal}(\mu_i, 10)\\ \mu_i & = 15 + 3\exp(x_i) \end{array} \]
Here’s some simulated data from this model:
Bayesian model framework \[ \begin{array}{rl} y_{i}\!\!\!\! & \sim \mathrm{Normal}(\mu_i,\sigma) \\ \mu_i\!\!\!\! & = \alpha + \beta\exp(x_i) \\ \alpha\!\!\!\! & \sim \mathrm{Normal}(15, 5) \\ \beta\!\!\!\! & \sim \textrm{Normal}(0, 3) \\ \sigma\!\!\!\! & \sim \textrm{Log-Normal}(0, 5) \end{array} \]
Summarize posterior
\[ \begin{array}{rl} y_i & \sim \textrm{Normal}(\mu_i, 10)\\ \mu_i & = 15 + 3\exp(x_i) \end{array} \]
N <- 1000
x_grid <- seq( from=0, to=4, length.out=N )
# Uncertainty around mean
mu <- link( post, data=data.frame(x = x_grid) )
mu.mean <- apply( mu, 2, mean )
mu.PI <- apply( mu, 2, PI, prob=0.89 )
# Prediction interval for y
sim.y <- sim( post, data=data.frame(x = x_grid) )
y.PI <- apply( sim.y, 2, PI, prob=0.89 )
# PLOT!
plot( y ~ x, data=exp_data, col=rangi2 ) # Data
lines( x_grid, mu.mean ) # Exponential model for mean
shade( mu.PI, x_grid, col=col.alpha("black", 0.5) ) # Uncertainty around mean
shade( y.PI, x_grid ) # Prediction interval for y
Intro to Quantitative Biology, Spring 2024