Geocentric Models

By: Richard McElreath

whole bit

1| SETUP DATA

d <- Howell1
d2<- subset(Howell1, age>=18)

#center predictor
xbar <- mean(d2$weight)

weight.seq <- seq(25, 70, by=1)

2| FIT MODEL

$h$ = Column of Heights | $x$ = Column of Weights

$h_i$ ~ $Normal(\mu_i, \sigma)$ $\rightarrow$ LIKELIHOOD

$\mu_i=\alpha+\beta(x_i-\overline{x})$ $\rightarrow$ $\mu$ LINEAR MODEL

$\alpha$ ~ $Normal(178, 20)$ $\rightarrow$ $\alpha$ PRIOR

$\beta$ ~ $Normal(0, 10)$ $\rightarrow$ $\beta$ PRIOR

$\sigma$ ~ $Uniform(0, 50)$ $\rightarrow$ $\sigma$ PRIOR

m4.3 <- quap(alist(height ~ dnorm(mu, sigma),mu <- a + b*(weight - xbar),a ~ dnorm(178, 20),b ~ dlnorm(0, 1),sigma ~ dunif(0, 50)),data = d2)

3| GET UNCERTAINTY IN MEAN ($\mu$)

mu <- link(m4.3, data=list(weight=weight.seq))
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI, prob=0.89)

4| GET UNCERTAINTY IN OUTCOMES (PREDICTIVE, INCLUDES ($\sigma$))

y <- sim(m4.3, data=list(weight=weight.seq))
y.PI <- apply(y, 2, PI, prob=0.89)

5| PLOT EVERYTHING

Figures. Showing the 89% prediction interval for height as a function of weihgt. The solid line is the average line for the mean height at each weight. The two shaded regions show different 89% plausible regions. The narrow shaded interval around the line is the distribution of mu. The wider shaded region represents the region within which the model expects to find 89% of the actual heights in the population at each weight. Figure from page 109. One with ggplot, the other base R.

🔔 height

## 'data.frame':    544 obs. of  4 variables:
##  $ height: num  152 140 137 157 145 ...
##  $ weight: num  47.8 36.5 31.9 53 41.3 ...
##  $ age   : num  63 63 65 41 51 35 32 27 19 54 ...
##  $ male  : int  1 0 0 1 0 1 0 1 0 1 ...

##               mean         sd      5.5%     94.5%     histogram
## height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁
## weight  35.6106176 14.7191782  9.360721  54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁
## age     29.3443934 20.7468882  1.000000  66.13500     ▇▅▅▃▅▂▂▁▁
## male     0.4724265  0.4996986  0.000000   1.00000    ▇▁▁▁▁▁▁▁▁▇

##      height          weight           age             male       
##  Min.   :136.5   Min.   :31.07   Min.   :18.00   Min.   :0.0000  
##  1st Qu.:148.6   1st Qu.:40.26   1st Qu.:28.00   1st Qu.:0.0000  
##  Median :154.3   Median :44.79   Median :39.00   Median :0.0000  
##  Mean   :154.6   Mean   :44.99   Mean   :41.14   Mean   :0.4688  
##  3rd Qu.:160.7   3rd Qu.:49.29   3rd Qu.:51.00   3rd Qu.:1.0000  
##  Max.   :179.1   Max.   :62.99   Max.   :88.00   Max.   :1.0000

1a| Decide priors

When you decide these, they should be based on what you know about the data you are looking at and not based on your data. Also, using really large sd can be wonky, see example on page 83 (tallest man alive).

$h_i$ ~ $Normal(\mu, \sigma)$ $\rightarrow$ LIKELIHOOD

$\mu$ ~ $Normal(178, 20)$ $\rightarrow$ $\mu$ PRIOR

$\sigma$ ~ $Uniform(0, 50)$ $\rightarrow$ $\sigma$ PRIOR

2a| Plot priors

MU PRIOR

SIGMA PRIOR

Here we are assuming a flat prior, or a uniform one.

3a| Simulate heights, sample priors

If you just sampled from the priors, you would get a BIG UNREALISTIC CLOUD. You want an updated posterior based on your data.

$Posterior$ $\propto$ $Likelihood*Prior$

P O S T E R I O R

Here you can see examples of brute force grid approximation, and quadratic approximations using the function quap.

🌋 Imagine

Imagine the posterior as a mountain.

Contour plot: You look at the topographic map.

Sampling: You drop 1,000 pebbles onto the mountain. They roll downhill and settle proportional to the mountain’s height.

The pile of pebbles reconstructs the mountain. That’s what your scatterplot is.

Sampling from the prior tells you what you believed before seeing data.
Sampling from the posterior tells you what you believe after seeing data
$Posterior$ $\propto$ $Likelihood*Prior$

Now that we have the distribution of samples, we can describe the distribution of confidence in each combination of mu and sigma by summarizing the samples.

A| BRUTE FORCE

4a| Grid approximation of posterior

We are doing brute force in this example, but in all future examples, we will use the quadratic approximation to estimate the posterior distribution.

BRUTE FORCE GRID APPROX

THIS LINE IS THE BAYESIAN UPDATE

post$prod <- post$LL + dnorm(post$mu, 178, 20, T)+dunif(post$sigma, 0, 50, T)

Figure. Here is the shape of the posterior, showing a deterministic surface (the entire posterior grid surfaced). The posterior thinks the true population mean is about 154, and the true population standard deviation is about 7.8. Notice it looks a bit taller than wide, that means there is a bit more uncertainty in sigma than mu.

5a| Sample posterior

To study the posterior in more detail, sample it! Since there are two parameters that we want to sample together, we will do it a bit different than chapter 3. In this plot, we go from visualizing the posterior probability surface (above) to drawing random parameter combinations from that surface in proportion to their probability.

Figure. Here are the random draws from the shape above. A random approximation of the posterior surface shown in previous figures. Basically throwing darts at your posterior distribution mountain and seeing what you get. This has probability weighted.

6a| Summarize the samples

Summarize mean, median, mode

Then pick the one that makes the most sense given the distribution. See more about these choices on page 59.

## [1] 154.6116

## [1] 154.6465

## [1] 154.6472

## [1] 7.769174

## [1] 7.767677

## [1] 7.783364

Because both distributions above are pretty symmetrical, using the mean to summarize the samples is appropriate.

Summarize the widths

To summarize the widths of the densities with posterior compatibility intervals:

##       5%      94% 
## 153.9394 155.2525

B| `QUAP`

4b| `quap`

6b| Model Fit

##             mean        sd       5.5%      94.5%
## mu    154.607024 0.4119947 153.948577 155.265471
## sigma   7.731333 0.2913860   7.265642   8.197024

##             mean        sd       5.5%      94.5%
## mu    154.607024 0.4119947 153.948576 155.265471
## sigma   7.731333 0.2913860   7.265642   8.197024

##         mu    sigma
## 1 154.6542 7.291286
## 2 154.6334 7.491592
## 3 154.8193 7.420196
## 4 154.8706 7.508450
## 5 154.6737 7.962302
## 6 154.1006 7.477165

##             mean        sd       5.5%     94.5%       histogram
## mu    154.626379 0.4088174 153.991173 155.26087 ▁▁▁▁▃▅▇▇▇▃▁▁▁▁▁
## sigma   7.742592 0.2975328   7.256779   8.22153     ▁▁▁▂▅▇▇▃▂▁▁

THESE VALUES SHOULD BE IDENTICAL TO THE GRID APPROX VALUES IN PART A.

Different priors

The only thing we are changing here between our old priors and our new priors are the following:

$h_i$ ~ $Normal(\mu, \sigma)$ $\rightarrow$ LIKELIHOOD

$\mu$ ~ $Normal(178,$ 0.01 $)$ $\rightarrow$ $\mu$ PRIOR

$\sigma$ ~ $Uniform(0, 50)$ $\rightarrow$ $\sigma$ PRIOR

The 0.01 within the prior for mu ends up changing sigma significantly, because once the golem is certain that the mean is near 178 (as we added to the mu’s prior below) then the golem has to estimate sigma on that fact. This results in a different posterior for sigma even though all we changed is the prior information about the other parameter.

##            mean        sd      5.5%    94.5%
## mu    177.86382 0.1002354 177.70363 178.0240
## sigma  24.51934 0.9290875  23.03448  26.0042

🧍height vs. weight

PREDICTOR weight

RESPONSE height

R E C A L L

$h_i$ ~ $Normal(\mu, \sigma)$ $\rightarrow$ LIKELIHOOD

$\mu$ ~ $Normal(178, 20)$ $\rightarrow$ $\mu$ PRIOR

$\sigma$ ~ $Uniform(0, 50)$ $\rightarrow$ $\sigma$ PRIOR

L I N E A R _ M O D E L _ U P D A T E:

Below, h = Column of Heights; x = Column of Weights

$h_i$ ~ $Normal(\mu_i, \sigma)$ $\rightarrow$ LIKELIHOOD

$\mu_i=\alpha+\beta(x_i-\overline{x})$ $\rightarrow$ $\mu$ LINEAR MODEL

$\alpha$ ~ $Normal(178, 20)$ $\rightarrow$ $\alpha$ PRIOR

$\beta$ ~ $Normal(0, 10)$ $\rightarrow$ $\beta$ PRIOR

$\sigma$ ~ $Uniform(0, 50)$ $\rightarrow$ $\sigma$ PRIOR

B R E A K D O W N

Jointly these 2 parameters below ask the golem to find a line that relates weight (x) to height (y), a line that passes through $\alpha$ when $x_i=\overline{x}$, and has the slope of $\beta$.

$h_i$ ~ $Normal(\mu_i, \sigma)$ $\rightarrow$ LIKELIHOOD
- Read $h_i$ & $\mu_i$ as “each h” or “each mu”.

$\mu_i=\alpha+\beta(x_i-\overline{x})$ $\rightarrow$ $\mu$ LINEAR MODEL
- Here the linear model is not stochastic (there is no ~ in the model) and $\mu$ is no longer being estimated.
- $\alpha$: What is the expected height when $x_i=\overline{x}$? It is $\alpha$. This bit is often called the INTERCEPT.
- $\beta$: Rate of change in expectation, often called SLOPE.
The remaining lines in the model define distributions for the unobserved variables. These values are often called PARAMETERS and their distributions are called PRIORS
- The 3 parameters are $\alpha$, $\beta$, and $\sigma$. In the first part of this tutorial, $\alpha$ was called $\mu$ and $\sigma$ was also present.
- $\beta$ what is up with this distribution: $Normal(0, 10)$?! Why have a normal distribution with a mean of zero?

A Better Beta

$Normal(0, 10)$

Prior predictive simulation 01:

## [1] 31.07105 62.99259

## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "linetype"
## is not a graphical parameter

Figure. Not exactly within human reason. So we will use log of normal for beta instead, see tab above.

$Log-Normal(0, 1)$

Figure. Showing the rlnorm() function at work.

Prior predictive simulation 02:

## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "linetype"
## is not a graphical parameter

Figure. Within human reason given our human height and weight data. Log-Normal is sensible.

POSTERIOR

The code needed to approximate the posterior is similar to what we have already used.

$h_i$ ~ $Normal(\mu_i, \sigma)$ $\rightarrow$ LIKELIHOOD $\rightarrow$ height ~ dnorm(mu, sigma)

$\mu_i=\alpha+\beta(x_i-\overline{x})$ $\rightarrow$ $\mu$ LINEAR MODEL $\rightarrow$ mu <- a+b*(weight-xbar)

$\alpha$ ~ $Normal(178, 20)$ $\rightarrow$ $\alpha$ PRIOR $\rightarrow$ a ~ dnorm(178, 20)

$\beta$ ~ $Normal(0, 10)$ $\rightarrow$ $\beta$ PRIOR $\rightarrow$ b ~ dlnorm(0, 1)

$\sigma$ ~ $Uniform(0, 50)$ $\rightarrow$ $\sigma$ PRIOR sigma ~ dunif(0, 50)

POSTERIOR INTERPRETATION

FIRST precis:

##              mean         sd        5.5%       94.5%
## a     154.6020886 0.27030487 154.1700892 155.0340880
## b       0.9032998 0.04192318   0.8362984   0.9703011
## sigma   5.0718271 0.19114972   4.7663330   5.3773213

INTERPRETATION OF THE ABOVE TABLE

Let’s just talk b

Here you can see the mean is 0.90. Since b is the slope, this can be interpreted as: A person that is 1 kg heavier can be expected to be 0.9 cm taller.

## [1] 89

89% of the posterior probability lies between 0.84 & 0.97 for height changes with 1 kg weight change.

SECOND vcovfor the variance-covariance matrix:

##           a     b sigma
## a     0.073 0.000 0.000
## b     0.000 0.002 0.000
## sigma 0.000 0.000 0.037

PLOTS

The following codes help with plotting the posterior and adding uncertainty around the mean:

##          a         b    sigma
## 1 154.8161 0.9074612 5.176614
## 2 154.6814 0.8692587 5.042741
## 3 155.0951 0.9303838 4.796650
## 4 154.5614 0.8461726 5.313094
## 5 154.2892 0.8130106 4.900169
## 6 155.1583 0.9275360 5.122004

Figure. These plots show the samples from teh quadratic approximate posterior distribution for the height/weight model m4.3, with increasing amounts of data (N). In each plot, 20 lines were sampled from teh posterior distribution, showing the uncertainty in the regression relationship.

PREDICTION RECIPE

Use link to generate distributions of posterior values for $\mu$.
Use summary functions like mean or PI to find averages for lower and upper bounds of $\mu$ for each value of the predictor variable.
Use plotting functions like lines and shades to draw lines and intervals. Or plot distributions of the predictions. It is up to you!

Figure. Showing the first 100 values in the distribution of mu at each weight value (plot on left). Plot on right shows all of the height/weight data with an 89% compatibility interval of the mean (shaded region).

Add in $\sigma$

We want uncertainty in both the posterior and the uncertainty in the Guassian distribution of heights. To do this:

This string is VERY similar to mu, but it contains simulated heights – not the averaged height for mu.

SUMMARIZE IT

summary

1| Regression Models the Mean

Regression does not change the likelihood.

We still assume:

\[ h_i \sim \text{Normal}(\mu_i, \sigma) \]

What changes is the mean:

\[ \mu_i = \alpha + \beta x_i \]

Regression is simply a model for how the mean moves across cases.

2| The Likelihood and the Mean Are Separate

The likelihood describes variation around the mean.
The regression equation describes how the mean changes.

Noise and structure are different ideas.

The likelihood controls noise.
The regression controls structure.

3| Two Types of Uncertainty

There are always two layers of uncertainty.

Mean uncertainty (μ): - Uncertainty in α and β
- Produces the narrow interval

Predictive uncertainty: - Includes σ
- Produces the wide interval

Predictive intervals are always wider.

4| The Posterior Is a Distribution

We do not estimate a single line.

We estimate a distribution over possible lines.

Everything — predictions, intervals, plots — comes from propagating that distribution forward.

Regression is not about “the best line.”
It is about the distribution of plausible lines.

5| Centering Changes Interpretation

When we write:

\[ \mu_i = \alpha + \beta (x_i - \overline{x}) \]

we do not change the slope.

We change what α means.

Now α represents the mean outcome at the average predictor value.

Centering improves interpretability and computation.

6| σ Is Often Skewed

Standard deviation parameters:

Cannot be negative
Often have right-skewed posteriors

This is normal.

Skewness is a geometric consequence of variance estimation.

7| Polynomials and Splines

Linear regression assumes a straight line.

Polynomials allow curvature.
Splines allow flexible curvature.

They change the form of μ.

They do not change the logic of Bayesian inference.

8| The Core of Chapter 4

Regression models the mean.
There are two kinds of uncertainty.
The posterior is a distribution, not a point.
Predictive intervals include σ.

Everything else is implementation detail.

Geocentric Models

whole bit

1| SETUP DATA

2| FIT MODEL

3| GET UNCERTAINTY IN MEAN (\(\mu\))

4| GET UNCERTAINTY IN OUTCOMES (PREDICTIVE, INCLUDES (\(\sigma\)))

5| PLOT EVERYTHING

🔔 height

1a| Decide priors

2a| Plot priors

3a| Simulate heights, sample priors

P O S T E R I O R

🌋 Imagine

A| BRUTE FORCE

4a| Grid approximation of posterior

5a| Sample posterior

6a| Summarize the samples

B| `QUAP`

4b| `quap`

6b| Model Fit

Different priors

🧍height vs. weight

A Better Beta

\(Normal(0, 10)\)

\(Log-Normal(0, 1)\)

POSTERIOR

POSTERIOR INTERPRETATION

PLOTS

PREDICTION RECIPE

Add in \(\sigma\)

summary

1| Regression Models the Mean

2| The Likelihood and the Mean Are Separate

3| Two Types of Uncertainty

4| The Posterior Is a Distribution

5| Centering Changes Interpretation

6| σ Is Often Skewed

7| Polynomials and Splines

8| The Core of Chapter 4

Geocentric Models

whole bit

1| SETUP DATA

2| FIT MODEL

3| GET UNCERTAINTY IN MEAN (\(\mu\))

4| GET UNCERTAINTY IN OUTCOMES (PREDICTIVE, INCLUDES (\(\sigma\)))

5| PLOT EVERYTHING

🔔 height

1a| Decide priors

2a| Plot priors

3a| Simulate heights, sample priors

P O S T E R I O R

🌋 Imagine

A| BRUTE FORCE

4a| Grid approximation of posterior

5a| Sample posterior

6a| Summarize the samples

B| QUAP

4b| quap

6b| Model Fit

Different priors

🧍height vs. weight

A Better Beta

\(Normal(0, 10)\)

\(Log-Normal(0, 1)\)

POSTERIOR

POSTERIOR INTERPRETATION

PLOTS

PREDICTION RECIPE

Add in \(\sigma\)

summary

1| Regression Models the Mean

2| The Likelihood and the Mean Are Separate

3| Two Types of Uncertainty

4| The Posterior Is a Distribution

5| Centering Changes Interpretation

6| σ Is Often Skewed

7| Polynomials and Splines

8| The Core of Chapter 4

B| `QUAP`

4b| `quap`