The models so far all assume that a straight line describes the relationship. But there’s nothing special about straight lines, aside from their simplicity.
library(rethinking)
## Loading required package: rstan
## Loading required package: ggplot2
## Loading required package: StanHeaders
## rstan (Version 2.17.3, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Loading required package: parallel
## rethinking (Version 1.59)
data(Howell1)
d <- Howell1
str(d)
## 'data.frame': 544 obs. of 4 variables:
## $ height: num 152 140 137 157 145 ...
## $ weight: num 47.8 36.5 31.9 53 41.3 ...
## $ age : num 63 63 65 41 51 35 32 27 19 54 ...
## $ male : int 1 0 0 1 0 1 0 1 0 1 ...
Plotting height agains weight.
plot(height ~ weight, data = d)
The relationship is visibly curved, now that we’ve included the non-adult individuals see this notebook to see how filtering the non-adult population the relationship is linear.
There are many ways to model a curved relationship between two variables. Here we’ll see the POLYNOMIAL REGRESSION. In this context, “polynomial” means equations for \(\mu_i\) that add additional terms with squares, cubes and even higher powers of the predictor variable. There’s still only one predictor variable in the model, so this is still a bivariate regression. But the definition of \(\mu_i\) has more parameters now.
So the polynomial relationship will be a parabolic (second order) polynomial: a parabolic model of the mean.
\(\mu_i = \alpha + \beta_1 x_i + \beta_2 x_i^2\)
Fittin these model to data is easy. Interpreting the can be hard.
The first thing to do is to STANDARDIZE the predictor variable. This means to first center the variable and then divide it by its standard deviation. Why do this?. Value centering let us improve the interpretation of the parameters, but what do about standardize?. This is helpful because these 2 reasons:
So to standardize:
d$weight.s <- (d$weight - mean(d$weight)) /sd(d$weight)
The new variable weight.s has mean zero and standard deviation 1. No information has been lost in this procedure. Let’s plot to see that the relationship between the variable is unaltered, but now with different range on the horizontal axis:
plot(height ~ weight.s, data = d)
Let’s fit the model:
\[\begin{align*} & h_i \backsim \mathcal{N}(\mu_i, \sigma) \quad \quad \quad \textrm{Likelihood}\\ &\mu_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 \quad \quad \quad \textrm{ Linear Model}\\ &\alpha \backsim \mathcal{N}(178, 100) \quad \quad \quad \textrm{ Prior for } \alpha\\ &\beta_1 \backsim \mathcal{N}(0, 10) \quad \quad \quad \textrm{ Prior for } \beta_1\\ &\beta_2 \backsim \mathcal{N}(0, 10) \quad \quad \quad \textrm{ Prior for } \beta_2\\ & \sigma \backsim \mathcal{U}(0, 50) \quad \quad \quad \textrm{ Prior for}\sigma \end{align*}\]Doing with r code:
d$weight.s2 <- d$weight.s^2
m4.5 <- map(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b1*weight.s + b2*weight.s2,
a ~ dnorm(178, 100),
b1 ~ dnorm(0, 10),
b2 ~ dnorm(0, 10),
sigma ~ dunif(0, 50)),
data = d)
Let’s take a look at the table of summary:
precis(m4.5, corr = TRUE)
## Mean StdDev 5.5% 94.5% a b1 b2 sigma
## a 146.66 0.37 146.07 147.26 1.00 -0.39 -0.75 0
## b1 21.40 0.29 20.94 21.86 -0.39 1.00 0.53 0
## b2 -8.41 0.28 -8.86 -7.96 -0.75 0.53 1.00 0
## sigma 5.75 0.17 5.47 6.03 0.00 0.00 0.00 1
Plotting it:
weight.seq <- seq(from = -2.2, to = 2, length.out = 30)
pred_dat <- list(weight.s = weight.seq, weight.s2 = weight.seq^2 )
mu <- link(m4.5 , data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI, prob = 0.89)
sim.height <- sim(m4.5, data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
height.PI <- apply(sim.height, 2, PI, prob = 0.89)
plot(height ~ weight.s, d, col = col.alpha(rangi2, 0.5))
lines(weight.seq, mu.mean)
shade(mu.PI, weight.seq, col = "red")
shade(height.PI, weight.seq)
Let’s try to do with a higher order plynomial regression, a cubic regression on weight:
d$weight.s2 <- d$weight.s^2
d$weight.s3 <- d$weight.s^3
m4.6 <- map(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b1*weight.s + b2*weight.s2 + b3*weight.s3,
a ~ dnorm(178, 100),
b1 ~ dnorm(0, 10),
b2 ~ dnorm(0, 10),
b3 ~ dnorm(0, 10),
sigma ~ dunif(0, 50)),
data = d)
Plotting it:
weight.seq <- seq(from = -2.2, to = 2, length.out = 30)
pred_dat <- list(weight.s = weight.seq, weight.s2 = weight.seq^2, weight.s3 = weight.seq^3 )
mu <- link(m4.6 , data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI, prob = 0.89)
sim.height <- sim(m4.6, data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
height.PI <- apply(sim.height, 2, PI, prob = 0.89)
plot(height ~ weight.s, d, col = col.alpha(rangi2, 0.5))
lines(weight.seq, mu.mean)
shade(mu.PI, weight.seq, col = "red")
shade(height.PI, weight.seq)
This cubic curve is even more flexible that the parabola, so it fits the data even better. But it’s not clear that any of these models make a lot of sense. They are good geocentric descriptions of the sample, yes, but not always a better fit might not actually be a better model.
The former plot have standard units on the horizontal axis. These units are sometimes called z-scores. Suppose we want remain with z-scores variables but want to plot in the original scale. All that’s really needed is first to turn off the horizontal axis when you plot the raw data:
plot(height ~ weight.s, d , col = col.alpha(rangi2, 0.5) , xaxt = "n")
at <- c(-2, -1, 0, 1, 2)
labels <- at*sd(d$weight) + mean(d$weight)
axis(side = 1, at = at, labels = round(labels,1))