Introduction

The models so far all assume that a straight line describes the relationship. But there’s nothing special about straight lines, aside from their simplicity.

library(rethinking)
## Loading required package: rstan
## Loading required package: ggplot2
## Loading required package: StanHeaders
## rstan (Version 2.17.3, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Loading required package: parallel
## rethinking (Version 1.59)
data(Howell1)
d <- Howell1
str(d)
## 'data.frame':    544 obs. of  4 variables:
##  $ height: num  152 140 137 157 145 ...
##  $ weight: num  47.8 36.5 31.9 53 41.3 ...
##  $ age   : num  63 63 65 41 51 35 32 27 19 54 ...
##  $ male  : int  1 0 0 1 0 1 0 1 0 1 ...

Plotting height agains weight.

plot(height ~ weight, data = d)

The relationship is visibly curved, now that we’ve included the non-adult individuals see this notebook to see how filtering the non-adult population the relationship is linear.

There are many ways to model a curved relationship between two variables. Here we’ll see the POLYNOMIAL REGRESSION. In this context, “polynomial” means equations for \(\mu_i\) that add additional terms with squares, cubes and even higher powers of the predictor variable. There’s still only one predictor variable in the model, so this is still a bivariate regression. But the definition of \(\mu_i\) has more parameters now.

So the polynomial relationship will be a parabolic (second order) polynomial: a parabolic model of the mean.

\(\mu_i = \alpha + \beta_1 x_i + \beta_2 x_i^2\)

Fittin these model to data is easy. Interpreting the can be hard.

Let’s do the model

Standardize

The first thing to do is to STANDARDIZE the predictor variable. This means to first center the variable and then divide it by its standard deviation. Why do this?. Value centering let us improve the interpretation of the parameters, but what do about standardize?. This is helpful because these 2 reasons:

  1. Interpretation might be easire. For a standardized variable, **a change of one unit is equivalent to a change of one standard *deviation** In many contexts, this is more interesting and more revealing than a one unit change on the natural scale.
  2. When a predictor variables have very large values in the,. there are sometimes numerical glitches. Even well-know statstical software can suffer from these glitches,leading to mistaken estimates. These problems are very common for polynomial regression, because the square or cube of a large number can be truly massive. Standardizing largely resolves this issue.

So to standardize:

d$weight.s <- (d$weight - mean(d$weight)) /sd(d$weight)

The new variable weight.s has mean zero and standard deviation 1. No information has been lost in this procedure. Let’s plot to see that the relationship between the variable is unaltered, but now with different range on the horizontal axis:

plot(height ~ weight.s, data = d)

Fitting the model

Let’s fit the model:

\[\begin{align*} & h_i \backsim \mathcal{N}(\mu_i, \sigma) \quad \quad \quad \textrm{Likelihood}\\ &\mu_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 \quad \quad \quad \textrm{ Linear Model}\\ &\alpha \backsim \mathcal{N}(178, 100) \quad \quad \quad \textrm{ Prior for } \alpha\\ &\beta_1 \backsim \mathcal{N}(0, 10) \quad \quad \quad \textrm{ Prior for } \beta_1\\ &\beta_2 \backsim \mathcal{N}(0, 10) \quad \quad \quad \textrm{ Prior for } \beta_2\\ & \sigma \backsim \mathcal{U}(0, 50) \quad \quad \quad \textrm{ Prior for}\sigma \end{align*}\]

Doing with r code:

d$weight.s2 <- d$weight.s^2
m4.5 <- map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b1*weight.s + b2*weight.s2,
    a ~ dnorm(178, 100),
    b1 ~ dnorm(0, 10),
    b2 ~ dnorm(0, 10),
    sigma ~ dunif(0, 50)), 
  data = d)

Let’s take a look at the table of summary:

precis(m4.5, corr = TRUE)
##         Mean StdDev   5.5%  94.5%     a    b1    b2 sigma
## a     146.66   0.37 146.07 147.26  1.00 -0.39 -0.75     0
## b1     21.40   0.29  20.94  21.86 -0.39  1.00  0.53     0
## b2     -8.41   0.28  -8.86  -7.96 -0.75  0.53  1.00     0
## sigma   5.75   0.17   5.47   6.03  0.00  0.00  0.00     1

Plotting it:

weight.seq <- seq(from = -2.2, to = 2, length.out = 30)
pred_dat <- list(weight.s = weight.seq, weight.s2 = weight.seq^2 )
mu <- link(m4.5 , data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI, prob = 0.89)
sim.height <- sim(m4.5, data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
height.PI <- apply(sim.height, 2, PI, prob = 0.89)
plot(height ~ weight.s, d, col = col.alpha(rangi2, 0.5))
lines(weight.seq, mu.mean)
shade(mu.PI, weight.seq, col = "red")
shade(height.PI, weight.seq)

Cubic regression:

Let’s try to do with a higher order plynomial regression, a cubic regression on weight:

d$weight.s2 <- d$weight.s^2
d$weight.s3 <- d$weight.s^3

m4.6 <- map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b1*weight.s + b2*weight.s2 + b3*weight.s3,
    a ~ dnorm(178, 100),
    b1 ~ dnorm(0, 10),
    b2 ~ dnorm(0, 10),
    b3 ~ dnorm(0, 10),
    sigma ~ dunif(0, 50)), 
  data = d)

Plotting it:

weight.seq <- seq(from = -2.2, to = 2, length.out = 30)
pred_dat <- list(weight.s = weight.seq, weight.s2 = weight.seq^2, weight.s3 = weight.seq^3 )
mu <- link(m4.6 , data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI, prob = 0.89)
sim.height <- sim(m4.6, data = pred_dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
height.PI <- apply(sim.height, 2, PI, prob = 0.89)
plot(height ~ weight.s, d, col = col.alpha(rangi2, 0.5))
lines(weight.seq, mu.mean)
shade(mu.PI, weight.seq, col = "red")
shade(height.PI, weight.seq)

This cubic curve is even more flexible that the parabola, so it fits the data even better. But it’s not clear that any of these models make a lot of sense. They are good geocentric descriptions of the sample, yes, but not always a better fit might not actually be a better model.

Converting back to natural scale:

The former plot have standard units on the horizontal axis. These units are sometimes called z-scores. Suppose we want remain with z-scores variables but want to plot in the original scale. All that’s really needed is first to turn off the horizontal axis when you plot the raw data:

plot(height ~ weight.s, d , col = col.alpha(rangi2, 0.5) , xaxt = "n")
at <- c(-2, -1, 0, 1, 2)
labels <- at*sd(d$weight) + mean(d$weight)
axis(side = 1, at = at, labels = round(labels,1))