itle: “Multivariate Linear Models”

uthor: “Sergio Marrero”

ate: “January 25, 2018”

utput:

html_document:

df_print: paged

toc: yes

html_notebook:

number_sections: yes

toc: yes

Introduction

In EEUU, states with many Waffle Houses per person also have some of the highest divorce rate. Could always-available waffles put marriage ar risk? Probably not. This is an example of misleading correlation. No one thinks there is any plausible mechanism by which Waffle House make divorce more likely. Instead, when we see a correlation of this kind, we inmmediately start asking about other variables that are “really” driving the relationship between waffles and divorce. In this case, Waffle House began in Georgia in the year 1955. Over time, the diners spread across the Southern United States, remaining largely bounded within it. So Waffle House is associated with the South. Divorce is noy uniquely Southern institution, but is more common anyplace that pleople marry young, and many communities in the South still frown on youn people “shacking up” and living together out of wedlock. So it’s probably just an accident of history that Waffle House and high divorce rates both occur in the South.

The truth is that correlation is not rare in nature, but rather very common.In large data sets, every pair of variables has a statitically discernible non-zero correlation. So we should never be surprised to find that two variables are correlated.But since most correlations do not indicate causal relationships, we need tools for distinguishing mere association from evidence of causation. That is why so much statistical effort is devoted to MULTIVARIATE REGRESSION, using more than one predictor variable to model an outcome. Reasons often given for mutivariate models include:

Statistical “control” for confounds. A confound is a variable that may be correlated with another variable of interest. A cofounder (also a confounding variable or confounding factor) is a variable that influences both the depent variable and independent variable causing a spurious association.

For example, the spurious waffles and divorce correlation is on possible type of confound. Let Z the variable location, in this case the value of Z is Southerness. Let it X variable which we want use to predict the outcome, in this case this variable is Waffle House density. The oucome Y will be the divorce rate. So Z will be the confound variable. It makes the variable \(Z\) with no real importance appear to be important in order to predict the outcome Y. See the next figure:

Spurious association

But confounds can hide real important variables just as easily as they can produce false ones.

Multiple causation. Even when confounds are absent, dur for example to tight experimental control, a phenomenon may really arise from multiple causes.
Interactions. Even when variables are completely uncorrelated, the importance of eac may still depend upon the other. For example, plants benefit for both light and water. But in the absence of either, the other is no benefit at all. Such interactions occur in a very large number of systems. So effective inference about one variable will usually depend upon consideration of other variables.

We will focus in this notebook in two valuable things multivariate models can help us with:

revealing spurious correlations like the Waffle House correlation with divorce.
revealing important correlations that may be masked by unrevealed correlations with other variables.

But multiple predictor variables can hurt as much as they can help. So the chapter describes some dangers of multivariate models, notably multicolinearity.

Spurious association:

Let’s leave waffles behind, at least for the moment. Let’s try to predict the divorce rate using two predictive variables: 1) marriage rate and Median Age of marriage in each state of EEUU.

On one hand we have the correlation between divorce rate and marriage rate. But does marriage rate cause divorce?. There’s is no reason high marriage rate must be correlated with divorce. It’s easy to imagine high marriage rate indicating high cultural valuation of marriage and therefore being associated with low divorce rate. So something is suspicious here.
On the another hand, we have the correlation between divorce rate and median age at marriage. Age at married is also a good predictor of divorce rate- higher age at marriage predicts less divorce.

Let’s fit a different model (separate model) for both predictive variables:

The model for marriage rate \(\beta_R\):

\[\begin{align*} & D_i \backsim \mathcal{N}(\mu_i, \sigma) \quad \quad \quad \textrm{Likelihood}\\ & \mu_i = \beta_R R_i + \alpha_R \quad \quad \quad \textrm{Linear model}\\ & \beta_R \backsim \mathcal{N}(0, 1) \quad \quad \quad \textrm{ prior for} \beta_R \\ & \alpha_R \backsim \mathcal{N}(10, 10) \quad \quad \quad \textrm{ prior for }\alpha_R \\ & \sigma \backsim \mathcal{U}(0, 1) \quad \quad \quad \sigma \textrm{ prior for} \sigma \end{align*}\]

The model for median age to marriage \(\beta_A\):

\[\begin{align*} & D_i \backsim \mathcal{N}(\mu_i, \sigma) \quad \quad \quad \textrm{Likelihood}\\ & \mu_i = \beta_A A_i + \alpha_A \quad \quad \quad \textrm{Linear model}\\ & \beta_A \backsim \mathcal{N}(0, 1) \quad \quad \quad \textrm{ prior for} \beta_A \\ & \alpha_A \backsim \mathcal{N}(10, 10) \quad \quad \quad \textrm{ prior for } \alpha_A \\ & \sigma \backsim \mathcal{U}(0, 1) \quad \quad \quad \sigma \textrm{ prior for} \sigma \end{align*}\]

Let’s code:

# load data
library(rethinking)
data(WaffleDivorce)
d <- WaffleDivorce
# standarize the predictor 

d$MedianAgeMarriage.s <- (d$MedianAgeMarriage - mean(d$MedianAgeMarriage)) /sd(d$MedianAgeMarriage)
d$Marriage.s <- (d$Marriage - mean(d$Marriage)) /sd(d$Marriage)
colnames(d)

##  [1] "Location"            "Loc"                 "Population"         
##  [4] "MedianAgeMarriage"   "Marriage"            "Marriage.SE"        
##  [7] "Divorce"             "Divorce.SE"          "WaffleHouses"       
## [10] "South"               "Slaves1860"          "Population1860"     
## [13] "PropSlaves1860"      "MedianAgeMarriage.s" "Marriage.s"

Fit the models:

# Fit first model
m5.1 <- map(
  alist(
    Divorce ~ dnorm(mu, sigma),
    mu <- a + bR*Marriage.s ,
    a ~ dnorm(10, 10),
    bR ~ dnorm(0, 1),
    sigma ~ dunif(0, 10)), 
  data = d,
  start = list(a = 8, bR = 0.5, sigma = 2)
)

# Fit first model
m5.2 <- map(
  alist(
    Divorce ~ dnorm(mu, sigma),
    mu <- a + bA*MedianAgeMarriage.s ,
    a ~ dnorm(10, 10),
    bA ~ dnorm(0, 1),
    sigma ~ dunif(0, 10)), 
  data = d,
  start = list(a = 12, bA = -0.5, sigma = 2)
)

# The following code will compute the shaded confidence region. 
Marriage.seq <- seq(from = -3, to = 3.5, length.out = 30)
mu.Marriage <- link(m5.1, data = data.frame(Marriage.s = Marriage.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.Marriage.PI <- apply(mu.Marriage, 2, PI)

MAM.seq <- seq(from = -3, to = 3.5, length.out = 30)
mu.MAM <- link(m5.2, data = data.frame(MedianAgeMarriage.s = MAM.seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.MAM.PI <- apply(mu.MAM, 2, PI)

Plot it all

par(mfrow = c(1,2))
plot( Divorce ~ Marriage.s, data = d, col = rangi2)
abline(m5.1)
shade(mu.Marriage.PI, Marriage.seq)


plot( Divorce ~ MedianAgeMarriage.s, data = d, col = rangi2)
abline(m5.2)
shade(mu.MAM.PI, MAM.seq)

Let’s see the summary:

precis(m5.1)

##       Mean StdDev 5.5% 94.5%
## a     9.69   0.24 9.31 10.07
## bR    0.64   0.23 0.27  1.02
## sigma 1.67   0.17 1.40  1.94

precis(m5.2)

##        Mean StdDev  5.5% 94.5%
## a      9.69   0.20  9.36 10.01
## bA    -1.04   0.20 -1.37 -0.72
## sigma  1.45   0.14  1.22  1.68

# Multivariate analysis:

The relation of age of marriage is stronger than the other one. But merely comparing parameter means between different bivariate regressions is no way to decide which predictor is better. Both of these predictors could provide independent value, or they could be redundant, or one could eliminate the value of the other. So we’ll build a multivariate model with the goal of measuring the partial value of each predictor. The question we want answering is:

What is the predictive value of a variable, once I already know all of the other predictor variables?

Related to this example the question is:

After I already know marriage rate, what additional value is there in also knowing age at marriage?
After I already know age at marriage, what additional value is there in also knowing marriage rate?

The model’ll looks like:

\[\begin{align*} & D_i \backsim \mathcal{N}(\mu_i, \sigma) \quad \quad \quad \quad \quad \quad \textrm{Likelihood}\\ & \mu_i = \beta_R R_i + \beta_A A_i + \alpha \quad \quad \quad \quad \quad \quad \textrm{Linear model}\\ & \beta_R \backsim \mathcal{N}(0, 1) \quad \quad \quad \quad \quad \quad \textrm{ prior for} \beta_R \\ & \beta_A \backsim \mathcal{N}(0, 1) \quad \quad \quad \quad \quad \quad \textrm{ prior for} \beta_A \\ & \alpha \backsim \mathcal{N}(10, 10) \quad \quad \quad \quad \quad \quad \textrm{ prior for } \alpha \\ & \sigma \backsim \mathcal{U}(0, 1) \quad \quad \quad \quad \quad \quad\sigma \textrm{ prior for} \sigma \end{align*}\]

The map fitting code:

m5.3 <- map(
  alist(
    Divorce ~ dnorm( mu, sigma) ,
    mu <- a + bR*Marriage.s + bA*MedianAgeMarriage.s,
    a ~ dnorm(10, 10) ,
    bR ~ dnorm(0, 1) ,
    bA ~ dnorm(0, 1) ,
    sigma ~ dunif(0, 10)
  ) ,
   data = d)

precis(m5.3)

##        Mean StdDev  5.5% 94.5%
## a      9.69   0.20  9.36 10.01
## bR    -0.13   0.28 -0.58  0.31
## bA    -1.13   0.28 -1.58 -0.69
## sigma  1.44   0.14  1.21  1.67

precis_plot(precis(m5.3))

Once we know median age at marriage for a State, there is little or no additional predictive power in also knowing the rate of marriage in that state. But how did the model achive this result? To answer that question.

Plotting multivariate posterior:

Vizualizing the posterior distribution in simple bivariate regressions, like those the previous chapter, is easy. With multivariate regression, you’’l need more plots. There’s a huge literature detailing a variety of plotting techniques that all attempt to help one understand multiple linear regression.

Let’s see three types of interpretative plots:

Predictor resitual plots: These plots show the outcome against residual predictor values
Counterfactual plots: These show the implied predictions for imaginary experiments in which the different predictor variables can be chaged independently of one another
Posterior predicion plots: These show model-based predictions against raw data, or otherwise display the error in prediction.

Each of these plot type has its advantages and deficiencies, depending upon the context and the question of interest.

Predictor residual plots:

A predictor variable residual is the averate prediction error when we use all of the other predictor variables to model a predictor of interest.

# Marriage.s ~ MedianAgeMarriage.s
m5.41 <- map(
  alist(
    Marriage.s ~ dnorm( mu, sigma),
    mu <- a + b*MedianAgeMarriage.s,
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 1),
    sigma ~ dunif( 0, 10)), 
  data = d)

# compute expected value at MAP, for each State
mu_m5.41 <- coef(m5.41)["a"] + coef(m5.41)["b"] * d$MedianAgeMarriage.s
# Compute residual for each state
d$m.resid_m5.41 <- d$Marriage.s - mu_m5.41



# MedianAgeMarriage.s ~ Marriage.s 
m5.42 <- map(
  alist(
    MedianAgeMarriage.s ~ dnorm( mu, sigma),
    mu <- a + b*Marriage.s,
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 1),
    sigma ~ dunif( 0, 10)), 
  data = d)

# compute expected value at MAP, for each State
mu_m5.42 <- coef(m5.42)["a"] + coef(m5.42)["b"] * d$Marriage.s
# Compute residual for each state
d$m.resid_m5.42 <- d$MedianAgeMarriage.s - mu_m5.42

par(mfrow = c(1,2))
plot( Marriage.s ~ MedianAgeMarriage.s, d , col = rangi2)
abline( m5.41 )
# loop over States
for (i in 1:length(d$m.resid_m5.41)){  
  x <- d$MedianAgeMarriage.s[i] # location of line segment
  y <- d$Marriage.s[i] # observed endpoint of line segment
  # draw the line segment
  lines( c(x,x) , c(mu_m5.41[i], y), lwd = 0.5, col = col.alpha("black", 0.7))
}

plot(MedianAgeMarriage.s ~ Marriage.s, d , col = rangi2)
abline( m5.42 )
# loop over States
for (i in 1:length(d$m.resid_m5.42)){  
  y <- d$MedianAgeMarriage.s[i] # location of line segment
  x <- d$Marriage.s[i] # observed endpoint of line segment
  # draw the line segment
  lines( c(x,x) , c(mu_m5.42[i], y), lwd = 0.5, col = col.alpha("black", 0.7))
}

Left we have \(Marriage.s ~ MedianAgeMarriage.s\) and right side \(MedianAgeMarriage.s ~ Marriage.s\). Let’s name the residuals of the left one as residuals_RM and the right side one residuals_MAM. Let’s see if there is correlation between the Divorce rate and the different residuals.

## 
m5.411 <- map(
  alist(
    Divorce ~ dnorm( mu, sigma),
    mu <- a + b*m.resid_m5.41 ,
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 1),
    sigma ~ dunif( 0, 10)), data = d)

m5.411_seq <- seq( from = -2, to = 2, length.out = 100)
mu.list.m5.411 <- link(m5.411, data = data.frame(m.resid_m5.41 = m5.411_seq ))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

HPDI_m5.411 <- apply(mu.list.m5.411, 2 , HPDI)

##
m5.422 <- map(
  alist(
    Divorce ~ dnorm( mu, sigma),
    mu <- a + b*m.resid_m5.42 ,
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 1),
    sigma ~ dunif( 0, 10)), data = d)

m5.422_seq <- seq( from = -2, to = 3, length.out = 100)
mu.list.m5.422 <- link(m5.422, data = data.frame(m.resid_m5.42 = m5.422_seq ))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

HPDI_m5.422 <- apply(mu.list.m5.422, 2 , HPDI)

par(mfrow = c(1,2))
plot(Divorce ~ m.resid_m5.41, data = d,
     xlab = "Marriage rate residuals",
     ylab = "Divorce rate")
abline(m5.411)
shade(HPDI_m5.411, m5.411_seq)
abline( v = 0, lwd = 0.5)

plot(Divorce ~ m.resid_m5.42, data = d,
     xlab = "Age of marriage residuals",
     ylab = "Divorce rate")
abline(m5.422)
shade(HPDI_m5.422, m5.422_seq)
abline( v = 0, lwd = 0.5)

Looking at the left side plot. The vertical line indicates marriage rate that exactly matches the expectation from median age ar marriage. So States to the right of the line marry faster than expected. States to the left, slower. Nevertheless, the slope of the model is almos zero, indicating that in both side the divorce rate is almost equal. Look at the slope of the regression is \(-0.13\) exactly what we found in the multivariate model. The right hand is the same but now for median age at marriage rate. So States to the right of the vertical dashed line have older-than-expected media age at marriage, while those to the left have younger-than-expected median age at marriage. Now we find that the average divorce rate on the right is lower that the rate on the left. States in which people marry older than expected for a given rate of marriage tend to have less divorce. the slope of the regression line here is -1.13, again the same as in the multivariate model.

So what’s the point of all this? There’s direct value in seeing the model-based predictions displayed against the outcome, after subtracting out the influence of other predictors.

Counterfactual plots

The simplest use of a counterfactual plot is to see how the predictions change as you change only one prediction at a time. This means holding the values of all predictiors constant, except for a single predictor of interest. Su

# prepare new counterfactual data for Marriage.s
A.avg1 <- mean(d$MedianAgeMarriage.s)
R.seq1 <- seq(from = -3, to = 3, length.out = 30)
pred.data1 <- data.frame(
    Marriage.s <- R.seq1,
    MedianAgeMarriage.s <- A.avg1 # Stoped at zero
)

# compute counterfactual mean divorce (mu)
mu1 <- link(m5.3, data = pred.data1)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.mean1 <- apply(mu1, 2, mean)
mu.PI1 <- apply(mu1, 2, PI)

# simulate counterfactual divorce outcomes
R.sim1 <- sim(m5.3, data = pred.data1, n = 1e4)

## [ 1000 / 10000 ]
[ 2000 / 10000 ]
[ 3000 / 10000 ]
[ 4000 / 10000 ]
[ 5000 / 10000 ]
[ 6000 / 10000 ]
[ 7000 / 10000 ]
[ 8000 / 10000 ]
[ 9000 / 10000 ]
[ 10000 / 10000 ]

R.PI1 <- apply(R.sim1, 2 , PI)

# prepare new counterfactual data for MedianAgeMarriage
A.avg2 <- mean(d$Marriage.s)
R.seq2 <- seq(from = -3, to = 3, length.out = 30)
pred.data2 <- data.frame(
    MedianAgeMarriage.s <- R.seq2,
    Marriage.s <- A.avg2 # Stoped at zero
)

# compute counterfactual mean divorce (mu)
mu2 <- link(m5.3, data = pred.data2)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

mu.mean2 <- apply(mu2, 2, mean)
mu.PI2 <- apply(mu2, 2, PI)

# simulate counterfactual divorce outcomes
R.sim2 <- sim(m5.3, data = pred.data2, n = 1e4)

## [ 1000 / 10000 ]
[ 2000 / 10000 ]
[ 3000 / 10000 ]
[ 4000 / 10000 ]
[ 5000 / 10000 ]
[ 6000 / 10000 ]
[ 7000 / 10000 ]
[ 8000 / 10000 ]
[ 9000 / 10000 ]
[ 10000 / 10000 ]

R.PI2 <- apply(R.sim2, 2 , PI)

par(mfrow = c(1,2))
# display predictions, hiding raw data with type = "n"
plot(Divorce ~ Marriage.s, data = d, type = "n")
mtext("MedianAgeMarriage.s = 0")
lines(R.seq1, mu.mean1)
shade( mu.PI1, R.seq1)
shade( R.PI1, R.seq1)


# display predictions, hiding raw data with type = "n"
plot(Divorce ~ MedianAgeMarriage.s, data = d, type = "n")
mtext("Marriage.s  = 0")
lines(R.seq2, mu.mean2)
shade( mu.PI2, R.seq2)
shade( R.PI2, R.seq2)

Posterior prediction plots

In addition to understanding the estimates, it’s important to check the model fit against the observed data. It means we will compare new simulated samples against observed. These kinds of checks are useful in many ways. We’ll focus on two uses for them:

Did the model fit correctly?
How does the model fail? All models are useful fictions, so they always fail in som way. Inspecting the individual cases where the model makes poor predictions, you might get an idea of how to improve the model.

Let’s begin by simulating predictions, averaging over the posterior:

## call link
mu <- link(m5.3)

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

## summarize samples across cases
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI)

## simulate observations
divorce.sim <- sim(m5.3, n = 1e4)

## [ 1000 / 10000 ]
[ 2000 / 10000 ]
[ 3000 / 10000 ]
[ 4000 / 10000 ]
[ 5000 / 10000 ]
[ 6000 / 10000 ]
[ 7000 / 10000 ]
[ 8000 / 10000 ]
[ 9000 / 10000 ]
[ 10000 / 10000 ]

divorce.PI <- apply( divorce.sim, 2, PI)

For multivariate models, there are many different ways to display these simulation. So let’s look at a few different ways. The simplest is to just plot predictions against observed.

plot( mu.mean ~ d$Divorce, col = rangi2, ylim = range(mu.PI))
abline(a = 0, b = 1, lty = 2)
for (i in 1:nrow(d)){ 
    lines(c(d$Divorce[i], d$Divorce[i]) , c(mu.PI[1, i], mu.PI[2, i]),
          col = rangi2)
}

It’s easy to see fromm this arrangement of the simulations that the model under-predicts for States with very high divorce rates while it over-predicts for States with very low divorce rates. Some States are very frustrating to the model, lying very far from the diagonal.

As it could be difficult to read, let’s now to use order to sort the States from lowest prediction error to highest. Let’s do:

# compute residuals
divorce.resid <- d$Divorce - mu.mean
# get ordering by divorce rate
o <- order(divorce.resid)
# make the plot
dotchart(divorce.resid[o], labels = d$Loc[o], xlim = c(-6,5) , cex = 0.6)
abline( v = 0, col = col.alpha("black", 0.2))
for (i in 1:nrow(d) ) {
    j <- o[i] # which State in order
    lines( d$Divorce[j] - c(mu.PI[1,j],mu.PI[2,j]) , rep(i,2) )
    points( d$Divorce[j] - c(divorce.PI[1,j],divorce.PI[2,j]) , rep(i,2), pch = 3 , cex = 0.6 , col = "gray" )
}

This chart shows the average prediction error for each State, with \(89\%\) interval of the mean (black line) and \(89\%\) prediction interval (+ symbol). Here it’s easier to see the large model failures.

When adding variable hurts

There are several good, purely statistical reasons to avoid doing to add lot of variables to predict an outcome. Three reasons:

1.) Multicollinearity 2.) Post-treatment bias 3.) Overfitting

Multicollinearity means very strong correlation between two or more predictor variables. The consequence of it is that the posterior distributionwill say that a very large range. So once you understand multicollinearity, you will better understand multivariate models in general.

To explore multicollinearity, let’s begin with a simple simulation. Then we’ll turn to the primate milk data again and see multicolinearity in a real data set.

Multicolinearity legs:

N <- 100 # Number of individual
height <- rnorm(N, 10, 2) # sim total height of data
leg_prop <- runif(N, 0.4, 0.5) # leg as proportion of height
leg_left <- leg_prop * height + rnorm(N, 0, 0.02) # sim left leg as proportion + error
leg_right <- leg_prop * height + rnorm(N, 0, 0.02) # sim left leg as proportion + error
# Be aware that what error do is make difference between leg_left and leg_right

d <- data.frame(height,  leg_left, leg_right)

m5.8 <- map(
    alist(
        height ~ dnorm(mu, sigma),
        mu <- a + bl*leg_left + br*leg_right ,
        a ~ dnorm(10, 100),
        br ~ dnorm( 2 , 10),
        bl ~ dnorm(2, 10),
        sigma ~ dunif( 0 , 10)
    ), data = d)

precis(m5.8)

##        Mean StdDev  5.5% 94.5%
## a      0.98   0.35  0.42  1.53
## br     4.14   1.94  1.04  7.24
## bl    -2.13   1.94 -5.22  0.97
## sigma  0.67   0.05  0.60  0.75

precis_plot(precis(m5.8))

The results has not any sense !!!

post <- extract.samples(m5.8) # Extract parameters
plot(bl ~ br, data = post, col = col.alpha(rangi2, 0.1), pch = 16)

The posterior distribution forthese two parameters is very highly correlated, with all of the plausible values of bl and br lying along a narrow ridge. but what does it mean? Let’s see. What has happened here is that since both leg variables contain almost exactly the same information, if you insist on including both in a model, the there will be practically infinite number of combinations of bl and br that produce the same predictions. It’s easy to undertand with the next example:

\[y_i \backsim Normal(\mu_i, \sigma)\] \[\mu_i = \alpha + \beta_1 x_i + \beta_2 x_i \quad \rightarrow \quad \mu_i = \alpha + (\beta_1 + \beta_2)x_i \] From the point of view of the model \(\beta_1 + \beta_2 \simeq 2\) and this is what means the correlation showed in the image. When \(\beta_1 \simeq 2\) then \(\beta_2 \simeq 0\) or when \(\beta_1 \simeq 4\) then \(\beta_2 \simeq -2\).

Let’s simulate the sum:

sum_blbr <- post$bl + post$br
dens(sum_blbr, col = rangi2, lwd = 2, xlab = "sum of bl and br")

precis(sum_blbr)

##       Mean StdDev |0.89 0.89|
## model 2.02   0.07   1.9  2.14

let’s do the model for only one predictor and see how matches the same result:

m5.9 <- map(
    alist(
        height ~ dnorm( mu, sigma) ,
        mu <- a + bl*leg_left,
        a ~ dnorm(10, 100),
        bl ~ dnorm( 2, 10),
        sigma ~ dunif( 0, 10)
    ), data = d)
precis(m5.9)

##       Mean StdDev 5.5% 94.5%
## a     0.97   0.36 0.41  1.54
## bl    2.01   0.08 1.89  2.13
## sigma 0.69   0.05 0.61  0.77

The basic lesson remains intact across different simulations: When two predictor variables are very strongly correlated, includingg both in a model may lead to confusion. The posterior distribution isn’t wrong, in such cases. It’s telling you that the question you asked cannot be answered with these data.