Stats 155 Class Notes 2012-10-03

Main Ideas for Today

  1. From Model Terms to Vectors
    1. The intercept term as a vector of 1s
    2. Understanding interaction terms as vectors
  2. The geometry of fitting against two vectors.
  3. Colinearity and why adding model terms changes the coefficients on old terms. (Reference: housing prices versus bedrooms with and without living area)
    1. Simpson's paradox, geometrically.
    2. Extreme colinearity: redundancy

Heads up for the future: colinearity has an important effect on confidence intervals.

Review of Geometry through Arithmetic

From Model Terms to Vectors

Derive the model vectors for interaction terms.

Make a small, illustrative data set and a model from it.

small = sample(CPS85, size = 3)[, c("wage", "educ", "sector")]
small
##      wage educ  sector
## 245  8.00   12 service
## 13  25.00   14 service
## 57  13.95   18   manag
mod = lm(wage ~ educ * sector, data = small)

I'll predict that the residuals from this model will be all zero — it's a “perfect” fit to the data.

resid(mod)
## 245  13  57 
##   0   0   0 

Understanding the geometry of the situation will make it easier for you to understand why this is.

Write down the vectors as columns of numbers

with(small, wage)
## [1]  8.00 25.00 13.95
with(small, educ)
## [1] 12 14 18

Fitting as a linear algebra problem

Show a potential linear combination of the vectors. Just make up the coefficients.

What R does

coef(mod)
##        (Intercept)               educ      sectorservice 
##            -139.05               8.50              45.05 
## educ:sectorservice 
##                 NA 

Geometry of Fitting with Multiple Vectors

1. Diagram with two explanatory vectors

2. Show that the residual is orthogonal to each and every model vector.