MT5762 Lecture 16

C. Donovan

More on linear models

We've seen low the linear model encompasses a number of our previous methods

These model the signal with a design matrix \( \mathbf{X} \) and associated parameters \( {\beta} \) which are estimated assuming the noise is iid (independent and identically distributed) Normal

We look at extending this model for signal to interactions and more

The linear model form - recap

My favourite equation

Many models are effectively regressions - they are broadly of the form:

\[ y = f(x_1,..., x_p) + noise \]

So we need to estimate the signal \( f \), which is usually contingent on an idea of the distribution of the error (because we have to tease the signal and noise apart)

As we'll see, our estimates for the signal are optimised in light of our model for noise

The linear model form - recap

The term linear can be used differently in different contexts

  • It might refer to some shape of relationship e.g. a straight line relating some variable \( x \) to \( y \)
  • Here we mean the linear form. In short we model the signal as \( y = f(x1,...x_p) \) where it is

\[ y = \mathbf{X}\boldsymbol{\beta} \]

  • Recall this matrix equation means things like:

\[ y_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p \]

Testing/interpreting factor (categorical) covariates

we've included categorical covariates in our regression equation by cunning use of the design matrix (via dummy variables).

  • This requires one level of the factor be set as baseline
  • The baseline level is picked by software (e.g. alphabetical), but can be changed
  • The baseline parameter is the mean of \( y \) in this level, accounting for other covariates
  • The other factor level parameters are (usually) the mean \( y \) differences from this baseline

Testing/interpreting factor covariates

Returning to the mortgage default data:

  • Fitting a numeric (property value) and factor (job type) variable to predict debt-to-income ratio
  • We interpret…
  factModel <- lm(DEBTINC ~ JOB + VALUE, data = loanData)

  summary(factModel)

Call:
lm(formula = DEBTINC ~ JOB + VALUE, data = loanData)

Residuals:
    Min      1Q  Median      3Q     Max 
-34.734  -4.695   0.953   4.785 165.285 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.282e+01  4.252e-01  77.184  < 2e-16 ***
JOBOffice   -6.126e-01  4.518e-01  -1.356   0.1752    
JOBOther    -5.127e-01  3.956e-01  -1.296   0.1951    
JOBProfExe  -3.273e+00  4.309e-01  -7.596  3.7e-14 ***
JOBSales     2.205e+00  9.872e-01   2.233   0.0256 *  
JOBSelf     -9.053e-01  7.820e-01  -1.158   0.2471    
VALUE        2.267e-05  2.301e-06   9.854  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.234 on 4429 degrees of freedom
  (1524 observations deleted due to missingness)
Multiple R-squared:  0.03583,   Adjusted R-squared:  0.03452 
F-statistic: 27.43 on 6 and 4429 DF,  p-value: < 2.2e-16

Testing/interpreting factor covariates

To test between factors that are not baseline levels requires:

  • Changing the baseline, or
  • Requesting contrasts be outputted e.g. like Tukey's

Testing/interpreting factor covariates

To test the signifance of the factor variable overall:

 anova(factModel)
Analysis of Variance Table

Response: DEBTINC
            Df Sum Sq Mean Sq F value    Pr(>F)    
JOB          5   4576   915.1  13.499 4.295e-13 ***
VALUE        1   6582  6582.2  97.092 < 2.2e-16 ***
Residuals 4429 300258    67.8                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So, similar to doing an ANOVA if there is only the factor covariate in the model. Works more generally when other covariates exist.

Testing/interpreting factor covariates

Sequential vs non-sequential: Observe there are changes in the \( p \)-values depending on order into the model. This can be large.

  factModel <- lm(DEBTINC ~ VALUE + JOB, data = loanData)

  anova(factModel)
Analysis of Variance Table

Response: DEBTINC
            Df Sum Sq Mean Sq F value    Pr(>F)    
VALUE        1   4385  4384.8  64.679 1.122e-15 ***
JOB          5   6773  1354.6  19.981 < 2.2e-16 ***
Residuals 4429 300258    67.8                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You're often after non-sequential.

Testing/interpreting factor covariates

A different version of this is non-sequential (observe “type II tests” outputted)

  library(car)
  Anova(factModel)
Anova Table (Type II tests)

Response: DEBTINC
          Sum Sq   Df F value    Pr(>F)    
VALUE       6582    1  97.092 < 2.2e-16 ***
JOB         6773    5  19.981 < 2.2e-16 ***
Residuals 300258 4429                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You're often after non-sequential - see later for a common exception where order of testing matters.

Building more complex models

Multiple numeric covariates

Combining numeric covariates conceptually produces a model of signal that is a plane or hyperplane.

  # a grid of covariates
  x_1 <- 1:40
  x_2 <- 1:40
  gridX <- expand.grid(x_1 = x_1, x_2 = x_2)

  # y is a function of these - note this looks like signal for an lm
  y <- 2*gridX$x_1 -2*gridX$x_2 + 100

  # giving this
  persp(matrix(y, ncol = 40), theta = 150, phi = 30, shade = 0.2, col = 'purple',
        xlab = "x1", ylab = "x2", zlab = "y")

Multiple numeric covariates

Combining numeric covariates conceptually produces a model of signal that is a plane or hyperplane.

plot of chunk unnamed-chunk-8

Mixing covariate types

Adding categorical covariates to this mix gives different intercepts by factor level - so conceptually multiple lines, planes or hyperplanes

plot of chunk unnamed-chunk-9plot of chunk unnamed-chunk-9

Interpretation of terms

The interpretation of these terms as presented is straightforwards (but not for interactions later):

  • Each slope parameter is the (estimated) change in the mean response, for each unit increase in \( x \), even if there are many
  • The factor variables indicate the differences from baseline - so:
    • for lines, the vertical shift from the baseline line for each factor level
    • for planes, the vertical shift from the baseline plane for each factor level
    • for hyperplanes, the vertical shift from the baseline hyperplane for each factor level

Interactions

Covariates don't necessarily affect \( y \) in isolation

Interactions

The form we've seen so far has our covariates entering into the equation in a simple additive fashion i.e.

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \]

This means the affect of \( x_1 \) on \( y \) can be isolated/separated:

  • If numeric \( x \): “for every unit increase of \( x_1 \), \( y \) increases by \( \beta_1 \)”
  • If categorical \( x \) then we're giving the average difference from one factor level to the baseline level

Interactions

The world is rarely this simple.

  • The effect of \( x_1 \) say on \( y \) might be different given the values of other \( x \)
  • In which case, we have an interaction
  • The effect of \( x_1 \) on \( y \) cannot be represented without reference to other \( x \):

“for every unit increase of \( x_s \), \( y \) increases by \( A \) units given \( x_p \) is \( B \)”

Interactions

Thus - the marginal \( x \) terms cannot describe the relationship with \( y \) alone

Interactions

Lots of surfaces can't be described by only their margins:

  • Knowing how it looks from the perspective of \( x_1 \) and \( x_2 \) doe not allow reconstruction of the surface
  • The relationship WRT \( x_1 \) changes with the value of \( x_2 \) - it is an interaction

plot of chunk unnamed-chunk-10

Interactions between numeric and categorical covariates

Signal they permit

  • When mixing numeric and categorical/factor covariates previously, we effectively allowed different intercepts for the lines/planes/hyperplanes, depending on the factor-level
  • An interaction here alters the slopes/orientation within the factor levels

Model form, estimates and interpretation

Consider factor \( x_1 \) with two levels \( a \) and \( b \), and a numeric \( x_2 \)

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 (x_1 \times x_2) \]

  • Through the magic of dummy variables, \( \beta_0 \) is the mean of \( y \) when \( x_1 \) = baseline (\( a \) say) and \( x_2 \) = 0.
  • \( \beta_2 \) is the increase in mean \( y \) with a unit increase in \( x_2 \) if \( x_1 \) = baseline (\( a \) level)
  • \( \beta_3 \) modifies the slope parameter \( \beta_2 \) if \( x_1 \) = \( b \) level

Model form, estimates and interpretation

In short, our model has two lines with different intercepts and slopes

  • Intercept for factor \( x_1 = a \) is \( \beta_0 \)
  • Intercept for factor \( x_1 = b \) is \( \beta_0 + \beta_1 \)
  • Slope for factor \( x_1 = a \) is \( \beta_2 \)
  • Slope for factor \( x_1 = b \) is \( \beta_2 +\beta_3 \)

Model form, estimates and interpretation

Let's look at the design matrix

\[ \hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\hat{\beta}} = \begin{bmatrix} 1 & 0 & x_{2,1} & 0\\ 1 & 0 & x_{2,2} & 0\\ 1 & 0 & x_{2,3} & 0\\ 1 & 1 & x_{2,4} & x_{2,4}\\ 1 & 1 & x_{2,5} & x_{2,5}\\ 1 & 1 & x_{2,6} & x_{2,6}\\ \end{bmatrix} \begin{bmatrix} \hat{\beta_0}\\ \hat{\beta_1}\\ \hat{\beta_2}\\ \hat{\beta_3}\\ \end{bmatrix} = \begin{bmatrix} \hat{\beta_0} (+ 0\beta_1) + \hat{\beta_2}x_{2,1} (+ 0\hat{\beta_3})\\ \hat{\beta_0} (+ 0\beta_1) + \hat{\beta_2}x_{2,2} (+ 0\hat{\beta_3})\\ \hat{\beta_0} (+ 0\beta_1) + \hat{\beta_2}x_{2,3} (+ 0\hat{\beta_3})\\ \hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2}x_{2,4} + \hat{\beta_3} x_{2,4}\\ \hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2}x_{2,5} + \hat{\beta_3} x_{2,5}\\ \hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2}x_{2,6} + \hat{\beta_3} x_{2,6}\\ \end{bmatrix} \]

Model form, estimates and interpretation

Let's look at the last line, to see the slope modifier

\[ y_6 = \hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2}x_{2,6} + \beta_3 x_{2,6} = (\hat{\beta_0} + \hat{\beta_1}) + (\hat{\beta_2} + \hat{\beta}_3)x_{2,6} \]

  • the intercept if you are factor \( x_1=b \) is \( \hat{\beta_0} + \hat{\beta_1} \)
  • the slope if you are factor \( x_1=b \) is \( \hat{\beta_2} + \hat{\beta}_3 \).
  • \( \beta_1 \) alters the intercept and \( \beta_3 \) alters the slope, when you move from \( x_1=a \) to \( x_1=b \)

Model form, estimates and interpretation

This extends to more factor levels and more covariates generally. The magic of design matrices and matrix algebra.

With a simple interaction between house value and reason for loan:

  interactionModel <- lm(DEBTINC ~ REASON*VALUE, data = loanData)
  summary(interactionModel)

Call:
lm(formula = DEBTINC ~ REASON * VALUE, data = loanData)

Residuals:
    Min      1Q  Median      3Q     Max 
-33.814  -4.654   0.890   4.987 168.448 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          3.233e+01  3.282e-01  98.482  < 2e-16 ***
REASONHomeImp       -1.005e+00  5.194e-01  -1.935   0.0531 .  
VALUE                1.914e-05  2.855e-06   6.704 2.28e-11 ***
REASONHomeImp:VALUE  2.994e-08  4.293e-06   0.007   0.9944    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.146 on 4472 degrees of freedom
  (1484 observations deleted due to missingness)
Multiple R-squared:  0.02057,   Adjusted R-squared:  0.01992 
F-statistic: 31.31 on 3 and 4472 DF,  p-value: < 2.2e-16

Testing of terms

  • In linear model output, we get tests on all parameters
    • A \( p \)-value implies a test and therefore an \( H_0 \)
    • If not specified, usually this is \( H_0: \theta = 0 \) i.e. is the parameter plausibly zero?
  • The interpretation differs depending on the parameter type

Testing of terms

  • For a slope parameter, is there linear relationship between mean \( y \) and \( x \)?
  • For a non-baseline factor parameter, is there a significant difference between this level and baseline, in terms of average \( y \)?
  • For an interaction, is the slope significantly different between factor levels? i.e. is the slope modifier plausibly zero?

Testing of terms

  summary(interactionModel)

Call:
lm(formula = DEBTINC ~ REASON * VALUE, data = loanData)

Residuals:
    Min      1Q  Median      3Q     Max 
-33.814  -4.654   0.890   4.987 168.448 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          3.233e+01  3.282e-01  98.482  < 2e-16 ***
REASONHomeImp       -1.005e+00  5.194e-01  -1.935   0.0531 .  
VALUE                1.914e-05  2.855e-06   6.704 2.28e-11 ***
REASONHomeImp:VALUE  2.994e-08  4.293e-06   0.007   0.9944    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.146 on 4472 degrees of freedom
  (1484 observations deleted due to missingness)
Multiple R-squared:  0.02057,   Adjusted R-squared:  0.01992 
F-statistic: 31.31 on 3 and 4472 DF,  p-value: < 2.2e-16

Testing of terms

  Anova(interactionModel)
Anova Table (Type II tests)

Response: DEBTINC
             Sum Sq   Df F value    Pr(>F)    
REASON          951    1  14.328 0.0001556 ***
VALUE          5354    1  80.699 < 2.2e-16 ***
REASON:VALUE      0    1   0.000 0.9944352    
Residuals    296720 4472                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

See here that the interaction is not significant - our model could be simpler. This term isn't distinguishable from noise.

Marginality

Model terms involving one covariate are terms main effects (or marginal terms). They may also appear in interactions with other covariates.

As a principle:

  • We don't interpret main effects parameters if they are involved in an interaction (a higher order term)
  • This makes sense, as the relationship can't be isolated - it depends on another covariate
  • Similarly we ignore tests on marginal parameters if involved in an interaction

Interactions between numeric covariates

Signal they permit

  • Similar to the previous example, an interaction between two numeric covariates means a slope parameter becomes conditional on another covariate's value
  • Whereas before the slope altered by factor level, here it is a continuous alteration
  • We'll only consider the simplest case - but many complex (hyper-)surfaces are interactions

Model form, estimates and interpretation

Consider numeric \( x_1 \) and \( x_2 \), the equation is the same as the previous

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 (x_1 \times x_2) \]

  • We have an intercept and two marginal/main effects \( \beta_1 x_1 \) and \( \beta_2 x_2 \)
  • \( \beta_3 \) is again an interaction that adds to these

In short, we fit a (hyper) plane, plus an interaction surface

Model form, estimates and interpretation

Let's look at the design matrix

\[ \hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\hat{\beta}} = \begin{bmatrix} 1 & x_{1,1} & x_{2,1} & x_{1,1} \times x_{2,1}\\ 1 & x_{1,2} & x_{2,2} & x_{1,2} \times x_{2,2}\\ 1 & x_{1,3} & x_{2,3} & x_{1,3} \times x_{2,3}\\ 1 & x_{1,4} & x_{2,4} & x_{1,4} \times x_{2,4}\\ 1 & x_{1,5} & x_{2,5} & x_{1,5} \times x_{2,5}\\ 1 & x_{1,6} & x_{2,6} & x_{1,6} \times x_{2,6}\\ \end{bmatrix} \begin{bmatrix} \hat{\beta_0}\\ \hat{\beta_1}\\ \hat{\beta_2}\\ \hat{\beta_3}\\ \end{bmatrix} = \]

Model form, estimates and interpretation

Let's look at the design matrix

\[ \hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\hat{\beta}} = \begin{bmatrix} \hat{\beta_0} + \beta_1 x_{1,1} + \beta_2 x_{2,1} + \beta_3 (x_{1,1} \times x_{2,1})\\ \hat{\beta_0} + \beta_1 x_{1,2} + \beta_2 x_{2,2} + \beta_3 (x_{1,2} \times x_{2,2})\\ \hat{\beta_0} + \beta_1 x_{1,3} + \beta_2 x_{2,3} + \beta_3 (x_{1,3} \times x_{2,3})\\ \hat{\beta_0} + \beta_1 x_{1,4} + \beta_2 x_{2,4} + \beta_3 (x_{1,4} \times x_{2,4})\\ \hat{\beta_0} + \beta_1 x_{1,5} + \beta_2 x_{2,5} + \beta_3 (x_{1,5} \times x_{2,5})\\ \hat{\beta_0} + \beta_1 x_{1,6} + \beta_2 x_{2,6} + \beta_3 (x_{1,6} \times x_{2,6})\\ \end{bmatrix} \]

Model form, estimates and interpretation

Let's look how this modifies the slope WRT \( x_1 \):

\[ \hat{\beta_0} + \beta_1 x_{1} + \beta_3 (x_{1} \times x_{2}) \]

with some rearrangement

\[ \hat{\beta_0} + (\beta_1 + \beta_3 (x_{2}))x_{1} \]

So, the slope for \( x_1 \) changes by \( \beta_3 \) for a unit increase in \( x_2 \)

Model form, estimates and interpretation

  • The interaction surface may look like this in 2D - which is added to the marginal components (a hyperplane)
  • Basically it adds types of curvature
  yInt <- gridX$x1 * gridX$x2

  image(matrix(yInt, ncol = 40), col = terrain.colors(100))

plot of chunk unnamed-chunk-15

Testing of terms

  • We test as per previous - the test on the interaction parameter indicating whether there is a significant interaction between the covariates

Interactions between categorical covariates

Signal they permit

  • Similar to previous variants, an interaction here means predictions based on one factor covariate are conditional on the levels of another
  • In effect, we are now estimating for all the combinations of factor levels for the interacting covariates

Model form, estimates and interpretation

Consider factor \( x_1 \) with two levels \( A \) and \( B \), and \( x_2 \) with two levels \( C \) and \( D \)

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 (x_1 = B, x_2 = D) \]

Through the magic of dummy variables:

  • \( \beta_0 \) is the mean of \( y \) at baseline e.g. \( x_1=A \) and \( x_2=C \).
  • \( \beta_1 \) is the change in mean \( y \) shifting from \( x_1=A \) to \( x_1=B \)

Model form, estimates and interpretation

Consider factor \( x_1 \) with two levels \( A \) and \( B \), and \( x_2 \) with two levels \( C \) and \( D \)

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 (x_1 = B, x_2 = D) \]

  • \( \beta_2 \) is the change in mean \( y \) moving from \( x_2=C \) to \( x_2=D \)
  • \( \beta_3 \) represents the interaction - “additional” changes in mean \( y \) from changing both \( x_1 \) and \( x_2 \) due to an interaction effect

Model form, estimates and interpretation

In short, these estimate different means for all combinations of factor levels:

  • Mean of \( y \) at \( AC \) is \( \beta_0 \)
  • Mean of \( y \) at \( BC \) is \( \beta_0 + \beta_1 \)
  • Mean of \( y \) at \( AD \) is \( \beta_0 + \beta_2 \)
  • Mean of \( y \) at \( BD \) is \( \beta_0 + \beta_1 + \beta_2 + \beta_3 \)

If \( x_1 \) and \( x_2 \) were simply additive, then \( \beta_3 \) is not needed.

Model form, estimates and interpretation

Let's look at the design matrix

\[ \hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\hat{\beta}} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix} \begin{bmatrix} \hat{\beta_0}\\ \hat{\beta_1}\\ \hat{\beta_2}\\ \hat{\beta_3}\\ \end{bmatrix} = \begin{bmatrix} \hat{\beta_0} (+ 0\hat{\beta_1} + 0\hat{\beta_2} + 0\hat{\beta_3})\\ \hat{\beta_0} (+ 0\hat{\beta_1} + 0\hat{\beta_2} + 0\hat{\beta_3})\\ \hat{\beta_0} + \hat{\beta_1} + (0\hat{\beta_2} + 0\hat{\beta_3})\\ \hat{\beta_0} + \hat{\beta_1} + (0\hat{\beta_2} + 0\hat{\beta_3})\\ \hat{\beta_0} (+ 0\hat{\beta_1}) + \hat{\beta_2} + (0\hat{\beta_3})\\ \hat{\beta_0} (+ 0\hat{\beta_1}) + \hat{\beta_2} + (0\hat{\beta_3})\\ \hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2} + \hat{\beta_3}\\ \hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2} + \hat{\beta_3}\\ \end{bmatrix} \]

Model form, estimates and interpretation

  • If the factor covariates add simply, then knowing \( \beta_1 \) and \( \beta_2 \) would be enough to predict the BD combination
  • In that case there is no interaction and the underlying \( \beta_3 \) is zero
  • So a test on \( \beta_3 \) tests whether some interaction is in play

Testing of terms

  • Testing progresses as previous - noting we can't interpret main effects in the presence of interactions

Recap and look-forwards

We've covered:

  • Interactions between combinations of numeric and factor covariates
  • How these are represented in our design matrix
  • What they mean

Upcoming in linear models:

  • Model fit and selection
  • Akaike Information Criterion (AIC)
  • More diagnostics
  • Case study