PSY 613, Spring 2014: Lab #5 - Interactions

Interactions in Multiple Regression:

Interactions between continuous predictors
Interactions between categorical predictors
Interactions between continuous and categorical predictors

Research Scenario: For her masters project, a grad stduent wants to see whether test anxiety (Anx) interacts with the time students spend studying (Time) to predict students’ test performance (Perf).

# Read in the data file. Make sure to change your working directory to
# wherever the datafile is saved first.  setwd('Dropbox/613/2014\
# materials')
library(foreign)  # we need to use the foreign package to read spss files
data <- read.spss("613_Lab 05_simple slopes.sav", to.data.frame = TRUE)

## Warning: 613_Lab 05_simple slopes.sav: Unrecognized record type 7, subtype
## 18 encountered in system file

## re-encoding from latin1

What is the criterion?

What are the predictors (X1 and X2)? Is each continuous or categorical?

Should we center before making the interaction term? If so, which variables?

Step 1:

Center continuous predictors and create the interaction term.

# centering
data$Time_c <- data$Time - mean(data$Time)
data$Anx_c <- data$Anx - mean(data$Anx)

# You don't actually have to create the interaction term since R will do it
# for you when you make the model.  If you wanted to create the interaction
# term, though, here it is:
data$Anx_cXTime_c <- data$Anx_c * data$Time_c

Step 2:

Run a hierarchical linear regression where Time and Anx are entered into Model 1 and Int is entered into Model 2. Make sure to get R2 change!

model1 <- lm(data$Perf ~ data$Anx_c + data$Time_c)
model2 <- lm(data$Perf ~ data$Anx_c * data$Time_c)
# note that the * automatically includes the main effects as well as the
# interaction.

# get R-squared change
anova(model1, model2)  # this provides the F-test, but you need to get R-squared change yourself (just subtract one R from the other)

## Analysis of Variance Table
## 
## Model 1: data$Perf ~ data$Anx_c + data$Time_c
## Model 2: data$Perf ~ data$Anx_c * data$Time_c
##   Res.Df  RSS Df Sum of Sq    F  Pr(>F)    
## 1     70 3941                              
## 2     69 2733  1      1209 30.5 5.5e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


# If you want R to get R-squared change for you as well as the F-test, load
# the lmSupport package so you can use lm.deltaR2()
library(lmSupport)

## Loading required package: car
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009

lm.deltaR2(model1, model2)

##             R2     F ndf ddf         p
## Model 1 0.5524 43.20   2  70 6.037e-13
## Model 2 0.6897 51.11   3  69 1.647e-17
## Delta   0.1372 30.51   1  69 5.471e-07

Does the effect of time spent studying on test performance differ as aresult of test anxiety?

Which model should we interpret?

summary(model1)

## 
## Call:
## lm(formula = data$Perf ~ data$Anx_c + data$Time_c)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.310  -5.019   0.273   5.127  18.971 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   77.397      0.878   88.13  < 2e-16 ***
## data$Anx_c    -2.068      0.681   -3.04   0.0033 ** 
## data$Time_c    5.393      0.597    9.03  2.3e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.5 on 70 degrees of freedom
## Multiple R-squared:  0.552,  Adjusted R-squared:  0.54 
## F-statistic: 43.2 on 2 and 70 DF,  p-value: 6.04e-13

summary(model2)

## 
## Call:
## lm(formula = data$Perf ~ data$Anx_c * data$Time_c)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -18.80  -3.80  -0.51   3.58  16.17 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              77.822      0.741  105.08  < 2e-16 ***
## data$Anx_c               -2.730      0.583   -4.68  1.4e-05 ***
## data$Time_c               5.855      0.508   11.53  < 2e-16 ***
## data$Anx_c:data$Time_c   -2.329      0.422   -5.52  5.5e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.29 on 69 degrees of freedom
## Multiple R-squared:  0.69,   Adjusted R-squared:  0.676 
## F-statistic: 51.1 on 3 and 69 DF,  p-value: <2e-16

What's the interpretation of b0?

What's the interpretation of b1?

General Model: Yi = b0 + b1*X1 + b2*X2 + b3(X1*X2) + ei
What is our model?

Step 3:

Interpret the significant interaction. To do this, we must calculate and plot simple slopes.

Example: Let's plot the effect of time spent studying on test performance at low (-1SD), moderate (mean), and high (+1SD) levels of test anxiety. Find the regression equation for each line.

Anx_SD <- sd(data$Anx)

# saving each model coefficient as its own variable, for easier manipulation
b0 <- model2$coeff[1]
b1 <- model2$coeff[2]
b2 <- model2$coeff[3]
b3 <- model2$coeff[4]

# Make sure to double check that b1 and b2 refer to what you think they do.
# Double-check the model summary if you want: summary(model2)
Int_H <- b0 + b1 * Anx_SD
Slope_H <- b2 + b3 * Anx_SD

Int_M <- b0
Slope_M <- b2

Int_L <- b0 - b1 * Anx_SD
Slope_L <- b2 - b3 * Anx_SD

High:
Mean:
Low:

Note: If we had an interaction between a continuous variable and a categorical variable, instead of using SD to get High, Med, and Low lines, we would examine the effect of the continuous variable within each group. For example, if we had an interaction between time spent studying and gender, we would plot Performance on the y-axis, Time on the x-axis, and have two lines: one for gender=0 (male) and one for gender=1 (female).

library(ggplot2)
ggplot(data, aes(x = Time_c, y = Perf)) + geom_point(shape = 1, alpha = 0.5) + 
    geom_abline(aes(color = "low anx"), intercept = Int_L, slope = Slope_L, 
        show_guide = TRUE) + geom_abline(aes(color = "mean anx"), intercept = Int_M, 
    slope = Slope_M, show_guide = TRUE) + geom_abline(aes(color = "high anx"), 
    intercept = Int_H, slope = Slope_H, show_guide = TRUE) + guides(color = guide_legend(title = "Anxiety Level"))

plot of chunk unnamed-chunk-6

Step 4:

Significance testing of simple slopes. Get variance/covariance matrix of regression coefficients.

vcov(model2)

##                        (Intercept) data$Anx_c data$Time_c
## (Intercept)               0.548433  -0.009202    0.006426
## data$Anx_c               -0.009202   0.340188   -0.037204
## data$Time_c               0.006426  -0.037204    0.257741
## data$Anx_c:data$Time_c   -0.032392   0.050518   -0.035281
##                        data$Anx_c:data$Time_c
## (Intercept)                          -0.03239
## data$Anx_c                            0.05052
## data$Time_c                          -0.03528
## data$Anx_c:data$Time_c                0.17784

# Note that you can use matrix notation [rows, columns] to point to
# particular entries in the covariance matrix
SEb1sq <- vcov(model2)[3, 3]
SEb3sq <- vcov(model2)[4, 4]
covb1b3 <- vcov(model2)[3, 4]

# To test whether a given simple slope is different from zero, use a t-test:
# estimate/SE Calculate the SE for low anxiety (Anx = -Anx_SD)
SE_b_at_AnxL <- sqrt(SEb1sq + 2 * (-Anx_SD) * covb1b3 + (-Anx_SD)^2 * SEb3sq)
# Calculate t-test
t <- Slope_L/SE_b_at_AnxL

# get the p value for the t-test
n <- nrow(data)
df <- n - 4  # df = N-K-1
pt(t, df, lower.tail = FALSE)

## data$Time_c 
##   3.801e-17