let’s say we are interested in whether age and/ or amount of miles ran impacts race time.

d <- read.csv("/Users/clairemorrison/Desktop/gradstats2022/run.csv", header = T)
head(d)
##    time age miles
## 1 24.91  29    10
## 2 21.82  25    20
## 3 21.54  27    40
## 4 23.03  25    50
## 5 25.35  37    20
## 6 22.84  31    40

if we run a simple “additive” model we are letting age and miles predict race time. this model then assumes that age doesn’t influence the effect of miles on race time and that miles ran doesn’t influence the effect of age on race time. it is saying let’s add up the effects of both age and miles and see how they collectively predict race time

m1<- lm(time~miles+age, data=d)
mcSummary(m1)
## lm(formula = time ~ miles + age, data = d)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model      1469.895  2 734.948 0.665 76.351 0
## Error       741.195 77   9.626               
## Corr Total 2211.091 79  27.988               
## 
##   RMSE AdjEtaSq
##  3.103    0.656
## 
## Coefficients
##                Est StErr       t   SSR(3) EtaSq   tol CI_2.5 CI_97.5 p
## (Intercept) 24.605 1.573  15.641 2354.991 0.761    NA 21.473  27.738 0
## miles       -0.257 0.026 -10.057  973.525 0.568 0.974 -0.308  -0.206 0
## age          0.167 0.031   5.468  287.817 0.280 0.974  0.106   0.228 0

above, we can see age and miles do both significantly predict race time. the predicted difference in race time as miles increases 1 is -.257. thee predicted difference in race time as age increases 1 unit is .167.

the predictive linear model would be:

\(race time= 24.6-.257miles+.167age\)

we can re-arrange this equation to look at the relationship between age and/or miles at different levels of age and/or miles.

\(race time=(24.605-.247(20))+.165age\)

above, we are subbing 20 in for miles, to look at the effect of age on racetime for someone who ran 20 miles.

\(race time=19.465 + .165 age\)

the predicted race time for someone who ran 20 miles (controlling for age or holding age constant) is 19.465. as age increases 1, for someone who ran 20 miles, we expect race time to increase .165.

we can do this for different amounts of miles:

\(race time=(24.605-.247(10))+.165age\)

\(race time=(24.605-.247(40))+.165age\)

\(race time=(24.605-.247(50))+.165age\)

but what you’ll notice about all these is thee slope remains the same (.165).

we can plot individual lines of the age-time releationship for different amounts of miles:

d %>% ggplot(aes(age, time)) +
  xlim(0, 60) +
  ylim(10, 60) +
  geom_point(size = 3) +
  theme_classic(base_size = 22) +
  geom_vline(xintercept = 0) +
  geom_abline(intercept = 19.47, slope = .165, color = "blue", size=3) +
  geom_abline(intercept = 14.725, slope = .165, color = "red", size=3)+
  geom_abline(intercept = 23.37, slope = .165, color = "green", size=3)+
  geom_abline(intercept = 12.255, slope = .165, color = "yellow", size=3)

but again, the slope stays the same. we are assuming the relationship between miles and time does not depend on age and vice versa.

an interaction model, however, allows for that:

m2<- lm(time~miles+age+miles*age, data=d)
mcSummary(m2)
## lm(formula = time ~ miles + age + miles * age, data = d)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model      1513.465  3 504.488 0.684 54.959 0
## Error       697.625 76   9.179               
## Corr Total 2211.091 79  27.988               
## 
##  RMSE AdjEtaSq
##  3.03    0.672
## 
## Coefficients
##                Est StErr      t  SSR(3) EtaSq   tol CI_2.5 CI_97.5     p
## (Intercept) 18.899 3.036  6.224 355.634 0.338    NA 12.852  24.947 0.000
## miles       -0.069 0.090 -0.762   5.328 0.008 0.075 -0.248   0.111 0.448
## age          0.308 0.071  4.323 171.528 0.197 0.171  0.166   0.450 0.000
## miles:age   -0.005 0.002 -2.179  43.570 0.059 0.064 -0.009   0.000 0.032

here, we can see the iteraction, as well as the age coefficient is significant. now, we are saying let the slope of age and time depend on miles, and vice versa.

we can plot that here by using sjPlot anda plotting the same values of miles we looked at previously:

plot_model(m2, type = "pred", terms = c("age", "miles[5, 20, 40, 50]"))

now we se the slopes do slightly differ with each increase in age/miles.