let’s say we are interested in whether age and/ or amount of miles ran impacts race time.
d <- read.csv("/Users/clairemorrison/Desktop/gradstats2022/run.csv", header = T)
head(d)
## time age miles
## 1 24.91 29 10
## 2 21.82 25 20
## 3 21.54 27 40
## 4 23.03 25 50
## 5 25.35 37 20
## 6 22.84 31 40
if we run a simple “additive” model we are letting age and miles predict race time. this model then assumes that age doesn’t influence the effect of miles on race time and that miles ran doesn’t influence the effect of age on race time. it is saying let’s add up the effects of both age and miles and see how they collectively predict race time
m1<- lm(time~miles+age, data=d)
mcSummary(m1)
## lm(formula = time ~ miles + age, data = d)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 1469.895 2 734.948 0.665 76.351 0
## Error 741.195 77 9.626
## Corr Total 2211.091 79 27.988
##
## RMSE AdjEtaSq
## 3.103 0.656
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 24.605 1.573 15.641 2354.991 0.761 NA 21.473 27.738 0
## miles -0.257 0.026 -10.057 973.525 0.568 0.974 -0.308 -0.206 0
## age 0.167 0.031 5.468 287.817 0.280 0.974 0.106 0.228 0
above, we can see age and miles do both significantly predict race time. the predicted difference in race time as miles increases 1 is -.257. thee predicted difference in race time as age increases 1 unit is .167.
the predictive linear model would be:
\(race time= 24.6-.257miles+.167age\)
we can re-arrange this equation to look at the relationship between age and/or miles at different levels of age and/or miles.
\(race time=(24.605-.247(20))+.165age\)
above, we are subbing 20 in for miles, to look at the effect of age on racetime for someone who ran 20 miles.
\(race time=19.465 + .165 age\)
the predicted race time for someone who ran 20 miles (controlling for age or holding age constant) is 19.465. as age increases 1, for someone who ran 20 miles, we expect race time to increase .165.
we can do this for different amounts of miles:
\(race time=(24.605-.247(10))+.165age\)
\(race time=(24.605-.247(40))+.165age\)
\(race time=(24.605-.247(50))+.165age\)
but what you’ll notice about all these is thee slope remains the same (.165).
we can plot individual lines of the age-time releationship for different amounts of miles:
d %>% ggplot(aes(age, time)) +
xlim(0, 60) +
ylim(10, 60) +
geom_point(size = 3) +
theme_classic(base_size = 22) +
geom_vline(xintercept = 0) +
geom_abline(intercept = 19.47, slope = .165, color = "blue", size=3) +
geom_abline(intercept = 14.725, slope = .165, color = "red", size=3)+
geom_abline(intercept = 23.37, slope = .165, color = "green", size=3)+
geom_abline(intercept = 12.255, slope = .165, color = "yellow", size=3)
but again, the slope stays the same. we are assuming the relationship between miles and time does not depend on age and vice versa.
an interaction model, however, allows for that:
m2<- lm(time~miles+age+miles*age, data=d)
mcSummary(m2)
## lm(formula = time ~ miles + age + miles * age, data = d)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 1513.465 3 504.488 0.684 54.959 0
## Error 697.625 76 9.179
## Corr Total 2211.091 79 27.988
##
## RMSE AdjEtaSq
## 3.03 0.672
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 18.899 3.036 6.224 355.634 0.338 NA 12.852 24.947 0.000
## miles -0.069 0.090 -0.762 5.328 0.008 0.075 -0.248 0.111 0.448
## age 0.308 0.071 4.323 171.528 0.197 0.171 0.166 0.450 0.000
## miles:age -0.005 0.002 -2.179 43.570 0.059 0.064 -0.009 0.000 0.032
here, we can see the iteraction, as well as the age coefficient is significant. now, we are saying let the slope of age and time depend on miles, and vice versa.
we can plot that here by using sjPlot anda plotting the same values of miles we looked at previously:
plot_model(m2, type = "pred", terms = c("age", "miles[5, 20, 40, 50]"))
now we se the slopes do slightly differ with each increase in age/miles.