library(alr3)
## Loading required package: car
## Warning: package 'car' was built under R version 3.4.3
data(wblake)
attach(wblake)
Does the relaitonship between a fish’s age and length depend on the fish’s scale radius? (ie: do we need an interaction term?)
Interaction T-test: create interaction model, and see if interaction term is significant Partial F-test: create complete model with interaction term “*" and reduced model without interaction term “+”
mod.fish <- lm(Age ~ Length * Scale)
summary(mod.fish)
##
## Call:
## lm(formula = Age ~ Length * Scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.84055 -0.53376 0.06606 0.56913 2.59857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.3584599 0.2591932 -9.099 < 2e-16 ***
## Length 0.0338820 0.0020137 16.826 < 2e-16 ***
## Scale 0.3194946 0.0688244 4.642 4.57e-06 ***
## Length:Scale -0.0014237 0.0002338 -6.089 2.50e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8212 on 435 degrees of freedom
## Multiple R-squared: 0.8309, Adjusted R-squared: 0.8297
## F-statistic: 712.6 on 3 and 435 DF, p-value: < 2.2e-16
The interaction term is significant (2.50e-09), so the relationship between a fish’s age and length is significantly dependent on its scale radius.
mod.fish <- lm(Age ~ Length * Scale)
mod.fish2 <- lm(Age ~ Length + Scale)
anova(mod.fish2, mod.fish)
## Analysis of Variance Table
##
## Model 1: Age ~ Length + Scale
## Model 2: Age ~ Length * Scale
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 436 318.36
## 2 435 293.36 1 25.001 37.072 2.504e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Same p-val (2.504e-09), so interaction term is significant. SInce teh interaction term is significant, the relationship between a fish’s age and length is significantly dependent on its scale radius.
swim.data <- read.csv("http://cknudson.com/data/Swim100M.csv")
head(swim.data)
## year time sex
## 1 1905 65.8 M
## 2 1908 65.6 M
## 3 1910 62.8 M
## 4 1912 61.6 M
## 5 1918 61.4 M
## 6 1920 60.4 M
attach(swim.data)
plot(time ~year, col=sex)
levels(sex)
## [1] "F" "M"
###famles are black dots, males are red
### can change caharacer too pch=sex
Do men and women’s 100m swim times improve at different rates? Does the relationship between imrpovement time and year depend on sex? Ask the first way becasue it makes sense to normal poeple. (ie: do we need an interaction term?)
Interaction T-test: create interaction model, and see if interaction term is significant Partial F-test: create complete model with interaction term “*" and reduced model without interaction term “+”
mod.swim <- lm(time ~ year * sex)
summary(mod.swim)
##
## Call:
## lm(formula = time ~ year * sex)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3484 -1.4409 -0.2894 0.5404 15.9783
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 697.30122 39.22143 17.779 < 2e-16 ***
## year -0.32405 0.02010 -16.118 < 2e-16 ***
## sexM -302.46384 56.41163 -5.362 1.49e-06 ***
## year:sexM 0.14992 0.02889 5.189 2.83e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.32 on 58 degrees of freedom
## Multiple R-squared: 0.8935, Adjusted R-squared: 0.8879
## F-statistic: 162.1 on 3 and 58 DF, p-value: < 2.2e-16
The interaction term measures the difference in slopes, which in this case is the difference in improvement rates for men and women. Since the interaction term is significant, the slopes are significantly different. Therefore, men’s and women’s swim times imrpove at different rates.
mod.swim <- lm(time ~ year * sex)
mod.swim2 <- lm(time ~ year + sex)
anova(mod.swim, mod.swim2)
## Analysis of Variance Table
##
## Model 1: time ~ year * sex
## Model 2: time ~ year + sex
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 58 639.15
## 2 59 935.83 -1 -296.68 26.922 2.826e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Again, our p-val for the partial F-test is small, so the slopes are different. Therefore, men’s and women’s swim times imrpove at different rates.
Cupid.data <- read.csv("http://cknudson.com/data/OKCupid.csv")
attach(Cupid.data)
## The following object is masked from wblake:
##
## Age
head(Cupid.data)
## Sex Height IdealMateHeight Age
## 1 F 59 66 28
## 2 F 60 70 36
## 3 F 60 70 39
## 4 F 60 72 26
## 5 F 60 72 30
## 6 F 61 65 26
Do OK Cupid daters’ ideal mate height depend on the sex and/or height of the dater?
Complete F-test (are either of these variables significant) look at the last row of the entire output
mod.cupid.comp <- lm(IdealMateHeight ~ Height + Sex)
summary(mod.cupid.comp)
##
## Call:
## lm(formula = IdealMateHeight ~ Height + Sex)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1770 -1.2642 0.2219 1.2602 5.3020
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.2456 3.9171 10.019 < 2e-16 ***
## Height 0.4930 0.0604 8.162 2.38e-13 ***
## SexM -8.5383 0.5281 -16.168 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.132 on 131 degrees of freedom
## Multiple R-squared: 0.6854, Adjusted R-squared: 0.6806
## F-statistic: 142.7 on 2 and 131 DF, p-value: < 2.2e-16
p-value: < 2.2e-16. Definitely The daters’ ideal mate height depend on the sex and/or height of the dater.
Now, create model lm(IdealMateHeight ~ Height) and interpret the slope. Why does/doesnt this make sense? Compare to our previous MLR model.
mod.cupids <- lm(IdealMateHeight ~ Height)
plot(IdealMateHeight ~ Height, col=Sex)
###female = black; male = red
summary(mod.cupids)
##
## Call:
## lm(formula = IdealMateHeight ~ Height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.6262 -2.6511 -0.1796 2.7022 8.9938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82.47144 4.93571 16.709 < 2e-16 ***
## Height -0.20665 0.07266 -2.844 0.00516 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.676 on 132 degrees of freedom
## Multiple R-squared: 0.05774, Adjusted R-squared: 0.0506
## F-statistic: 8.089 on 1 and 132 DF, p-value: 0.005163
plot(IdealMateHeight ~ Height, col=Sex)
abline(39, .493)
abline(39-8, .493)
Slope = -0.20665. For every one inch increase in a person’s height their ideal mate height decreases by 0.20665. Taller people want shorter partners, and shorter people want taller partners. This doesn’t really make sense. The model when we considered sex and height makes more sense because men want shorter mates and women want taller mates (on average). In general, short people want short mates, and tall people want tall mates.
mod.cupid.comp <- lm(IdealMateHeight ~ Height * Sex)
mod.cupid.red <- lm(IdealMateHeight ~ Height + Sex)
anova(mod.cupid.comp, mod.cupid.red)
## Analysis of Variance Table
##
## Model 1: IdealMateHeight ~ Height * Sex
## Model 2: IdealMateHeight ~ Height + Sex
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 130 595.30
## 2 131 595.59 -1 -0.29097 0.0635 0.8014