Raven Shan
library(car)
data(Davis)
str(Davis)
'data.frame': 200 obs. of 5 variables:
$ sex : Factor w/ 2 levels "F","M": 2 1 1 2 1 2 2 2 2 2 ...
$ weight: int 77 58 53 68 59 76 76 69 71 65 ...
$ height: int 182 161 161 177 157 170 167 186 178 171 ...
$ repwt : int 77 51 54 70 59 76 77 73 71 64 ...
$ repht : int 180 159 158 175 155 165 165 180 175 170 ...
library(ggplot2)
ggplot (Davis, aes (x = height)) + geom_histogram()
m1 <- lm(weight ~ height, data = Davis)
summary(m1)
Call:
lm(formula = weight ~ height, data = Davis)
Residuals:
Min 1Q Median 3Q Max
-23.696 -9.506 -2.818 6.372 127.145
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.26623 14.95042 1.690 0.09260 .
height 0.23841 0.08772 2.718 0.00715 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.86 on 198 degrees of freedom
Multiple R-squared: 0.03597, Adjusted R-squared: 0.0311
F-statistic: 7.387 on 1 and 198 DF, p-value: 0.007152
For those 0 cm in height, their weight would be 25.266 kg. One unit increase in height means a .2384kg increase in weight.
m2<- lm(weight ~ sex, data = Davis)
summary(m2)
Call:
lm(formula = weight ~ sex, data = Davis)
Residuals:
Min 1Q Median 3Q Max
-21.898 -6.874 -1.866 5.102 108.134
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.866 1.150 50.32 <2e-16 ***
sexM 18.032 1.733 10.40 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.17 on 198 degrees of freedom
Multiple R-squared: 0.3534, Adjusted R-squared: 0.3501
F-statistic: 108.2 on 1 and 198 DF, p-value: < 2.2e-16
Notice that model 2 is different than model 1 in that the independent variable, sex, is a factor variable. Here, the intercept represents the value of the dependent variable for the reference group (females). An intercept of 57.866 would mean that females weigh 57.866 kg on average. The slope represents how many more or less kgs the “dummy” group (males) weighs in comparison to the reference group (females). A slope of 18.032 means that males weigh 18.032 kg more than females.
m3 <- lm (weight ~ sex + height, data = Davis)
summary(m3)
Call:
lm(formula = weight ~ sex + height, data = Davis)
Residuals:
Min 1Q Median 3Q Max
-24.718 -7.159 -1.472 5.487 74.726
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 109.11420 14.20513 7.681 7.22e-13 ***
sexM 22.49801 2.08690 10.781 < 2e-16 ***
height -0.31298 0.08649 -3.619 0.000376 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11.81 on 197 degrees of freedom
Multiple R-squared: 0.3937, Adjusted R-squared: 0.3875
F-statistic: 63.95 on 2 and 197 DF, p-value: < 2.2e-16
Controlling for height, males weigh 22.49801 kg more than females on average. Controlling for sex, for every unit increase in height, weight decreases by 0.31298 kg.
library(texreg)
screenreg (list(m1, m2, m3))
==============================================
Model 1 Model 2 Model 3
----------------------------------------------
(Intercept) 25.27 57.87 *** 109.11 ***
(14.95) (1.15) (14.21)
height 0.24 ** -0.31 ***
(0.09) (0.09)
sexM 18.03 *** 22.50 ***
(1.73) (2.09)
----------------------------------------------
R^2 0.04 0.35 0.39
Adj. R^2 0.03 0.35 0.39
Num. obs. 200 200 200
RMSE 14.86 12.17 11.81
==============================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Describing R^2:
A higher R-squared means that the independent variable is more powerful in explaining the total variations in the dependent variable. Model 3 is preferred because the R-squared value is the highest.Therefore, it can be said that part of the ratio difference in weight can be explained by the ratio difference in height and sex. Part of the reason males weigh more than females is because they are typically taller.
m4 <- lm(weight ~ sex*height, data = Davis)
summary(m4)
Call:
lm(formula = weight ~ sex * height, data = Davis)
Residuals:
Min 1Q Median 3Q Max
-23.091 -6.331 -0.995 6.207 41.230
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 160.49748 13.45954 11.924 < 2e-16 ***
sexM -261.82753 32.72161 -8.002 1.05e-13 ***
height -0.62679 0.08199 -7.644 9.17e-13 ***
sexM:height 1.62239 0.18644 8.702 1.33e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.06 on 196 degrees of freedom
Multiple R-squared: 0.5626, Adjusted R-squared: 0.556
F-statistic: 84.05 on 3 and 196 DF, p-value: < 2.2e-16
The t value (1.33e-15) is higher than .05 which means that the effect of height on weight (or weight on height) does not depend on sex; they are not significant.