To see a regression analysis for measured weights (weight) and reported weights (repwt) of men and women engaged in regular exercise, we look at the Davis data set in the car package.
library(car)
data(Davis)
head(Davis)
## sex weight height repwt repht
## 1 M 77 182 77 180
## 2 F 58 161 51 159
## 3 F 53 161 54 158
## 4 M 68 177 70 175
## 5 F 59 157 59 155
## 6 M 76 170 76 165
In order to show the relationship of repwt to weight with repwt being the predictor variable for weight, we conduct a simple linear regression.
davis.mod <- lm(weight ~repwt, data=Davis)
Before continuing, we are going to plot the relationship to look for any outliers.
We notice that there is an outlier and when we graph it we see that it is Observation 12.
scatterplot(weight ~ repwt, data=Davis, smooth=FALSE, id.n=1)
## 12
## 12
We remove this outlier:
davis.mod2 <- update(davis.mod, subset=-12)
Finally, we show the updated regression:
summary(davis.mod2)
##
## Call:
## lm(formula = weight ~ repwt, data = Davis, subset = -12)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5296 -1.1010 -0.1322 1.1287 6.3891
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.73380 0.81479 3.355 0.000967 ***
## repwt 0.95837 0.01214 78.926 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.254 on 180 degrees of freedom
## (17 observations deleted due to missingness)
## Multiple R-squared: 0.9719, Adjusted R-squared: 0.9718
## F-statistic: 6229 on 1 and 180 DF, p-value: < 2.2e-16