data(women)
head(women)
## height weight
## 1 58 115
## 2 59 117
## 3 60 120
## 4 61 123
## 5 62 126
## 6 63 129
dim(women)
## [1] 15 2
attach(women)
response: weight, predictor: height
plot(weight~height)
From first glance, there seems to be a linear relationship between height and weight of women
myline <- lm(weight~height)
myline
##
## Call:
## lm(formula = weight ~ height)
##
## Coefficients:
## (Intercept) height
## -87.52 3.45
plot(weight~height)
abline(-87.52,3.45)
Intercept: if the person was 0 inches tall, their weight would be -87.52 lbs. This could not actually happen, but that is the basic idea behind it
For every one inch increase in height, there is a 3.45 lbs increase in avg weight
myresid <- myline$residuals
hist(myresid)
qqnorm(myresid)
qqline(myresid)
since there are no plots really far off the line, it looks like it is homoscedasticity
plot(myline$residuals ~ height)
abline(0,0)
I don’t see anything bad with the (0,0) line, but it looks a little strange just becasue there are only 15 data points
summary(myline)
##
## Call:
## lm(formula = weight ~ height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
Tells us that the MSE is 1.525 on 13 degrees of freedom
sqrt(sum((myline$residuals)^2)/13)
## [1] 1.525005
MSE is 1.525, we want it to be low because MSE tells us the difference of actual and predicted lower it is tells us that the line is pretty accurate