Ways to look at the data:
data(women)
attach(women)
summary(women)
## height weight
## Min. :58.0 Min. :115.0
## 1st Qu.:61.5 1st Qu.:124.5
## Median :65.0 Median :135.0
## Mean :65.0 Mean :136.7
## 3rd Qu.:68.5 3rd Qu.:148.0
## Max. :72.0 Max. :164.0
names(women)
## [1] "height" "weight"
str(women)
## 'data.frame': 15 obs. of 2 variables:
## $ height: num 58 59 60 61 62 63 64 65 66 67 ...
## $ weight: num 115 117 120 123 126 129 132 135 139 142 ...
dim(women)
## [1] 15 2
datmod <- lm(weight ~ height) #weight is the response variable to height
datmod #gives the coefficent and slope intercept of the how your height affects your weight
##
## Call:
## lm(formula = weight ~ height)
##
## Coefficients:
## (Intercept) height
## -87.52 3.45
This plots the data as weight as the response variable and height as the explanatory variable. The abline function adds the regression line to the plot.
plot(height, weight, ylab = "Height(inches)",
xlab = "Weight(lbs)",
main = "Women's Weight and Height")
abline(-87.52,3.45)
This gives the residuals of the data and then creates a histogram of the residuals.
datresids <- datmod$residuals
datresids
## 1 2 3 4 5 6
## 2.41666667 0.96666667 0.51666667 0.06666667 -0.38333333 -0.83333333
## 7 8 9 10 11 12
## -1.28333333 -1.73333333 -1.18333333 -1.63333333 -1.08333333 -0.53333333
## 13 14 15
## 0.01666667 1.56666667 3.11666667
hist(datresids)
#hard to tell if residuals are normally distributed
Below compares the residuals to the normal distribution line. As seen, they closely follow the line. Therefore, there is evidence the residuals are normally distributed.
qqnorm(datresids)
qqline(datresids)
Next, we check the vertical spread of the residuals. The spread was different across women’s height so there is some evidence of heteroscedasticity.
plot(datmod$residuals ~ height)
abline(0,0)
Finally, here is the summary of different types of information, including mean square error, coefficients, and residual data.
summary(datmod)
##
## Call:
## lm(formula = weight ~ height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.