The following are ways that we can get a look at the data we are interested in:
data(women)
attach(women)
head(women) #shows the top of the data
## height weight
## 1 58 115
## 2 59 117
## 3 60 120
## 4 61 123
## 5 62 126
## 6 63 129
summary(women) #summarizes the data
## height weight
## Min. :58.0 Min. :115.0
## 1st Qu.:61.5 1st Qu.:124.5
## Median :65.0 Median :135.0
## Mean :65.0 Mean :136.7
## 3rd Qu.:68.5 3rd Qu.:148.0
## Max. :72.0 Max. :164.0
names(women) #shows the names of the variables in the data
## [1] "height" "weight"
str(women) #combo of head and summary of data
## 'data.frame': 15 obs. of 2 variables:
## $ height: num 58 59 60 61 62 63 64 65 66 67 ...
## $ weight: num 115 117 120 123 126 129 132 135 139 142 ...
We now create the linear model of our data with height as the explanatory variable and weight as the response:
mymod <- lm(weight ~ height)
mymod
##
## Call:
## lm(formula = weight ~ height)
##
## Coefficients:
## (Intercept) height
## -87.52 3.45
The following plots the data with height as the explanatory variable and weight as the response variable, and we then add the regression line to the plot:
plot(height, weight, ylab = "Weight in Pounds", xlab = "Height in Inches", main = "Women's Height vs. Weight")
abline(-87.52, 3.45)
And print out the intercept and slope of the regression line:
coef(mymod)
## (Intercept) height
## -87.51667 3.45000
We now create a histogram of the residuals of the data:
myresids <- mymod$residuals
hist(myresids)
And compare the data to a straight line (this data pretty closely follows a straight line, meaning the residuals are close to normally distrubuted)
qqnorm(myresids)
qqline(myresids)
We now plot the residuals against the height information to check for equal variance:
plot(mymod$residuals ~ height)
abline(0,0)
And finally we can print out a summary of various information, including the mean squared error:
summary(mymod)
##
## Call:
## lm(formula = weight ~ height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
Note that the echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.