R Markdown

The following are ways that we can get a look at the data we are interested in:

data(women)
attach(women)
head(women) #shows the top of the data
##   height weight
## 1     58    115
## 2     59    117
## 3     60    120
## 4     61    123
## 5     62    126
## 6     63    129
summary(women) #summarizes the data
##      height         weight     
##  Min.   :58.0   Min.   :115.0  
##  1st Qu.:61.5   1st Qu.:124.5  
##  Median :65.0   Median :135.0  
##  Mean   :65.0   Mean   :136.7  
##  3rd Qu.:68.5   3rd Qu.:148.0  
##  Max.   :72.0   Max.   :164.0
names(women) #shows the names of the variables in the data
## [1] "height" "weight"
str(women) #combo of head and summary of data 
## 'data.frame':    15 obs. of  2 variables:
##  $ height: num  58 59 60 61 62 63 64 65 66 67 ...
##  $ weight: num  115 117 120 123 126 129 132 135 139 142 ...

We now create the linear model of our data with height as the explanatory variable and weight as the response:

mymod <- lm(weight ~ height) 
mymod 
## 
## Call:
## lm(formula = weight ~ height)
## 
## Coefficients:
## (Intercept)       height  
##      -87.52         3.45

Including Plots

The following plots the data with height as the explanatory variable and weight as the response variable, and we then add the regression line to the plot:

plot(height, weight, ylab = "Weight in Pounds", xlab = "Height in Inches", main = "Women's Height vs. Weight")
abline(-87.52, 3.45)

And print out the intercept and slope of the regression line:

coef(mymod) 
## (Intercept)      height 
##   -87.51667     3.45000

We now create a histogram of the residuals of the data:

myresids <- mymod$residuals
hist(myresids) 

And compare the data to a straight line (this data pretty closely follows a straight line, meaning the residuals are close to normally distrubuted)

qqnorm(myresids)
qqline(myresids) 

We now plot the residuals against the height information to check for equal variance:

plot(mymod$residuals ~ height)
abline(0,0) 

And finally we can print out a summary of various information, including the mean squared error:

summary(mymod) 
## 
## Call:
## lm(formula = weight ~ height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.