R Markdown

Ways to look at the data:

data(women)
attach(women)
summary(women)
##      height         weight     
##  Min.   :58.0   Min.   :115.0  
##  1st Qu.:61.5   1st Qu.:124.5  
##  Median :65.0   Median :135.0  
##  Mean   :65.0   Mean   :136.7  
##  3rd Qu.:68.5   3rd Qu.:148.0  
##  Max.   :72.0   Max.   :164.0
names(women)
## [1] "height" "weight"
str(women) 
## 'data.frame':    15 obs. of  2 variables:
##  $ height: num  58 59 60 61 62 63 64 65 66 67 ...
##  $ weight: num  115 117 120 123 126 129 132 135 139 142 ...
dim(women)
## [1] 15  2
datmod <- lm(weight ~ height)   #weight is the response variable to height
datmod #gives the coefficent and slope intercept of the how your height affects your weight
## 
## Call:
## lm(formula = weight ~ height)
## 
## Coefficients:
## (Intercept)       height  
##      -87.52         3.45

Plots

This plots the data as weight as the response variable and height as the explanatory variable. The abline function adds the regression line to the plot.

plot(height, weight, ylab = "Height(inches)",
     xlab = "Weight(lbs)", 
     main = "Women's Weight and Height")
abline(-87.52,3.45)    

Residuals

This gives the residuals of the data and then creates a histogram of the residuals.

datresids <- datmod$residuals  
datresids
##           1           2           3           4           5           6 
##  2.41666667  0.96666667  0.51666667  0.06666667 -0.38333333 -0.83333333 
##           7           8           9          10          11          12 
## -1.28333333 -1.73333333 -1.18333333 -1.63333333 -1.08333333 -0.53333333 
##          13          14          15 
##  0.01666667  1.56666667  3.11666667
hist(datresids)  

#hard to tell if residuals are normally distributed

Below compares the residuals to the normal distribution line. As seen, they closely follow the line. Therefore, there is evidence the residuals are normally distributed.

qqnorm(datresids) 
qqline(datresids) 

Next, we check the vertical spread of the residuals. The spread was different across women’s height so there is some evidence of heteroscedasticity.

plot(datmod$residuals ~ height)
abline(0,0)    

Finally, here is the summary of different types of information, including mean square error, coefficients, and residual data.

summary(datmod)
## 
## Call:
## lm(formula = weight ~ height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.