I chose to use the data titled, women. It provides data on average heights and weights ffor american women.

data(women)
attach(women)

Next I will make a scatter plot to look at the relationship between women’s height and weight.

plot(height,weight, ylab = "Weight (lbs)", xlab = "Height (in)", main = "American Women's Average Height and Weight")

There is definitely a positvie correation between Hegith and weight. I looks like there is a linear relationship. There are not many data points so they are not packed too tightly together. It looks like it has homoscedasticity or equal variance.

Next, I will find the regression line. In this case the response variable is weight and the explanatory variable is height.

womenmod <- lm(weight ~ height)
womenmod
## 
## Call:
## lm(formula = weight ~ height)
## 
## Coefficients:
## (Intercept)       height  
##      -87.52         3.45

By looking at our slpe, we can say that for each additonal inch in a woman’s height, weight increases by 3.45 pounds. We cannot interpret our intercept because we do not have data points around 0. It is not realistic to think of a women’s height (the explanatory variable) as 0. Therefore, interpreting our intercept would be consdered extrapolation.

Next, I will add the regressin line to my scatter plot. I added the specification col=4 in the abline command so the line would be blue!!

plot(height,weight, ylab = "Weight (lbs)", xlab = "Height (in)", main = "American Women's Average Height and Weight")
abline(-87.52,3.45, col=4)

Next, I want to use my regression line to make a prediction of weight. First, I had to use some indexing to find my own height in the data.

myheight<-women[12,"height"]
wt.prediction<- -87.52 +(3.45*myheight)
wt.prediction
## [1] 150.53

Using the regression line, this is predicting that on average, a woman of my height from the ages of 30-39 would on average weight 150.53 lbs.

Next I will look at risudual for my height.

wt.prediction-women[12,2]
## [1] 0.53

This means that our prediction is .53 too high.

Next, I will check model assumptions.

  1. Errors will have a nomra distribution. Let’s look at this by making a historgram of the residuals.We can also plot the quantiles of the residuals against the quantiles of a normal distribution. If the residuals are normally distributed, they’ll follow a straight line.
womenresid<- womenmod$residuals
hist(womenresid)

qqnorm(womenresid)
qqline(womenresid)

It doesn’t look too bad, but still not great.

  1. Homoscedasticity

We will look at this by looking at the original data

plot(height,weight, ylab = "Weight (lbs)", xlab = "Height (in)", main = "American Women's Average Height and Weight")

It is hard to tell with so little data points in this set.

Next, we will look at the mean square error.

summary(womenmod)
## 
## Call:
## lm(formula = weight ~ height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

Above you can see the mean square error is 1.525. You can also see there is 13 degrees of freedom. This is becuase there are 15 data points and we are estimating 2 regeeression coeffeicents (15-2=13).