discussion 11

Modeling

model <- lm(df$Height~df$FatherHeight)
model

## 
## Call:
## lm(formula = df$Height ~ df$FatherHeight)
## 
## Coefficients:
##     (Intercept)  df$FatherHeight  
##         39.1104           0.3994

As we can see here our model was created under the name “model” so by just calling it’s name in a command it will show us our coefficients. We could now create a regression equation from this output:

\[predictedheight=39.1104+observedfather′sheight∗.3994\]

summary(model)

## 
## Call:
## lm(formula = df$Height ~ df$FatherHeight)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.2683  -2.6689  -0.2092   2.6342  11.9329 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     39.11039    3.22706  12.120   <2e-16 ***
## df$FatherHeight  0.39938    0.04658   8.574   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.446 on 896 degrees of freedom
## Multiple R-squared:  0.07582,    Adjusted R-squared:  0.07479 
## F-statistic: 73.51 on 1 and 896 DF,  p-value: < 2.2e-16

The model can only explain 7.582% of the data variation.

qqnorm(resid(model))
qqline(resid(model))

As we can see from this plot our errors follow the straight line decently so we will say this assumption is met and discuss possible issues. The points off the line tell us that we might have skewed data or, the most likely situation, we have extreme values in our data that don’t fit well into a normal distribution.

discussion 11

Wei Zhou

11/10/2019

Visualization

Modeling