09/05/2020

Overview

We use the 1880’s Galton data and see if we can predict a child’s height only by the height of both his parents using a simple linear model.

Raw data plot

We plot the original data as it is using plotly

linear model

Now we fit a linear model

mf<-lm(height~mother+father,data=Galton)

Model

We take a look at our linear model

## 
## Call:
## lm(formula = height ~ mother + father, data = Galton)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.136 -2.700 -0.181  2.768 11.689 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 22.30971    4.30690   5.180 2.74e-07 ***
## mother       0.28321    0.04914   5.764 1.13e-08 ***
## father       0.37990    0.04589   8.278 4.52e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.386 on 895 degrees of freedom
## Multiple R-squared:  0.1089, Adjusted R-squared:  0.1069 
## F-statistic: 54.69 on 2 and 895 DF,  p-value: < 2.2e-16

Interpretation and Conclusion

Father’s height and mother’s height are good indicators of the child’s (low p value) but the model returns a low R² and a relatively high MSE which means it can be improved if we added more predictors