Predict son's height using father's height
Guanglai Li
Guanglai Li
the data
In 1903, Pearson and Lee published a paper On the Laws of Inheritance in Man: I. Inheritance of Physical Characters.
In this paper, they studied the relationship between father's and son's height.
For this purpose, they measured the heights of 1078 father-son pairs.
goal of the project
In this project, we build a predictive model based on this data set to predict son's height
using father's height.
Pearson's father-son height data set has been incorporated into UsingR package. The data has two columns, fheight for father's height and sheight for son's height. Father's height ranges 59.01 to 75.43 inches and son's heigh 58.51 to 78.36 inches.
library(UsingR)
data(father.son)
summary(father.son)
## fheight sheight
## Min. :59.01 Min. :58.51
## 1st Qu.:65.79 1st Qu.:66.93
## Median :67.77 Median :68.62
## Mean :67.69 Mean :68.68
## 3rd Qu.:69.60 3rd Qu.:70.47
## Max. :75.43 Max. :78.36
A linear regression model is used to fit the father.son height data. The model is summarized below.
LM <- lm(fheight ~ sheight, data=father.son)
summary(LM)
##
## Call:
## lm(formula = fheight ~ sheight, data = father.son)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.3590 -1.6406 0.0761 1.6095 7.1044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.10745 1.76826 19.29 <2e-16 ***
## sheight 0.48890 0.02572 19.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.376 on 1076 degrees of freedom
## Multiple R-squared: 0.2513, Adjusted R-squared: 0.2506
## F-statistic: 361.2 on 1 and 1076 DF, p-value: < 2.2e-16
In the following figure, scattered circles represent measured data and the blue line is the model fitting.
plot(father.son$fheight, father.son$sheight, xlab="father's height (inch)", ylab="son's height (inch)")
abline(LM, col='blue')
The model fitting is used to predict son's height. Please visit this interactive shiny application to make predictions.