d2i2k
April 5, 2015
The GaltonFamilies(HistData) dataset lists the individual observations for 934 adult children born to 205 fathers and mothers on which Sir Francis Galton (1886) based regression toward the mean. He wrote that, “the average regression of the offspring is a constant fraction of their respective mid-parental deviations.” For height, Galton estimated this regression coefficient to be about two-thirds (2/3).
Galton, F. (1886). “Regression towards mediocrity in hereditary stature”. The Journal of the Anthropological Institute of Great Britain and Ireland 15: 246-263.
Scatterplot of Galton family data with height of the son or daughter in inches on the ordinate (y-axis) and parental mid-height in inches on the abcissa (x-axis).
Gender-specific linear regression models fitted to Galton family data with height of the son or daughter as the dependent variable and parental mid-height as the independent variable.
lm(formula = childHeight ~ midparentHeight, data = subset(GaltonFamilies,
gender == "female"))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.33348 3.60497 5.086 5.38e-07 ***
midparentHeight 0.66075 0.05202 12.701 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.024 on 451 degrees of freedom
Multiple R-squared: 0.2634, Adjusted R-squared: 0.2618
F-statistic: 161.3 on 1 and 451 DF, p-value: < 2.2e-16
lm(formula = childHeight ~ midparentHeight, data = subset(GaltonFamilies,
gender == "male"))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.91346 4.08943 4.869 1.52e-06 ***
midparentHeight 0.71327 0.05912 12.064 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.3 on 479 degrees of freedom
Multiple R-squared: 0.2331, Adjusted R-squared: 0.2314
F-statistic: 145.6 on 1 and 479 DF, p-value: < 2.2e-16
Multiple Scatterplots of Galton family data with height of the son or daughter in inches on the ordinate (y-axis) and parental mid-height in inches on the abcissa (x-axis).
Multiple regression model fitted to Galton family data with child height as the dependent variable, parental mid-height and gender of the child as the independent variables. The estimated common slope or regression coefficient equals 0.687 with 95% confidence interval (0.61, 0.76) covering two-thirds.
lm(formula = childHeight ~ midparentHeight + gender, data = GaltonFamilies)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.51410 2.73392 6.04 2.22e-09 ***
midparentHeight 0.68702 0.03944 17.42 < 2e-16 ***
gendermale 5.21511 0.14216 36.69 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.17 on 931 degrees of freedom
Multiple R-squared: 0.6332, Adjusted R-squared: 0.6324
F-statistic: 803.6 on 2 and 931 DF, p-value: < 2.2e-16