1 Question 1

In 1886, Francis Galton presented a data set on a sample of adult British children and their of parents. For each child, he had recorded their adult height and the average of their parents’ heights. For each child, he had recorded their adult height and the average of their parents’ heights. His analysis of the data set the stage for correlation, regression and the bivariate normal distribution.

We are going to keep the analysis simple and just consider one child randomly selected from each set of parents and only consider the following question: “is the height of the child is determined by the height of the parent?”. In particular, we wished to see if the relationship between the two is approximately one to one, i.e., children grow up to have the same height as the average of their parent’s height - allowing for some error.

The data is stored in the file Galton.csv and contains the variables:

Variable Description
child The height (converted to cm) of the child when adult.
parent The average height (converted to cm) of the parents.

1.1 Question of interest/goal of the study

We wish to investigate the relationship between children and their parents’ height and determine if this relationship is one-to-one.

1.2 Read in and inspect the data:

Galton.df=read.csv("Galton.csv", header=T)
plot(child~parent, main="Childs height versus parents average height",data=Galton.df)

1.3 Comment on the plots

1.4 Looking at the general scatter of the plot we can see a moderate positive linear association between the average of the parents height on the X axis and the height of the child on the Y axis. As the average height of the parent increases, the height of the child increases

1.5 Fit an appropriate linear model, including model checks and relevant output.

Galton.lm=lm(child~parent, data=Galton.df)
modelcheck(Galton.lm)

summary(Galton.lm)
## 
## Call:
## lm(formula = child ~ parent, data = Galton.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.9305  -5.8912  -0.6636   6.6326  17.9038 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  66.6189    20.7342   3.213  0.00154 ** 
## parent        0.6115     0.1214   5.036 1.08e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.049 on 195 degrees of freedom
## Multiple R-squared:  0.1151, Adjusted R-squared:  0.1105 
## F-statistic: 25.36 on 1 and 195 DF,  p-value: 1.081e-06
confint(Galton.lm)
##                2.5 %      97.5 %
## (Intercept) 25.72695 107.5109105
## parent       0.37202   0.8510386

1.6 Create a scatter plot with the fitted line from your model superimposed over it.

plot(child~parent, main="Childs height versus parents average height", sub="Solid line = fitted model, dashed line = slope 1",data=Galton.df)

abline(Galton.lm$coef[1],Galton.lm$coef[2])

1.7 Method and Assumption Checks

Since we have a linear relationship in the data, we have fitted a simple linear regression model to our data. We have a sample of families, but no information on how these were obtained, so we have to assume they were randomly selected. However, as this study was conducted so long ago when good statistical practice wasn’t understood this is unlikely to be the case. There could be doubts about independence. (However problems with multiple children from the same family was solved by randomly choosing one child from each family.) The residuals show patternless scatter with fairly constant variability - so no problems. The normality checks don’t show any major problems (slightly short tails, if anything) and the Cook’s plot doesn’t reveal any unduly influential points. Overall, all the model assumptions are satisfied.

1.7.1 Complete the equation below:

Our model is:

\(child_i=\beta_0 +\beta_1\times parent_i+\epsilon_i\) where \(\epsilon_i \sim iid ~ N(0,\sigma^2)\)

1.7.2 Complete the statement

Our model only explains 11% of the variation in the response variable.

1.8 Executive Summary

We are interested in evaluating whether the height of the child is determined by the height of the parent. In particular, we wished to see if the relationship between the two is approximately one to one, i.e., children grow up to have the same height as the average of their parents’ height.

We have strong evidence that of an increasing relationship between the height of child and the average of parents’ height.

We estimate that for every cm increase of average parent height, the child’s height will increase by somewhere between 0.37 and 0.85cm, for example if the parents height increased by 5cm, the child’s height would only increase by 1.85 and 4.25cm

This is not consistent with a one to one relationship between average parent height and child’s height.