(1) Overview

The data below are the heights of fathers and their songs, based on an experiment by Karl Pearson in 1900. Heights are rounded to the nearest 0.1 inch. From my understanding of data, the most repeated example of normal distributions has always been heights. Due to my sense of curiosity, I decided to see if that is in the case.

HYPOTHESIS: Similar to the measurements in heights, the differences in heights between each son and his father should also normally distributed.

(2) Exploring the Data


Figure 1: Based on the appearance of the histogram, it appears that the frequencies follow a normal distribution. The mean is centered towards the center of the histogram and carries the largest of the frequencies.
Figure 2: Another way to judge the data’s normality would be plotting it into a Q-Q plot. A Q-Q plot displays the distribution of the data against the expected normal distribution. Visually, most of the data points do follow the red line but a few outliers exist.
Figure 3: Using the standard deviation and mean from each collection of heights, the density histogram gives the reliative likelihood of a point (height) fallingw within a specific range. Plotted against our observed data, the the probabilities of obtaining any given value appear to follow that of a normal distribution.

(3) Results


Figure 4: Son’s and father’s heights plotted against each other, with the height of fathers being the X variable and sons being the Y variable. Note there isn’t a presence of a strong correlation and a simple linear relationship is very unlikely, due to various other variables that include genes from the son’s mothers.

Height differences


Figure 5: As we plot the height differences between a father’s height and his son’s, we see that the density frequencies closely resemble that of a normal distribution.
Figure 6: Height differences hug the Q-Q plot line pretty accurately.

(4) Conclusion

Heights were plotted via a histogram and a density curve, noting they appear to be normally distributed as expected. Similar to the unadjusted heights, the height differences appear to follow the boundaries proposed by the bell curve and ploted against a Q-Q line confirms my hypothesis.