Student Details

Felicity Draper (s3742570)

Problem Statement

The objective of this investigation is to determine if the variable ‘Height’, from the Body Measurements Data set, fits a normal distribution. The variable refers to the respondent’s height in cm, and is investigated separately in men and women. This will be determined by calculating descriptive statistics, plotting a histogram of the empirical data and comparing it visually to a normal distribution curve.

Load Packages

# This is a chunk to load the necessary packages required for the report
library(readxl)
library(knitr)
library(dplyr)

Data

The Body Measurements data was downloaded from ‘Exploring Relationships in Body Dimensions’ (Heinz et al, 2003) and imported into RStudio for preprocessing and analysis.

# This is a chunk for importing and pre-processing data 
bdims<- read_excel("bdims.xlsx")
bdims_hgt<- select(bdims, hgt, sex)
bdims_hgt$sex<- bdims$sex %>% factor(levels = c(1, 0), labels = c("Male", "Female"))
colnames(bdims_hgt)<- c("Height", "Sex")
str(bdims_hgt)
tibble [507 × 2] (S3: tbl_df/tbl/data.frame)
 $ Height: num [1:507] 174 175 194 186 187 ...
 $ Sex   : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...

Summary Statistics

The summarise() function has been used to calculate descriptive statistics of the Height variable. The data has been grouped by sex.

# This following chunk calculates summary statistics
bdims_summary<- bdims_hgt %>% group_by(Sex) %>% summarise(Mean = mean(Height), 
                                          SD = sd(Height),
                                          IQR = IQR(Height),
                                          Minimum = min(Height),
                                          Q1 = quantile(Height, probs =0.25),
                                          Median = median(Height),
                                          Q3 = quantile(Height, probs =0.75),
                                          Maximum = max(Height),
                                          .groups = 'drop'
                                          )



knitr::kable(bdims_summary, caption = "Height Summary Statistics by Sex")
Height Summary Statistics by Sex
Sex Mean SD IQR Minimum Q1 Median Q3 Maximum
Male 177.7453 7.183629 9.75 157.2 172.9 177.8 182.65 198.1
Female 164.8723 6.544602 9.50 147.2 160.0 164.5 169.50 182.9

Distribution Fitting

Separate histograms have been plotted using the empirical distribution of Male and Female height variable. Over these, a curve representing a normal distribution has been plotted to allow visual analysis of whether height in males and females follows a normal distribution.

# This chunk filters male and female results into two data frames for easy plotting
bdims_Male<- filter(bdims_hgt, Sex == 'Male')
bdims_Female<- filter(bdims_hgt, Sex == 'Female')


# This chunk makes the required graph with distribution fitting for females
x<- 0
mu_f<- 164.8
sd_f<- 6.5
female_plot<- hist(bdims_Female$Height, breaks = 12, main = "Female Height Distribution",
     col = "lightblue",
     xlab = "Height (cm)",
     ylim = c(0, 0.07),
     prob = TRUE)
curve(expr = dnorm(x, mean = mu_f, sd = sd_f), xlim = c(mu_f-sd_f*4, mu_f+sd_f*4), lwd= 2, add = TRUE)


# This chunk makes the required graph with distribution fitting for males
x<- 0
mu_m<- 177.7
sd_m<- 7.2
male_plot<- hist(bdims_Male$Height, breaks = 10, main = "Male Height Distribution",
     col = "pink",
     xlab = "Height (cm)",
     ylim = c(0, 0.07),
     prob = TRUE)
curve(expr = dnorm(x, mean = mu_m, sd = sd_m), xlim = c(mu_m-sd_m*4, mu_m+sd_m*4), lwd= 2, add = TRUE)

Interpretation

By interpreting the plotted data we can see that the Male Height Distribution follows a normal distribution curve to reasonable proximity. However, the empirical data of the Female Height Distribution does not match a normal distribution curve and instead we can see that it is slightly right-skewed. This visual information for the female curve is supported by the descriptive statistics, as the mean, 164.9 cm, is slightly greater than the median, 164.5 cm.

References

Heinz G, Peterson LJ, Johnson RW, Kerk CJ. 2003. Exploring Relationships in Body Dimensions. Journal of Statistics Education 11(2)

LS0tCnRpdGxlOiAiTUFUSDEzMjQgQXNzaWdubWVudCAxIgpzdWJ0aXRsZTogTW9kZWxpbmcgQm9keSBNZWFzdXJlbWVudHMKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OgogICAgZGZfcHJpbnQ6IHBhZ2VkCi0tLQoKIyMgU3R1ZGVudCBEZXRhaWxzCgoKRmVsaWNpdHkgRHJhcGVyIChzMzc0MjU3MCkKCgojIyBQcm9ibGVtIFN0YXRlbWVudAoKVGhlIG9iamVjdGl2ZSBvZiB0aGlzIGludmVzdGlnYXRpb24gaXMgdG8gZGV0ZXJtaW5lIGlmIHRoZSB2YXJpYWJsZSAnSGVpZ2h0JywgZnJvbSB0aGUgQm9keSBNZWFzdXJlbWVudHMgRGF0YSBzZXQsIGZpdHMgYSBub3JtYWwgZGlzdHJpYnV0aW9uLiBUaGUgdmFyaWFibGUgcmVmZXJzIHRvIHRoZSByZXNwb25kZW50J3MgaGVpZ2h0IGluIGNtLCBhbmQgaXMgaW52ZXN0aWdhdGVkIHNlcGFyYXRlbHkgaW4gbWVuIGFuZCB3b21lbi4gVGhpcyB3aWxsIGJlIGRldGVybWluZWQgYnkgY2FsY3VsYXRpbmcgZGVzY3JpcHRpdmUgc3RhdGlzdGljcywgcGxvdHRpbmcgYSBoaXN0b2dyYW0gb2YgdGhlIGVtcGlyaWNhbCBkYXRhIGFuZCBjb21wYXJpbmcgaXQgdmlzdWFsbHkgdG8gYSBub3JtYWwgZGlzdHJpYnV0aW9uIGN1cnZlLiAKCgoKIyMgTG9hZCBQYWNrYWdlcwoKYGBge3J9CiMgVGhpcyBpcyBhIGNodW5rIHRvIGxvYWQgdGhlIG5lY2Vzc2FyeSBwYWNrYWdlcyByZXF1aXJlZCBmb3IgdGhlIHJlcG9ydApsaWJyYXJ5KHJlYWR4bCkKbGlicmFyeShrbml0cikKbGlicmFyeShkcGx5cikKYGBgCgojIyBEYXRhCgpUaGUgQm9keSBNZWFzdXJlbWVudHMgZGF0YSB3YXMgZG93bmxvYWRlZCBmcm9tICdFeHBsb3JpbmcgUmVsYXRpb25zaGlwcyBpbiBCb2R5IERpbWVuc2lvbnMnIChIZWlueiBldCBhbCwgMjAwMykgYW5kIGltcG9ydGVkIGludG8gUlN0dWRpbyBmb3IgcHJlcHJvY2Vzc2luZyBhbmQgYW5hbHlzaXMuIAoKCmBgYHtyfQojIFRoaXMgaXMgYSBjaHVuayBmb3IgaW1wb3J0aW5nIGFuZCBwcmUtcHJvY2Vzc2luZyBkYXRhIApiZGltczwtIHJlYWRfZXhjZWwoImJkaW1zLnhsc3giKQpiZGltc19oZ3Q8LSBzZWxlY3QoYmRpbXMsIGhndCwgc2V4KQpiZGltc19oZ3Qkc2V4PC0gYmRpbXMkc2V4ICU+JSBmYWN0b3IobGV2ZWxzID0gYygxLCAwKSwgbGFiZWxzID0gYygiTWFsZSIsICJGZW1hbGUiKSkKY29sbmFtZXMoYmRpbXNfaGd0KTwtIGMoIkhlaWdodCIsICJTZXgiKQpzdHIoYmRpbXNfaGd0KQoKYGBgCgoKCiMjIFN1bW1hcnkgU3RhdGlzdGljcwoKVGhlIHN1bW1hcmlzZSgpIGZ1bmN0aW9uIGhhcyBiZWVuIHVzZWQgdG8gY2FsY3VsYXRlIGRlc2NyaXB0aXZlIHN0YXRpc3RpY3Mgb2YgdGhlIEhlaWdodCB2YXJpYWJsZS4gVGhlIGRhdGEgaGFzIGJlZW4gZ3JvdXBlZCBieSBzZXguIAoKCmBgYHtyfQojIFRoaXMgZm9sbG93aW5nIGNodW5rIGNhbGN1bGF0ZXMgc3VtbWFyeSBzdGF0aXN0aWNzCmJkaW1zX3N1bW1hcnk8LSBiZGltc19oZ3QgJT4lIGdyb3VwX2J5KFNleCkgJT4lIHN1bW1hcmlzZShNZWFuID0gbWVhbihIZWlnaHQpLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgU0QgPSBzZChIZWlnaHQpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBJUVIgPSBJUVIoSGVpZ2h0KSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTWluaW11bSA9IG1pbihIZWlnaHQpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBRMSA9IHF1YW50aWxlKEhlaWdodCwgcHJvYnMgPTAuMjUpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBNZWRpYW4gPSBtZWRpYW4oSGVpZ2h0KSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgUTMgPSBxdWFudGlsZShIZWlnaHQsIHByb2JzID0wLjc1KSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTWF4aW11bSA9IG1heChIZWlnaHQpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAuZ3JvdXBzID0gJ2Ryb3AnCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICkKCgoKa25pdHI6OmthYmxlKGJkaW1zX3N1bW1hcnksIGNhcHRpb24gPSAiSGVpZ2h0IFN1bW1hcnkgU3RhdGlzdGljcyBieSBTZXgiKQpgYGAKCgoKCiMjIERpc3RyaWJ1dGlvbiBGaXR0aW5nCgoKU2VwYXJhdGUgaGlzdG9ncmFtcyBoYXZlIGJlZW4gcGxvdHRlZCB1c2luZyB0aGUgZW1waXJpY2FsIGRpc3RyaWJ1dGlvbiBvZiBNYWxlIGFuZCBGZW1hbGUgaGVpZ2h0IHZhcmlhYmxlLiBPdmVyIHRoZXNlLCBhIGN1cnZlIHJlcHJlc2VudGluZyBhIG5vcm1hbCBkaXN0cmlidXRpb24gaGFzIGJlZW4gcGxvdHRlZCB0byBhbGxvdyB2aXN1YWwgYW5hbHlzaXMgb2Ygd2hldGhlciBoZWlnaHQgaW4gbWFsZXMgYW5kIGZlbWFsZXMgZm9sbG93cyBhIG5vcm1hbCBkaXN0cmlidXRpb24uIAoKCmBgYHtyfQojIFRoaXMgY2h1bmsgZmlsdGVycyBtYWxlIGFuZCBmZW1hbGUgcmVzdWx0cyBpbnRvIHR3byBkYXRhIGZyYW1lcyBmb3IgZWFzeSBwbG90dGluZwpiZGltc19NYWxlPC0gZmlsdGVyKGJkaW1zX2hndCwgU2V4ID09ICdNYWxlJykKYmRpbXNfRmVtYWxlPC0gZmlsdGVyKGJkaW1zX2hndCwgU2V4ID09ICdGZW1hbGUnKQoKCiMgVGhpcyBjaHVuayBtYWtlcyB0aGUgcmVxdWlyZWQgZ3JhcGggd2l0aCBkaXN0cmlidXRpb24gZml0dGluZyBmb3IgZmVtYWxlcwp4PC0gMAptdV9mPC0gMTY0LjgKc2RfZjwtIDYuNQpmZW1hbGVfcGxvdDwtIGhpc3QoYmRpbXNfRmVtYWxlJEhlaWdodCwgYnJlYWtzID0gMTIsIG1haW4gPSAiRmVtYWxlIEhlaWdodCBEaXN0cmlidXRpb24iLAogICAgIGNvbCA9ICJsaWdodGJsdWUiLAogICAgIHhsYWIgPSAiSGVpZ2h0IChjbSkiLAogICAgIHlsaW0gPSBjKDAsIDAuMDcpLAogICAgIHByb2IgPSBUUlVFKQpjdXJ2ZShleHByID0gZG5vcm0oeCwgbWVhbiA9IG11X2YsIHNkID0gc2RfZiksIHhsaW0gPSBjKG11X2Ytc2RfZio0LCBtdV9mK3NkX2YqNCksIGx3ZD0gMiwgYWRkID0gVFJVRSkKCiMgVGhpcyBjaHVuayBtYWtlcyB0aGUgcmVxdWlyZWQgZ3JhcGggd2l0aCBkaXN0cmlidXRpb24gZml0dGluZyBmb3IgbWFsZXMKeDwtIDAKbXVfbTwtIDE3Ny43CnNkX208LSA3LjIKbWFsZV9wbG90PC0gaGlzdChiZGltc19NYWxlJEhlaWdodCwgYnJlYWtzID0gMTAsIG1haW4gPSAiTWFsZSBIZWlnaHQgRGlzdHJpYnV0aW9uIiwKICAgICBjb2wgPSAicGluayIsCiAgICAgeGxhYiA9ICJIZWlnaHQgKGNtKSIsCiAgICAgeWxpbSA9IGMoMCwgMC4wNyksCiAgICAgcHJvYiA9IFRSVUUpCmN1cnZlKGV4cHIgPSBkbm9ybSh4LCBtZWFuID0gbXVfbSwgc2QgPSBzZF9tKSwgeGxpbSA9IGMobXVfbS1zZF9tKjQsIG11X20rc2RfbSo0KSwgbHdkPSAyLCBhZGQgPSBUUlVFKQoKYGBgCgojIyBJbnRlcnByZXRhdGlvbgoKQnkgaW50ZXJwcmV0aW5nIHRoZSBwbG90dGVkIGRhdGEgd2UgY2FuIHNlZSB0aGF0IHRoZSBNYWxlIEhlaWdodCBEaXN0cmlidXRpb24gZm9sbG93cyBhIG5vcm1hbCBkaXN0cmlidXRpb24gY3VydmUgdG8gcmVhc29uYWJsZSBwcm94aW1pdHkuIEhvd2V2ZXIsIHRoZSBlbXBpcmljYWwgZGF0YSBvZiB0aGUgRmVtYWxlIEhlaWdodCBEaXN0cmlidXRpb24gZG9lcyBub3QgbWF0Y2ggYSBub3JtYWwgZGlzdHJpYnV0aW9uIGN1cnZlIGFuZCBpbnN0ZWFkIHdlIGNhbiBzZWUgdGhhdCBpdCBpcyBzbGlnaHRseSByaWdodC1za2V3ZWQuIFRoaXMgdmlzdWFsIGluZm9ybWF0aW9uIGZvciB0aGUgZmVtYWxlIGN1cnZlIGlzIHN1cHBvcnRlZCBieSB0aGUgZGVzY3JpcHRpdmUgc3RhdGlzdGljcywgYXMgdGhlIG1lYW4sIDE2NC45IGNtLCBpcyBzbGlnaHRseSBncmVhdGVyIHRoYW4gdGhlIG1lZGlhbiwgMTY0LjUgY20uCgojIyBSZWZlcmVuY2VzCgpIZWlueiBHLCBQZXRlcnNvbiBMSiwgSm9obnNvbiBSVywgS2VyayBDSi4gMjAwMy4gRXhwbG9yaW5nIFJlbGF0aW9uc2hpcHMgaW4gQm9keSBEaW1lbnNpb25zLiBKb3VybmFsIG9mIFN0YXRpc3RpY3MgRWR1Y2F0aW9uIDExKDIpCgoK