Problem Statement

From the given dataset, I have chosen to examine the Weight column, in which holds (in kilograms) the weight of each individual. I will then group them by male and female and observe the distributions of each.

Load Libraries and Packages

suppressMessages(library(readxl))
suppressMessages(library(dplyr))
suppressMessages(library(mosaic))
suppressMessages(library(knitr))
suppressMessages(library(rmarkdown))

Import Data

Simply use the read_excel() function from readxl package. Store as dataset. Then view the structure of the data frame.

dataset <- read_excel('bdims.csv (1).xlsx', sheet = 1)

Tidy Data

As you can see, the sex column is a class num, where as it clearly is a categorical variable. Here, I use the factor() function to transform the 1 and 0 values into Male and Female values. I also relabel the wgt column as Weight for readability.

dataset$Weight <- dataset$wgt
dataset$Sex <- factor(dataset$sex,
                      levels = c(1,0),
                      labels = c('Male', 'Female'))

Form Dataset

Here I use the select() function to grab the two columns I want to analyse. I then view the str() of my clean data frame.

dataset <- select(dataset, Weight, Sex) %>%  as.data.frame()
str(dataset)
'data.frame':   507 obs. of  2 variables:
 $ Weight: num  65.6 71.8 80.7 72.6 78.8 74.8 86.4 78.4 62 81.6 ...
 $ Sex   : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...

Summarise Data

dataset_sex_summary <-  dataset %>% 
                        group_by(Sex) %>% 
                        summarise(Min = min(Weight, na.rm = TRUE),
                                  Q1 = round(quantile(Weight, probs = 0.25, na.rm = TRUE),1),
                                  Median = median(Weight, na.rm = TRUE),
                                  Q3 = quantile(Weight, probs = 0.75, na.rm = TRUE),
                                  Max = max(Weight, na.rm = TRUE),
                                  IQR = Q3 - Q1,
                                  Range = Max - Min,
                                  Mean = round(mean(Weight, na.rm = TRUE),1),
                                  SD = round(sd(Weight, na.rm = TRUE),2),
                                  n = n()) 
knitr::kable(dataset_sex_summary, caption = "Weight Summary grouped by Sex")
Sex Min Q1 Median Q3 Max IQR Range Mean SD n
Male 53.9 71.0 77.3 85.5 116.4 14.5 62.5 78.1 10.51 247
Female 42.0 54.5 59.0 65.6 105.2 11.1 63.2 60.6 9.62 260

Male Histogram with Normal Distribution Overlay

male <- dataset %>% filter(Sex == 'Male')
x <- male$Weight
h <- hist(male$Weight,
          breaks = 30,
          xlab = 'Weight',
          main = 'Male Weight Distribution',
          xlim = c(50,110))
xfit <- seq(min(x),
            max(x), 
            length = 100) 
yfit <- dnorm(xfit, 
              mean = mean(x), 
              sd = sd(x)) 
yfit <- yfit * diff(h$mids[1:2]) * length(x) 
lines(xfit, yfit, col = "blue", lwd = 2)
abline(v = mean(male$Weight), col = 'purple', lwd = 2, lty = 2)
abline(v = median(male$Weight), col = 'orange', lwd = 2, lty = 2)
legend('topright', legend = c('Mean', 'Median'), col = c('purple','orange'), lty = 2)

Female Histogram with Normal Distribution Overlay

female <- dataset %>% filter(Sex == 'Female')
y <- female$Weight
g <- hist(female$Weight,
          breaks = 30,
          xlab = 'Weight',
          main = 'Female Weight Distribution',
          xlim = c(40,90))
xfit <- seq(min(y),
            max(y), 
            length = 100) 
yfit <- dnorm(xfit, 
              mean = mean(y), 
              sd = sd(y)) 
yfit <- yfit * diff(g$mids[1:2]) * length(y) 
lines(xfit, yfit, col = "red", lwd = 2)
abline(v = mean(female$Weight), col = 'purple', lwd = 2, lty = 2)
abline(v = median(female$Weight), col = 'orange', lwd = 2, lty = 2)
legend('topright', legend = c('Mean', 'Median'), col = c('purple','orange'), lty = 2)

Discussion

The male weight data shows a bimodal distribution, this was achieved by selecting a sufficient number of bins in the histogram. If only 5 bins were determined in the breaks argument, it seemingly follows a normal distribution. Although, as I increased the number of breaks, a bimodal distribution became apparent. Another indicator of the bimodal distribution is the first two most common bins (72-74 and 84-86) are either side of both the median (77.3) and mean (78.1). The top of the normal distribution overlay may be close to the mean and the median, but these given modes are not.

As for the female weight data, a more so normal distribution has occurred. Although in this histogram, the mode bin (54-56) is well below the median (59.0) and mean (60.6), indicating a positively skewed normal distribution.

Both categories of data, male and female, are skewed positively, which is also visually indicated by the dotted purple line (the mean), being to the right of the orange line (the median) in both histograms. These slight positive skews could be explained by the occurrence of outliers above the median in both histograms, and the lack of outliers below the median.

Of the 247 observations for the weight of a given male, the interquartile range was 14.5 and standard deviation was 10.41. Whereas for the 260 observations for weight of females, the IQR was 11.1 and standard deviation was 9.62. Since both values are lower than the respective male IQR and , this indicates that the male weight distribution has a larger spread for the majority of the data, despite females having a slightly larger range (males being 62.5, females being 63.2), in turn indicating a flatter normal distribution for female weights.

In conclusion, for the male weight data, a normal distribution does not quite follow whereas a positively skewed bimodal distribution does. For female weight, a positively skewed normal distribution does follow the given data.

LS0tDQp0aXRsZTogIkludHJvIHRvIFN0YXRpc3RpY3MgLSBBc3NpZ25tZW50IDIgLSBTYW11ZWwgSG9sdCAtIDMzODE3MjgiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0DQotLS0NCg0KIyBQcm9ibGVtIFN0YXRlbWVudA0KDQpGcm9tIHRoZSBnaXZlbiBkYXRhc2V0LCBJIGhhdmUgY2hvc2VuIHRvIGV4YW1pbmUgdGhlIFdlaWdodCBjb2x1bW4sIGluIHdoaWNoIGhvbGRzIChpbiBraWxvZ3JhbXMpIHRoZSB3ZWlnaHQgb2YgZWFjaCBpbmRpdmlkdWFsLiBJIHdpbGwgdGhlbiBncm91cCB0aGVtIGJ5IG1hbGUgYW5kIGZlbWFsZSBhbmQgb2JzZXJ2ZSB0aGUgZGlzdHJpYnV0aW9ucyBvZiBlYWNoLiANCg0KIyBMb2FkIExpYnJhcmllcyBhbmQgUGFja2FnZXMNCg0KYGBge3J9DQpzdXBwcmVzc01lc3NhZ2VzKGxpYnJhcnkocmVhZHhsKSkNCnN1cHByZXNzTWVzc2FnZXMobGlicmFyeShkcGx5cikpDQpzdXBwcmVzc01lc3NhZ2VzKGxpYnJhcnkobW9zYWljKSkNCnN1cHByZXNzTWVzc2FnZXMobGlicmFyeShrbml0cikpDQpzdXBwcmVzc01lc3NhZ2VzKGxpYnJhcnkocm1hcmtkb3duKSkNCmBgYA0KDQojIEltcG9ydCBEYXRhDQoNClNpbXBseSB1c2UgdGhlIHJlYWRfZXhjZWwoKSBmdW5jdGlvbiBmcm9tIHJlYWR4bCBwYWNrYWdlLiBTdG9yZSBhcyBkYXRhc2V0LiBUaGVuIHZpZXcgdGhlIHN0cnVjdHVyZSBvZiB0aGUgZGF0YSBmcmFtZS4NCg0KYGBge3J9DQpkYXRhc2V0IDwtIHJlYWRfZXhjZWwoJ2JkaW1zLmNzdiAoMSkueGxzeCcsIHNoZWV0ID0gMSkNCg0KYGBgDQoNCiMgVGlkeSBEYXRhDQoNCkFzIHlvdSBjYW4gc2VlLCB0aGUgc2V4IGNvbHVtbiBpcyBhIGNsYXNzIG51bSwgd2hlcmUgYXMgaXQgY2xlYXJseSBpcyBhIGNhdGVnb3JpY2FsIHZhcmlhYmxlLiBIZXJlLCBJIHVzZSB0aGUgZmFjdG9yKCkgZnVuY3Rpb24gdG8gdHJhbnNmb3JtIHRoZSAxIGFuZCAwIHZhbHVlcyBpbnRvIE1hbGUgYW5kIEZlbWFsZSB2YWx1ZXMuIEkgYWxzbyByZWxhYmVsIHRoZSB3Z3QgY29sdW1uIGFzIFdlaWdodCBmb3IgcmVhZGFiaWxpdHkuICANCg0KYGBge3J9DQpkYXRhc2V0JFdlaWdodCA8LSBkYXRhc2V0JHdndA0KZGF0YXNldCRTZXggPC0gZmFjdG9yKGRhdGFzZXQkc2V4LA0KICAgICAgICAgICAgICAgICAgICAgIGxldmVscyA9IGMoMSwwKSwNCiAgICAgICAgICAgICAgICAgICAgICBsYWJlbHMgPSBjKCdNYWxlJywgJ0ZlbWFsZScpKQ0KDQpgYGANCg0KIyBGb3JtIERhdGFzZXQNCg0KSGVyZSBJIHVzZSB0aGUgc2VsZWN0KCkgZnVuY3Rpb24gdG8gZ3JhYiB0aGUgdHdvIGNvbHVtbnMgSSB3YW50IHRvIGFuYWx5c2UuIEkgdGhlbiB2aWV3IHRoZSBzdHIoKSBvZiBteSBjbGVhbiBkYXRhIGZyYW1lLg0KDQpgYGB7cn0NCmRhdGFzZXQgPC0gc2VsZWN0KGRhdGFzZXQsIFdlaWdodCwgU2V4KSAlPiUgIGFzLmRhdGEuZnJhbWUoKQ0Kc3RyKGRhdGFzZXQpDQpgYGANCg0KIyBTdW1tYXJpc2UgRGF0YQ0KDQpgYGB7cn0NCmRhdGFzZXRfc2V4X3N1bW1hcnkgPC0gIGRhdGFzZXQgJT4lIA0KICAgICAgICAgICAgICAgICAgICAgICAgZ3JvdXBfYnkoU2V4KSAlPiUgDQogICAgICAgICAgICAgICAgICAgICAgICBzdW1tYXJpc2UoTWluID0gbWluKFdlaWdodCwgbmEucm0gPSBUUlVFKSwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBRMSA9IHJvdW5kKHF1YW50aWxlKFdlaWdodCwgcHJvYnMgPSAwLjI1LCBuYS5ybSA9IFRSVUUpLDEpLA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE1lZGlhbiA9IG1lZGlhbihXZWlnaHQsIG5hLnJtID0gVFJVRSksDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgUTMgPSBxdWFudGlsZShXZWlnaHQsIHByb2JzID0gMC43NSwgbmEucm0gPSBUUlVFKSwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBNYXggPSBtYXgoV2VpZ2h0LCBuYS5ybSA9IFRSVUUpLA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIElRUiA9IFEzIC0gUTEsDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgUmFuZ2UgPSBNYXggLSBNaW4sDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTWVhbiA9IHJvdW5kKG1lYW4oV2VpZ2h0LCBuYS5ybSA9IFRSVUUpLDEpLA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFNEID0gcm91bmQoc2QoV2VpZ2h0LCBuYS5ybSA9IFRSVUUpLDIpLA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG4gPSBuKCkpIA0Ka25pdHI6OmthYmxlKGRhdGFzZXRfc2V4X3N1bW1hcnksIGNhcHRpb24gPSAiV2VpZ2h0IFN1bW1hcnkgZ3JvdXBlZCBieSBTZXgiKQ0KYGBgDQoNCiMgTWFsZSBIaXN0b2dyYW0gd2l0aCBOb3JtYWwgRGlzdHJpYnV0aW9uIE92ZXJsYXkNCg0KYGBge3J9DQptYWxlIDwtIGRhdGFzZXQgJT4lIGZpbHRlcihTZXggPT0gJ01hbGUnKQ0KDQp4IDwtIG1hbGUkV2VpZ2h0DQpoIDwtIGhpc3QobWFsZSRXZWlnaHQsDQogICAgICAgICAgYnJlYWtzID0gMzAsDQogICAgICAgICAgeGxhYiA9ICdXZWlnaHQnLA0KICAgICAgICAgIG1haW4gPSAnTWFsZSBXZWlnaHQgRGlzdHJpYnV0aW9uJywNCiAgICAgICAgICB4bGltID0gYyg1MCwxMTApKQ0KeGZpdCA8LSBzZXEobWluKHgpLA0KICAgICAgICAgICAgbWF4KHgpLCANCiAgICAgICAgICAgIGxlbmd0aCA9IDEwMCkgDQp5Zml0IDwtIGRub3JtKHhmaXQsIA0KICAgICAgICAgICAgICBtZWFuID0gbWVhbih4KSwgDQogICAgICAgICAgICAgIHNkID0gc2QoeCkpIA0KeWZpdCA8LSB5Zml0ICogZGlmZihoJG1pZHNbMToyXSkgKiBsZW5ndGgoeCkgDQoNCmxpbmVzKHhmaXQsIHlmaXQsIGNvbCA9ICJibHVlIiwgbHdkID0gMikNCmFibGluZSh2ID0gbWVhbihtYWxlJFdlaWdodCksIGNvbCA9ICdwdXJwbGUnLCBsd2QgPSAyLCBsdHkgPSAyKQ0KYWJsaW5lKHYgPSBtZWRpYW4obWFsZSRXZWlnaHQpLCBjb2wgPSAnb3JhbmdlJywgbHdkID0gMiwgbHR5ID0gMikNCmxlZ2VuZCgndG9wcmlnaHQnLCBsZWdlbmQgPSBjKCdNZWFuJywgJ01lZGlhbicpLCBjb2wgPSBjKCdwdXJwbGUnLCdvcmFuZ2UnKSwgbHR5ID0gMikNCg0KYGBgDQoNCiMgRmVtYWxlIEhpc3RvZ3JhbSB3aXRoIE5vcm1hbCBEaXN0cmlidXRpb24gT3ZlcmxheQ0KDQpgYGB7cn0NCmZlbWFsZSA8LSBkYXRhc2V0ICU+JSBmaWx0ZXIoU2V4ID09ICdGZW1hbGUnKQ0KDQp5IDwtIGZlbWFsZSRXZWlnaHQNCmcgPC0gaGlzdChmZW1hbGUkV2VpZ2h0LA0KICAgICAgICAgIGJyZWFrcyA9IDMwLA0KICAgICAgICAgIHhsYWIgPSAnV2VpZ2h0JywNCiAgICAgICAgICBtYWluID0gJ0ZlbWFsZSBXZWlnaHQgRGlzdHJpYnV0aW9uJywNCiAgICAgICAgICB4bGltID0gYyg0MCw5MCkpDQp4Zml0IDwtIHNlcShtaW4oeSksDQogICAgICAgICAgICBtYXgoeSksIA0KICAgICAgICAgICAgbGVuZ3RoID0gMTAwKSANCnlmaXQgPC0gZG5vcm0oeGZpdCwgDQogICAgICAgICAgICAgIG1lYW4gPSBtZWFuKHkpLCANCiAgICAgICAgICAgICAgc2QgPSBzZCh5KSkgDQp5Zml0IDwtIHlmaXQgKiBkaWZmKGckbWlkc1sxOjJdKSAqIGxlbmd0aCh5KSANCg0KbGluZXMoeGZpdCwgeWZpdCwgY29sID0gInJlZCIsIGx3ZCA9IDIpDQphYmxpbmUodiA9IG1lYW4oZmVtYWxlJFdlaWdodCksIGNvbCA9ICdwdXJwbGUnLCBsd2QgPSAyLCBsdHkgPSAyKQ0KYWJsaW5lKHYgPSBtZWRpYW4oZmVtYWxlJFdlaWdodCksIGNvbCA9ICdvcmFuZ2UnLCBsd2QgPSAyLCBsdHkgPSAyKQ0KbGVnZW5kKCd0b3ByaWdodCcsIGxlZ2VuZCA9IGMoJ01lYW4nLCAnTWVkaWFuJyksIGNvbCA9IGMoJ3B1cnBsZScsJ29yYW5nZScpLCBsdHkgPSAyKQ0KYGBgDQoNCiMgRGlzY3Vzc2lvbg0KDQpUaGUgbWFsZSB3ZWlnaHQgZGF0YSBzaG93cyBhIGJpbW9kYWwgZGlzdHJpYnV0aW9uLCB0aGlzIHdhcyBhY2hpZXZlZCBieSBzZWxlY3RpbmcgYSBzdWZmaWNpZW50IG51bWJlciBvZiBiaW5zIGluIHRoZSBoaXN0b2dyYW0uIElmIG9ubHkgNSBiaW5zIHdlcmUgZGV0ZXJtaW5lZCBpbiB0aGUgYnJlYWtzIGFyZ3VtZW50LCBpdCBzZWVtaW5nbHkgZm9sbG93cyBhIG5vcm1hbCBkaXN0cmlidXRpb24uIEFsdGhvdWdoLCBhcyBJIGluY3JlYXNlZCB0aGUgbnVtYmVyIG9mIGJyZWFrcywgYSBiaW1vZGFsIGRpc3RyaWJ1dGlvbiBiZWNhbWUgYXBwYXJlbnQuIEFub3RoZXIgaW5kaWNhdG9yIG9mIHRoZSBiaW1vZGFsIGRpc3RyaWJ1dGlvbiBpcyB0aGUgZmlyc3QgdHdvIG1vc3QgY29tbW9uIGJpbnMgKDcyLTc0IGFuZCA4NC04NikgYXJlIGVpdGhlciBzaWRlIG9mIGJvdGggdGhlIG1lZGlhbiAoNzcuMykgYW5kIG1lYW4gKDc4LjEpLiBUaGUgdG9wIG9mIHRoZSBub3JtYWwgZGlzdHJpYnV0aW9uIG92ZXJsYXkgbWF5IGJlIGNsb3NlIHRvIHRoZSBtZWFuIGFuZCB0aGUgbWVkaWFuLCBidXQgdGhlc2UgZ2l2ZW4gbW9kZXMgYXJlIG5vdC4gDQoNCkFzIGZvciB0aGUgZmVtYWxlIHdlaWdodCBkYXRhLCBhIG1vcmUgc28gbm9ybWFsIGRpc3RyaWJ1dGlvbiBoYXMgb2NjdXJyZWQuIEFsdGhvdWdoIGluIHRoaXMgaGlzdG9ncmFtLCB0aGUgbW9kZSBiaW4gKDU0LTU2KSBpcyB3ZWxsIGJlbG93IHRoZSBtZWRpYW4gKDU5LjApIGFuZCBtZWFuICg2MC42KSwgaW5kaWNhdGluZyBhIHBvc2l0aXZlbHkgc2tld2VkIG5vcm1hbCBkaXN0cmlidXRpb24uDQoNCkJvdGggY2F0ZWdvcmllcyBvZiBkYXRhLCBtYWxlIGFuZCBmZW1hbGUsIGFyZSBza2V3ZWQgcG9zaXRpdmVseSwgd2hpY2ggaXMgYWxzbyB2aXN1YWxseSBpbmRpY2F0ZWQgYnkgdGhlIGRvdHRlZCBwdXJwbGUgbGluZSAodGhlIG1lYW4pLCBiZWluZyB0byB0aGUgcmlnaHQgb2YgdGhlIG9yYW5nZSBsaW5lICh0aGUgbWVkaWFuKSBpbiBib3RoIGhpc3RvZ3JhbXMuIFRoZXNlIHNsaWdodCBwb3NpdGl2ZSBza2V3cyBjb3VsZCBiZSBleHBsYWluZWQgYnkgdGhlIG9jY3VycmVuY2Ugb2Ygb3V0bGllcnMgYWJvdmUgdGhlIG1lZGlhbiBpbiBib3RoIGhpc3RvZ3JhbXMsIGFuZCB0aGUgbGFjayBvZiBvdXRsaWVycyBiZWxvdyB0aGUgbWVkaWFuLg0KDQpPZiB0aGUgMjQ3IG9ic2VydmF0aW9ucyBmb3IgdGhlIHdlaWdodCBvZiBhIGdpdmVuIG1hbGUsIHRoZSBpbnRlcnF1YXJ0aWxlIHJhbmdlIHdhcyAxNC41IGFuZCBzdGFuZGFyZCBkZXZpYXRpb24gd2FzIDEwLjQxLiBXaGVyZWFzIGZvciB0aGUgMjYwIG9ic2VydmF0aW9ucyBmb3Igd2VpZ2h0IG9mIGZlbWFsZXMsIHRoZSBJUVIgd2FzIDExLjEgYW5kIHN0YW5kYXJkIGRldmlhdGlvbiB3YXMgOS42Mi4gU2luY2UgYm90aCB2YWx1ZXMgYXJlIGxvd2VyIHRoYW4gdGhlIHJlc3BlY3RpdmUgbWFsZSBJUVIgYW5kIFxzaWdtYSwgdGhpcyBpbmRpY2F0ZXMgdGhhdCB0aGUgbWFsZSB3ZWlnaHQgZGlzdHJpYnV0aW9uIGhhcyBhIGxhcmdlciBzcHJlYWQgZm9yIHRoZSBtYWpvcml0eSBvZiB0aGUgZGF0YSwgZGVzcGl0ZSBmZW1hbGVzIGhhdmluZyBhIHNsaWdodGx5IGxhcmdlciByYW5nZSAobWFsZXMgYmVpbmcgNjIuNSwgZmVtYWxlcyBiZWluZyA2My4yKSwgaW4gdHVybiBpbmRpY2F0aW5nIGEgZmxhdHRlciBub3JtYWwgZGlzdHJpYnV0aW9uIGZvciBmZW1hbGUgd2VpZ2h0cy4NCg0KSW4gY29uY2x1c2lvbiwgZm9yIHRoZSBtYWxlIHdlaWdodCBkYXRhLCBhIG5vcm1hbCBkaXN0cmlidXRpb24gZG9lcyBub3QgcXVpdGUgZm9sbG93IHdoZXJlYXMgYSBwb3NpdGl2ZWx5IHNrZXdlZCBiaW1vZGFsIGRpc3RyaWJ1dGlvbiBkb2VzLiBGb3IgZmVtYWxlIHdlaWdodCwgYSBwb3NpdGl2ZWx5IHNrZXdlZCBub3JtYWwgZGlzdHJpYnV0aW9uIGRvZXMgZm9sbG93IHRoZSBnaXZlbiBkYXRhLg0KDQoj