Jaimee Lee Lincoln (s3797303)
This assignment will focus on investigating whether the variable “che.di” from the Body Measurements Dataset fits a normal distribution. “Che.di” is comprised of the chest diameter (measured in centimetres at mid-expiration) of survey respondents. Information about whether or not the measurements fit a normal distribution will be ascertained using a histogram of the data with a normal distribution overlay.
The packages used in this assignment have been loaded below.
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The data used for this assignment has been imported and prepared below.
setwd("C:/Users/Jaimee-Lee/Documents/R/Working Directory/Intro to Statistics/Assignment One")
bdims <- read_excel("bdims.csv.xlsx")
View(bdims)
class(bdims$sex)
## [1] "numeric"
bdims$sex <- factor(bdims$sex,
levels = c(0,1),
labels = c("Female", "Male"))
levels(bdims$sex)
## [1] "Female" "Male"
chest_diameter <- bdims[,c(5,22:25)]
glimpse(chest_diameter)
## Observations: 507
## Variables: 5
## $ che.di <dbl> 28.0, 30.8, 31.7, 28.2, 29.4, 31.3, 31.7, 28.8, 27.5, 2...
## $ age <dbl> 21, 23, 28, 23, 22, 21, 26, 27, 23, 21, 23, 22, 20, 26,...
## $ wgt <dbl> 65.6, 71.8, 80.7, 72.6, 78.8, 74.8, 86.4, 78.4, 62.0, 8...
## $ hgt <dbl> 174.0, 175.3, 193.5, 186.5, 187.2, 181.5, 184.0, 184.5,...
## $ sex <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, M...
Descriptive statistics for Chest Diameter such as mean, median, standard deviation, first and third quartile, interquartile range, minimum values and maximum values have been calculated by sex below. The mode was also calculated using a custom function, so that the figure could be used to determine whether the mean, median and mode were equal and thus representative of a normal distribution.
#CREATING A MODE FUNCTION
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# MALE
chest_diameter_male <- filter(chest_diameter, sex == "Male")
chest_diameter_male %>% summarize(Mean = mean(che.di, na.rm = TRUE),
Median = median(che.di, na.rm = TRUE),
Mode = getmode(chest_diameter_male$che.di),
Sd = sd(che.di, na.rm = TRUE),
Q1 = quantile(che.di, probs = .25, na.rm = TRUE),
Q3 = quantile(che.di, probs = .75, na.rm = TRUE),
IQR = IQR(che.di,na.rm = TRUE),
Min = min(che.di,na.rm = TRUE),
Max = max(che.di,na.rm = TRUE)) %>% round(digits = 3)
#FEMALE
chest_diameter_female <- filter(chest_diameter, sex == "Female")
chest_diameter_female %>% summarize(Mean = mean(che.di, na.rm = TRUE),
Median = median(che.di, na.rm = TRUE),
Mode = getmode(chest_diameter_female$che.di),
Sd = sd(che.di, na.rm = TRUE),
Q1 = quantile(che.di, probs = .25, na.rm = TRUE),
Q3 = quantile(che.di, probs = .75, na.rm = TRUE),
IQR = IQR(che.di, na.rm = TRUE),
Min = min(che.di, na.rm =TRUE),
Max = max(che.di, na.rm = TRUE)) %>% round(digits = 3)
The empirical distribution of Chest Diameter has been plotted separately for men and women in histograms below. Additionally, an overlay of a normal distribution has been added to both plots, to enable comparison.
#MALE
hist(chest_diameter_male$che.di, freq=FALSE, xlab="Chest Diameter (cm at mid-expiration)", main="Histogram of Male Chest Diameter", col="lightgreen")
curve(dnorm(x, mean=mean(chest_diameter_male$che.di), sd=sd(chest_diameter_male$che.di)), add=TRUE, col="red", lwd=2)
#FEMALE
hist(chest_diameter_female$che.di, freq=FALSE, xlab="Chest Diameter (cm at mid-expiration)", main="Histogram of Female Chest Diameter", col="mediumpurple1")
curve(dnorm(x, mean=mean(chest_diameter_female$che.di), sd=sd(chest_diameter_female$che.di)), add=TRUE, col="red", lwd=2)
The preceding investigation on the male data for chest diameter reveals that the data does not fit a normal distribution. Primarily, this can be determined by looking at the summary statistics of the data. In this instance the mean (29.95), median (29.9) and mode (28) are not equal, consequently rendering the data non-characteristic of a normal distribution.
Additionally, when viewing the data as a histogram with a normal distribution overlay, it is easily discernible that the data does not display a symmetrical curve in the same way that a normal distribution tends to. In fact, the male data is so asymmetric that the curve rises again on the right side of the peak (at the 31cm data point), as opposed to a strictly downward expression. Therefore, due to the factors discussed, the male data for chest diameter can be described as not fitting a normal distribution.
The preceding investigation on the female data for chest diameter reveals that the data also does not fit a normal distribution. Primarily, this can be determined by looking at the summary statistics of the data. In this instance the mean (26.1), median (25.9) and mode (25.6) are also not equal, consequently rendering the data to be non-characteristic of a normal distribution.
Additionally, when viewing the data as a histogram with a normal distribution overlay, it is easily discernible that the data does not display a symmetrical curve in the same way that a normal distribution tends to. This assymmetry is present in the way that the data slopes more gradually on the right hand side of the mean, than it does on the left hand side, thus breaking the symmetry of the overall curve. Therefore, due to the factors discussed, the male data for chest diameter can be described as not fitting a normal distribution.