Student Details

Seungyeon Lee (s3366440)

Problem Statement

It is observed that males and females tend to have different body dimensions. The investigation aims to determine whether selected variable of both female and male fits normal distribtuion. The variable of interest is respondent’s hip girth in centimeters (hip.gi) that is measured at level of bitrochanteric diameter. The investigation involves with importing dataset into R, assigning factors, filtering and sub-setting the data. Followed by calculating descriptive statistics such as mean, median, standard deviation, first and third quartile,interquartile range, minimum and maximum values. Lastly, plotting the histogram with statistic information of female and male for comparison. The further investigation is done by adding normal distribution curve.

Load Packages

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(magrittr)
library(ggplot2)

Data

The working directory is set.Then the dataset“bdims.csv” is imported. The categorical variable “sex” is comprised of levels in numeric.So, it has been assigned to labels “Female” and “Male” (converting numeric values to factors). The filter and select function have been used to filter and sub-setting the data for required outcome.

bdims<-read.csv("bdims.csv.csv")
bdims$sex <- bdims$sex %>% factor(levels = c(0,1),labels = c("Female","Male"))
Femalehip <- bdims %>% filter(sex=="Female") %>% select(hip.gi)
Malehip <- bdims %>% filter(sex=="Male") %>% select(hip.gi)

Summary Statistics

#Summary of Female statistics.

summary(Femalehip)
##      hip.gi      
##  Min.   : 78.80  
##  1st Qu.: 90.75  
##  Median : 94.95  
##  Mean   : 95.65  
##  3rd Qu.: 99.50  
##  Max.   :128.30
#Summary of Male statistics.

summary(Malehip)
##      hip.gi      
##  Min.   : 81.50  
##  1st Qu.: 93.25  
##  Median : 97.40  
##  Mean   : 97.76  
##  3rd Qu.:101.55  
##  Max.   :118.70
#Summary statistics grouped by sex.
bdims %>% group_by(sex) %>% summarise(Mean=mean(hip.gi,na.rm = TRUE)%>% round(3),
                                     Median=median(hip.gi,na.rm = TRUE)%>% round(3),
                                     SD=sd(hip.gi,na.rm=TRUE)%>% round(3),
                                     Q1=quantile(hip.gi,probs=.25,na.rm = TRUE),
                                     Q3=quantile(hip.gi,probs=.75,na.rm = TRUE),
                                     IQR=IQR(hip.gi,na.rm=TRUE),
                                     Min=min(hip.gi,na.rm = TRUE),
                                     Max=max(hip.gi,na.rm = TRUE))

Distribution Fitting

Compare the empirical distribution of selected body measurement to a normal distribution separately in men and in women. You need to do this visually by plotting the histogram with normal distribution overlay. Show your code.

#Histogram of female with normal distribution curve. 

hist(Femalehip$hip.gi,xlab="Hip girth(cm)",main ="Histogram of Female Hip Girth With Normal Distribution",col="cadetblue3",prob=TRUE, ylim = c(0,0.08), ylab = 'Density',breaks=20,density=50, font.main=2,cex.main=1)
curve(dnorm(x,mean=mean(Femalehip$hip.gi),sd=sd(Femalehip$hip.gi)),add=TRUE,col="red",lwd=2)

#Histogram of Male with normal distribution curve.

hist(Malehip$hip.gi,xlab="Hip girth(cm)",main = "Histogram of Male Hip Girth With Normal Distribution",col="aquamarine4",prob=TRUE, ylim = c(0,0.08), ylab = 'Density',breaks = 20, density = 50,font.main=2,cex.main=1 )
curve(dnorm(x,mean=mean(Malehip$hip.gi),sd=sd(Malehip$hip.gi)),add=TRUE,col="red",lwd=2)

Interpretation

According to the histogram above, it has been shown that histogram of female is right skewed (positive) meanwhile it is symmetrically distributed in male.

Therefore, it can be concluded that male’s distribution fit a normal distribution where else female’s distribution does not fit a normal distribution. The mean of female hip girth is greater than the median of it, which indicates that the female hip girth dataset has a extremely low bound. The dataset has a larger range than the normal distributed ones and it has a high density on the left side. The female hip girth has a higher probility to fall between 90 to 98 cm, while also has a chance to be over 130 cm. On the contrary, the male hip girth is average and mostly fall near the median and mean.

In the reality, empirical data is not perfectly symmetrical and not normally distributed as shown in both male and female hip grith histogram. However, the male modelling distribution is approximately normal. By assuming it as normal distribution, the modelling of body measurement can be easier to apply. Upon making empirical data normally distributed, it is easier to predict data distribution and retrieve required data information.