Student Details

Sunny Kumar Vaishnov (S3822295)

Problem Statement

Between the different skeletal measurements available, I choose “bitrochanteric diameter” because of the importance it holds in the skeletal measurement. The objective of this report is to analyze whether the bitrochanteric diameter of both male and female fits for normal distribution.We will investigate mean, median, standard deviation to understand the behaviour of data and then we will plot histogram to understand the data distribution, in last normal data will be compared with empricial distribution to produce an interpretation

Load Packages

library(magrittr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract

Data

Import the body measurements data and prepare it for analysis. Show your code.

#The bdims.csv file is imported into R, after that the Factor function is applied to change the numeric indication (0 and 1) of gender into character form. At the end, new two data frames are created to store the variables based on gender group
#Imported the file using read.csv function
bdims <- read.csv("bdims.csv")

#define the variable sex as a factor and define labels for it) using R functions, 1 if the respondent is male, 0 if female.

#Determnig the variable sex as factor:
bdims$sex <- bdims$sex %>% as.factor()
bdims$sex %>% class
## [1] "factor"
#Defining the lables for it:
bdims$sex <- factor(bdims$sex, levels = c(0, 1), labels = c("Female", "Male"))
bdims$sex %>% levels
## [1] "Female" "Male"
#Creating the data frames according to gender as analysis required seperately
#female data
bdims_female <- bdims %>% filter(sex =="Female")
View(bdims_female)

#male data
bdims_male <- bdims %>% filter(sex=="Male")
View(bdims_male)

Summary Statistics

Calculate descriptive statistics (i.e., mean, median, standard deviation, first and third quartile, interquartile range, minimum and maximum values) of the selected measurement grouped by sex.

#Quartile, Minimum, Maximum and Mean values of bitrochanteric diameter in Male
bdims_male$bit.di %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   27.50   31.40   32.40   32.53   33.80   38.00
#Quartile, Minimum, Maximum and Mean values of bitrochanteric diameter in Female
bdims_female$bit.di %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.70   30.00   31.50   31.46   32.90   37.80
#Standard deviation in bitrochanteric diameter in Male
sd(bdims_male$bit.di)
## [1] 1.865131
#Standard deviation in bitrochanteric diameter in female
sd(bdims_female$bit.di)
## [1] 2.049179
#Interquartile range of bitrochanteric diameter in Male
IQR(bdims_male$bit.di)
## [1] 2.4
#Interquartile range of bitrochanteric diameter in female
IQR(bdims_female$bit.di)
## [1] 2.9
#range of bitrochanteric diameter in Male
range(bdims_male$bit.di)
## [1] 27.5 38.0
#range of bitrochanteric diameter in Female
range(bdims_female$bit.di)
## [1] 24.7 37.8
#Variance in bitrochanteric diameter in Male
var(bdims_male$bit.di)
## [1] 3.478714
#Variance in bitrochanteric diameter in female
var(bdims_female$bit.di)
## [1] 4.199133

Distribution Fitting

Compare the empirical distribution of selected body measurement to a normal distribution separately in men and in women. You need to do this visually by plotting the histogram with normal distribution overlay. Show your code.

#mean in bitrochanteric diameter in male assigning to meanm variable
meanm <- mean(bdims_male$bit.di)
meanm
## [1] 32.52672
#mean in bitrochanteric diameter in female assigning to meanf variable
meanf <- mean(bdims_female$bit.di)
meanf
## [1] 31.46154
#Standard deviation in bitrochanteric diameter in male assigning to msd Variable
msd <- sd(bdims_female$bit.di)
msd
## [1] 2.049179
#Standard deviation in bitrochanteric diameter in female assigning to fsd variable
fsd <- sd(bdims_male$bit.di)
fsd
## [1] 1.865131
#median for female:
medianf <- median(bdims_female$bit.di)
medianf
## [1] 31.5
# median for bitro diam in male:
medianm <- median(bdims_male$bit.di)
medianm
## [1] 32.4
#histogram for bitrochanteric diameter in male:
bdims_male$bit.di %>% hist(col="yellow",breaks = 20,prob=TRUE,xlim = c(25,40),xlab ="bitrochanteric diameter in cm",ylab ="Density",main = "Distribution of bitrochanteric diameter in male")
curve(dnorm(x,mean=meanm,sd=msd),col="green",lwd=2,add =TRUE)
bdims_male$bit.di %>% mean() %>% abline(v=.,col='Blue',lw=2)
bdims_male$bit.di %>% median() %>% abline(v=.,col='red',lw=2)

#histogram for bitrochanteric diameter in female:
bdims_female$bit.di %>% hist(col="orange",breaks = 15,prob=TRUE,xlim = c(20,40),xlab ="bitrochanteric diameter in cm",ylab ="Density",main = "Distribution of bitrochanteric diameter in female")
curve(dnorm(x,mean=meanf,sd=fsd),col="green",lwd=2,add =TRUE)
bdims_male$bit.di %>% mean() %>% abline(v=.,col='Blue',lw=2)
bdims_male$bit.di %>% median() %>% abline(v=.,col='red',lw=2)

#Empirical distribution in male
plot(ecdf(bdims_male$bit.di), main="Empirical distribution of bitrochanteric diameter in male", xlab
     = "bitrochanteric diameter in cm", ylab = "Density", col="blue")

#Empirical distribution in female
plot(ecdf(bdims_female$bit.di), main="Empirical distribution of bitrochanteric diameter in female", xlab
          = "bitrochanteric diameter in cm", ylab = "Density", col="red")

Interpretation

For Females: From the above comparison, it is illustrated that the mean is higher as compare to the median. Therefore. The graph is positively skewed and it also doesn’t fit in the normal distribution.

For Males: When we compare males, we can observe that the mean is more than median.so, it doesn’t fit in the normal distribution. It is clearly evident from above that the histogram graph is positively skewed.