Modeling Body Measurments

Problem Statement

In general, there is a difference in male and female hip size, in this problem we will try to analyze the hip size of both the genders using available data. Out of 507 observations and 25 variables, we are choosing hip grith variable i.e hip.gi in both males and females. We will investigate mean, median, standard deviation to understand the behaviour of data and then we will plot histogram to understand the data distribution, in last normal data will be compared with empricial distribution to produce an interpretation.

Load Packages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(rvest)

## Loading required package: xml2

library(readr)

## 
## Attaching package: 'readr'

## The following object is masked from 'package:rvest':
## 
##     guess_encoding

library(foreign)
library(knitr)

Data

The data has been imported using read.csv function and the variable assigned to data is bmdata508. We are taking two columns in #1, one is columne 14 with hip.gi variable and other is 25th with sex of the participants. After this we are factorising the gender to separate female and male hip girth for the sake of separate comparison between both of them.

bmdata508 <- read.csv("C:/data/bdims.csv", stringsAsFactors = FALSE)

hipdim <- bmdata508[,c(14,25)] #1
hipdim

gender <- bmdata508$sex
gender <- factor(c("1","0"),levels = c("1","0"),ordered = TRUE)

female<-hipdim[248:507, ]
head(female)

male<-hipdim[0:247, ]
head(male)

Summary Statistics

This section contains the code to give the summary of the male and female which covers 1st Qu, Median, Mean, 3rd Qu and Max. IQR will give IQR details and sd will give standard deviation details. Mean tells us about the average of all the data and with the help of mean and standard devaiation of data which tells us about the devaiation of data we will plot a graph.

summary(male) #mean,median,1st and 3rd, min, max

##      hip.gi            sex   
##  Min.   : 81.50   Min.   :1  
##  1st Qu.: 93.25   1st Qu.:1  
##  Median : 97.40   Median :1  
##  Mean   : 97.76   Mean   :1  
##  3rd Qu.:101.55   3rd Qu.:1  
##  Max.   :118.70   Max.   :1

summary(female)

##      hip.gi            sex   
##  Min.   : 78.80   Min.   :0  
##  1st Qu.: 90.75   1st Qu.:0  
##  Median : 94.95   Median :0  
##  Mean   : 95.65   Mean   :0  
##  3rd Qu.: 99.50   3rd Qu.:0  
##  Max.   :128.30   Max.   :0

IQR(male$hip.gi) #IQR

## [1] 8.3

IQR(female$hip.gi)

## [1] 8.75

sd(male$hip.gi) #standar deviation

## [1] 6.228043

sd(female$hip.gi)

## [1] 6.940728

Distribution Fitting

In #Normal, we are plotting the histogram of both the male and female hip size separtely. In #1, when we try to plot empricial overlay with normal histogram but it didn’t worked as the frequency is much higher and density value is too small so to solve this problem the histogram is converted to a curve and then the normal and empirical curve has been overlayed separtely in both genders for comparison

#Normal 

hist(male$hip.gi,col="blue",xlim = c(75,125),main="Histogram")

hist(female$hip.gi,col="blue",xlim = c(65,135),main="Histogram")

density(male$hip.gi)

## 
## Call:
##  density.default(x = male$hip.gi)
## 
## Data: male$hip.gi (247 obs.);    Bandwidth 'bw' = 1.852
## 
##        x                y            
##  Min.   : 75.94   Min.   :9.850e-06  
##  1st Qu.: 88.02   1st Qu.:1.727e-03  
##  Median :100.10   Median :9.701e-03  
##  Mean   :100.10   Mean   :2.068e-02  
##  3rd Qu.:112.18   3rd Qu.:4.007e-02  
##  Max.   :124.26   Max.   :6.302e-02

#1

meanm<-mean(male$hip.gi)
meanm

## [1] 97.76316

sdm<-sqrt(var(male$hip.gi))
sdm

## [1] 6.228043

histm<-hist(male$hip.gi,col="grey",xlim = c(25,45),main = "Histogram plot of male hip")

#making emperical distribution for hip.gi in males.
malehipemp <-ecdf(male$hip.gi)
malehipemp

## Empirical CDF 
## Call: ecdf(male$hip.gi)
##  x[1:139] =   81.5,   84.1,   85.5,  ...,  116.5,  118.7

plot(malehipemp,main="Emperical distribution of Male hip data")
curve(dnorm(x, mean=meanm, sd=sdm), 
      col="green", lwd=2, add=TRUE, yaxt="n")

normcurve<-curve(expr = dnorm(x,meanm,sdm),ylim=c(0,0.13), 
      xlim = c(meanm-sdm*4,meanm+sdm*4), 
      main = paste("Male hip size comparison, Mean = ",meanm,", Sigma = ",sdm),
      ylab = "Density")
lines(density(male$hip.gi,adjust = 2),col="red")

meanfemale<-mean(female$hip.gi)
meanfemale

## [1] 95.65269

sdfemale<-sqrt(var(female$hip.gi))
sdfemale

## [1] 6.940728

histfemale<-hist(female$hip.gi,col="red",xlim = c(25,45),main = "Histogram plot of female hip girth")

#making emperical distribution for hip.gi in females.
femaleemp<-ecdf(female$hip.gi)
femaleemp

## Empirical CDF 
## Call: ecdf(female$hip.gi)
##  x[1:151] =   78.8,   80.7,   80.9,  ...,    114,  128.3

plot(femaleemp,main="Emperical distribution of female hip.gi data")
curve(dnorm(x, mean=meanfemale, sd=sdfemale), 
      col="green", lwd=2, add=TRUE, yaxt="n")

normcurve<-curve(expr = dnorm(x,meanfemale,sdfemale),ylim=c(0,0.15), 
      xlim = c(meanfemale-sdfemale*4,meanfemale+sdfemale*4), 
      main = paste("FeMale hip size comparison, Mean = ",meanfemale,", Sigma = ",sdfemale),
      ylab = "Density")
lines(density(female$hip.gi,adjust = 2),col="red")

Interpretation

Males

From the comparison we can interpret that there is a slight difference between male theoretical and emperical comparison, but most of the males fall in the range of mu that is 34.13% of the Normal distribution and graph is not skewed left or right.

Females

From the comparison we can interpret that there is a slight difference between female theoretical and emperical comparison, as comparison to males the female data is more nearer to normal distribution which indicates that females hip size is more closer to mean and most of the females fall in the range of mu that is 34.13% of the Normal distribution and graph is not skewed left or right.

General

Females are more nearer to the mean data and there data is more normally distributed as compare to males but the differece between both genders data in terms of comparison between empirical and theoretical is not that significant.