Problem statement

In the Dataset (bdims.csv) the Body girth measurements and skeletal diameter measurements with age, weight, height, and gender, are given for 507 physically active individuals - 247 men and 260 women is been provided.

In the data the “bia.di”-: Respondent’s biacromial diameter in centimeters which is the width of the shoulder is considered to calculate normal distribution.

Install packages

library(readr)
library(readxl)
library(tidyr)
library(dplyr)
library(ggformula)
library(mosaic)
library(car)
library(Hmisc)
library(outliers)

##Import Data

a<-read_excel("bdims.csv (1).xlsx")
a
## # A tibble: 507 x 25
##    bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
##     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1   42.9   26     31.5   17.7   28     13.1   10.4   18.8   14.1   106.
##  2   43.7   28.5   33.5   16.9   30.8   14     11.8   20.6   15.1   110.
##  3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1   115.
##  4   44.3   29.9   34     18.4   28.2   13.9   11.2   20.9   15     104.
##  5   42.5   29.9   34     21.5   29.4   15.2   11.6   20.7   14.9   108.
##  6   43.3   27     31.5   19.6   31.3   14     11.5   18.8   13.9   120.
##  7   43.5   30     34     21.9   31.7   16.1   12.5   20.8   15.6   124.
##  8   44.4   29.8   33.2   21.8   28.8   15.1   11.9   21     14.6   120.
##  9   43.5   26.5   32.1   15.5   27.5   14.1   11.2   18.9   13.2   111 
## 10   42     28     34     22.5   28     15.6   12     21.1   15     120.
## # ... with 497 more rows, and 15 more variables: che.gi <dbl>,
## #   wai.gi <dbl>, nav.gi <dbl>, hip.gi <dbl>, thi.gi <dbl>, bic.gi <dbl>,
## #   for.gi <dbl>, kne.gi <dbl>, cal.gi <dbl>, ank.gi <dbl>, wri.gi <dbl>,
## #   age <dbl>, wgt <dbl>, hgt <dbl>, sex <dbl>

##Create Subset created a subset by selecting one measurement(“bia.di”-: Respondent’s biacromial diameter in centimeters) and Sex

ab<- a[,c(1,25)]
ab
## # A tibble: 507 x 2
##    bia.di   sex
##     <dbl> <dbl>
##  1   42.9     1
##  2   43.7     1
##  3   40.1     1
##  4   44.3     1
##  5   42.5     1
##  6   43.3     1
##  7   43.5     1
##  8   44.4     1
##  9   43.5     1
## 10   42       1
## # ... with 497 more rows

Tidying data by labeling Male and Female.

ab$sex <- factor(ab$sex, levels = c(1,0), labels = c("Male","Female"))
ab
## # A tibble: 507 x 2
##    bia.di sex  
##     <dbl> <fct>
##  1   42.9 Male 
##  2   43.7 Male 
##  3   40.1 Male 
##  4   44.3 Male 
##  5   42.5 Male 
##  6   43.3 Male 
##  7   43.5 Male 
##  8   44.4 Male 
##  9   43.5 Male 
## 10   42   Male 
## # ... with 497 more rows

Created a subset male along with bia.di

m <- subset(ab,subset=sex =="Male") 
m
## # A tibble: 247 x 2
##    bia.di sex  
##     <dbl> <fct>
##  1   42.9 Male 
##  2   43.7 Male 
##  3   40.1 Male 
##  4   44.3 Male 
##  5   42.5 Male 
##  6   43.3 Male 
##  7   43.5 Male 
##  8   44.4 Male 
##  9   43.5 Male 
## 10   42   Male 
## # ... with 237 more rows

checking if measurement fits a normal distribution

male<-qqPlot(m$bia.di, dist="norm")

male
## [1] 78 13

Created a subset female along with bia.di

f <- subset(ab,subset=sex =="Female")
f
## # A tibble: 260 x 2
##    bia.di sex   
##     <dbl> <fct> 
##  1   37.6 Female
##  2   36.7 Female
##  3   34.8 Female
##  4   36.6 Female
##  5   35.5 Female
##  6   37   Female
##  7   35.5 Female
##  8   37.4 Female
##  9   37.8 Female
## 10   38.6 Female
## # ... with 250 more rows

checking if measurement fits a normal distribution

female<-qqPlot(f$bia.di, dist="norm")

female
## [1]  24 233

statistics summary for bia.di

favstats(~bia.di | sex,data =ab)
##      sex  min     Q1 median   Q3  max     mean       sd   n missing
## 1   Male 34.1 40.000   41.2 42.6 47.4 41.24130 2.087164 247       0
## 2 Female 32.4 35.175   36.4 37.8 42.6 36.50308 1.779221 260       0

##Empirical Distribution of Body Measurement #Distribution Calculation for male

M=mean(m$bia.di)
S=sd(m$bia.di)

ploting of histogram

hist(m$bia.di, breaks=20, prob=TRUE, 
     xlab="x-variable", ylim=c(0,0.4), col = "light blue",
     main="Normal Curve Histogram")

x<- seq(min(m$bia.di),max(m$bia.di),0.3)

y<- dnorm(x,M,S)

points(x,y,type = 'l',col=" orange",lwd=3)

#Distribution Calculation for female

Me=mean(f$bia.di)
Sd=sd(f$bia.di)

ploting of histogram

hist(f$bia.di, breaks=20, prob=TRUE, 
     xlab="x-variable", ylim=c(0,0.4),col = "orange", 
     main="Normal Curve Histogram")

xx<- seq(min(f$bia.di),max(f$bia.di),0.3)

yy<- dnorm(xx,Me,Sd)

points(xx,yy,type = 'l',col="brown",lwd=3  )

Interpretation

By the above ploted histogram we can conclude that the variable “bia.di” which is the shoulder width of the human body fits the emperical data The curve also fits the data by which we can understand that it fits the normal distribution

While we compare the mean values of male and female we can see that the male has a higher mean of 41.24130 than female having 36.50308 . Next the standard deviation of male is 2.087164 amd for female it is 1.779221’. Also the min value of male is 34.1 and max value is 47.4, Where as for the female the min value is 32.4 and the max value is 42.6.

By considering the above paramaters we can conclude that the male have a wider shoulder than compared to the female

————————————————————————-END——————————————————————————-