Student Details

Ankit Munot - s3764950

Problem Statement

We need to understand the distribution for one of the body measurements separately in men and women and compare it to a normal distribution. We have used a data set named bdims.csv and the body measurement considered is wri.di (Respondent’s wrist diameter in cm, measured as sum of two wrists). The assignment involves importing the data set and calculating the summary statistics for men and women separately. Further,we study the distribution of wri.di in men and women by plotting a histogram with normal distribution overlay.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
bdims <- read.csv("G:/Semester -1/Introduction to Statistics/bdims.csv")



bdims$sex <- factor(bdims$sex, levels= c(0,1), labels = c("Female","Male"))
head(bdims)
##   bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
## 1   42.9   26.0   31.5   17.7   28.0   13.1   10.4   18.8   14.1  106.2
## 2   43.7   28.5   33.5   16.9   30.8   14.0   11.8   20.6   15.1  110.5
## 3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1  115.1
## 4   44.3   29.9   34.0   18.4   28.2   13.9   11.2   20.9   15.0  104.5
## 5   42.5   29.9   34.0   21.5   29.4   15.2   11.6   20.7   14.9  107.5
## 6   43.3   27.0   31.5   19.6   31.3   14.0   11.5   18.8   13.9  119.8
##   che.gi wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi
## 1   89.5   71.5   74.5   93.5   51.5   32.5   26.0   34.5   36.5   23.5
## 2   97.0   79.0   86.5   94.8   51.5   34.4   28.0   36.5   37.5   24.5
## 3   97.5   83.2   82.9   95.0   57.3   33.4   28.8   37.0   37.3   21.9
## 4   97.0   77.8   78.8   94.0   53.0   31.0   26.2   37.0   34.8   23.0
## 5   97.5   80.0   82.5   98.5   55.4   32.0   28.4   37.7   38.6   24.4
## 6   99.9   82.5   80.1   95.3   57.5   33.0   28.0   36.6   36.1   23.5
##   wri.gi age  wgt   hgt  sex
## 1   16.5  21 65.6 174.0 Male
## 2   17.0  23 71.8 175.3 Male
## 3   16.9  28 80.7 193.5 Male
## 4   16.6  23 72.6 186.5 Male
## 5   18.0  22 78.8 187.2 Male
## 6   16.9  21 74.8 181.5 Male
bdims %>% group_by(sex) %>% summarise(Mean = mean(wri.di, na.rm = TRUE),
                                      Median = median(wri.di, na.rm = TRUE),
                                      SD = sd(wri.di, na.rm = TRUE),
                                      Q1 = quantile(wri.di,probs = .25,na.rm = TRUE),
                                      Q3 = quantile(wri.di,probs = .75,na.rm = TRUE),
                                      IQR= IQR(wri.di, na.rm = TRUE),
                                      Min = min(wri.di,na.rm = TRUE),
                                      Max = max(wri.di,na.rm = TRUE),
                                      Missing = sum(is.na(wri.di)))
## # A tibble: 2 x 10
##   sex     Mean Median    SD    Q1    Q3   IQR   Min   Max Missing
##   <fct>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <int>
## 1 Female  9.87    9.8 0.662   9.4  10.4 1       8.1  12.2       0
## 2 Male   11.2    11.2 0.636  10.8  11.6 0.850   9.8  13.3       0
Male_respond<- bdims %>% filter(sex=="Male")
Female_respond <- bdims %>% filter(sex=="Female")

table(Female_respond$wri.di)
## 
##  8.1  8.3  8.4  8.5  8.6  8.7  8.9    9  9.1  9.2  9.3  9.4  9.5  9.6  9.7 
##    1    1    2    1    3    3    7    6    8   20    4   20    7   22    5 
##  9.8  9.9   10 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9   11 11.2 11.5 
##   23    9   20    4   18    5   19   12   10    4    9    3    9    1    3 
## 12.2 
##    1
table(Male_respond$wri.di)
## 
##  9.8  9.9   10 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9   11 11.1 11.2 
##    1    1    3    3    5    4   13    6   11    8   16    4   22   10   22 
## 11.3 11.4 11.5 11.6 11.7 11.8 11.9   12 12.2 12.3 12.4 12.5 12.6 12.8 12.9 
##   14    9   19   14   10   14    3   18    3    2    3    2    1    3    1 
## 13.2 13.3 
##    1    1

In the above chunk, I have calculated the mean, median, standard deviation, first and third quartile, interquartile range, minimum and maximum values of the wrist dimensions and grouped it by sex. -na.rm function removes the null values from data. - I have captured the male respondents in Male_respond and the female respondents in Female_respond. - I used the variable wri.di because it had high frequency for various values of wrist diameter. - Using the table function , I looked at the most frequent value or mode for males and females, we see it to be 9.8 for females and 11.2 for males.

mean_male = mean(Male_respond$wri.di) 
sd_male = sd(Male_respond$wri.di) 
hist(Male_respond$wri.di, breaks = 30, prob = TRUE,xlab = "Wrist diameter of male population in cm, measured as sum of two wrists", main= "Normal Curve over Histogram for Male Respondants",col="lightblue")   
Male <- seq(min(Male_respond$wri.di),max(Male_respond$wri.di),0.1) 
Male_Normal_Distribution <- dnorm(Male, mean_male, sd_male)  
points(Male,Male_Normal_Distribution, type = 'l',col="blue",lwd = 3)

mean_female = mean(Female_respond$wri.di) 
sd_female = sd(Female_respond$wri.di) 
hist(Female_respond$wri.di, breaks = 30, prob = TRUE,xlab = "Wrist diameter of female population in cm, measured as sum of two wrists", main= "Normal Curve over Histogram for Female Respondants",col="lightgreen")   
Female <- seq(min(Female_respond$wri.di),max(Female_respond$wri.di),0.1) 
Female_Normal_Distribution <- dnorm(Female, mean_female, sd_female)  
points(Female,Female_Normal_Distribution, type = 'l',col="hotpink4",lwd = 3)

Interpretation

I plotted a normal curve overlaying the histogram of the empirical data for both males and females. Normal distributions are symmetric, unimodal, and asymptotic, and the mean, median, and mode are all equal. In the emperical data we see that the mean, median and mode are all approximately equal, for males it is 9.8 and for males it is 11.2 which justifies the normal nature of data. The curve is also symmetrical and bell shaped which further justifies the curve to be normally distributed. In a normal distribution, approximately 99.7% of the data fall within three standard deviations of the mean. This is evident from both the male and female plots, we can refer the right side of the plot to have a clear view. As a conclusion we can say that the values for wrist diameter lie under a normal distribution for both males and females but a graphical representation like histogram offers no guarantee.