load("more/bdims.RData")
library("ggplot2")
summary(bdims)
##      bia.di          bii.di          bit.di          che.de     
##  Min.   :32.40   Min.   :18.70   Min.   :24.70   Min.   :14.30  
##  1st Qu.:36.20   1st Qu.:26.50   1st Qu.:30.60   1st Qu.:17.30  
##  Median :38.70   Median :28.00   Median :32.00   Median :19.00  
##  Mean   :38.81   Mean   :27.83   Mean   :31.98   Mean   :19.23  
##  3rd Qu.:41.15   3rd Qu.:29.25   3rd Qu.:33.35   3rd Qu.:20.90  
##  Max.   :47.40   Max.   :34.70   Max.   :38.00   Max.   :27.50  
##      che.di          elb.di          wri.di          kne.di     
##  Min.   :22.20   Min.   : 9.90   Min.   : 8.10   Min.   :15.70  
##  1st Qu.:25.65   1st Qu.:12.40   1st Qu.: 9.80   1st Qu.:17.90  
##  Median :27.80   Median :13.30   Median :10.50   Median :18.70  
##  Mean   :27.97   Mean   :13.39   Mean   :10.54   Mean   :18.81  
##  3rd Qu.:29.95   3rd Qu.:14.40   3rd Qu.:11.20   3rd Qu.:19.60  
##  Max.   :35.60   Max.   :16.70   Max.   :13.30   Max.   :24.30  
##      ank.di          sho.gi           che.gi           wai.gi      
##  Min.   : 9.90   Min.   : 85.90   Min.   : 72.60   Min.   : 57.90  
##  1st Qu.:13.00   1st Qu.: 99.45   1st Qu.: 85.30   1st Qu.: 68.00  
##  Median :13.80   Median :108.20   Median : 91.60   Median : 75.80  
##  Mean   :13.86   Mean   :108.20   Mean   : 93.33   Mean   : 76.98  
##  3rd Qu.:14.80   3rd Qu.:116.55   3rd Qu.:101.15   3rd Qu.: 84.50  
##  Max.   :17.20   Max.   :134.80   Max.   :118.70   Max.   :113.20  
##      nav.gi           hip.gi           thi.gi          bic.gi     
##  Min.   : 64.00   Min.   : 78.80   Min.   :46.30   Min.   :22.40  
##  1st Qu.: 78.85   1st Qu.: 92.00   1st Qu.:53.70   1st Qu.:27.60  
##  Median : 84.60   Median : 96.00   Median :56.30   Median :31.00  
##  Mean   : 85.65   Mean   : 96.68   Mean   :56.86   Mean   :31.17  
##  3rd Qu.: 91.60   3rd Qu.:101.00   3rd Qu.:59.50   3rd Qu.:34.45  
##  Max.   :121.10   Max.   :128.30   Max.   :75.70   Max.   :42.40  
##      for.gi          kne.gi          cal.gi          ank.gi     
##  Min.   :19.60   Min.   :29.00   Min.   :28.40   Min.   :16.40  
##  1st Qu.:23.60   1st Qu.:34.40   1st Qu.:34.10   1st Qu.:21.00  
##  Median :25.80   Median :36.00   Median :36.00   Median :22.00  
##  Mean   :25.94   Mean   :36.20   Mean   :36.08   Mean   :22.16  
##  3rd Qu.:28.40   3rd Qu.:37.95   3rd Qu.:38.00   3rd Qu.:23.30  
##  Max.   :32.50   Max.   :49.00   Max.   :47.70   Max.   :29.30  
##      wri.gi          age             wgt              hgt        sex    
##  Min.   :13.0   Min.   :18.00   Min.   : 42.00   Min.   :147.2   0:260  
##  1st Qu.:15.0   1st Qu.:23.00   1st Qu.: 58.40   1st Qu.:163.8   1:247  
##  Median :16.1   Median :27.00   Median : 68.20   Median :170.3          
##  Mean   :16.1   Mean   :30.18   Mean   : 69.15   Mean   :171.1          
##  3rd Qu.:17.1   3rd Qu.:36.00   3rd Qu.: 78.85   3rd Qu.:177.8          
##  Max.   :19.6   Max.   :67.00   Max.   :116.40   Max.   :198.1

Q1. Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?

qplot(bdims$hgt ,data=bdims,facets = .~bdims$sex, binwidth=0.5) 

# 264 male in '0' histigram; 247 famale in '1' histogram. Both histograms have belt curve shape, so that their both are normal distribution.
#Male's distribution center at around 178, femal's distrition center at around 160.
  1. Based on the this plot, does it appear that the data follow a nearly normal distribution?
#Yes. However, the male's heigh distribution is more close to the normal distribution than female's. 
  1. Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?
fdims <- subset(bdims, sex == 0)
fhgtmean <- mean(fdims$hgt)
fhgtsd   <- sd(fdims$hgt)
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnormsim(sim_norm)

#No,it is not all the data fall on the line, there are very a few outliers at two tails. 
#'sim_norm' is standar normal with 0 mean and 1 standar deviation, yet the real data is nearly normal with different mean and standar deviation. 
  1. Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?
qqnorm(sim_norm)
qqline(sim_norm)

#Yes,'sim_norm' follows the same mean and standar deviation as the 'fdims$hgt'.
#Yes.
  1. Using the same technique, determine whether or not female weights appear to come from a normal distribution.
fwgtmean <- mean(fdims$wgt)
fwgtsd   <- sd(fdims$wgt)
hist(fdims$wgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fwgtmean, sd = fwgtsd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$wgt)
qqline(fdims$wgt)

#The histogram show the femal weight is right skew, and there are many outliers on both tails. Therefore, femal weight is the not normal. 
sum(fdims$hgt > 182) / length(fdims$hgt)
## [1] 0.003846154
  1. Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?
#What is the probability of female heights <= 172?
pnorm(q = 172, mean = fhgtmean, sd = fhgtsd)
## [1] 0.861944
sum(fdims$hgt <172) / length(fdims$hgt)
## [1] 0.8384615
#What is the probability of female weight >60?
1-pnorm(q = 60, mean = fwgtmean, sd = fwgtsd)
## [1] 0.524893
sum(fdims$wgt >60) / length(fdims$wgt)
## [1] 0.4384615
#By comparing the theoretical normal distribution as the empirical disrical distribution in height <=172 and weight >60, height has a loser agreenment between the two methods.

On Your Own

qqnorm(fdims$bii.di)
qqline(fdims$bii.di)

**b.** The histogram for female elbow diameter (`elb.di`) belongs to normal 
probability plot letter __C__.
qqnorm(fdims$elb.di)
qqline(fdims$elb.di)

c. The histogram for general age (age) belongs to normal probability plot letter D.

qqnorm(fdims$age)
qqline(fdims$age)

**d.** The histogram for female chest depth (`che.de`) belongs to normal 
probability plot letter _A___.
qqnorm(fdims$che.de)
qqline(fdims$che.de)

#C is  right skew, and D is nearly normal distribution. 
fkne_dimean <- mean(fdims$kne.di)
fkne_disd <- sd(fdims$kne.di)
hist(fdims$kne.di, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fkne_dimean, sd = fkne_disd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$kne.di)
qqline(fdims$kne.di)

#The distribution of fdims$kne.di is a bit right skew, it is nearly normal, so that there are some outliers on both tails.
histQQmatch

histQQmatch

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel from a lab written by Mark Hansen of UCLA Statistics.