download.file("http://www.openintro.org/stat/data/bdims.RData", destfile = "bdims.RData")
load("bdims.RData")
head(bdims)
##   bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
## 1   42.9   26.0   31.5   17.7   28.0   13.1   10.4   18.8   14.1  106.2
## 2   43.7   28.5   33.5   16.9   30.8   14.0   11.8   20.6   15.1  110.5
## 3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1  115.1
## 4   44.3   29.9   34.0   18.4   28.2   13.9   11.2   20.9   15.0  104.5
## 5   42.5   29.9   34.0   21.5   29.4   15.2   11.6   20.7   14.9  107.5
## 6   43.3   27.0   31.5   19.6   31.3   14.0   11.5   18.8   13.9  119.8
##   che.gi wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi
## 1   89.5   71.5   74.5   93.5   51.5   32.5   26.0   34.5   36.5   23.5
## 2   97.0   79.0   86.5   94.8   51.5   34.4   28.0   36.5   37.5   24.5
## 3   97.5   83.2   82.9   95.0   57.3   33.4   28.8   37.0   37.3   21.9
## 4   97.0   77.8   78.8   94.0   53.0   31.0   26.2   37.0   34.8   23.0
## 5   97.5   80.0   82.5   98.5   55.4   32.0   28.4   37.7   38.6   24.4
## 6   99.9   82.5   80.1   95.3   57.5   33.0   28.0   36.6   36.1   23.5
##   wri.gi age  wgt   hgt sex
## 1   16.5  21 65.6 174.0   1
## 2   17.0  23 71.8 175.3   1
## 3   16.9  28 80.7 193.5   1
## 4   16.6  23 72.6 186.5   1
## 5   18.0  22 78.8 187.2   1
## 6   16.9  21 74.8 181.5   1
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)

Execise 1

Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions? Histogram of men’s heights

mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
hist(mdims$hgt, xlab="Men's height")

Histogram of womens’s heights

mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
hist(fdims$hgt, xlab="Women's height")

fhgtmean<-mean(fdims$hgt)
fhgtsd<-sd(fdims$hgt)
summary(mdims$hgt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   157.2   172.9   177.8   177.7   182.6   198.1
summary(fdims$hgt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   147.2   160.0   164.5   164.9   169.5   182.9
hist(fdims$hgt, ylim = c(0, 0.06),probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")

Exercise 2 Based on the this plot, does it appear that the data follow a nearly normal distribution?

Yes, it appear that the most data located mainly at “fhgtmean” and it are decreased gradually at the standard deviation.

qqnorm(fdims$hgt)
qqline(fdims$hgt)

sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)

Exercise 3 Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?

The real data does not exactly fall on line. The plot are close to the real data.

qqnormsim(fdims$hgt)

Execise 4 Does the normal probability plot for fdims\(hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal? Yes, the normal probability plot for fdims\)hgt look similar to the plots for simulated data, Exercise 5 Using the same technique, determine whether or not female weights appear to come from a normal distribution.

1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)
## [1] 0.004434387
sum(fdims$hgt > 182) / length(fdims$hgt)
## [1] 0.003846154

Exercise 6 Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?

What is the probability that a randomly chosen young adult female is less than 150 cm?

theoretical normal distribution:

pnorm(q = 150, mean = fhgtmean, sd = fhgtsd)
## [1] 0.01152955

empirical distribution:

sum(fdims$hgt < 150) / length(fdims$hgt)
## [1] 0.01153846

What is the probability that a randomly chosen age of female is younger than 25?

fagemean <-mean(fdims$age)
fagesd <-sd(fdims$age)
pnorm(q = 25, mean = fagemean, sd = fagesd)
## [1] 0.3351456

empirical distribution:

sum(fdims$age < 25) / length(fdims$age)
## [1] 0.4038462