download.file("http://www.openintro.org/stat/data/bdims.RData", destfile = "bdims.RData")
load("bdims.RData")
head(bdims)
## bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi che.gi
## 1 42.9 26.0 31.5 17.7 28.0 13.1 10.4 18.8 14.1 106.2 89.5
## 2 43.7 28.5 33.5 16.9 30.8 14.0 11.8 20.6 15.1 110.5 97.0
## 3 40.1 28.2 33.3 20.9 31.7 13.9 10.9 19.7 14.1 115.1 97.5
## 4 44.3 29.9 34.0 18.4 28.2 13.9 11.2 20.9 15.0 104.5 97.0
## 5 42.5 29.9 34.0 21.5 29.4 15.2 11.6 20.7 14.9 107.5 97.5
## 6 43.3 27.0 31.5 19.6 31.3 14.0 11.5 18.8 13.9 119.8 99.9
## wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi wri.gi age
## 1 71.5 74.5 93.5 51.5 32.5 26.0 34.5 36.5 23.5 16.5 21
## 2 79.0 86.5 94.8 51.5 34.4 28.0 36.5 37.5 24.5 17.0 23
## 3 83.2 82.9 95.0 57.3 33.4 28.8 37.0 37.3 21.9 16.9 28
## 4 77.8 78.8 94.0 53.0 31.0 26.2 37.0 34.8 23.0 16.6 23
## 5 80.0 82.5 98.5 55.4 32.0 28.4 37.7 38.6 24.4 18.0 22
## 6 82.5 80.1 95.3 57.5 33.0 28.0 36.6 36.1 23.5 16.9 21
## wgt hgt sex
## 1 65.6 174.0 1
## 2 71.8 175.3 1
## 3 80.7 193.5 1
## 4 72.6 186.5 1
## 5 78.8 187.2 1
## 6 74.8 181.5 1
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?
The two graphs are very similar and seem to follow the normal curve but females are generally shorter.
hist(mdims$hgt,main="Males",xlab="Height")
hist(fdims$hgt,main="Females",xlab="Height")
fhgtmean <- mean(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")
Based on the this plot, does it appear that the data follow a nearly normal distribution?
It appears that the data does follow a nearly normal distribution.
qqnorm(fdims$hgt)
qqline(fdims$hgt)
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?
qqnorm(sim_norm)
qqline(sim_norm)
Most of the points fall on the line, it is very similar to the probability plot for the real data.
qqnormsim(fdims$hgt)
Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?
Yes,the probability plots for the data as well as the sim are similar and therefore can be said that the womens heigts are normal.
Using the same technique, determine whether or not female weights appear to come from a normal distribution.
qqnormsim(fdims$wgt)
The weights do seem to come from a normal distribution.
1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)
## [1] 0.004434387
sum(fdims$hgt > 182) / length(fdims$hgt)
## [1] 0.003846154
Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?
What percent of women are under 5 feet 3 inches?
pnorm(q = 160.02, mean = fhgtmean, sd = fhgtsd)
## [1] 0.229219
sum(fdims$hgt < 160.02) / length(fdims$hgt)
## [1] 0.2692308
What percent of women are under 100 lbs?
pnorm(q = 45.3592,mean(fdims$wgt), sd(fdims$wgt))
## [1] 0.0564796
sum(fdims$wgt < 45.3592) / length(fdims$wgt)
## [1] 0.01538462
The height variable had closer agreement between the two methods.
The histogram for female biiliac (pelvic) diameter (bii.di) belongs to normal probability plot letter B.
The histogram for female elbow diameter (elb.di) belongs to normal probability plot letter C.
The histogram for general age (age) belongs to normal probability plot letter D.
The histogram for female chest depth (che.de) belongs to normal probability plot letter A.
Note that normal probability plots C and D have a slight stepwise pattern. Why do you think this is the case?
This may be the case due to the nature of the variables, stepwise patterns appear more with discrete variables.
As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for female knee diameter (kne.di). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.
qqnorm(fdims$kne.di)
qqline(fdims$kne.di)
Based on the probability plot the variable is most likely right skewed.It is confirmed by the histogram.
hist(fdims$kne.di)