library(knitr)
load(url("http://www.openintro.org/stat/data/bdims.RData"))
head(bdims)
## bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
## 1 42.9 26.0 31.5 17.7 28.0 13.1 10.4 18.8 14.1 106.2
## 2 43.7 28.5 33.5 16.9 30.8 14.0 11.8 20.6 15.1 110.5
## 3 40.1 28.2 33.3 20.9 31.7 13.9 10.9 19.7 14.1 115.1
## 4 44.3 29.9 34.0 18.4 28.2 13.9 11.2 20.9 15.0 104.5
## 5 42.5 29.9 34.0 21.5 29.4 15.2 11.6 20.7 14.9 107.5
## 6 43.3 27.0 31.5 19.6 31.3 14.0 11.5 18.8 13.9 119.8
## che.gi wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi
## 1 89.5 71.5 74.5 93.5 51.5 32.5 26.0 34.5 36.5 23.5
## 2 97.0 79.0 86.5 94.8 51.5 34.4 28.0 36.5 37.5 24.5
## 3 97.5 83.2 82.9 95.0 57.3 33.4 28.8 37.0 37.3 21.9
## 4 97.0 77.8 78.8 94.0 53.0 31.0 26.2 37.0 34.8 23.0
## 5 97.5 80.0 82.5 98.5 55.4 32.0 28.4 37.7 38.6 24.4
## 6 99.9 82.5 80.1 95.3 57.5 33.0 28.0 36.6 36.1 23.5
## wri.gi age wgt hgt sex
## 1 16.5 21 65.6 174.0 1
## 2 17.0 23 71.8 175.3 1
## 3 16.9 28 80.7 193.5 1
## 4 16.6 23 72.6 186.5 1
## 5 18.0 22 78.8 187.2 1
## 6 16.9 21 74.8 181.5 1
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
Men histogram has a peak column which looks higher than the women’s, it can be assumed from here that men are taller than women.
It’s a normal distribution.
qqnorm(fdims$hgt)
qqline(fdims$hgt)
fhgtmean <- sum(fdims$hgt[1:length(fdims$hgt)])/length(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
sim_norm
. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?qqnormsim(fdims$hgt)
The output results differ every time simulated but most of the points fall on or very close to the line every time more than the real data.
fdims$hgt
look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?The normal probability and the output results of the simulated data for fdims$hgt
are very similar with slight diffrerences at the edges of the line. fdims$hgt
is a normal distribution.
fwgtmean <- sum(fdims$wgt[1:length(fdims$wgt)])/length(fdims$wgt)
fwgtsd <- sd(fdims$wgt)
sim_normw <- rnorm(n = length(fdims$wgt), mean = fwgtmean, sd = fwgtsd)
qqnorm(sim_normw)
qqline(sim_normw)
qqnormsim(fdims$wgt)
It’s almost close to be called normal distribution like the fdims$hgt
, but fdims$wgt
clearly shows at the tail of the line that it’s an almost normal distribution. The plot shows that it is a right skewed.
1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)
## [1] 0.004434387
sum(fdims$hgt > 182) / length(fdims$hgt)
## [1] 0.003846154
What percent of women are under 150 cm?
## [1] "Theoretical for height: %(Z= -2.27245406145884 )= 1.15 %"
## [1] "Empirical for height: %= 1.15 %"
What percent of women are over 90Kg?
## [1] "Theoretical for weight: %(Z= 3.05746015192308 )= 99.89 %"
## [1] "Empirical for weight: %= 0.77 %"
height | weight |
---|---|
Theoritical = Empirical | Theoritical != Empirical |
- The height had a closer agreement than the weight.
Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.
a. The histogram for female biiliac (pelvic) diameter (bii.di
) belongs to normal probability plot letter ____.
elb.di
) belongs to normal probability plot letter ____.
age
) belongs to normal probability plot letter ____.
che.de
) belongs to normal probability plot letter ____.
As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for female knee diameter (kne.di
). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.