load("more/bdims.RData")
head(bdims)## bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
## 1 42.9 26.0 31.5 17.7 28.0 13.1 10.4 18.8 14.1 106.2
## 2 43.7 28.5 33.5 16.9 30.8 14.0 11.8 20.6 15.1 110.5
## 3 40.1 28.2 33.3 20.9 31.7 13.9 10.9 19.7 14.1 115.1
## 4 44.3 29.9 34.0 18.4 28.2 13.9 11.2 20.9 15.0 104.5
## 5 42.5 29.9 34.0 21.5 29.4 15.2 11.6 20.7 14.9 107.5
## 6 43.3 27.0 31.5 19.6 31.3 14.0 11.5 18.8 13.9 119.8
## che.gi wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi
## 1 89.5 71.5 74.5 93.5 51.5 32.5 26.0 34.5 36.5 23.5
## 2 97.0 79.0 86.5 94.8 51.5 34.4 28.0 36.5 37.5 24.5
## 3 97.5 83.2 82.9 95.0 57.3 33.4 28.8 37.0 37.3 21.9
## 4 97.0 77.8 78.8 94.0 53.0 31.0 26.2 37.0 34.8 23.0
## 5 97.5 80.0 82.5 98.5 55.4 32.0 28.4 37.7 38.6 24.4
## 6 99.9 82.5 80.1 95.3 57.5 33.0 28.0 36.6 36.1 23.5
## wri.gi age wgt hgt sex
## 1 16.5 21 65.6 174.0 1
## 2 17.0 23 71.8 175.3 1
## 3 16.9 28 80.7 193.5 1
## 4 16.6 23 72.6 186.5 1
## 5 18.0 22 78.8 187.2 1
## 6 16.9 21 74.8 181.5 1
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)Plotting both histograms together we see they are very similar.
m<-hist(mdims$hgt,plot=FALSE)
m## $breaks
## [1] 155 160 165 170 175 180 185 190 195 200
##
## $counts
## [1] 2 5 28 44 76 50 28 12 2
##
## $density
## [1] 0.001619433 0.004048583 0.022672065 0.035627530 0.061538462 0.040485830
## [7] 0.022672065 0.009716599 0.001619433
##
## $mids
## [1] 157.5 162.5 167.5 172.5 177.5 182.5 187.5 192.5 197.5
##
## $xname
## [1] "mdims$hgt"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
f<-hist(fdims$hgt,plot=FALSE)
f## $breaks
## [1] 145 150 155 160 165 170 175 180 185
##
## $counts
## [1] 3 15 52 63 69 38 18 2
##
## $density
## [1] 0.002307692 0.011538462 0.040000000 0.048461538 0.053076923 0.029230769
## [7] 0.013846154 0.001538462
##
## $mids
## [1] 147.5 152.5 157.5 162.5 167.5 172.5 177.5 182.5
##
## $xname
## [1] "fdims$hgt"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
h<-rbind(m$density,f$density)## Warning in rbind(m$density, f$density): number of columns of result is not
## a multiple of vector length (arg 2)
h## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.001619433 0.004048583 0.02267206 0.03562753 0.06153846 0.04048583
## [2,] 0.002307692 0.011538462 0.04000000 0.04846154 0.05307692 0.02923077
## [,7] [,8] [,9]
## [1,] 0.02267206 0.009716599 0.001619433
## [2,] 0.01384615 0.001538462 0.002307692
barplot(h,beside = T)mean(mdims$hgt)## [1] 177.7453
mean(fdims$hgt)## [1] 164.8723
sd(mdims$hgt)## [1] 7.183629
sd(fdims$hgt)## [1] 6.544602
fhgtmean <- mean(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE, ylim = c(0, 0.06))
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")Yes it does, the historgram seems to match the normal distribution line very well.
qqnorm(fdims$hgt)
qqline(fdims$hgt)sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?No, not all points fall on the line. Similar to the real data, points at the extremes do not follow the line, while points in the middle do.
qqnorm(sim_norm)
qqline(sim_norm)fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?We run several simulations and look at the corresponding Q-Q Plots. In all these simulations we see how points fall on the line very well. The simulations indicate this dataset is in fact normally distributed.
qqnormsim(fdims$hgt)We can first look at a Q-Q Plot of the dataset. Doing so shows similar results to the heights analysis. With some exceptions towards the edges, data seems to fit a normal distribution.
qqnorm(fdims$hgt)
qqline(fdims$hgt)We can also run simulations using the mean and standard deviation of the dataset to see how they compare against a normal distribution. Again, we find most point following the straight line, and therefore we can conclude is is also reasonable to assume a normal distribution.
qqnormsim(fdims$wgt)What is the probability that a randomly chosen young adult female is taller than 6 feet (about 182 cm)?
#Usint normal distribution Z values
1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)## [1] 0.004434387
#empirical solution
sum(fdims$hgt > 182) / length(fdims$hgt)## [1] 0.003846154
What is the probability that a randomly chosen young adult female is shorter than 5 feet (152.4 cm)?
#Theoretical Normal Distribution
pnorm(q = 152.4, mean = fhgtmean, sd = fhgtsd)## [1] 0.028342
#Empirical
length(fdims$hgt[fdims$hgt<152.4])/length(fdims$hgt)## [1] 0.02692308
What is the probability that a randomly chosen young adult female is weights more than 150 lbs (68.0389 Kg)?
#Theoretical Normal Distribution
fwgtmean <- mean(fdims$wgt)
fwgtsd <- sd(fdims$wgt)
1 - pnorm(q = 68.0389, mean = fwgtmean, sd = fwgtsd)## [1] 0.2195895
#Empirical
length(fdims$wgt[fdims$wgt>68.0389])/length(fdims$wgt)## [1] 0.1923077
Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.
a. The histogram for female biiliac (pelvic) diameter (bii.di) belongs to normal probability plot letter B.
qqnorm(fdims$bii.di)
qqline(fdims$bii.di)**b.** The histogram for female elbow diameter (`elb.di`) belongs to normal
probability plot letter __C__.
qqnorm(fdims$elb.di)
qqline(fdims$elb.di)**c.** The histogram for general age (`age`) belongs to normal probability
plot letter __D__.
qqnorm(fdims$age)
qqline(fdims$age)**d.** The histogram for female chest depth (`che.de`) belongs to normal
probability plot letter __A__.
qqnorm(fdims$che.de)
qqline(fdims$che.de)If we observe the distribution plots for these two datasets (C being age and D being che.de) we can see how they are both right skew. This is consistant with the deviation from the straight line seen in the probability plots.
kne.di). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.Probability plot show the dataset being right skew, with more point falling off the line towards the righ.
qqnorm(fdims$kne.di)
qqline(fdims$kne.di)The density plot confirms this also showing a histogram which is right skewed.
hist(fdims$kne.di)histQQmatch