Notes

Before we can answer any questions, we load the mosaic package and the body dimensions data set.

library(mosaic)
library(oilabs)
data(bdims)

Total out of XX points.

Question 1

Now let’s consider some of the other variables in the body dimensions data set. Using the figures linked here, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. While unnecessary for this assignment, if you are uncertain based on these figures, generate the plots in R to check.

  1. The histogram for female biiliac (pelvic) diameter (bii.di) belongs to normal probability plot letter ____.
  2. The histogram for female elbow diameter (elb.di) belongs to normal probability plot letter ____.
  3. The histogram for general age (age) belongs to normal probability plot letter ____.
  4. The histogram for female chest depth (che.de) belongs to normal probability plot letter ____.

Solutions: (2 points)

  1. B, the sample quantiles go as low as -4.
  2. C, the histogram is the most normal, the QQ-plot is the closest to a diagonal line.
  3. D, the outlying values are less than 4.
  4. A, the outlying value is greather than 4.

Question 2

Note that normal probability plots C and D have a slight stepwise pattern. Why do you think this is the case?

(1 point) Because there are many repeated values. For example, the female elb.di value 12.4 occurs 29 times:

fdims <- bdims %>%
  filter(sex == "f")
tally(~elb.di, data=fdims)
## 
##  9.9 10.1 10.3 10.4 10.6 10.7 10.8 10.9   11 11.1 11.2 11.3 11.4 11.5 11.6 
##    1    1    1    1    3    1    1    2    4    3    8    6    1   13   11 
## 11.7 11.8 11.9   12 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9   13 13.1 
##    3   12    4   17    6   10    6   29    3   15    7   14   13   10    9 
## 13.2 13.3 13.4 13.6 13.7 13.8 13.9   14 14.1 14.2 14.3   15 
##   11    3   14    3    1    4    1    4    1    1    1    1

As an extreme example, let’s consider a data set of only the numbers 1 through 4:

values <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4)
favstats(values)
##  min Q1 median Q3 max     mean       sd  n missing
##    1  2      3  4   4 2.571429 1.164965 21       0

If we plot the qqplot, we see a step wise pattern because of the repeats

qplot(sample=values, stat="qq")

Question 3

As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for female knee diameter (kne.di). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.

(1 point) The variable kne.di is right-skewed and its qq-plot is convex (the ends point up).

histogram(~kne.di, data=fdims, nint=25)

qplot(sample=kne.di, data=fdims, stat="qq")

Right-skewed Data qq-plot

x <- rchisq(10000, df=10)
histogram(~x, nint=50)

qplot(sample=x, stat="qq")

Left-skewed Data qq-plot

x <- -rchisq(10000, df=10)
histogram(~x, nint=50)

qplot(sample=x, stat="qq")