library(ggplot2)
load("more/bdims.RData")
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
  1. Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?
ggplot(bdims, aes(hgt, fill = sex)) + geom_density(alpha = 0.3)

To compare the heights, I would look at the spread and the mean of the two distributions. It appears that on average, men are taller than women, however, the spread of their heights are about the same.

  1. Based on the this plot, does it appear that the data follow a nearly normal distribution?
fhgtmean <- mean(fdims$hgt)
fhgtsd   <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE, ylim = c(0,0.06))
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")

According to the plot, the data appears to follow the normal distribution fairly well. The observations fall pretty close to the ideal normal distribution lines.

  1. Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)

qqnorm(fdims$hgt)
qqline(fdims$hgt)

A large propotion of the points from sim_norm fall very close to the line. Compared to the real data, the sim_norm points are closer to the line. However, this is expected because it samples directly from a perfect normal distribution.

  1. Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?
qqnormsim(fdims$hgt)

The normal probability plot for fdims$hgt looks similar to the simulated data. The plot does not exhibit any strong indication of skewness and does not deviate significantly from the line. In other words, the plots provide sufficient evidence that female heights are nearly normal.

  1. Using the same technique, determine whether or not female weights appear to come from a normal distribution.
qqnorm(fdims$wgt)
qqline(fdims$wgt)

According to the qq plot, female weights do not appear to be normal. The tails of the plot are not close to the line compared to a normal distribution. This indicates that there is some skewness present in the female weight distribution. There is not enough evidence to conclude that female weight is normally distributed.

  1. Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?

Question 1

What percentage of females are shorter than 160 cm?

#Theoretical normal distribution
pnorm(q = 160, mean = fhgtmean, sd = fhgtsd)
## [1] 0.2282939
#Empirical distribution
sum(fdims$hgt < 160)/length(fdims$hgt)
## [1] 0.1923077

Question 2

What percentage of females are heavier than 90 kg?

#Theoretical normal distribution
1- pnorm(q = 90, mean = mean(fdims$wgt), sd = sd(fdims$wgt))
## [1] 0.001116107
#Empirical distribution
sum(fdims$wgt > 90)/length(fdims$wgt)
## [1] 0.007692308

Height had a closer agreement between the two methods compared to weight. Close to the mean, both height and weight have similar values compared to the theoretical normal distribution. However, out near the tails, weight has a larger proportion of values compared to the theoretical normal distribution. Looking at the histograms of female height and weight, height is mostly normal and weight is skewed to the right.

par(mfrow = c(1,2))

hist(fdims$hgt, main = "Height Histogram")
hist(fdims$wgt, main = "Weight Histogram")


On Your Own

Stepwise patterns can occur when a histogram has steep dropoffs and large bins. In other words, there is not enough granularity in the bins to smooth out the distribution, which results in a stepwise pattern. Both female elbow diameter and female general age exhibit both large bins and steep dropoffs in their distributions.

qqnorm(fdims$kne.di)
qqline(fdims$kne.di)

According to the qq plot, the values appear to curve upwards relative to the line. Upward curvature indicates data is right skewed.

hist(fdims$kne.di)

Female knee diameter is right skewed.

histQQmatch

histQQmatch