library(ggplot2)
load("more/bdims.RData")
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
ggplot(bdims, aes(hgt, fill = sex)) + geom_density(alpha = 0.3)
To compare the heights, I would look at the spread and the mean of the two distributions. It appears that on average, men are taller than women, however, the spread of their heights are about the same.
fhgtmean <- mean(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE, ylim = c(0,0.06))
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")
According to the plot, the data appears to follow the normal distribution fairly well. The observations fall pretty close to the ideal normal distribution lines.
sim_norm
. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
qqnorm(fdims$hgt)
qqline(fdims$hgt)
A large propotion of the points from sim_norm fall very close to the line. Compared to the real data, the sim_norm points are closer to the line. However, this is expected because it samples directly from a perfect normal distribution.
fdims$hgt
look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?qqnormsim(fdims$hgt)
The normal probability plot for fdims$hgt
looks similar to the simulated data. The plot does not exhibit any strong indication of skewness and does not deviate significantly from the line. In other words, the plots provide sufficient evidence that female heights are nearly normal.
qqnorm(fdims$wgt)
qqline(fdims$wgt)
According to the qq plot, female weights do not appear to be normal. The tails of the plot are not close to the line compared to a normal distribution. This indicates that there is some skewness present in the female weight distribution. There is not enough evidence to conclude that female weight is normally distributed.
What percentage of females are shorter than 160 cm?
#Theoretical normal distribution
pnorm(q = 160, mean = fhgtmean, sd = fhgtsd)
## [1] 0.2282939
#Empirical distribution
sum(fdims$hgt < 160)/length(fdims$hgt)
## [1] 0.1923077
What percentage of females are heavier than 90 kg?
#Theoretical normal distribution
1- pnorm(q = 90, mean = mean(fdims$wgt), sd = sd(fdims$wgt))
## [1] 0.001116107
#Empirical distribution
sum(fdims$wgt > 90)/length(fdims$wgt)
## [1] 0.007692308
Height had a closer agreement between the two methods compared to weight. Close to the mean, both height and weight have similar values compared to the theoretical normal distribution. However, out near the tails, weight has a larger proportion of values compared to the theoretical normal distribution. Looking at the histograms of female height and weight, height is mostly normal and weight is skewed to the right.
par(mfrow = c(1,2))
hist(fdims$hgt, main = "Height Histogram")
hist(fdims$wgt, main = "Weight Histogram")
Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.
a. The histogram for female biiliac (pelvic) diameter (bii.di
) belongs to normal probability plot letter B.
b. The histogram for female elbow diameter (elb.di
) belongs to normal probability plot letter C.
c. The histogram for general age (age
) belongs to normal probability plot letter D.
d. The histogram for female chest depth (che.de
) belongs to normal probability plot letter A.
Note that normal probability plots C and D have a slight stepwise pattern.
Why do you think this is the case?
Stepwise patterns can occur when a histogram has steep dropoffs and large bins. In other words, there is not enough granularity in the bins to smooth out the distribution, which results in a stepwise pattern. Both female elbow diameter and female general age exhibit both large bins and steep dropoffs in their distributions.
kne.di
). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.qqnorm(fdims$kne.di)
qqline(fdims$kne.di)
According to the qq plot, the values appear to curve upwards relative to the line. Upward curvature indicates data is right skewed.
hist(fdims$kne.di)
Female knee diameter is right skewed.
histQQmatch