Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
hist(mdims$hgt)
hist(fdims$hgt)
summary(mdims$hgt)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 157.2 172.9 177.8 177.7 182.7 198.1
summary(fdims$hgt)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 147.2 160.0 164.5 164.9 169.5 182.9
Both the histogram of the male and female height tends to show a relatively normal distribution. The range, median, and average for the male height is slightly wider/heigher than that of the female height.
fhgtmean <- mean(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE, ylim = c(0,.06))
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")
#### Exercise 2
Based on the this plot, does it appear that the data follow a nearly normal distribution?
Based on the plot above, it does appear the the data for female height does seem to follow a nearly normal distribution
qqnorm(fdims$hgt)
qqline(fdims$hgt)
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
Most of the points fall on the line. There are a few points near the beginning and end of the plot that are not on the line.Compared to the probablility plot of the real data, the simulated data is more linear. This might be due to some deviations or points in the data set that do not follow the trend.
qqnormsim(fdims$hgt)
Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?
The simulated plots do provide evidence that the female heights are nearly normal because the simularted data resembles the probability plot for female height dimensions. For the most part, both the simulated plots and the one based on our female data, show that most points fall on the line except for a few near the beginning and end.
qqnorm(fdims$wgt)
qqline(fdims$wgt)
Using the same technique, determine whether or not female weights appear to come from a normal distribution.
qqnormsim(fdims$wgt)
I would say that the female weight distribution is not normal.Looking at the probability plot for female weight, you can see that the tails do not falll on the line, especially the right tail. The female plot seems to have a distribution that is right skewed. Also when compared to the simulated plots, our actual data points are less linear than the simulations.
fwgtmean<-mean(fdims$wgt)
fwgtsd<-sd(fdims$wgt)
Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?
pnorm(150,fhgtmean, fhgtsd)
## [1] 0.01152955
pnorm(150,fwgtmean, fwgtsd)
## [1] 1
pnorm(178,fhgtmean,fhgtsd)-pnorm(165,fhgtmean,fhgtsd)
## [1] 0.4697822
pnorm(178,fwgtmean,fwgtsd)-pnorm(165,fwgtmean,fwgtsd)
## [1] 0
The probability that the female height is 150 or less is .0115 and the probability that female height is between 165 and 178 is 0.4698.