DATA606hw4

download.file("http://www.openintro.org/stat/data/bdims.RData", destfile = "bdims.RData")
load("bdims.RData")

mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)

Exercise 1 Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?

hist(mdims$hgt)

hist(fdims$hgt)

On average men are taller than women. Both are bell-curved distribution.

fhgtmean <- mean(fdims$hgt)
fhgtsd   <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")

Exercise 2 Based on the this plot, does it appear that the data follow a nearly normal distribution? Based on this plot, it does appear to follow a normal distribution.

Exercise 3 Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?

sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)

No, not all the points fall on the line. Compared the probability plot for the real data, this sim_norm plot have more points falling on the line.

qqnormsim(fdims$hgt)

Exercise 4 Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?

The probability plot for female height is similiar to the simulated plots. Most of the points fall within the line despite some points on the ends of the data being out. However, using the plots, we can conclude that the female heights are normal.

Exercise 5

Using the same technique, determine whether or not female weights appear to come from a normal distribution.

fwgtmean<-mean(fdims$wgt)
fwgtsd<-sd(fdims$wgt)
hist(fdims$wgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fwgtmean, sd = fwgtsd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$wgt)
qqline(fdims$wgt)

It doesn’t look like a normal distribution. We can see there are two tails that are not aligned with the line.

Exercise 6 Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?

Q1: What is the probability of females’ height being taller than 160cm? The distribution is very close to each other. [1] 0.7717061 [1] 0.7307692

1 - pnorm(q = 160, mean = fhgtmean, sd = fhgtsd)

## [1] 0.7717061

sum(fdims$hgt > 160) / length(fdims$hgt)

## [1] 0.7307692

Q2: What is the probability of females’ weight being more than 61 kg?

1 - pnorm(q = 61, mean = fwgtmean, sd = fwgtsd)

## [1] 0.4834253

sum(fdims$wgt > 61) / length(fdims$wgt)

## [1] 0.4038462

The variable height have a closer agreement compared to weight using the two methods.

ON YOUR OWN 1.Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.

qqnorm(fdims$bii.di)
qqline(fdims$bii.di)

qqnorm(fdims$elb.di)
qqline(fdims$elb.di)

qqnorm(fdims$age)
qqline(fdims$age)

qqnorm(fdims$che.de)
qqline(fdims$che.de)

The histogram for female biiliac (pelvic) diameter (bii.di) belongs to normal probability plot letter B.
The histogram for female elbow diameter (elb.di) belongs to normal probability plot letter C.
The histogram for general age (age) belongs to normal probability plot letter D.
The histogram for female chest depth (che.de) belongs to normal probability plot letter A.

Note that normal probability plots C and D have a slight stepwise pattern. Why do you think this is the case? That means there are good amount of people that reported that same amount.
As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for female knee diameter (kne.di). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.

qqnorm(fdims$kne.di)
qqline(fdims$kne.di)

hist(fdims$kne.di)

The plot for female knee diameter is right skewed.

DATA606hw4

Tony Mei

9/23/2019