DATA606-Lab3- Muller

Exercise 1

Unimodal symmetric distribution

library(psych)
mdims = subset(bdims, sex==1)
fdims = subset(bdims, sex==0)
hist(mdims$hgt)

hist(fdims$hgt)

describe(fdims$hgt)

##    vars   n   mean   sd median trimmed  mad   min   max range skew
## X1    1 260 164.87 6.54  164.5  164.84 6.67 147.2 182.9  35.7 0.07
##    kurtosis   se
## X1    -0.32 0.41

describe(mdims$hgt)

##    vars   n   mean   sd median trimmed  mad   min   max range skew
## X1    1 247 177.75 7.18  177.8  177.67 7.41 157.2 198.1  40.9  0.1
##    kurtosis   se
## X1    -0.16 0.46

fhgtmean <- mean(fdims$hgt)
fhgtsd   <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")

Exercise 2 Based on the this plot, does it appear that the data follow a nearly normal distribution?

Yes, the more closely the bars fill the blue curve, the more normal the distribution.

Exercise3 Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?

The result vary everytime I run the normal probability plot, however the outcomes are all very close to the line, and compare simarily to the real data.

qqnorm(fdims$hgt)
qqline(fdims$hgt)

sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)

qqnormsim(fdims$hgt)

Exercise 4 Does the normal probability plot for fdims hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?

Yes, all the simulated plots follow very closely to the normal probability plot for fdims$hgt. The plots provide evidence that female heights are near normal distribution.

Exercise 5 Using the same technique, determine whether or not female weights appear to come from a normal distribution.

For the most part, women’s weight seems to follow a normal distribution. However as we go into higher quartiles the variance becomes much greater.

fwgtmean <- mean(fdims$wgt)
fwgtsd   <- sd(fdims$wgt)
hist(fdims$wgt, probability = TRUE)
x <- 40:100
y <- dnorm(x = x, mean = fwgtmean, sd = fwgtsd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$wgt)
qqline(fdims$wgt)

sim_norm <- rnorm(n = length(fdims$wgt), mean = fwgtmean, sd = fwgtsd)
qqnorm(sim_norm)
qqline(sim_norm)

qqnormsim(fdims$wgt)

1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)

## [1] 0.004434387

sum(fdims$hgt > 182) / length(fdims$hgt)

## [1] 0.003846154

Exercise 6 Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?

The women’s height had a closer agreement between the two methods.

Question 1: What is the probability that a randomly chosen young adult female is shorter than 5 feet? (about 152cm))

pnorm(q=152,mean= fhgtmean, sd=fhgtsd) # Theoretical prob

## [1] 0.02459975

sum(fdims$hgt < 152 )/length(fdims$hgt) # empirical prob

## [1] 0.01923077

Question 2: What is the probability that a randomly chosen young adult female is weighs more than 60lbs?

1-pnorm(q=60,mean=fwgtmean,sd=fwgtsd) # Theoretical prob

## [1] 0.524893

sum(fdims$wgt > 60)/length(fdims$wgt) # empirical prob

## [1] 0.4384615

1. Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.

The histogram for female biiliac (pelvic) diameter (bii.di) belongs to normal probability plot letter “B”

The histogram for female elbow diameter (elb.di) belongs to normal probability plot letter “C”
The histogram for general age (age) belongs to normal probability plot letter “D”
The histogram for female chest depth (che.de) belongs to normal probability plot letter “A”

2. Note that normal probability plots C and D have a slight stepwise pattern. Why do you think this is the case?

This happens when the distribution is not normal. The increased amount of outliers skew the results and distorts the beginning and ends of a QQ plot.

3. As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for female knee diameter (kne.di). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.

Variable is right skewed. As seen below.

femKneeMean = mean(fdims$kne.di)
femKneeSd = sd(fdims$kne.di)
hist(fdims$kne.di, probability = TRUE)
x <- 10:30
y <- dnorm(x = x, mean = femKneeMean, sd = femKneeSd)
lines(x = x, y = y, col = "red")

qqnorm(fdims$kne.di)
qqline(fdims$kne.di)

sim_norm <- rnorm(n = length(fdims$kne.di), mean = femKneeMean, sd = femKneeSd)
qqnorm(sim_norm)
qqline(sim_norm)

qqnormsim(fdims$kne.di)