download.file("http://www.openintro.org/stat/data/bdims.RData", destfile = "bdims.RData")
load("bdims.RData")
#The Data (bdim)
head(bdims)
## bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi che.gi
## 1 42.9 26.0 31.5 17.7 28.0 13.1 10.4 18.8 14.1 106.2 89.5
## 2 43.7 28.5 33.5 16.9 30.8 14.0 11.8 20.6 15.1 110.5 97.0
## 3 40.1 28.2 33.3 20.9 31.7 13.9 10.9 19.7 14.1 115.1 97.5
## 4 44.3 29.9 34.0 18.4 28.2 13.9 11.2 20.9 15.0 104.5 97.0
## 5 42.5 29.9 34.0 21.5 29.4 15.2 11.6 20.7 14.9 107.5 97.5
## 6 43.3 27.0 31.5 19.6 31.3 14.0 11.5 18.8 13.9 119.8 99.9
## wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi wri.gi age
## 1 71.5 74.5 93.5 51.5 32.5 26.0 34.5 36.5 23.5 16.5 21
## 2 79.0 86.5 94.8 51.5 34.4 28.0 36.5 37.5 24.5 17.0 23
## 3 83.2 82.9 95.0 57.3 33.4 28.8 37.0 37.3 21.9 16.9 28
## 4 77.8 78.8 94.0 53.0 31.0 26.2 37.0 34.8 23.0 16.6 23
## 5 80.0 82.5 98.5 55.4 32.0 28.4 37.7 38.6 24.4 18.0 22
## 6 82.5 80.1 95.3 57.5 33.0 28.0 36.6 36.1 23.5 16.9 21
## wgt hgt sex
## 1 65.6 174.0 1
## 2 71.8 175.3 1
## 3 80.7 193.5 1
## 4 72.6 186.5 1
## 5 78.8 187.2 1
## 6 74.8 181.5 1
##Split Data Set (m/f)
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex ==0)
##Exercise 1 Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions? Both distributions appear to be fairly “normal” or “bell-shaped” with the men’s appearing slightly more in line with a normal distribution and the women’s appearing slightly skewed to the right.
hist(mdims$hgt, main = "Frequency of Men's Heights in bdims Sample", xlab = "Height (cm?)")
hist(fdims$hgt, main= "Frequency of Woman's Heights in bdims Sample", xlab = "Height (cm?)")
##Exercise 2:Based on the this plot, does it appear that the data follow a nearly normal distribution? Yes, very much so. I would even adjust my comments above about any type of skex for female height, because it appears that an equal area exists above the curve on the right and left sides. But “how close is close enough?”
fhgtmean <-mean(fdims$hgt)
fhgtsd <-sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE, ylim = c(0, 0.06))
x <- 140:190
y <-dnorm (x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")
How do I increase y- scale? Add ylim = c(0, [value desired for ylim]) ##Exercise 3: Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?
All the points of the normal probability plot fir sim_norm do not fall on the line though the majority of the data beween -2 SD and +1 SD are concentrated on the line. This plot compares very similarly with the probability plot for the real data in the center of the distribtion though their is a little more vriability in that the data appears to take a more wave-like pattern around the qqline.
qqnorm(fdims$hgt)
qqline(fdims$hgt)
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
##Exercise4: Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?
Yes, relatively identical. I would say the repeated simulations reflect evidence that that female heights are nearly normal though their is slightly more variability around the line than in any of the simulations.
qqnormsim(fdims$hgt)
##Using the same technique, determine whether or not female weights appear to come from a normal distribution.
Appears that the weight of these women is slightly skewed compared to the normal distribution in that parts of the sample on the upper end appear overweight compared to a normal distribution.
qqnorm(fdims$wgt)
qqline(fdims$wgt)
fwgtmean <- mean(fdims$wgt)
fwgtsd <- sd(fdims$wgt)
qqnormsim(fdims$wgt)
##Exercise 6: Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods? What is the likeliness that a female will be under 5 ft (152.4 cms) in height? P-value = 0.0283342 What is the likliness that a female will be under 110 lbs (49.889 kgs) in weight? P-value = 0.1326508
Hegiht had a closer agreement between the two methods than weight.
"p-value fhgt"
## [1] "p-value fhgt"
pnorm(q = 152.4, mean = fhgtmean, sd = fhgtsd)
## [1] 0.028342
"empirical prob fhgt"
## [1] "empirical prob fhgt"
sum(fdims$hgt < 152.4) / length(fdims$hgt)
## [1] 0.02692308
"p-value fwgt"
## [1] "p-value fwgt"
pnorm(q = 49.889, mean = fwgtmean, sd = fwgtsd)
## [1] 0.1326508
"empirical prob fwgt"
## [1] "empirical prob fwgt"
sum(fdims$wgt < 49.889) / length(fdims$wgt)
## [1] 0.1038462
###On Your Own 1: a. Plot B b. Plot C c. Plot D d. Plot A, 2. Normal Probability Plot D for female general age has a stepwise pattern due to it being a discrete rather than continous variable. The plot for female elbow diameter is showing a stair step pattern which is slighlty suprising, but may have to do with the notion that joints often continue to grow larger as people age so these diameters may be somewhat dependent on age. 3. Female Kne.di is left skewed from the probability plot. My histogram—confirms this.
fbii.di_mean <- mean(fdims$bii.di)
fbii.di_sd <- sd(fdims$bii.di, na.rm = FALSE)
fbii.di_sdat <- (fdims$bii.di-fbii.di_mean)/fbii.di_sd
hist(fbii.di_sdat)
qqnorm(fdims$bii.di)
qqline(fdims$bii.di)
qqnorm(fdims$elb.di)
qqline(fdims$elb.di)
qqnorm(fdims$age)
qqline(fdims$age)
qqnorm(fdims$che.de)
qqline(fdims$che.de)
qqnorm(fdims$kne.di)
qqline(fdims$kne.di)
fkne.di_mean <- mean(fdims$kne.di)
fkne.di_sd <- sd(fdims$kne.di, na.rm = FALSE)
fkne.di_sdat <-(fdims$kne.di-fkne.di_mean)/fkne.di_sd
hist(fkne.di_sdat)