load("more/bdims.RData")
library("ggplot2")
summary(bdims)
## bia.di bii.di bit.di che.de
## Min. :32.40 Min. :18.70 Min. :24.70 Min. :14.30
## 1st Qu.:36.20 1st Qu.:26.50 1st Qu.:30.60 1st Qu.:17.30
## Median :38.70 Median :28.00 Median :32.00 Median :19.00
## Mean :38.81 Mean :27.83 Mean :31.98 Mean :19.23
## 3rd Qu.:41.15 3rd Qu.:29.25 3rd Qu.:33.35 3rd Qu.:20.90
## Max. :47.40 Max. :34.70 Max. :38.00 Max. :27.50
## che.di elb.di wri.di kne.di
## Min. :22.20 Min. : 9.90 Min. : 8.10 Min. :15.70
## 1st Qu.:25.65 1st Qu.:12.40 1st Qu.: 9.80 1st Qu.:17.90
## Median :27.80 Median :13.30 Median :10.50 Median :18.70
## Mean :27.97 Mean :13.39 Mean :10.54 Mean :18.81
## 3rd Qu.:29.95 3rd Qu.:14.40 3rd Qu.:11.20 3rd Qu.:19.60
## Max. :35.60 Max. :16.70 Max. :13.30 Max. :24.30
## ank.di sho.gi che.gi wai.gi
## Min. : 9.90 Min. : 85.90 Min. : 72.60 Min. : 57.90
## 1st Qu.:13.00 1st Qu.: 99.45 1st Qu.: 85.30 1st Qu.: 68.00
## Median :13.80 Median :108.20 Median : 91.60 Median : 75.80
## Mean :13.86 Mean :108.20 Mean : 93.33 Mean : 76.98
## 3rd Qu.:14.80 3rd Qu.:116.55 3rd Qu.:101.15 3rd Qu.: 84.50
## Max. :17.20 Max. :134.80 Max. :118.70 Max. :113.20
## nav.gi hip.gi thi.gi bic.gi
## Min. : 64.00 Min. : 78.80 Min. :46.30 Min. :22.40
## 1st Qu.: 78.85 1st Qu.: 92.00 1st Qu.:53.70 1st Qu.:27.60
## Median : 84.60 Median : 96.00 Median :56.30 Median :31.00
## Mean : 85.65 Mean : 96.68 Mean :56.86 Mean :31.17
## 3rd Qu.: 91.60 3rd Qu.:101.00 3rd Qu.:59.50 3rd Qu.:34.45
## Max. :121.10 Max. :128.30 Max. :75.70 Max. :42.40
## for.gi kne.gi cal.gi ank.gi
## Min. :19.60 Min. :29.00 Min. :28.40 Min. :16.40
## 1st Qu.:23.60 1st Qu.:34.40 1st Qu.:34.10 1st Qu.:21.00
## Median :25.80 Median :36.00 Median :36.00 Median :22.00
## Mean :25.94 Mean :36.20 Mean :36.08 Mean :22.16
## 3rd Qu.:28.40 3rd Qu.:37.95 3rd Qu.:38.00 3rd Qu.:23.30
## Max. :32.50 Max. :49.00 Max. :47.70 Max. :29.30
## wri.gi age wgt hgt sex
## Min. :13.0 Min. :18.00 Min. : 42.00 Min. :147.2 0:260
## 1st Qu.:15.0 1st Qu.:23.00 1st Qu.: 58.40 1st Qu.:163.8 1:247
## Median :16.1 Median :27.00 Median : 68.20 Median :170.3
## Mean :16.1 Mean :30.18 Mean : 69.15 Mean :171.1
## 3rd Qu.:17.1 3rd Qu.:36.00 3rd Qu.: 78.85 3rd Qu.:177.8
## Max. :19.6 Max. :67.00 Max. :116.40 Max. :198.1
Q1. Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?
qplot(bdims$hgt ,data=bdims,facets = .~bdims$sex, binwidth=0.5)
# 264 male in '0' histigram; 247 famale in '1' histogram. Both histograms have belt curve shape, so that their both are normal distribution.
#Male's distribution center at around 178, femal's distrition center at around 160.
#Yes. However, the male's heigh distribution is more close to the normal distribution than female's.
sim_norm
. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?fdims <- subset(bdims, sex == 0)
fhgtmean <- mean(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnormsim(sim_norm)
#No,it is not all the data fall on the line, there are very a few outliers at two tails.
#'sim_norm' is standar normal with 0 mean and 1 standar deviation, yet the real data is nearly normal with different mean and standar deviation.
fdims$hgt
look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?qqnorm(sim_norm)
qqline(sim_norm)
#Yes,'sim_norm' follows the same mean and standar deviation as the 'fdims$hgt'.
#Yes.
fwgtmean <- mean(fdims$wgt)
fwgtsd <- sd(fdims$wgt)
hist(fdims$wgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fwgtmean, sd = fwgtsd)
lines(x = x, y = y, col = "blue")
qqnorm(fdims$wgt)
qqline(fdims$wgt)
#The histogram show the femal weight is right skew, and there are many outliers on both tails. Therefore, femal weight is the not normal.
sum(fdims$hgt > 182) / length(fdims$hgt)
## [1] 0.003846154
#What is the probability of female heights <= 172?
pnorm(q = 172, mean = fhgtmean, sd = fhgtsd)
## [1] 0.861944
sum(fdims$hgt <172) / length(fdims$hgt)
## [1] 0.8384615
#What is the probability of female weight >60?
1-pnorm(q = 60, mean = fwgtmean, sd = fwgtsd)
## [1] 0.524893
sum(fdims$wgt >60) / length(fdims$wgt)
## [1] 0.4384615
#By comparing the theoretical normal distribution as the empirical disrical distribution in height <=172 and weight >60, height has a loser agreenment between the two methods.
Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.
a. => b
qqnorm(fdims$bii.di)
qqline(fdims$bii.di)
**b.** The histogram for female elbow diameter (`elb.di`) belongs to normal
probability plot letter __C__.
qqnorm(fdims$elb.di)
qqline(fdims$elb.di)
c. The histogram for general age (
age
) belongs to normal probability plot letter D.
qqnorm(fdims$age)
qqline(fdims$age)
**d.** The histogram for female chest depth (`che.de`) belongs to normal
probability plot letter _A___.
qqnorm(fdims$che.de)
qqline(fdims$che.de)
#C is right skew, and D is nearly normal distribution.
kne.di
). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.fkne_dimean <- mean(fdims$kne.di)
fkne_disd <- sd(fdims$kne.di)
hist(fdims$kne.di, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fkne_dimean, sd = fkne_disd)
lines(x = x, y = y, col = "blue")
qqnorm(fdims$kne.di)
qqline(fdims$kne.di)
#The distribution of fdims$kne.di is a bit right skew, it is nearly normal, so that there are some outliers on both tails.
histQQmatch
This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel from a lab written by Mark Hansen of UCLA Statistics.