Data606 Lab3

The Data

This week we’ll be working with measurements of body dimensions. This data set contains measurements from 247 men and 260 women, most of whom were considered healthy young adults.

load("more/bdims.RData")

Let’s take a quick peek at the first few rows of the data.

head(bdims)

##   bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
## 1   42.9   26.0   31.5   17.7   28.0   13.1   10.4   18.8   14.1  106.2
## 2   43.7   28.5   33.5   16.9   30.8   14.0   11.8   20.6   15.1  110.5
## 3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1  115.1
## 4   44.3   29.9   34.0   18.4   28.2   13.9   11.2   20.9   15.0  104.5
## 5   42.5   29.9   34.0   21.5   29.4   15.2   11.6   20.7   14.9  107.5
## 6   43.3   27.0   31.5   19.6   31.3   14.0   11.5   18.8   13.9  119.8
##   che.gi wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi
## 1   89.5   71.5   74.5   93.5   51.5   32.5   26.0   34.5   36.5   23.5
## 2   97.0   79.0   86.5   94.8   51.5   34.4   28.0   36.5   37.5   24.5
## 3   97.5   83.2   82.9   95.0   57.3   33.4   28.8   37.0   37.3   21.9
## 4   97.0   77.8   78.8   94.0   53.0   31.0   26.2   37.0   34.8   23.0
## 5   97.5   80.0   82.5   98.5   55.4   32.0   28.4   37.7   38.6   24.4
## 6   99.9   82.5   80.1   95.3   57.5   33.0   28.0   36.6   36.1   23.5
##   wri.gi age  wgt   hgt sex
## 1   16.5  21 65.6 174.0   1
## 2   17.0  23 71.8 175.3   1
## 3   16.9  28 80.7 193.5   1
## 4   16.6  23 72.6 186.5   1
## 5   18.0  22 78.8 187.2   1
## 6   16.9  21 74.8 181.5   1

mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)

Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?

hist(mdims$hgt, probability = TRUE)

hist(fdims$hgt)

summary(mdims$hgt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   157.2   172.9   177.8   177.7   182.7   198.1

summary(fdims$hgt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   147.2   160.0   164.5   164.9   169.5   182.9

the difference of 1st Qu and 3rd Qu: is 182.7-172.9 = 9.8 the different of 1st Qu and 3rd QU of women is : 9.5

There are pretty closed.

fhgtmean <- mean(fdims$hgt)
fhgtsd   <- sd(fdims$hgt)

hist(fdims$hgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")

Based on the this plot, does it appear that the data follow a nearly normal distribution?

Yes. It is nearly normal distribution

qqnorm(fdims$hgt)
qqline(fdims$hgt)

sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)

Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?

qqnorm(sim_norm)
qqline(sim_norm)

Both real and simulation are pretty closed!

qqnormsim(fdims$hgt)

Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?

Ans: The female height distribution looks like simiulated ata and nearly normal.
Using the same technique, determine whether or not female weights appear to come from a normal distribution.
```
fwgtmean <- mean(fdims$wgt)
fwgtsd   <- sd(fdims$wgt)
```

qqnorm(fdims$wgt)
qqline(fdims$wgt)

sim_norm_wgt <- rnorm(n = length(fdims$wgt), mean = fwgtmean, sd = fwgtsd)

qqnorm(sim_norm_wgt)
qqline(sim_norm_wgt)

Ans: Women “real” weight is not normal distrubution.

“What is the probability that a randomly chosen young adult female is taller than 6 feet (about 182 cm)?”

1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)

## [1] 0.004434387

sum(fdims$hgt > 182) / length(fdims$hgt)

## [1] 0.003846154

Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?

Ans: what is the probability of men height taller than 6 feets?

mhgtmean <- mean(mdims$hgt)
mhgtsd   <- sd(mdims$hgt)

1 - pnorm(q = 182, mean = mhgtmean, sd = mhgtsd)

## [1] 0.2768345

sum(mdims$hgt > 182) / length(mdims$hgt)

## [1] 0.2631579

These 2 methods generate the similar result.

On Your Own

Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.

fbiimean <- mean(fdims$bii.di)
fbiisd   <- sd(fdims$bii.di)

hist(fdims$bii.di, probability = TRUE)
x<- 10:60
y <- dnorm(x = x, mean = fbiimean, sd = fbiisd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$bii.di)
qqline(fdims$bii.di)

**a.** The histogram for female biiliac (pelvic) diameter (`bii.di`) belongs
to normal probability plot letter __?__. (It doesn't general the probablity plot, i can't determine it)

felbmean <- mean(fdims$elb.di)
felbsd   <- sd(fdims$elb.di)

hist(fdims$elb.di, probability = TRUE)
y <- dnorm(x = x, mean = felbmean, sd = felbsd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$elb.di)
qqline(fdims$elb.di)

**b.** The histogram for female elbow diameter (`elb.di`) belongs to normal 
probability plot letter __?__. (it doesn't generate probablity plot, i can't determine it)

**c.** The histogram for general age (`age`) belongs to normal probability 
plot letter ____.  (it doesn't generate the probablity plot letter chart, i can't determine it.)

fagemean <- mean(fdims$age)
fagesd   <- sd(fdims$age)

hist(fdims$age, probability = TRUE)
y <- dnorm(x = x, mean = fagemean, sd = fagesd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$age)
qqline(fdims$age)

**d.** The histogram for female chest depth (`che.de`) belongs to normal 
probability plot letter __?__.

fchemean <- mean(fdims$che.di)
fchesd   <- sd(fdims$che.di)

hist(fdims$che.di, probability = TRUE)
y <- dnorm(x = x, mean = fchemean, sd = fchesd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$che.di)
qqline(fdims$che.di)

Note that normal probability plots C and D have a slight stepwise pattern.
Why do you think this is the case? I can’t answer this question.
As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for female knee diameter (kne.di). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.
```
fknemean <- mean(fdims$kne.di)
fknesd   <- sd(fdims$kne.di)
```

hist(fdims$kne.di, probability = TRUE)
y <- dnorm(x = x, mean = fknemean, sd = fknesd)
lines(x = x, y = y, col = "blue")

qqnorm(fdims$kne.di)
qqline(fdims$kne.di)

Data606 Lab3

Yuen Chun Wong

September 17, 2017

The Data

On Your Own