The Normal Distribution

The data

download the data

download.file("http://www.openintro.org/stat/data/bdims.RData", destfile = "bdims.RData")

load the data

load("bdims.RData")
head(bdims)
##   bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
## 1   42.9   26.0   31.5   17.7   28.0   13.1   10.4   18.8   14.1  106.2
## 2   43.7   28.5   33.5   16.9   30.8   14.0   11.8   20.6   15.1  110.5
## 3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1  115.1
## 4   44.3   29.9   34.0   18.4   28.2   13.9   11.2   20.9   15.0  104.5
## 5   42.5   29.9   34.0   21.5   29.4   15.2   11.6   20.7   14.9  107.5
## 6   43.3   27.0   31.5   19.6   31.3   14.0   11.5   18.8   13.9  119.8
##   che.gi wai.gi nav.gi hip.gi thi.gi bic.gi for.gi kne.gi cal.gi ank.gi
## 1   89.5   71.5   74.5   93.5   51.5   32.5   26.0   34.5   36.5   23.5
## 2   97.0   79.0   86.5   94.8   51.5   34.4   28.0   36.5   37.5   24.5
## 3   97.5   83.2   82.9   95.0   57.3   33.4   28.8   37.0   37.3   21.9
## 4   97.0   77.8   78.8   94.0   53.0   31.0   26.2   37.0   34.8   23.0
## 5   97.5   80.0   82.5   98.5   55.4   32.0   28.4   37.7   38.6   24.4
## 6   99.9   82.5   80.1   95.3   57.5   33.0   28.0   36.6   36.1   23.5
##   wri.gi age  wgt   hgt sex
## 1   16.5  21 65.6 174.0   1
## 2   17.0  23 71.8 175.3   1
## 3   16.9  28 80.7 193.5   1
## 4   16.6  23 72.6 186.5   1
## 5   18.0  22 78.8 187.2   1
## 6   16.9  21 74.8 181.5   1
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)

The normal distribution analysis

creating density histograms

fhgtmean <- mean(fdims$hgt)
fhgtsd   <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE,ylim = c(0, 0.06))
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")

fhgtmean <- mean(fdims$hgt)
fhgtsd   <- sd(fdims$hgt)

Evaluating the normal distribution

qqnorm(fdims$hgt)
qqline(fdims$hgt)

sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
sim_norm
##   [1] 189.1323 156.5018 163.8752 165.6349 172.7249 157.8723 164.9861
##   [8] 161.0876 169.5002 169.8463 170.4260 165.5267 161.9198 159.9991
##  [15] 160.5634 164.3173 176.2616 165.0047 171.6235 163.2474 170.2342
##  [22] 162.6041 161.7901 168.5107 172.1545 157.5391 160.8772 165.5507
##  [29] 167.6417 155.1977 166.9333 169.0549 163.5563 156.6540 165.2283
##  [36] 171.8389 167.3599 164.6341 165.7681 163.5555 162.7846 172.2757
##  [43] 162.5045 172.6246 149.3478 165.2738 176.7649 168.1759 164.3292
##  [50] 162.7630 151.7325 166.3579 173.1490 156.7207 169.2578 159.8083
##  [57] 170.9324 168.8489 165.4776 165.2993 162.7761 155.5681 161.6675
##  [64] 165.9991 164.7768 161.3783 161.1708 164.2357 158.2069 165.4356
##  [71] 162.1022 163.6637 173.0466 156.3689 181.5810 177.9489 176.9943
##  [78] 158.9838 168.4945 168.1853 168.7934 164.3525 173.1001 163.3432
##  [85] 159.4191 165.8890 175.1964 161.0112 162.1965 157.6935 160.5827
##  [92] 171.7059 159.5325 164.8785 154.0501 154.0733 170.0363 173.9859
##  [99] 166.4474 163.2905 162.4192 169.2360 157.7405 161.2321 158.7740
## [106] 157.3512 155.8782 167.3956 175.4994 159.1524 172.6303 164.2426
## [113] 169.2814 162.4832 168.5154 155.6665 159.5139 163.6688 167.5375
## [120] 156.1710 173.3485 169.2698 164.1654 167.2228 171.9628 163.7308
## [127] 179.5710 167.3472 172.1412 168.4492 164.5160 162.8242 162.2363
## [134] 166.1870 168.5475 168.3745 162.2660 166.8954 183.3165 172.3608
## [141] 155.9805 175.9597 149.7726 169.5543 163.4958 165.6529 156.2607
## [148] 153.1758 168.3350 161.1056 159.9429 157.8535 164.2179 155.3828
## [155] 166.4704 168.3134 163.9669 165.7151 170.6648 157.6267 175.4763
## [162] 161.8956 159.3019 170.3044 169.1099 167.8998 157.9137 160.0174
## [169] 166.0127 155.8677 171.0175 165.5045 173.3539 167.8566 170.7688
## [176] 158.1649 157.1238 151.0213 163.3929 173.6022 164.2655 165.9949
## [183] 176.8106 157.7541 166.1841 176.4394 166.9071 164.3284 161.5940
## [190] 168.8645 164.8134 158.7466 167.4246 170.9544 169.5290 170.4052
## [197] 157.7821 166.7546 166.0055 179.2079 151.3932 162.4659 163.1181
## [204] 166.9611 167.1074 163.1066 160.7476 155.2695 161.6238 170.9044
## [211] 174.0725 165.9918 167.0419 172.3156 168.6026 165.1978 151.7417
## [218] 163.9566 161.6788 166.5906 157.7484 168.8073 156.1398 170.9307
## [225] 168.1454 160.8264 172.7300 158.6968 165.9381 158.3251 168.3982
## [232] 164.1480 150.5404 155.5576 168.6568 175.7756 172.1615 174.1523
## [239] 161.4143 160.0693 163.1504 153.7953 171.3193 162.4964 175.0859
## [246] 168.8168 161.3637 169.1349 163.2671 164.7506 158.4734 163.2707
## [253] 153.6264 151.0210 169.1209 177.3023 177.0435 161.6792 155.2247
## [260] 158.0277
qqnormsim(fdims$hgt)

Normal probabilities

1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)
## [1] 0.004434387
sum(fdims$hgt > 182) / length(fdims$hgt)
## [1] 0.003846154

Exercises

Exercise 1

Make a histogram of men’s heights and a histogram of women’s heights. How would you compare the various aspects of the two distributions?

Answer 1

Men’s Height Histogram

hist(mdims$hgt, main="Male height", xlab="cm", ylim = c(0, 80))

Women’s Height Histogram

hist(fdims$hgt, main="Female height", xlab="cm", ylim = c(0, 80))

Most of the men’s height ranges between 175 cm to 180 cm with a fre quency of 80, where as most of women’s height range between 165cm to 170 cm with a frequency of 75.

Exercise 2

Based on the this plot, does it appear that the data follow a nearly normal distribution?

Answer 2

Yes, It looks like the data follows a nearly normal distribution

Exercise 3

Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?

Answer 3

qqnorm(sim_norm)
qqline(sim_norm)

Yes, It seems like the data follows nearly normal distribution.The points are close to the line or the points are close to the real data.

Exercise 4

Does the normal probability plot for fdims$hgt look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?

Answer 4

qqnorm(fdims$hgt)
qqline(fdims$hgt)

These plots are most likely similar. There are few variations along the y axis , though not much off as to think the distributions are not normal.

Exercise 5

Using the same technique, determine whether or not female weights appear to come from a normal distribution.

Answer 5

qqnorm(fdims$wgt)
qqline(fdims$wgt)

The distribution for weight is right skewed . The Q-Q plot for the real data diverges from the line. Therefore the distribution does not seem normal.

Exercise 6

Write out two probability questions that you would like to answer; one regarding female heights and one regarding female weights. Calculate the those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which variable, height or weight, had a closer agreement between the two methods?

Answer 6

Q: What is the probability that a random chosen young adult female is shorter than 170 cm?

pnorm(q = 170, mean = fhgtmean, sd = fhgtsd)
## [1] 0.7833331
sum(fdims$hgt < 170) / length(fdims$hgt)
## [1] 0.7538462

Q: Whats the probability that a female weighs less that 50kg?

fwgtmean <- mean(fdims$wgt)
fwgtsd   <- sd(fdims$wgt)
pnorm(q=50,mean=fwgtmean,sd=fwgtsd)
## [1] 0.135143
sum(fdims$wgt < 50) / length(fdims$wgt)
## [1] 0.1038462

Generally, height is closer

On Your Own

Ques_1

1: Now let’s consider some of the other variables in the body dimensions data set. Using the figures at the end of the exercises, match the histogram to its normal probability plot. All of the variables have been standardized (first subtract the mean, then divide by the standard deviation), so the units won’t be of any help. If you are uncertain based on these figures, generate the plots in R to check.

Answer 1

  1. The histogram for female biiliac (pelvic) diameter (“bii.di”) belongs to normal probability plot letter:

PLOT B

hist(fdims$bii.di,xlab = "Female Pelvic diameter", main = "Histogram of Female Pelvic Diameter")

qqnorm(fdims$bii.di)
qqline(fdims$bii.di)

  1. The histogram for female elbow diameter (elb.di) belongs to normal probability plot letter ____.

PLOT C

hist(bdims$elb.di,xlab = "Female elbow diameter", main = "Histogram of Female Elbow Diameter")

qqnorm(fdims$elb.di)
qqline(fdims$elb.di)

  1. The histogram for general age (age) belongs to normal probability plot letter ____.

PLOT D

hist(bdims$age, xlab = "Age in years", main = "Histogram of Sample Age in Years")

qqnorm(bdims$age)
qqline(bdims$age)

  1. The histogram for female chest depth (che.de) belongs to normal probability plot letter ____.

PLOT A

hist(fdims$che.de,xlab = "Female chest diameter", main = "Histogram of Female Chest Diameter")

qqnorm(fdims$che.de)
qqline(fdims$che.de)

Ques_2

  1. Note that normal probability plots C and D have a slight stepwise pattern. Why do you think this is the case?

Answer 2

Likely because of the integer values provided in the data set. Age was given in integers making the jumps a bit more obvious. Perhaps for elbow diameters, many people cluster around the same diameter, since it functions in the same way for many people.

Ques_3

  1. As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for female knee diameter (kne.di). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.

Answer 3

hist(fdims$kne.di, xlab = "Female knee diameter in cm", main = "Histogram of Female Knee Diameter")

qqnorm(fdims$kne.di)
qqline(fdims$kne.di)

From the probability plot it appears there are less values as the quantites increase. Therefore, it appears as through it is skewed right.