Normal Distribution

4.3.1 Population is normally distributed, what percentage of the population are

a) within 1.5 standard deviations of the mean

(pnorm(1.5) - pnorm(-1.5)) 
## [1] 0.8663856

b) more than 2.5 sd of the mean

(1 - pnorm(2.5)) 
## [1] 0.006209665

c) more than 3.5 sd above or below the mean

pnorm(-3.5) + (1 - pnorm(3.5))
## [1] 0.0004652582

4.3.2

(a) The 90th percentile of a normal distribution is how many standard deviations above the mean?

qnorm(.9)
## [1] 1.281552

(b) The 10th percentile of a normal distribution is how many sd below the mean?

qnorm(.1)
## [1] -1.281552

The 10th percentile is 1.281 sd below the mean.

4.3.9 The serum cholesterol levels of 12- to 14-year-olds are N(155,27).

x <- seq(80, 260, length=431)
y <- dnorm(x, mean=155, sd=27)
plot(x, y, type="l", lwd=2, col = "blue", 
          main = "Serum cholesterol, 12 - 14 yr olds (approximated plot)" ,
          xlab = "serum cholesterol (mg/dl)")

What percentage of 12 to 14-year-olds have serum cholesterol values
(a) 164 or more?

pnorm(164, 155, 27, lower.tail = FALSE)
## [1] 0.3694413

(b) 137 or less?

pnorm(137, 155, 27)
## [1] 0.2524925

(c) 186 or less?

pnorm(186, 155, 27)
## [1] 0.8745463

(d) 100 or more?

1 - pnorm(100, 155, 27)
## [1] 0.9791768

(e) between 159 and 186?

pnorm(186, 155, 27) - pnorm(159, 155, 27)
## [1] 0.3156592

(f) between 100 and 132?

pnorm(132, 155, 27) - pnorm(100, 155, 27)
## [1] 0.176325

(g) between 132 and 159?

pnorm(159, 155, 27) - pnorm(132, 155, 27)
## [1] 0.3617389

4.3.10 A 13-year-old is chosen at random, let Y be the person’s serum cholesterol value. Find

(a) Pr{Y >= 159}

pnorm(159, 155, 27, FALSE)
## [1] 0.4411129

(b) Pr{159 < Y < 186}

pnorm(186, 155, 27) - pnorm(159, 155, 27)
## [1] 0.3156592

4.3.11

(a) The 80th percentile of the serum cholesterol distribution N(155,27)

qnorm(.8, 155, 27)
## [1] 177.7238

(b) the 20th percentile

qnorm(.2, 155, 27)
## [1] 132.2762

4.3.17 Rome marathon, runners n = 10,002, min(x) = 129 minutes, mean = 245 minutes, sd = 40 minutes, max(x) ~ 360 minutes

Approximation of the normal plot
x <- seq(129, 360, length=10002)
y <- dnorm(x, mean=245, sd=40)
plot(x, y, type="l", lwd=2, col = "blue", 
     main = "Rome marathon run times (approximated plot)" ,
     xlab = "Final time (minutes)")

(a) What percentage of times were greater than 200 minutes?

1 - pnorm(200, 245, 40)
## [1] 0.8697055

(b) What is the 60th percentile of the times?

qnorm(.6, 245, 40)
## [1] 255.1339

(c) The normal curve approximation is fairly good except around the 240-minute mark. How can we explain this anomalous behavior of the distribution?

A large number of runners fall between 190 minutes and 240 minutes, the mean is pulled upwards by the number slower runners, which outweighs the number of fast runners. If high and low outliers were eliminated, the curve would probably center closer to 240.

4.4.2 Match normal quantile plots (a), (b), (c) to histograms I, II, III and explain.

(a) - II Right skewed data, the upward curve of the quantile plot shows a number of high values and possibly a high outlier.

(b) - III Left skewed data, the downward curve of the quantile plot shows a number of low values.

(c) - II The approximately straight line of the quantile plot shows that the population data is normal.

4.4.6 Tour de France times

A normal quantile plot was created from the times that it took 166 bicycle riders to complete the stage 11 time trial in the 2001 Tour de France cycling race.

(a) Are the times of the fastest riders better than, worse than, or roughly equal to the times one would expect the fastest riders to have if the data came from a truly normal distribution?

The times of the three fastest riders are better than a truly normal distribution would show. The values fall above the regression line.

(b) Are the times of the slowest riders better than, worse than, or roughly equal to the times one would expect the slowest riders to have if the data came from a truly normal distribution?

The time of the slowest riders a roughly equal to the truly normal distribution, the values fall very close to the regression line.

4.S.16 Heart rate change after coffee consumption

Resting heart rate measured for a group of subjects; and after drinking coffee. The change in heart rate followed a normal distribution, with a mean increase of 7.3 beats per minute and a standard deviation of 11.1, let Y denote the change in heart rate for a randomly selected person. Find

(a)

paste("Pr{Y > 10} = ", 
      1 - pnorm(10, 7.3, 11.1) %>% 
      round(3))
## [1] "Pr{Y > 10} =  0.404"

(b)

paste("Pr{Y > 20} = ", 1 - pnorm(20, 7.3, 11.1) %>% 
        round(3))
## [1] "Pr{Y > 20} =  0.126"

(c)

pr <- (pnorm(15, 7.3, 11.1) - pnorm(5, 7.3, 11.2)) %>% 
        round(3)
paste0("Pr{5 < Y < 15} = ", pr)
## [1] "Pr{5 < Y < 15} = 0.337"

4.S.17 Probability that a randomly chosen person’s heart rate will go down.

paste0("Pr{Y < 0} = ", 
       pnorm(0, 7.3, 11.1) %>% 
       round(3))
## [1] "Pr{Y < 0} = 0.255"

4.S.19 How large would an observation need to be in order to be labeled an outlier on the upper end?

A high outlier would need to be greater than 37.247.

q1 <- qnorm(.25, 7.3, 11.1)
q3 <- qnorm(.75, 7.3, 11.1)
iqr <- q3 - q1
l1 <- c(paste0("upper fence = ", q3 + 1.5 * iqr),
        paste0("Q1 = ", q1),
        paste0("Q3 = ", q3),
        paste0("IQR = ", iqr)
        )
l1
## [1] "upper fence = 37.247344908706" "Q1 = -0.186836227176506"      
## [3] "Q3 = 14.7868362271765"         "IQR = 14.973672454353"

4.S.20 Shapiro-Wilk test result for heart rate change

If the heart rates follow a normal distribution, which of the following Shapiro–Wilk’s test P-values for a random sample of 15 subjects are consistent with this claim?

(b) P-value = 0.1345. A p=value >= 0.10 shows no compelling evidence of non-normality.