What percent of a standard normal distribution \(N(\mu =0,\quad \sigma =1)\) is found in each region? Be sure to draw a graph
a) Z > -1.13
Zlim <- 3
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- -1.13
xloc <- seq(Z, Zlim, 0.01)
yloc <- dnorm(xloc)
polygon(c(Z, xloc, Zlim), c(0, yloc, 0), col = "grey")
pnorm(Z, lower.tail = FALSE)
## [1] 0.8707619
b) Z < 0.18
Zlim <- 3
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- 0.18
xloc <- seq(-Zlim, Z, 0.01)
yloc <- dnorm(xloc)
polygon(c(-Zlim, xloc, Z), c(0, yloc, 0), col = "grey")
pnorm(Z)
## [1] 0.5714237
c) Z > 8
Zlim <- 8
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- 8
xloc <- seq(Z, Zlim, 0.01)
yloc <- dnorm(xloc)
polygon(c(Z, xloc, Zlim), c(0, yloc, 0), col = "grey")
pnorm(Z, lower.tail = FALSE)
## [1] 6.220961e-16
d) |Z| < 0.5
-0.5 < Z < 0.5
Zlim <- 3
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- 0.5
xloc <- seq(-Z, Z, 0.01)
yloc <- dnorm(xloc)
polygon(c(-Z, xloc, Z), c(0, yloc, 0), col = "grey")
pnorm(Z) - pnorm(-Z)
## [1] 0.3829249
(a)
The short-hands are: For men’s group, \(N(\mu = 4313,\quad \sigma = 583)\) For women’s group, \(N(\mu = 5261,\quad \sigma = 807)\)
(b)
For Leo: \(Z=\frac { x-\mu }{ \sigma } =\frac { 4948-4313 }{ 583 } = 1.0891938\)
For Mary: \(Z=\frac { x-\mu }{ \sigma } =\frac { 5513-5261 }{ 807 } = 0.3122677\)
These Z-scores tell us relatively how Leo and Mary did in their respective groups. Their Z-scores are both higher than their respective group means, meaning they are slower than at most people in their groups. Leo is 1.09 standard deviation higher than his group mean, while Mary is 0.31 higher than hers.
(c)
Mary did better, because the lower tail area represents the percentage of people finish better (faster, lesser time). Mary’s lower tail is smaller than Leo’s. Therefore, less people finish better than Mary in her group than Leo’s in his group.
(d)
(percent <- pnorm((4948-4313)/583, lower.tail = FALSE))
## [1] 0.1380342
Leo finished faster than 13.8% of people in his group.
(e)
(percent <- pnorm((5513-5261)/807, lower.tail = FALSE))
## [1] 0.3774186
Mary finished faster than 37.74% of people in her group.
(f)
Yes, becaues the entire analysis rest upon the assumption that the distribution is normal. If the distribution is something different, the same method cannot be used.
hght <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
(a)
(hghtmu <- mean(hght))
## [1] 61.52
(hghtsd <- sd(hght))
## [1] 4.583667
For the 68% rule,
limitup <- hghtmu + hghtsd
limitdm <- hghtmu - hghtsd
interval <- hght[hght >= limitdm & hght <= limitup]
length(interval)/length(hght)
## [1] 0.68
For the 95% rule,
limitup <- hghtmu + 2*hghtsd
limitdm <- hghtmu - 2*hghtsd
interval <- hght[hght >= limitdm & hght <= limitup]
length(interval)/length(hght)
## [1] 0.96
For the 99.7% rule,
limitup <- hghtmu + 3*hghtsd
limitdm <- hghtmu - 3*hghtsd
interval <- hght[hght >= limitdm & hght <= limitup]
length(interval)/length(hght)
## [1] 1
Yes, the heights do follow the 68-95-99.7% rule.
(b)
Yes the data does appear to follow normal distribution. The histogram shows a unimodal bell curve approximately following the normal distribution curve. The Q-Q plot shows that majority of the observation fall on the normal distribution line, with the exception of few outliers.
To run qqnormsim, ensure the directory has “more/bdims.RData”.
load("more/bdims.RData")
qqnormsim(hght)
qqnorm(hght)
qqline(hght)
Comparing the true Q-Q plot with the simulated normal Q-Q plots, it is apparent that the true data is close to normal.
(a)
0.98^9*0.02
## [1] 0.01667496
(b)
0.98^100
## [1] 0.1326196
(c)
The defective rate is 2%, or, 1/50. So on average, 49 good transistors before 1 defective is observed.
The standard deviation is:
sqrt(0.02*0.98)
## [1] 0.14
(d)
The defective rate is 5%, or, 1/20. So on average, 19 good transistors before 1 defective is observed.
The standard deviation is:
sqrt(0.05*0.95)
## [1] 0.2179449
(e)
Increasing the probability of success increase reduces the mean wait time until a sucess is observed, but increases the standard deviation of wait time.
(a)
choose(3,2)*(0.51^2)*(0.49^1)
## [1] 0.382347
(b)
Three possible ordering: boy boy girl boy girl boy girl boy boy
0.51^2*0.49 + 0.51*0.49*0.51 + 0.49*0.51^2
## [1] 0.382347
Same result.
(c)
Using binomial model:
choose(8,3)*(0.51^3)*(0.49^5)
## [1] 0.2098355
But if we use Part(b) method, we have to list 56 possible orderings, and calculate probability of each to sum them up.
(a)
choose(9,2)*(0.15^3)*(0.85^7)
## [1] 0.03895012
(b)
If each serve is independent, previous results do not affect next serve. So the probability is still 15%, regardless how many success/failure she had previously.
(c)
Part (a) is a probability calculated before she made any serves. Once she started serving, the probability calculated in Part (a) does not hold anymore because uncertainty is removed by new observation. Part (b) will remain true always, as long as each serve is independent.