What percent of a standard normal distribution \(N(\mu=0, \sigma=1)\) is found in each region? Be sure to draw a graph.
x <- seq(-4,4,0.01)
y <- dnorm(x, 0, 1)
df <- data.frame(x,y)
\[Z=\frac{x-\mu}{\sigma}\]
# Finding probability fo x > -1.13
1 - pnorm(-1.13, mean = 0, sd = 1)
## [1] 0.8707619
# Probability curve plot
normalPlot(mean = 0, sd = 1, bounds=c(-1.13,4), tails = FALSE)
Answer: The percentage represented on the region is: 0.871%
\[Z=\frac{x-\mu}{\sigma}\]
# Finding probability fo x < 0.18
pnorm(0.18, mean = 0, sd = 1)
## [1] 0.5714237
# Probability curve plot
normalPlot(mean = 0, sd = 1, bounds=c(-4,0.18), tails = FALSE)
Answer: The percentage represented on the region is: 0.571%
\[Z=\frac{x-\mu}{\sigma}\]
# Finding probability fo x > 8
1- pnorm(8, mean = 0, sd = 1)
## [1] 6.661338e-16
# Probability curve plot
normalPlot(mean = 0, sd = 1, bounds = c(8, 10), tails = FALSE)
Answer: The percentage represented on the region is almost nearly 0%
# Finding probability fo |x| < 0.5 = -x < 0.5 < x
x1 <- pnorm(-0.5, mean = 0, sd = 1)
x2 <- pnorm(0.5, mean = 0, sd = 1)
x2 - x1
## [1] 0.3829249
# Probability curve plot
normalPlot(mean = 0, sd = 1, bounds = c(-0.5, 0.5), tails = FALSE)
Answer: The percentage represented on the region is: 0.383%
Leo’s group: Men, Ages 30 - 34.
Leo’s race time: 1:22:28 (4948 seconds).
Men, ages 30 - 34 mean: 4313 seconds.
Men, ages 30 - 34 standard deviation: 583 seconds.
Mary’s Group: Women, Ages 25 - 29.
Mary’s race time: 1:31:53 (5513 seconds).
Women, ages 25 - 29 mean: 5261 seconds.
Women, ages 25 - 29 standard deviation: 807 seconds.
The distributions of finishing times for both groups are approximately Normal.
Remember: a better performance corresponds to a faster finish.
Answer: Group Men, Ages 30 - 34: N(\(\mu\) =4313, sd =583), Group Women, Ages 25-29: N(\(\mu\) =5261, sd =807).
\[Z_{Person}=\frac{Person_{Time}-\mu}{\sigma}\]
# Leo's Z Score
PersonLTime <- 4948
GroupLMu <- 4313
GroupLSD <- 583
ZLeo <- (PersonLTime - GroupLMu) / GroupLSD
ZLeo
## [1] 1.089194
# Mary's Z Score
PersonMTime <- 5261
GroupMMu <- 4313
GroupMSD <- 807
ZMary <- (PersonMTime - GroupMMu) / GroupMSD
ZMary
## [1] 1.174721
Answer: Mary had a better performance since her Z score is higher than Leo’s.
# Leo's Rank
LeoP <- 1- pnorm(PersonLTime, mean = GroupLMu, sd = GroupLSD)
# Mary's Rank
MaryP <- 1- pnorm(PersonMTime, mean = GroupMMu, sd = GroupMSD)
Answer: Mary performed better in her group than Leo did on his group, this is due to Mary is in the top 12% of best times for her group, while Leo is in top 14 % for best times on his group.
pnorm(PersonLTime, mean = GroupLMu, sd = GroupLSD)
Answer: Leo finished faster than 86.2 % triathletes in his group.
pnorm(PersonMTime, mean = GroupMMu, sd = GroupMSD)
## [1] 0.8799469
Answer: Mary finished faster than 88 % triathletes in her group.
Answer: If distributions are not nearly normal, then part (b) will remain the same since Z-scores can still be calculated. However, parts (d) and (e) rely on the normal model for calculations, so the results would change.
Read data from GitHub file
height <- read.csv(paste0("https://raw.githubusercontent.com/jbryer/DATA606Spring2019/master/data/os3_data/Ch%203%20Exercise%20Data/fheights.txt"))
mean <- 61.52
sd <- 4.58
height$z <- (height$heighs - mean) / sd
height$nearest_sd <- round(height$z,0)
kable(height)
heighs | z | nearest_sd |
---|---|---|
54 | -1.6419214 | -2 |
55 | -1.4235808 | -1 |
56 | -1.2052402 | -1 |
56 | -1.2052402 | -1 |
57 | -0.9868996 | -1 |
58 | -0.7685590 | -1 |
58 | -0.7685590 | -1 |
59 | -0.5502183 | -1 |
60 | -0.3318777 | 0 |
60 | -0.3318777 | 0 |
60 | -0.3318777 | 0 |
61 | -0.1135371 | 0 |
61 | -0.1135371 | 0 |
62 | 0.1048035 | 0 |
62 | 0.1048035 | 0 |
63 | 0.3231441 | 0 |
63 | 0.3231441 | 0 |
63 | 0.3231441 | 0 |
64 | 0.5414847 | 1 |
65 | 0.7598253 | 1 |
67 | 1.1965066 | 1 |
67 | 1.1965066 | 1 |
69 | 1.6331878 | 2 |
73 | 2.5065502 | 3 |
\[Z = \frac{x - \mu}{\sigma}\]
# 1 Standard deviation
mu <- mean
sd <- sd
Z <- 1
x1 <- Z * sd + mu
x1
## [1] 66.1
# Probaility
sum(height$heighs < x1) / length(height$heighs)
## [1] 0.8333333
# pnorm
pnorm(q = x1, mean = mean, sd = sd)
## [1] 0.8413447
# 2 Standard deviation
mu <- mean
sd <- sd
Z <- 2
x2 <- Z * sd + mu
x2
## [1] 70.68
# Probaility
sum(height$heighs < x2) / length(height$heighs)
## [1] 0.9583333
# pnorm
pnorm(q = x2, mean = mean, sd = sd)
## [1] 0.9772499
# 3 Standard deviation
mu <- mean
sd <- sd
Z <- 3
x3 <- Z * sd + mu
x3
## [1] 75.26
# Probaility
sum(height$heighs < x3) / length(height$heighs)
## [1] 1
# pnorm
pnorm(q = x3, mean = mean, sd = sd)
## [1] 0.9986501
Answer: Yes, these heights approximately follow the 68-95-99.7% Rule since:
83% of the data are within 1 standard deviation of the mean.
96% of the data are within 2 standard deviation of the mean.
100% of the data are within 3 standard deviation of the mean.
hist(height$heighs, probability = TRUE, xlab="Heights", ylim = c(0, 0.1))
x <- 50:75
y <- dnorm(x = x, mean = mu, sd = sd)
lines(x = x, y = y, col = "blue")
abline(v=mu,col="red")
qqnorm(height$heighs)
qqline(height$heighs, col = 2)
qqnormsim(height$heighs)
Answer: The distribution is unimodal and symmetric.We can say that the distribution is nearly normal.
Defective rate = 2%.
The production is considered a random process where each transistor is independent of the others.
# Rate of success and failure definition
pf <- 0.02
ps <- 1 - pf
n <- 10
# Probabality
(1-.02)^9 * (.02)
Answer: The probability that the 10th transistor produced is the first with a defect is almost 0.017%.
# Rate of success and failure definition
pf <- 0.02
ps <- 1 - pf
n <- 100
# Probabality
round(ps^n, 4)
Answer: The probability that the machine produces no defective transistors in a batch of 100 is 0.1326%.
# Expected value of a geometric distribution
pf <- 0.02
Ex <- 1/pf
Ex
## [1] 50
# Probabality
sd <- ((1-pf)/pf^2)^(1/2)
sd
## [1] 49.49747
Answer: On average, I would expect to produce 50 transistors before the first one comes with a defect, with a standard deviation of 49.5.
# Expected value of a geometric distribution
pf <- 0.05
Ex <- 1/pf
Ex
## [1] 20
# Standard deviation f a geometric distribution
sd <- ((1-pf)/pf^2)^(1/2)
sd
## [1] 19.49359
Answer: On average, I would expect to produce 20 transistors before the first one comes with a defect, with a standard deviation of 19.49.
Answer: When the probability of failure is bigger, the event is more common, meaning the expected number of trials before a success and the standard deviation of the waiting time are smaller.
Actual probability of having a boy is slightly higher at 0.51.
Suppose a couple plans to have 3 kids.
n <- 3
k <- 2
pboy <- 0.51
pboy2 <- choose(n, k) * (1 - pboy)^(n - k) * (pboy)^k
pboy2
## [1] 0.382347
Answer: The probability that two of them will be boys is 38.23%
children <- data.frame(c("BBG","BGB","GBB"))
children$p <- c( pboy * pboy * (1-pboy), pboy * (1- pboy) * pboy, (1-pboy) * pboy * pboy)
names(children) <- c("Kids", "p")
sump <- sum(children$p)
sump
## [1] 0.382347
pboy2 - sump
## [1] 0
kable(children)
Kids | p |
---|---|
BBG | 0.127449 |
BGB | 0.127449 |
GBB | 0.127449 |
Answer: Both results match.
Answer: The second method will be more tedious since we will have to create combination of 56 different possibilities making it very tedious to work with.
15% chance of making the serve.
Suppose that her serves are independent of each other.
# This is a Negative Binomial distribution
p <- 0.15
n <- 10
k <- 3
choose(n-1, k-1) * (1 - p)^(n - k) * p^k
## [1] 0.03895012
Answer: The probability that on the 10th try she will make her 3rd successful serve is 3.9%
Answer: The probability that her 10th serve will be successful is 15% since all her serves are independent of each other.
Answer: The probabilities are different because in the negative binomial model the last trial is taken as a success by definition.