A31a <- round((1 - pnorm(-1.13)), 4)
A31a## [1] 0.8708
A31b <- round(pnorm(.18), 4)
A31b## [1] 0.5714
A31c <- 1 - pnorm(8) # Not rounding because extremely small value.
A31c## [1] 6.661338e-16
A31d <- round((pnorm(-.5) + (1 - pnorm(.5))), 4)
A31d## [1] 0.6171
Short-hand for normal distributions M30_34 N(mu = 4313, sd = 583) W25_29 N(mu = 5261, sd, 807)
Z-scores for each, what do they tell
Leo <- 4948
Mary <- 5513
m.avg <- 4313
m.sd <- 583
w.avg <- 5261
w.sd <- 807
Leo.z <- (Leo - m.avg) / m.sd
Mary.z <- (Mary - w.avg) / w.sd
Leo.ptile <- round(pnorm(Leo.z), 4)
Mary.ptile <- round(pnorm(Mary.z), 4)
cat("Leo's Z-score is", round(Leo.z, 3), "and his percentile is", Leo.ptile, "\nMary's Z-score is ", round(Mary.z, 3), "and her percentile is", Mary.ptile)## Leo's Z-score is 1.089 and his percentile is 0.862
## Mary's Z-score is 0.312 and her percentile is 0.6226
Did Leo of Mary rank better in respective groups? Explain reasoning. By comparison with their respective groups, Leo ranked better than Mary. Leo’s z-score of ~1.09 means he beat about 86% of the other runners in the M30-34 division On the other hand, Mary’s z-score of ~.31 means she beat about 62% of other runners in the W25-29 division.
What % of triathletes did Leo finish faster than in group?
round(Leo.ptile, 4)## [1] 0.862
round(Mary.ptile, 4)## [1] 0.6226
height <- c(54, 55, 56, 56, 57, 58, 58, 59, 60, 60, 60, 61, 61, 62, 62, 63, 63, 63, 64, 65, 65, 67, 67, 69, 73)
# quick checks for input error
students.total <- length(height) # Should be 25 students
height.avg <- mean(height) # Should be mean of 61.52
height.sd <- sd(height) # Should be SD of 4.58
cat(students.total, height.avg, height.sd)## 25 61.52 4.583667
# Let's see how many of the sample fall within one SD
sd1.below <- (height.avg - height.sd)
sd1.above <- (height.avg + height.sd)
sd1.exp <- round(students.total * .68)
sd1.act <- sum(height > sd1.below & height < sd1.above)
sd2.below <- (height.avg - (height.sd * 2))
sd2.above <- (height.avg + (height.sd * 2))
sd2.exp <- round(students.total * .95)
sd2.act <- sum(height > sd2.below & height < sd2.above)
sd3.below <- (height.avg - (height.sd * 3))
sd3.above <- (height.avg + (height.sd * 3))
sd3.exp <- round(students.total * .997)
sd3.act <- sum(height > sd3.below & height < sd3.above)
cat("SD1 Sample vs. Expected:", sd1.act, "/", sd1.exp, "\nSD2 Sample vs. Expected:", sd2.act, "/", sd2.exp, "\nSD3 Sample vs. Expected:", sd3.act, "/", sd3.exp)## SD1 Sample vs. Expected: 17 / 17
## SD2 Sample vs. Expected: 24 / 24
## SD3 Sample vs. Expected: 25 / 25
# Replicate the histogram and QQplot in the text to check if the distribution appears normal.
par(mfrow=c(1,2))
hist(height, breaks = 8, xlab = NULL, main = paste("Student heights", "\nBins = 8"))
qqnorm(height, pch = 2, frame = F, main = paste("Student heights", "\nQQplot"))
qqline(height, col = "red", lwd = 2)These data do appear to follow the normal distribution. I detected some right-side skew when expanding the number of bins, but there does not seem to be dramatic departure from the line in the QQplot.
married <- .471
unmarried <- (1 - married)thirdwoman <- (unmarried) * (unmarried) * married
A321a <- round(thirdwoman, 4)
A321a## [1] 0.1318
allwomen <- (married)^3
A321b <- round(allwomen, 4)
A321b## [1] 0.1045
# Each selection is random and constitutes a separate, independent tria, and we use the geometric distribution to assess probability, which is defined as (1 - p)^(n-1) * p, with mu = 1 / p, and sd = sqrt((1 - p) / p^2) for n trials.
married.avg <- 1 / married
married.sd <- sqrt((unmarried) / married^2)
cat("We'd expect to sample around", round(married.avg), "women\nThe standard deviation is", round(married.sd, 2))## We'd expect to sample around 2 women
## The standard deviation is 1.54
married.alt <- .3
unmarried.alt <- 1 - married
married.avg.alt <- 1 / married.alt
married.sd.alt <- sqrt((unmarried.alt) / married.alt^2)
cat("We'd expect to sample around", round(married.avg.alt), "women\nThe standard deviation is", round(married.sd.alt, 2))## We'd expect to sample around 3 women
## The standard deviation is 2.42
# As this is a random process, each transistor represents a separate, independent trail. We use the geometric distribution for the probability that the 10th product, and none prior, will be defective. The probability is defined as (1 - p)^(n-1) * p, with mu = 1 / p, and sd = sqrt((1 - p) / p^2) for n trials.
defect <- .02
nominal <- 1 - defect
n1 <- 10
A322a <- round((nominal)^(n1 - 1) * defect, 4)
A322a## [1] 0.0167
n2 <- 101
A322b <- round((nominal)^(n2 - 1) * defect, 4)
A322b## [1] 0.0027
defect.avg <- 1 / defect
defect.sd <- sqrt((nominal) / defect^2)
cat("We'd expect to sample around", round(defect.avg), "transistors\nThe standard deviation is", round(defect.sd, 2))## We'd expect to sample around 50 transistors
## The standard deviation is 49.5
defect.alt <- .05
nominal.alt <- 1 - defect
defect.avg.alt <- 1 / defect.alt
defect.sd.alt <- sqrt((nominal.alt) / defect.alt^2)
cat("We'd expect to sample around", round(defect.avg.alt), "transistors\nThe standard deviation is", round(defect.sd.alt, 2))## We'd expect to sample around 20 transistors
## The standard deviation is 19.8
# There are five spots and five people, so the likelihood that an employee picks any one spot is p = .2. The particular configuration that is alphabetically ordered is (1 / n!) where n is 5.
spots <- 5
A337a <- round(1 / factorial(spots), 4)
A337a## [1] 0.0083
A337b <- factorial(spots)
A337b## [1] 120
spots.alt <- 8
A337c <- round(factorial(spots.alt), 4)
A337c## [1] 40320
boy <- .51
girl <- 1 - boy
kids <- 3# The probability of having exactly two boys is provided by dbinom, with n of 3 (children), x of 2 (boys), and probabaility of .51.
A338a <- round(dbinom(2, size = kids, prob = boy), 4)
A338a## [1] 0.3823
# All possible orderings of boy and girl amongst the three children means permutation.
x <- c("boy", "girl")
genders <- 2
permutations(n = 2, r = kids, v = x, repeats.allowed = T)## [,1] [,2] [,3]
## [1,] "boy" "boy" "boy"
## [2,] "boy" "boy" "girl"
## [3,] "boy" "girl" "boy"
## [4,] "boy" "girl" "girl"
## [5,] "girl" "boy" "boy"
## [6,] "girl" "boy" "girl"
## [7,] "girl" "girl" "boy"
## [8,] "girl" "girl" "girl"
Use scenarios to calculate same probability from part (a) but use addition rule for disjoint outcomes. Confirm match between (a) and (b).
added <- (boy * boy * girl) + (boy * girl * boy) + (girl * boy * boy)
A338b <- round(added, 4)
A338b## [1] 0.3823
kids.alt <- 8
A338c <- round(dbinom(3, size = kids.alt, prob = boy), 4)
A338c## [1] 0.2098
# vs. A338c.alt <- round((boy * boy * boy * girl * girl * girl * girl * girl)...15% of making volleyball serve, independent events
serve <- .15
miss <- 1 - serve# We use the negative binomial distribution to determine the probability of observing the 3rd event on the 10th trial.
x <- 10
success <- 3
A342a <- round(dnbinom(x, size = success, prob = serve, log = F), 4)
A342a## [1] 0.0439
Suppose 2 successes in 9 attempts. What is probability of 10th serve success? They are independent events, so each serve has a 15% chance of success. The 10th is no different than any other.
Probabilities for (a) and (b) should be different. Explain reason for discrepancy. The probability in (a) is about observing k successes over n events. The probability of (b) is about a single independent event which happens to come after a series of other independent events.