Assignment summary

Chapter 5 –Odd numbered questions between 1 and 9, 11 and 15
Chapter 7 -Even numbered questions between 2 and 10, 12 and 22
Chapter 9 -Odd numbered questions between 1 and 21, 23-25.

knitr::opts_chunk$set(echo = TRUE)
require(tidyverse)

Chapter 5

Question 1:

Q: (a) What is the probability of rolling a pair of dice and obtaining a total score of 9 or more? (b) What is the probability of rolling a pair of dice and obtaining a total score of 7?

A: Since the probability of rolling a single dice is simply the number of favorable outcomes (a seven) divided by the number of possible equally-likely outcomes, we have a 1 in 6 chance to get a seven with one die. If we throw two, the probability is simply 1/6 + 1/6 = 0.3333333

Question 3:

Q: A card is drawn at random from a deck. (a) What is the probability that it is an ace or a king? (b) What is the probability that it is either a red card or a black card?

A: In question a) we know we have four aces and four kings per deck. The the number of favorable outcomes is eight. We also know the number of cards in a deck is 52. That means the probability of a) is 8/52 or 0.1538462. For b) we use the same logic, but use 26 as our numerator (half the deck is black and half red) leaving the probabiliy of b) as simply 26/52 or 0.5

Question 5:

Q: A fair coin is flipped 9 times. What is the probability of getting exactly 6 heads?

A: Every time we flip a coin we have a 50/50 chance of getting either heads or tails. If we flip the coin nine times, however, and are looking for six outcomes as heads, we need to use our binomial distribution formula. P(x) = N! / [x!(N-x)!] * Pi^x * (1-Pi)^ (N-x), where Pi = the probability of a single event. In this case we have:

Pi = 1/2
N = # of trials, nine
x = 6 heads

Therefore we have P(6 heads in 9) = (9!) / [6!(9-6)] * 0.5 ^ 6 * (1-0.5) ^ (9-6) or 0.328125

Question 7:

Q: You flip a coin three times. (a) What is the probability of getting heads on only one of your flips? (b) What is the probability of getting heads on at least one flip?

A: When discussing the probability of getting a heads on any one throw a) we simply revert back to our probability equation; number of favorable outcomes divided by the number of possible equally-likely outcomes. Which yields 3/6 or 0.5. For part b) we are asking for at least one. Which could mean either, 1, 2, or 3 heads. The probability is then the sum of each of those or P(1h) = 1/2 + P(2h) = (1/2)(1/2) + P(3H) = (1/2)(1/2)*(1/2). The total is then 0.875

Question 9:

Q: A jar contains 10 blue marbles, 5 red marbles, 4 green marbles, and 1 yellow marble. Two marbles are chosen (without replacement). (a) What is the probability that one will be green and the other red? (b) What is the probability that one will be blue and the other yellow?

A: When I calculate without replacement probabilities I need to remember to remove the item from my denominator once it is picked. For the first question a) I have a probability of the first pick being either green or red. If my first pick is green, and there are 4 green marbles, then that probability is 4 in 20. My second pick would be red, and assuming my first pick wasn’t a red, already, then my odds would be 5 in 19. Multiplying these together we get the P(a|g then r) = 0.0526316. Doing this in reverse yields P(a|r then g) = 0.0526316. The same. When looking at B, and assuming the same logic as above, the P(B| if b->y or y->b) is 0.0263158.

Question 11:

Q: You win a game if you roll a die and get a 2 or a 5. You play this game 60 times.
a) What is the probability that you win between 5 and 10 times (inclusive)?
b) What is the probability that you will win the game at least 15 times?
c) What is the probability that you will win the game at least 40 times?
d) What is the most likely number of wins.
e) What is the probability of obtaining the number of wins in d?

Answer: Our formula for binomial distribution is P(x) = [N! / x!(N-x)!] * Pi^x * (1 - pi)^(N-x)
a) Pi = 1/3, N = 60, x = 6 yields 2.127496310^{-5}
b) What is the probability that you will win the game at least 15 times? Pi = 1/3, N = 60, x = 15 yields 0.0441507
c) What is the probability that you will win the game at least 40 times? Pi = 1/3, N = 60, x = 40 yields 1.036883110^{-7}
d) What is the most likely number of wins. The most likely number of wins is 15! e) What is the probability of obtaining the number of wins in d? The probability when you reduce our equation down to just one unknown X = d looks like P(d) = [60! / d!(60-d)!] * (1/3)^d * (2/3)^(60-d)

Question 15:

Q: True/False: You are more likely to get a pattern of HTHHHTHTTH than HHHHHHHHTT when you flip a coin 10 times.

A: Since every coin flip has a 1/2 probability of getting either heads or tails, then the only consideration we have when comparing these two patterns is the number of flips. And since they both have the same number (10) and the probabiliy of each flip is 0.5, then we have P(any 10 in a row coin flip pattern) = 9.76562510^{-4}

Chapter 7

Question 2:

Q: (a) What are the mean and standard deviation of the standard normal distribution? (b) What would be the mean and standard deviation of a distribution created by multiplying the standard normal distribution by 8 and then adding 75?

A: a) Any standard normal distribution has a mean of 0 and a standard deviation of 1. b) and if you were to multiply (scale) or add (shift) the distribution you would we simply apply the values. The new mean would be 8 x 0 (the original mean) + 75 = 75. The new standard deviation would be 8 x 1 (the original SD) + 75 = 83

Question 4:

Q: (a) What proportion of a normal distribution is within one standard deviation of the mean? (b) What proportion is more than 2.0 standard deviations from the mean? (c) What proportion is between 1.25 and 2.1 standard deviations above the mean?

A: a) The proportion of a normal distribution within 1 standard deviations of the mean is 68%. b) 95% of the area under the curve is captured within 2 standard deviations. c) The area under the curve of a normal distribution that is 1.25 and 2.1 SDs from the mean is 0.0878.

Question 6:

Q: Assume a normal distribution with a mean of 70 and a standard deviation of 12. What limits would include the middle 65% of the cases?

A: If I know my desired percentile and I know the my mean and standard deviation, I have all the pieces to calculate the endpoints for the area under the curve that matches my percentile. Since I am looking for the middle 65th percent that means I want one end point to be at p = 0.825 (which is 0.5 + 0.5 * 0.65) and another at p = 0.175 (which is 0.5 - 0.5 * 0.65)

q6l <- qnorm(p = 0.825, mean = 70, sd = 12)
q6u <- qnorm(p = 0.175, mean = 70, sd = 12)

Therefore, our end points for the calculated IQR are at 81.2150715 & 58.7849285

Question 8:

Q: Assume the speed of vehicles along a stretch of I-10 has an approximately normal distribution with a mean of 71 mph and a standard deviation of 8 mph.
a) The current speed limit is 65 mph. What is the proportion of vehicles less than or equal to the speed limit?
b) What proportion of the vehicles would be going less than 50 mph?
c) A new speed limit will be initiated such that approximately 10% of vehicles will be over the speed limit. What is the new speed limit based on this criterion?
d) In what way do you think the actual distribution of speeds differs from a normal distribution?

A:

#a) To find the proportion of vehicle (% under the curve) given a current speed we can use the pnorm function
q7a <- pnorm(q = 65, mean = 71, sd = 8)
#b) Similar to the above, but assuming a value less than 50, we can use the pnorm function 
q7b <- pnorm(q = 49.99999999, mean = 71, sd = 8)
#c) qnorm will help us find a value when given a particular proportion. In this case 90% of the vehciles will be under the speed limit.
q7c <- qnorm(p = .90, mean = 71, sd = 8)

A:
a) The proportion of vehicles less than or equal to the speed limit is 22.6627352%
b) The proportion of the vehicles would be going less than 50 mph is 0.4332448%
c) The new speed limit will be initiated such that approximately 10% of vehicles will be over the speed limit is 81.2524125
d) The problem with assuming a normal distribution is that there won’t be skew when measuring human behavior. The reality would be much more likely the speeders would dominate the distribution curve making our distribution negatively weighted.

Question 10:

Q: You want to use the normal distribution to approximate the binomial distribution. Explain what you need to do to find the probability of obtaining exactly 7 heads out of 12 flips.

#To find the mean we take the number of flip x the probability of each 
q10mean <- 12 * 0.5
# The variance is calculated using σ2 = Nπ(1-π)
q10var <- (12) * (0.5) * (0.5)
# The standard deviation is the square root of the variance 
q10sd <- sqrt(q10var)
# for a total of 7 flips in 12 we then have
prob_q10 <- (7- (0.5*12))/q10sd

A: “The probability of any one specific point is 0. The problem is that the binomial distribution is a discrete probability distribution, whereas the normal distribution is a continuous distribution.” The solution then is to round off and consider any value from either side of the number you’re evaluating." In this case that would be 6.5 to 7.5, which would then represent an outcome of 7 heads. Using this approach, we figure out the area under a normal curve from 6.5 to 8.5.

# to find the area under the curve up to 6.5 we use 
q10l <- pnorm(q = 6.5, mean = q10mean, sd = q10sd)
# to find the area under the curve up to 7.5 we use 
q10u <- pnorm(q = 7.5, mean = q10mean, sd = q10sd)

The approximation of the of the binomial probability is then the difference between the two pnorms or 0.1931769

Question 12:

Q: Use the normal distribution to approximate the binomial distribution and find the probability of getting 15 to 18 heads out of 25 flips. Compare this to what you get when you calculate the probability using the binomial distribution. Write your answers out to four decimal places.

#To find the mean we take the number of flip x the probability of each 
q12mean <- 25 * 0.5
# The variabnce is calculated using σ2 = Nπ(1-π)
q12var <- (25) * (0.5) * (0.5)
# The standard deviation is the square root of the variance 
q12sd <- sqrt(q12var)
# to find the area under the curve up to 15
q12l <- pnorm(q = 15, mean = q12mean, sd = q12sd)
# to find the area under the curve up to 18 
q12u <- 100 * pnorm(q = 18, mean = q12mean, sd = q12sd)
# Calculate the difference to find the probability and format to only 4 decimal places 
q12bin_form <- format(round((q12u-q12l), 4), nsmall = 4)

The approximation of the of the binomial probability is then the difference between the two pnorms or 97.7683%

Question 22:

Q: The following question uses data from the Angry Moods (AM) case study. For this problem, use the Anger Expression (AE) scores. (a) Compute the mean and standard deviation. (b) Then, compute what the 25th, 50th and 75th percentiles would be if the distribution were normal. (c) Compare the estimates to the actual 25th, 50th, and 75th percentiles.

q22file <- read.csv(file = "angry_moods.csv", header = TRUE)
summary(q22file)
##      Gender          Sports        Anger.Out        Anger.In    
##  Min.   :1.000   Min.   :1.000   Min.   : 9.00   Min.   :10.00  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:13.00   1st Qu.:15.00  
##  Median :2.000   Median :2.000   Median :16.00   Median :18.50  
##  Mean   :1.615   Mean   :1.679   Mean   :16.08   Mean   :18.58  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:18.00   3rd Qu.:22.00  
##  Max.   :2.000   Max.   :2.000   Max.   :27.00   Max.   :31.00  
##   Control.Out      Control.In    Anger_Expression
##  Min.   :14.00   Min.   :11.00   Min.   : 7.00   
##  1st Qu.:21.00   1st Qu.:18.25   1st Qu.:27.00   
##  Median :24.00   Median :22.00   Median :36.00   
##  Mean   :23.69   Mean   :21.96   Mean   :37.00   
##  3rd Qu.:27.00   3rd Qu.:24.75   3rd Qu.:44.75   
##  Max.   :32.00   Max.   :32.00   Max.   :68.00
AE_sd <- sd(q22file$Anger_Expression)
AE_mean <- mean(q22file$Anger_Expression)

A: a.) The mean of the Anger_Expression column is 37 while the standard deviation is 12.9414265
b.) I can compare that to the 25th, 50th, and 75th percentiles if the distribution were normal by running qnorm

quarter <- qnorm(p = 0.25, mean = AE_mean, sd = AE_sd)
half <- qnorm(p = 0.50, mean = AE_mean, sd = AE_sd)
threequarter <- qnorm(p = 0.75, mean = AE_mean, sd = AE_sd)

Comparing the two I can see:
My actual 25th percentile is 27.00 while my 25th percentile under normal distribution assumptions is 28.2711405
My actual 50th percentile (mean) is 37.00 while my 50th percentile under normal distribution assumptions is 37
My actual 75th percentile is 44.75 while my 75th percentile under normal distribution assumptions is 45.7288595

What’s interesting is that my mean stays the same, but I have a negative skew in the actual data, both the first and third quartiles were larger in the actual sample calculations.

Chapter 9

Question 1:

Q: A population has a mean of 50 and a standard deviation of 6. (a) What are the mean and standard deviation of the sampling distribution of the mean for N = 16? (b) What are the mean and standard deviation of the sampling distribution of the mean for N = 20?

A: We know the sample mean is exactly equal to the population mean in a normal distribution, a for a.) our mean is just 50. The standard deviation of the sample mean, however, is the standard deviation of the population divided by the square root of the sample size. In this the sample SD would be 6 / sqrt(16) or 3/2. b.) If we followed that exact same logic, we’d have the mean holding true at 50 and the sample standard deviation = 6 / sqrt(20) = 1.3416408

Question 3:

Q: What term refers to the standard deviation of the sampling distribution?

A: The standard error of the mean is the standard deviation of the sampling distribution of the mean. The term is “sampling error”

Question 5:

Q: A questionnaire is developed to assess women’s and men’s attitudes toward using animals in research. One question asks whether animal research is wrong and is answered on a 7-point scale. Assume that in the population, the mean for women is 5, the mean for men is 4, and the standard deviation for both groups is 1.5. Assume the scores are normally distributed. If 12 women and 12 men are selected randomly, what is the probability that the mean of the women will be more than 1.5 points higher than the mean of the men?

A: We know that the probability is going to be high since sampling distribution means are the same as our population, but to calculate the actual probability we first calculate the sampling distribution difference in means, or 5-4 = 1. The standard error difference is then the square root of the sum of each variance over each sample size.

n <- 12
sd <- 1.5
var <- sd^2
s_error_delta <- sqrt((var/n)+(var/n))
mean_diff <- 5-4
q5_prob <- pnorm(q = 1.5, mean = mean_diff, sd = s_error_delta)

Therefore the probability that the mean of the women will be 1.5 more than the mean of the men after 12 samples of each is 79.2891911%

Question 7:

Q: If numerous samples of N = 15 are taken from a uniform distribution and a relative frequency distribution of the means is drawn, what would be the shape of the frequency distribution?

A: The shape of the frequency distribution would be normal, tightly centered around the mean of the uniform distribution

Question 9:

Q: What is the shape of the sampling distribution of r? In what way does the shape depend on the size of the population correlation?

A: In the books example they discuss a collection of correlation values of 12 random samples of SAT students’ verbal and quant. scores. Because the value of p (correlation) can only increase until max value 1, while we could have much brooder range of negative correlations, the shape of the sampling distribution ‘r’ is negatively skewed. The skewness will be contingent on the correlation though. If your sampling distribution mean correlation is high, e.g. 0.90, then your skewness will be strongly negative.

Question 11:

Q: A variable is normally distributed with a mean of 120 and a standard deviation of 5. Four scores are randomly sampled. What is the probability that the mean of the four scores is above 127?

q12_standard_error <- 5/sqrt(4)
q11_prob <- 1-pnorm(q = 127, mean = 120, sd = q12_standard_error)

A: The probability is only 0.255513%, which represents the small area under the normal distribution above the value of 127

Question 13:

Q: The mean GPA for students in School A is 3.0; the mean GPA for students in School B is 2.8. The standard deviation in both schools is 0.25. The GPAs of both schools are normally distributed. If 9 students are randomly sampled from each school, what is the probability that:
a) the sample mean for School A will exceed that of School B by 0.5 or more?
b) the sample mean for School B will be greater than the sample mean for School A?

n_q13 <- 9
sd_q13 <- 0.25
var_q13 <- sd_q13^2
s_error_delta_q13 <- sqrt((var_q13/n_q13)+(var_q13/n_q13))
mean_diff_q13 <- 3-2.8
q13_prob_a <- pnorm(q = 0.5, mean = mean_diff_q13, sd = s_error_delta_q13)
q13_prob_b <- pnorm(q = 0, mean = mean_diff_q13, sd = s_error_delta_q13)
q13_prob_a
## [1] 0.9945453
q13_prob_b
## [1] 0.04484301

a.) The probability that the mean of school A will exceed that of School B by 0.5 or more is 99.4545251% b.) The probability that the mean of school B will exceed that of School A by 0.0 is 4.4843011%

Question 15:

Q: When solving problems where you need the sampling distribution of r, what is the reason for converting from r to z’?

A: Because r is inherently non-standard, so computing probabilities is not possible. By using the z conversion we can create a normal distribution with a known standard error, and by default, a probability.

Question 17:

Q: True/false: The standard error of the mean is smaller when N = 20 than when N = 10.

#Let's test using a variance of 1 and our two sample sizes 
sqrt((1/20)+(1/20))
## [1] 0.3162278
sqrt((1/10)+(1/10))
## [1] 0.4472136

A: TRUE: We can see from the above that the standard error is smaller when using a larger sample size, which is what we should have assumed

Question 19:

Q: True/false: You choose 20 students from the population and calculate the mean of their test scores. You repeat this process 100 times and plot the distribution of the means. In this case, the sample size is 100.

A: False: The sample size is 20, the number of sample taken is unique.

Question 21:

Q: True/false: The median has a sampling distribution.

A: TRUE.

The following questions use data from the Angry Moods (AM) case study.

Question 23:

Q: (a) How many men were sampled? (b) How many women were sampled?

q24file <- read.csv(file = "angry_moods.csv", header = TRUE)
data.frame(table(q24file$Gender))
##   Var1 Freq
## 1    1   30
## 2    2   48

A: We know from the description of the data set that 1 = males, 2 = females, meaning, we have 30 males and 48 females sampled.

Question 24:

Q: What is the mean difference between men and women on the Anger-Out scores?

q24file_men <- q24file[q24file$Gender == 1,]
q24file_women <- q24file[q24file$Gender == 2,]
men_AO <- mean(q24file_men$Anger.Out)
women_AO <- mean(q24file_women$Anger.Out)

A: The mean of the men’s anger out score is 16.5666667. The mean of the women’s anger out score is 15.7708333

Question 25:

Q: Suppose in the population, the Anger-Out score for men is two points higher than it is for women. The population variances for men and women are both 20. Assume the Anger-Out scores for both genders are normally distributed. Given this information about the population parameters:
a) What is the mean of the sampling distribution of the difference between means?
b) What is the standard error of the difference between means?
c) What is the probability that you would have gotten this mean difference (see #24) or less in your sample?

A: a) What is the mean of the sampling distribution of the difference between means is simply the difference between sampling means!! That would be 2 in this case.
b) The standard error of the difference between means is simply the square root of the sum of the varainces over each sample size. Which in this case would be sqrt((20/30)+(20/48)) or 1.040833
c) The probability of getting the mean difference (calculated above as 0.7958333) can be calculated from below

pnorm(q = 0.7958333, mean = 1.040833, sd = sqrt(20))
## [1] 0.4781554