Data606- Chapter 3 Practice

Chapter 3 - Distributions of Random Variables - Practice

Practice: 3.1 (see normalPlot), 3.3, 3.17 (use qqnormsim from lab 3), 3.21, 3.37, 3.41

3.1 Area under the curve, Part I. What percent of a standard normal distribution N(μ = 0, sd = 1) is found in each region? Be sure to draw a graph.

#(a) Z < -1.35
normalPlot(mean = 0, sd = 1, bounds = c(-5, -1.35))

#(b) Z > 1.48


normalPlot(mean = 0, sd = 1, bounds = c(1.48, 5))

#(c)???0.4 < Z < 1.5

normalPlot(mean = 0, sd = 1, bounds = c(0.4, 1.5))

#d |Z| > 2

normalPlot(mean = 0, sd = 1, bounds = c(2, 4))

normalPlot(mean = 0, sd = 1, bounds = c(-4, -2))

3.3 GRE scores, Part I.

Sophia who took the Graduate Record Examination (GRE) scored 160 on the Verbal Reasoning section and 157 on the Quantitative Reasoning section. The mean score for Verbal Reasoning section for all test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was 153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal. ### (a) Write down the short-hand for these two normal distributions.

Verbal Reasoning normal distribution N(μ=151,σ=7)

Quantitative Reasoning normal distribution N(μ=153,σ=7.67)

(b) What is Sophia’s Z-score on the Verbal Reasoning section? On the Quantitative Reasoning section? Draw a standard normal distribution curve and mark these two Z-scores.

Verbal Reasoning Z =(x−μ)/σ=(160−151)/7=1.286

Quantitative Reasoning Z =(x−μ)/σ=(157−153)/7.67=0.5215

normalPlot(mean = 151, sd = 7, bounds = c(160 , 161))

normalPlot(mean = 153, sd = 7.67, bounds = c(157 , 158))

(c) What do these Z-scores tell you?

The Z-score of an observation is the number of standard deviations it falls above or below the mean = 0. Verbal Reasoning Z is 1.286 which means 1.286 sd above the mean. and Quantitative Reasoning Z is 0.5215 which means 0.5215 sd above the mean.

(d) Relative to others, which section did she do better on?

She has more zscore for Verbal Reasoning with 1.286 sd above the mean.

(e) Find her percentile scores for the two exams.

# percentile for Verbal Reasoning
pnorm(1.286) * 100

## [1] 90.07785

# percentile for Quantitative Reasoning
pnorm(0.5215) * 100

## [1] 69.89907

(f) What percent of the test takers did better than her on the Verbal Reasoning section? On the

Quantitative Reasoning section?

#percent of the test takers did better than her on the Verbal Reasoning
(1- pnorm(1.286)) * 100

## [1] 9.922153

#percent of the test takers did better than her on the Quantitative Reasoning section
(1- pnorm(0.5215)) * 100

## [1] 30.10093

(g) Explain why simply comparing her raw scores from the two sections would lead to the incorrect conclusion that she did better on the Quantitative Reasoning section.

Both distribution has different mean and sd, compairing raw score would lead into incorrect conclution. Converting into z score means your distruition is scaled into a common scale where mean is always 0 and use sd to gauge the variation of scores from two different distrubtion.

(h) If the distributions of the scores on these exams are not nearly normal, would your answers to parts (b) - (f) change? Explain your reasoning.

Yes, it would change. Any change distrubtion will have a chnage in mean and sd. As long as this parameters chnages, z score and other percentiles we calculated above will change.

3.17 Scores on stats final

Below are final exam scores of 20 Introductory Statistics students.

57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94

(a) The mean score is 77.7 points. with a standard deviation of 8.44 points. Use this information to determine if the scores approximately follow the 68-95-99.7% Rule.

As per Rule: 68% of observations 1 sd above and below the mean( ie between 69.26 and 86.14) actual percentage = 14/20 * 100 = 70 %

As per Rule: 95% of observations 2 sd above and below the mean( ie between 60.82 and 94.58) actual percentage = 18/20 * 100 = 90 %

As per Rule: 99.7% of observations 3 sd above and below the mean( ie between 52.38 and 103.02) actual percentage = 20/20 * 100 = 100 %

It roughly folow 68-95-99.7% Rule.

(b) Do these data appear to follow a normal distribution? Explain your reasoning using the graphs provided below.

Slightly left skewed, But looks to be a close enough to normal based on qq plot

normalPlot(mean = 77.7, sd = 8.44)

scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
hist(scores, prob=TRUE, 
     xlab="scores", ylim=c(0, .1),
     main="normal curve over histogram")
curve(dnorm(x, mean=mean(scores), sd=sd(scores)), 
      col="darkblue", lwd=2, add=TRUE, yaxt="n")

qqnormsim(scores)

## 3.21 Married women The 2010 American Community Survey estimates that 47.1% of women ages 15 years and over are married.

We randomly select three women between these ages. What is the probability that the third woman selected is the only one who is married?

P(N, N, M)=P(1st_not_married )×P(2nd_not_married)×P(3rd_married)

P(N,N,M) = (1-.471) * (1-.471) * .47 (0.529) (0.529) (0.471) = 0.132

Using Geometric Distribution = (1−p)n−1p , where “success” is p in n trials

P(N,N,M) = (1−0.471)2×0.471=(0.529)2(0.471)=0.132=13.2

(b) What is the probability that all three randomly selected women are married?

P(M, M ,M ) = 0.471 * 0.471 * 0.471 = 0.104

(c) On average, how many women would you expect to sample before selecting a married woman? What is the standard deviation?

mean=1/p=1/0.471=2.12, so just over 2 or by the 3rd woman

sd=(sqrt(1-p)/p2)=1.54

(d) If the proportion of married women was actually 30%, how many women would you expect to sample before selecting a married woman? What is the standard deviation?

mean=1/p=1/0.30=3.33, so over 3 or by the 4th woman

sd=(sqrt(1−p)/p2)=2.79

(e) Based on your answers to parts (c) and (d), how does decreasing the probability of an event affect the mean and standard deviation of the wait time until success?

If the probabilty reduces, the mean and sd increases.

3.37 Exploring combinations

The formula for the number of ways to arrange n objects is n!=n * (n−1) * ⋅⋅⋅ * 2 * 1 . This exercise walks you through the derivation of this formula for a couple of special cases.

A small company has five employees: Anna, Ben, Carl, Damian, and Eddy. There are five parking spots in a row at the company, none of which are assigned, and each day the employees pull into a random parking spot. That is, all possible orderings of the cars in the row of spots are equally likely.

(a) On a given day, what is the probability that the employees park in alphabetical order?

P(parking_alphabetical-Anna, Ben, Carl, Damian, Eddy) = 1/ 5! = 1/(54321) = 1/120 = .05

(b) If the alphabetical order has an equal chance of occurring relative to all other possible orderings, how many ways must there be to arrange the five cars?

n! = 5! = 120

(c) Now consider a sample of 8 employees instead. How many possible ways are there to order these 8 employees’ cars?

n! = 8! =(8×7×6×5×4×3×2×1)=40,320

3.41 Sampling at school

For a sociology class project you are asked to conduct a survey on 20 students at your school. You decide to stand outside of your dorm’s cafeteria and conduct the survey on a random sample of 20 students leaving the cafeteria after dinner one evening. Your dorm is comprised of 45% males and 55% females.

(a) Which probability model is most appropriate for calculating the probability that the 4th person you survey is the 2nd female? Explain.

The negative binomial distribution can be used here. it describes the probability of observing the kth success on the nth trial. nth trial = 4th person, kth success = 2nd female

The negative binomial distribution must meet these 4 conditions.

The trials are independent. Each trial outcome can be classified as a success or failure. The probability of a success (p) is the same for each trial. The last trial must be a success.

(b) Compute the probability from part (a).

P(thekthsuccess_onthe_n_th_trial)=(n−1 k−1)pk(1−p)n−k

P(thekthsuccess_onthe_n_th_trial)=(n−1)!/(k−1)!(n−k)!pk(1−p)n−k

P(thekthsuccess_onthe_n_th_trial)=3!/1!2!×(.55)2×(.45)2

P(thekthsuccess_onthe_n_th_trial)=3(0.3025)(0.2025)=0.184

(c) The three possible scenarios that lead to 4th person you survey being the 2nd female are {M,M,F,F}, {M,F,M,F}, {F,M,M,F} One common feature among these scenarios is that the last trial is always female. In the first three trials there are 2 males and 1 female. Use the binomial coefficient to confirm that there are 3 ways of ordering 2 males and 1 female.

The binomial coefficient =(n−1)!/(k−1)!(n−k)!=(4−1)!/(2−1)!(4−2)!=3!/1!2!=3

Use the findings presented in part (c) to explain why the formula for the coefficient for the negative binomial is n-1 choose k-1 while the formula for the binomial coefficient is n choose k.

In the binomial formula we are looking at k successes out of n combinations, which give you the n and k factorials. The negative binomial formula is when the last trial is fixed to success, so we care about the remaining k-1 successes in n-1 combinations.