## 
## Welcome to CUNY IS606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='IS606') will list the demos that are available.
## 
## Attaching package: 'IS606'
## The following object is masked from 'package:utils':
## 
##     demo

3.1 Area under the curve, Part I

What percent of a standard normal distribution \(N(\mu = 0, \sigma = 1)\) is found in each region? Be sure to draw a graph.

(a) Z < \(-1.35\)

\(Z = (x - \mu)/\sigma ==> x = (Z \times \sigma) + \mu = (-1.35 \times 1) + 0 = -1.35\)

So, when \(\mu = 0\) and \(\sigma = 1\), Z = x

If we “draw” our graph with normalPlot we see the percent from -1.35 and less (to -4) is P(-4 < x < -1.35) = 0.0885 (and we get the same with -5 to -1.35) or 8.85%.

normalPlot(mean = 0, sd = 1, bounds = c(-4, -1.35))

(b) Z > 1.48

The same applies here with x = Z = 1.48. normalPlot says the percent for (1.48 to 4) is 0.0694 or 6.94%

normalPlot(mean = 0, sd = 1, bounds = c(1.48, 4))

(c) \(-0.4\) < Z < 1.5

normalPlot says the percent or P(-0.4 to 1.5) = 0.589, or 58.9%

normalPlot(mean = 0, sd = 1, bounds = c(-0.4, 1.5))

(d) |Z| > 2

I am not sure how to interpret |Z|. One idea is that it means the same as Z > 2, which is P(2 to 4) = 0.0227, or 2.27%. However, I could interpret it to mean P(2 to 4) + P(-2 to -4) = 2 X 2.27% = 4.54%.

normalPlot(mean = 0, sd = 1, bounds = c(2, 4))

normalPlot(mean = 0, sd = 1, bounds = c(-4, -2))

3.3 GRE scores, Part I

Sophia who took the Graduate Record Examination (GRE) scored 160 on the Verbal Reasoning section and 157 on the Quantitative Reasoning section. The mean score for Verbal Reasoning section for all test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was 153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal.

(a) Write down the short-hand for these two normal distributions.

Verbal Reasoning normal distribution \(N(\mu = 151, \sigma = 7)\)

Quantitative Reasoning normal distribution \(N(\mu = 153, \sigma = 7.67)\)

(b) What is Sophia’s Z-score on the Verbal Reasoning section? On the Quantitative Reasoning section? Draw a standard normal distribution curve and mark these two Z-scores.

Verbal Reasoning Z \(= (x - \mu)/\sigma = (160 - 151)/7 = 1.286\)

Quantitative Reasoning Z \(= (x - \mu)/\sigma = (157 - 153)/7.67 = 0.5215\)

It’s hard to draw a point with normalplot, but I shaded the part of the distribution from Sophia’s score up. The Verbal Reasoning percent that made her score or higher is 9.93 or ~10%. She was in the upper 10% or 90% scored less than she did. On Quantitative Reasoning the percent was 30.1%.

normalPlot(mean = 151, sd = 7, bounds = c(160, 180))

normalPlot(mean = 153, sd = 7.67, bounds = c(157, 180))

(c) What do these Z-scores tell you?

The Z-score of an observation is the number of standard deviations it falls above or below the mean. So, Sophia’s Verbal Reasoning Z-score of 1.286 means she scores 1.286 standard deviations above the mean (~10%). Her Quantitative Reasoning Z-score of 0.5215 is over half a standard deviation above the mean (~30%). She did better on Verbal Reasoning.

(d) Relative to others, which section did she do better on?

She did better on Verbal Reasoning, because that Z-score is higher (over 2 times).

(e) Find her percentile scores for the two exams.

Verbal Reasoning Percentage = 9.93 or ~10%, which I think we normally call the 90th percentile.

Quantitative Reasoning Percentage = 30.1%, which is close to the 70th percentile

(f) What percent of the test takers did better than her on the Verbal Reasoning section? On the Quantitative Reasoning section?

About 10% did better on the Verbal Reasoning section.

About 30% did better on the Quantitative Reasoning section.

(g) Explain why simply comparing her raw scores from the two sections would lead to the incorrect conclusion that she did better on the Quantitative Reasoning section.

Her raw scores were very close (160 & 157) and the means and SD didn’t seem that different. However, 90th versus 70th percentile is a big difference.

(h) If the distributions of the scores on these exams are not nearly normal, would your answers to parts (b) - (f) change? Explain your reasoning.

Part (b) would work because Z-scores don’t depend on normal distribution, however the other question parts wouldn’t work, because we wouldn’t have an abnormal distribution table to use or abnormalplot.

3.17 Scores on stats final

Below are final exam scores of 20 Introductory Statistics students.

57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94

(a) The mean score is 77.7 points. with a standard deviation of 8.44 points. Use this information to determine if the scores approximately follow the 68-95-99.7% Rule.

77.7 + 8.44 = 86.14, 77.7 - 8.44 = 69.26

Rule Check 1-SD 68% 2-SD 95% 3-SD 99.7%
Range 69 to 86 61 to 95 52 to 103
Count 14/20 19/20 20/20
Percent .70 = 70% .95 = 95% 1 = 100%

The 68% or 1-SD is a little off at 70%, but not much and the other two work.

(b) Do these data appear to follow a normal distribution? Explain your reasoning using the graphs provided below.

I think it is skewed to the left a little, but the Qtest doesn’t look bad, so maybe it is close enough to normal.

scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
qqnormsim(scores)

3.21 Married women

The 2010 American Community Survey estimates that 47.1% of women ages 15 years and over are married.

(a) We randomly select three women between these ages. What is the probability that the third woman selected is the only one who is married?

\(P(N-N-M) = P(1st woman not married) \times P(2nd woman not married) \times P(3rd woman is married)\)

P(N-N-M) = (0.529) (0.529) (0.471) = 0.132 = 13.2%

Geometric Distribution = \((1 - p)^{n-1}p\) , where “success” is p in n trials

P(N-N-M) = \((1 - 0.471)^{2} \times 0.471 = (0.529)^{2}(0.471) = 0.132 = 13.2%\)

(b) What is the probability that all three randomly selected women are married?

P(M-M-M) = P(1st woman is married) X P(2nd woman is married) X P(3rd woman is married)

P(M-M-M) = (0.471) (0.471) (0.471) = 0.104 = 10.4%

(c) On average, how many women would you expect to sample before selecting a married woman? What is the standard deviation?

\(\mu = 1/p = 1/0.471 = 2.12\), so just over 2 or by the 3rd woman

\(\sigma = \sqrt((1-p)/p^{2}) = 1.54\)

(d) If the proportion of married women was actually 30%, how many women would you expect to sample before selecting a married woman? What is the standard deviation?

\(\mu = 1/p = 1/0.30 = 3.33\), so over 3 or by the 4th woman

\(\sigma = \sqrt((1-p)/p^{2}) = 2.79\)

(e) Based on your answers to parts (c) and (d), how does decreasing the probability of an event affect the mean and standard deviation of the wait time until success?

They increase or go up.

3.37 Exploring combinations

The formula for the number of ways to arrange n objects is \(n! = n \times (n - 1) \times · · · \times 2 \times 1\). This exercise walks you through the derivation of this formula for a couple of special cases.

A small company has five employees: Anna, Ben, Carl, Damian, and Eddy. There are five parking spots in a row at the company, none of which are assigned, and each day the employees pull into a random parking spot. That is, all possible orderings of the cars in the row of spots are equally likely.

(a) On a given day, what is the probability that the employees park in alphabetical order?

P(alpha-parking) \(= P(Anna) \times P(Ben) \times P(Carl) \times P(Damian) \times P(Eddy)\)

\(= 1/5 \times 1/4 \times 1/3 \times 1/2 \times 1/1\)

\(= 1/5! = 1/(5 \times 4 \times 3 \times 2 \times 1) = 1/120 = 0.0083 =\) 0.83%

(b) If the alphabetical order has an equal chance of occurring relative to all other possible orderings, how many ways must there be to arrange the five cars?

n! = 5! = 120

(c) Now consider a sample of 8 employees instead. How many possible ways are there to order these 8 employees’ cars?

n! = 8! \(= (8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1) = 40,320\)

3.41 Sampling at school

For a sociology class project you are asked to conduct a survey on 20 students at your school. You decide to stand outside of your dorm’s cafeteria and conduct the survey on a random sample of 20 students leaving the cafeteria after dinner one evening. Your dorm is comprised of 45% males and 55% females.

(a) Which probability model is most appropriate for calculating the probability that the 4th person you survey is the 2nd female? Explain.

The negative binomial distribution describes the probability of observing the kth success on the nth trial. nth trial = 4th person, kth success = 2nd female

The negative binomial distribution must meet these 4 conditions.

  1. The trials are independent.
  2. Each trial outcome can be classified as a success or failure.
  3. The probability of a success (p) is the same for each trial.
  4. The last trial must be a success.

(b) Compute the probability from part (a).

\(P(the kth success on the nth trial) = {n - 1 \choose k - 1}p^{k}(1-p)^{n-k}\)

\(P(the kth success on the nth trial) = (n - 1)!/(k-1)!(n-k)! p^{k}(1-p)^{n-k}\)

\(P(the kth success on the nth trial) = 3!/1!2! \times (.55)^{2} \times (.45)^{2}\)

\(P(the kth success on the nth trial) = 3 (0.3025) (0.2025) = 0.184\)

MFMF = (.45)(.55)(.45)(.55) MMFF = (.45)(.45)(.55)(.55) FMMF = (.55)(.45)(.45)(.55)

(c) The three possible scenarios that lead to 4th person you survey being the 2nd female are {M,M,F,F}, {M,F,M,F}, {F,M,M,F} One common feature among these scenarios is that the last trial is always female. In the first three trials there are 2 males and 1 female. Use the binomial coefficient to confirm that there are 3 ways of ordering 2 males and 1 female.

The binomial coefficient \(= (n - 1)!/(k-1)!(n-k)! = (4 - 1)!/(2 - 1)!(4 - 2)! = 3!/1!2! = 3\)

(d) Use the findings presented in part (c) to explain why the formula for the coefficient for the negative binomial is n-1 choose k-1 while the formula for the binomial coefficient is n choose k.

In the binomial formula we are looking at k successes out of n combinations, which give you the n and k factorials. The negative binomial formula is when the last trial is fixed as a success, so we are looking at the remaining k-1 successes in n-1 combinations.