STA 111 Lab 5

R Markdown

Question 1

Here is the text for the first question.

Is our population parameter a population mean or a population proportion?

Due to its stated property of being a proportion of blue M&Ms relative to the whole population of M&Ms, it is a population proportion.

Question 2

Here is the text for the second question.

There are roughly 2.3 million M&M’s in the world. Based on this, how many M&M’s in the world should be blue? Round your answer to the nearest whole number.

# Create our population: Repeat the phrase "not blue" 23000 times
population <- rep("not blue", 23000)
# Choose 24% of these to be blue M&Ms
set.seed(100)
makeBlue <- sample(1:23000, 23000*.24)
# Record these blue M&Ms in our population
population[makeBlue] <- "blue"
rm(makeBlue)

table(population)

## population
##     blue not blue 
##     5520    17480

prop.table(table(population))

## population
##     blue not blue 
##     0.24     0.76

Question 3

Here is the text for the third question.

Based on the tables, have we created our population correctly?

I believe so. Given that out of our sample of 23,000 M&Ms, exactly 5520 (24%) of the M&Ms were colored blue aligning with the population proportion of 24% which we should find when running a sample of M&Ms. In this scenario, we did showing that we have created our population correctly.

Question 4

Here is the text for the fourth question.

What must be true for a sample to be a simple random sample (SRS)?

For this sample to be a simple random sample, every M&M selected for the sample must have an equal likelihood among other M&Ms to be selected.

sample(population, size = 50)

##  [1] "blue"     "not blue" "not blue" "not blue" "not blue" "not blue"
##  [7] "not blue" "not blue" "not blue" "blue"     "not blue" "not blue"
## [13] "not blue" "not blue" "not blue" "not blue" "not blue" "blue"    
## [19] "not blue" "not blue" "blue"     "not blue" "not blue" "not blue"
## [25] "not blue" "not blue" "not blue" "not blue" "blue"     "not blue"
## [31] "blue"     "not blue" "not blue" "not blue" "blue"     "not blue"
## [37] "not blue" "blue"     "blue"     "blue"     "not blue" "not blue"
## [43] "blue"     "not blue" "not blue" "blue"     "not blue" "not blue"
## [49] "blue"     "not blue"

SRS1 <- sample(population, size = 50)

Question 5

Here is the text for the fifth question.

If I wanted to store the sample under the name SimpleRandomSample1, how would I need to change the code in the chunk above?

Instead of classifying the variable as “SRS1”, you would give the computer the classification of “SimpleRandomSample1”. Here is what the code will look like.

SimpleRandomSample1 <- sample(population, size = 50)

prop.table(table(SimpleRandomSample1))

## SimpleRandomSample1
##     blue not blue 
##     0.24     0.76

Question 6

Here is the text for the sixth question.

What proportion of M&M’s in this sample are blue? This is our sample statistic for this sample.

In this sample, 26%, 13 out of 50, of the M&Ms are colored blue.

Question 7

Here is the text for the seventh question.

Suppose we use the sample statistic in Question 6 as an estimate of our population parameter. Do we over estimate or underestimate, and by how much?

If we were to use 0.26 as an estimation of the population proportion of all blue M&Ms, given that the stated actual proportion of blue M&Ms is 0.24, our estimated population proportion, resultant from our simple random sample, would be an overestimation of 0.02 or 2%.

Question 8

Here is the text for the eighth question.

Do we expect our sample statistic and our population parameter to be the same? Explain why or why not.

No. Given that our simple random sample only establishes a small sample population of 50 M&M candies, the observed proportion of blue M&Ms is likely to be deviated from the actual population proportion. When np (sample population * sample proportion) is larger, the skewness of the observed proportion tends to be smaller and shaped much more like a normal distribution.

SRS2 <- sample(population, size = 50)
prop.table(table(SRS2))

## SRS2
##     blue not blue 
##     0.26     0.74

Question 9

Here is the text for the ninth question.

What proportion of M&M’s in this sample are blue? Is this sample proportion the same as what you got Question 6?

For our second simple random sample, the observed proportion of blue M&Ms was 0.24 or 24% which is 12 out of 50 M&Ms. This is not the same proportion that we observed in the first sample (0.26; 26%).

Question 10

Here is the text for the tenth question.

Look at the sample proportions you got in Question 6 (from SRS1) and Question 9 (in SRS2). For each of these, compute 𝑝−𝑝̂ , where 𝑝 is the population proportion and 𝑝̂ is the sample proportion. This tells us how far off each of our sample statistics were from the population parameter.

𝑝= 0.24 SRS1: 0.24 - 0.26 = -0.02 SRS2: 0.24 - 0.24 = 0

According to this equation, SRS1 overstated the actual proportion by 0.02 and SRS2 accurately stated the actual proportion of blue M&Ms.

sample_prop50 <- rep(NA, 5000)

for(i in 1:5000){
   samp <- sample(population, size = 50)
   sample_prop50[i] <- sum(samp=="blue")/50
}

rm(i,samp)

Question 11

Here is the text for the eleventh question.

Look in your Environment Tab. What proportion of M&M’s in the third simple random sample are blue?

In the third simple random sample, 18% (0.18) of the 50 observed M&Ms are blue.

hist(sample_prop50, col="gold", xlab = "Sample Proportion", main = "Figure 1")

Question 12

Here is the text for the twelfth question.

Describe the sampling distribution. In other words, is the distribution unimodal or multimodal, and is it symmetric, skewed right, or skewed left?

The illustrated historam appears to be unimodal and skewed right.

Question 13

Here is the text for the thirteenth question.

What is the average of all 5000 sample proportions? In other words, what is the mean of sample_prop50?

mean(sample_prop50)

## [1] 0.240692

It appears that the mean of all 5000 sample proportions is 0.240676 which corresponds with our population proportion of 0.24.

Question 14

Here is the text for the fourteenth question.

Is the value you get in Question 13 bigger than, smaller than, or roughly equal to the true population parameter of .24?

While extremely similar, our observed mean of 0.240676 is slightly larger than the true population parameter of 0.24.

Question 15

Here is the text for the fifteenth question.

What is the standard error of our sample statistics? In other words, what is the standard deviation (sd) of sample_prop50?

sd(sample_prop50)

## [1] 0.06083306

For our sample statistics, we have a standard deviation of 0.0608385 which is roughly 6.1% +/- the mean.

Question 16

Here is the text for the sixteenth question.

Look back at your first sample statistic, the sample proportion from SRS1. Based on this, use your sample statistic to create a range of plausible values for the population proportion of M&M’s that are blue. State your range. Is 𝑝=.24 in your range of plausible values?

0.26 +/- 2(0.0608385)

Given that we would like for the true population proportion to fall within the range of +/- 2 standard deviations of our sample proportion, an appropriate range of plausible values based on SRS1 is (.1383, .3817). Given that this is our range, 0.24, the true population proportion, falls within our range.

Question 17

Here is the text for the seventeenth question.

𝑆𝐸= (𝑝(1−𝑝))/𝑛

Based on the formula above, what should the standard error of the sample proportion be? Is this value similar to what you got in Question 15?

According to the equation, our Standard Error should be 0.060399. This value is indeed very similar to the value I found in Question 15.

Question 18

Here is the text for the eighteenth question.

Here is the code we used to run our simulation study. We now want to change our sample size to 100. To do this, just change all the 50’s that you see to 100’s. Make the change and then run the code and plot your sampling distribution. Change the title of the sampling distribution to Figure 2.

Figure2 <- rep(NA, 5000)

for(i in 1:5000){
   samp <- sample(population, size = 100)
   Figure2[i] <- sum(samp=="blue")/100
}

rm(i,samp)

Question 19

Here is the text for the nineteenth question.

sd(Figure2)

## [1] 0.04225271

The standard deviation of Figure2 is 0.042534 which is considerably smaller than the observed standard deviation from our 5000 samples of 50.

Question 20

Here is the text for the twentieth question.

Why do you think that is?

Like I observed earlier in this lab, when the sample size gets larger, the standard deviation tends to get smaller as the skewness decreases. When n*p (population (n) times p (population proportion)) and n(1-p) are larger, the distribution is more normal which shows less skewness. This means that when there is a larger sample, the sample proportion tends to correct itself to more closely mirror the true population proportion.