Objectives:

For this lab you should…

Part 1: Are Stormy Marmot’s predictions better than a random 50-50 chance?

Every year in Aurora, Colorado a yellow-bellied marmot by the name of Stormy makes a prediction about the end of winter. If he comes out of his burrow and sees his shadow he predicts six more weeks of winter. If he does not see his shadow, his prediction is an early spring. In the ten years from 2013 to 2022 he has made the below predictions:

Year Prediction March Temperature Prediction accuracy
2022 Early spring Below normal Incorrect
2021 More winter Below normal Correct
2020 More winter Above normal Incorrect
2019 More winter Below normal Correct
2018 Early spring Above normal Correct
2017 Early spring Above normal Correct
2016 Early spring Above normal Correct
2015 Early spring Above normal Correct
2014 More winter Above normal Incorrect
2013 More winter Below normal Correct

TASK 1.1 What are the correct null and alternative hypotheses to answer the research question posed in the section title? Hint – what is p if his predictions are random?

**Response**

\(H_0\): P=.5
\(H_a\): P>.5

TASK 1.2 What is \(\hat{p}\) when considering this example?

**Response**7/10=.7

TASK 1.3. To create a randomization distribution, we must determine what the distribution of p-hat is if his prediction is random. We will use virtual coins, which have a true 50% chance of being heads. Go to justflipacoin.com.

How many times will you need to flip this penny to create one sample statistic for the randomization distribution?

**Response**10 Times

TASK 1.4 Now flip the penny that many times. Pretend that getting a heads with the coin is equivalent to Stormy making a correct prediction. What was your p-hat?

**Response** 4/10 = .4

TASK 1.5 Did your sample make as many correct predictions as Punxsutawney Phil?

**Response** No

TASK 1.6 Now we will go big and have R create many many more statistics for our randomization distribution. Modify and run the code below to correctly generate 2000 randomization samples and save the sample proportion from each.

What do you expect the center of the randomization to be?

Response .5

# save space for the randomization statistics
RandomizationProps <- rep(NA, 2000) # modify NumberRandomizationSamples

for(i in 1:2000){ # modify NumberRandomizationSamples
  
  # generate a randomization sample assuming the null is true
  TempSample <- sample(c("Correct","Incorrect"), 
                       prob = c(.5, .5), # modify null proportions of correct/incorrect
                       size = 10, # modify n, sample size
                       replace = T) 
  
  # save the proportion of correct from the randomization sample
  RandomizationProps[i] <- sum(TempSample == "Correct")/10 # modify n, the sample size
}

# write code here to visualize your randomization distribution
gf_histogram(~RandomizationProps)

TASK 1.7 Use your randomization distribution to find the p-value. Remember that the p-value is defined to be the proportion of statistics under the null that are as extreme or more extreme than the observed sample statistic.

**Response** 0.176
# in place of ? in the code below, use one of the following, depending on your alternative hypothesis: >=, <=.
# also modify SampleStatistic and NumberRandomizationSamples

sum(RandomizationProps >= .7)/2000
## [1] 0.162

TASK 1.8 Interpret the p-value in context by completing the sentence below:

**Response** If the marmot is choosing randomly, the chance that he would correctly predict the end of winter 7 times out of 10 , or more, is 0.176.

TASK 1.9 Let’s say that in 2023 Stormy makes a correct prediction, bringing his record to 8 correct of 11. Modify and use the code below to generate a new randomization distribution for this scenario and visualize it:

# save space for the randomization statistics
RandomizationProps2 <- rep(NA, 2000) # modify NumberRandomizationSamples

for(i in 1:2000){ # modify NumberRandomizationSamples
  
  # generate a randomization sample assuming the null is true
  TempSample <- sample(c("Correct","Incorrect"), 
                       prob = c(.5, .5), # modify null proportions of correct/incorrect
                       size = 10, # modify n, sample size
                       replace = T) 
  
  # save the proportion of correct from the randomization sample
  RandomizationProps2[i] <- sum(TempSample == "Correct")/10 # modify n, the sample size
}

# write code to visualize your new randomization distribution

gf_histogram(~RandomizationProps2)

TASK 1.10 Using the new data and the visualization of the randomization distribution, what is our sample \(\hat{p}\) and the approximate p-value for testing the same hypothesis we wrote in question 1?

**Response** Phat = 0.7272727, P-value = 0.065
sum(RandomizationProps2 >= 8/11)/2000
## [1] 0.058
8/11
## [1] 0.7272727

Part 2: Beer and Mosquitoes

In a study to investigate the effect of drinking beer on mosquito attraction, 43 men were randomly assigned to drink either a liter of beer or a liter of water, with 25 being assigned to the beer group and 18 to the water group. The number of mosquitoes approaching each man was then recorded.

We wish to see whether this data provides evidence that mosquitoes are attracted to beer drinkers (or, more precisely, that the mean number of mosquitoes approaching a beer drinker is significantly larger than the mean number of mosquitoes approaching a water drinker.) Assume group 1 is beer drinkers and group 2 is water drinkers.

TASK 2.1 Is this an experiment or an observational study? What are the cases? What are the variables?

Response - Experiment or observational study: Experimental - cases: Each man in the study - variables: Liquid consumed and amount of mosquitoes that approached each man

TASK 2.2 State the null and alternative hypotheses for this test. Define any parameters used.

Response \(H_0\): Mu(beer)-Mu(water)=0 \(H_a:\):Mu(beer)-Mu(water)>0

TASK 2.3 Use StatKey technology to create a randomization distribution for this test using at least 4,000 samples. (The data on Beer/Mosquitoes is one of the available datasets in StatKey, under Test for a Difference in Means.) Use the randomization distribution to indicate whether each of the following possible differences in means is (i) very likely to occur just by random chance, (ii) relatively unlikely to occur but might occur occasionally, or (iii) very unlikely to ever occur just by random chance:

A. –7       **Response** Very unlikely to occur by random chance

B. 1        **Response**        Very likely to occur by random chance

C. –3       **Response**    Relatively unlikely to occur but might occur occasionally

D. –0.5     **Response** Very likely to occur by random chance

TASK 2.4 Report the actual difference in means observed in the study. Using correct notation, provide the value of the sample statistic.

**Response** xbar1-xbar2 = 4.38

TASK 2.5 Where does the sample statistic lie in the randomization distribution? Is it likely or unlikely to occur just by random chance?

**Response** This lies beyond the 95th percentile in the distribution and is unlikely to occur by random chance.

TASK 2.7 Complete the interpretation for the p-value:

If mosquitoes are equally attracted to beer-drinkers and water drinkers, the chance that we see a sample statistic of 4.38__ or any statistic ____larger then 4.38_______ is ________almost 0_______.

TASK 2.8 Use your randomization distribution from 2.3 to match the sample statistics below with the closest of these four p-values: 0.001, 0.15, 0.77 and 0.007. Note that is possible to do this without performing any calculations.

Sample statistic p-value
1 .248
1.5 .147
3.25 .0068
4.4 0

Part 3

For each of the settings below, answer three question.

3.1 To test \(H_0: \mu = 50\) vs \(H_a: \mu < 50\) using sample data with \(\bar{x}\) = 43.7:

A. Where will the randomization distribution be centered?       
  
  **Response** 50

B. Is this a left-tail test, a right-tail test, or a two-tail test?     
  
  **Response** Left tail test

C. How can we find the p-value once we have the randomization distribution? *Example answer: Find the proportion of randomization statistics that are to the _________ of the sample statistic of ______.*

  **Response** Find the proportion of randomization statistics that are to the left of the sample statistic of 43.7

3.2 To test \(H_0: p_1 = p_2\) vs \(H_a: p_1 \ne p_2\) using sample data with \(\hat{p}_1 - \hat{p}_2 = -0.52\):

A. Where will the randomization distribution be centered?       
  
  **Response** 0

B. Is this a left-tail test, a right-tail test, or a two-tail test?     
  
  **Response**Two tail test

C. How can we find the p-value once we have the randomization distribution?     
  
  **Response** find the proportion of randomization statistics that are to the left of -.52 and to the right of .52 and double the smallest proportion.

3.3 To test \(H_0: p = 0.75\) vs \(H_a: p < 0\) using sample data with \(\hat{p} = 0.58\):

A. Where will the randomization distribution be centered?       
  
  **Response** .75

B. Is this a left-tail test, a right-tail test, or a two-tail test?     
  
  **Response** Left tail test

C. How can we find the p-value once we have the randomization distribution?     
  
  **Response** find the proportion of randomization statistics that are to the left of the sample statistic of .58.