This group assignment covers probability and statistical inference. At a minimum you will need packages tidyverse and infer.
In the questions that follow, you must use tidyverse code and infer code when applicable to earn full credit.
Suppose the Sixers and the Warriors are playing in the NBA finals (a seven game series). The first team to win four games wins the series. Assume the Sixers have a win probability for each game of 0.483. If the Sixers lose the first game, what is the probability that they win the series? Perform a Monte Carlo simulation to answer this question.
In a series of 100 fair coin flips, on average, what is the longest consecutive run of either heads or tails? What about for 1000 fair coin flips? Perform a Monte Carlo simulation to answer this question. Hint: look at function rle().
For example, suppose in 10 coin flips we observe
\[\{H, H, H, T, T, H, T, T, T, T\},\] then the longest run is four.
Randomly select two cards from a standard 52 card deck. If the two cards are replaced back in the deck and the deck is shuffled randomly, what is the probability that the two cards will be next to each other at any location within the deck? Perform a Monte Carlo simulation to answer this question.
Find a poll where respondents were asked a question that yielded a binary response. Create a 99% bootstrap confidence interval for the population proportion \(p\). Give a non-technical interpretation of your result. Cite the poll in the Reference section of your R Markdown document.
Read Statistical inference using bootstrap confidence intervals. Use data frame object cars (available in R) to create a 95% bootstrap confidence interval for the population correlation between variables speed and dist. Your answer should include a well-labelled plot of the bootstrap distribution.
Researchers interested in lead exposure due to car exhaust sampled the blood of 52 police officers subjected to constant inhalation of automobile exhaust fumes while working traffic enforcement in a primarily urban environment. The blood samples of these officers had an average lead concentration of 124.32 \(\mu g/l\) and a standard deviation of 37.74 \(\mu g/l\); a previous study of individuals from a nearby suburb, with no history of exposure, found an average blood level concentration of 35 \(\mu g/l\).
Test the hypothesis that the downtown police officers have a higher lead exposure than the group in the previous study. Interpret your results in context.
Use package gifski to create a GIF of anything except to show the Central Limit Theorem and a PacMan as is demonstrated here.
The deadline to submit Homework 3 is 11:59pm on Thursday, March 28. Submit your group’s work by uploading only one Rmd file through Google Classroom. Group assignments are available on Google Classroom. Late work will not be accepted except under certain extraordinary circumstances.
Post your questions in the #hw3 channel on Slack. If posting on Slack, explain your error in detail or give a reproducible example that generates the same error. Make use of the code snippet option available in Slack.
Work through challenges together as a group, but questions are always welcome and encouraged.
Scott or I will not answer any questions within the first 24 hours of this homework being assigned, and we will not answer any questions within 6 hours of the deadline.
This is a group assignment. Discussion with other groups should be limited to ideas on how to approach a problem or debug a small code error. You may not copy-and-paste another group’s code from this class. As a reminder, below is the policy on sharing and using other’s code from outside sources.
Similar reproducible examples (reprex) exist online that will help you answer many of the questions posed on in-class assignments, pre-class assignments, homework assignments, and midterm exams. Use of these resources is allowed unless it is written explicitly on the assignment. You must always cite any code you copy or use as inspiration. Copied code without citation is plagiarism and will result in a 0 for the assignment.
You must use R Markdown. Formatting is at your discretion but is graded. Use the in-class assignments and resources available online for inspiration. Another useful resource for R Markdown formatting is available at: https://holtzy.github.io/Pimp-my-rmd/
| Topic | Points |
|---|---|
| Questions 1-6 | 72 |
tidyverse and infer code |
10 |
| R Markdown formatting | 4 |
| Communication of results | 4 |
| Code style | 4 |
| Named code chunks | 3 |
| Knit | 3 |
| Total | 100 |
A bonus of up to 5 points can be earned for Question 7.
All group members will receive the same grade except under certain narrow circumstances.