Flint is the second poorest city of its size in the United States and has spent six of the past 15 years in a state of financial emergency. One of the cost-cutting measures taken by emergency managers was to stop buying water, sourced from Lake Huron, from the Detroit Water and Sewerage Department. Instead, Flint would use the Flint River for its water supply while waiting for a new pipeline to Lake Huron to be opened. The move was expected to save roughly $5 million over a period of two years.
The Flint River supply was switched on in April 2014. Not long after, problems arose. Flint resident and mother of four LeeAnne Walters noticed that the water coming out of her taps was orange. More worryingly, her family’s hair was falling out, her preschool sons had broken out in rashes and one of them had stopped growing.
The orange colour was from iron, but the family’s symptoms pointed to a far more dangerous contaminant: lead. (Langkjaer - Bain 2017)
The data set consists of 271 homes sampled with three water lead contaminant values at designated time points. The lead content is in parts per billion (ppb). Additionally, some location data is given about each home.
To get started, read in the flint.csv file using the function read.csv, as was done in ica-01-17-19. However, you do not need to use the attach function. The data set has five variables:
Before you get started, read The murky tale of Flint’s deceptive water data by Langkjaer - Bain (2017).
How many unique zip codes are in the data set? How many unique wards are in the data set? Do the number of wards in the data set match how many wards Flint has? Suggest a way to handle this issue.
Compute the median lead value for each draw. Compute the mean lead value for each draw. Create a histogram of the lead values for the first draw and comment on the histogram’s shape.
Compute the sample quantile for the 85th percentile of lead values for each draw. Comment on what you observe. Is any draw above the EPA action threshold level?
Recreate the below plot based on data from zip code 48503.
What is the largest lead value? What draw and zip code does it belong to? Comment on how we should handle this value if further statistical analysis were to be performed. What is the smallest lead value? What draw and zip code does it belong to?
Based on each draw, compute z-scores for the lead values. How many z-scores exceed three in absolute value for each draw?
Based on your analysis in questions 1-6, does it seem that flushing the water decreases the lead content? You may include further code and visualizations.
The deadline to submit Homework 1 is 11:59pm on Thursday, January 31. Submit your work by uploading only your Rmd file through Google Classroom. Late work will not be accepted except under certain extraordinary circumstances.
Post your questions in the #hw1 channel on Slack. If you are trying to get help on a code error, explain your error in detail or give a reproducible example that generates the same error. Make use of the code snippet option available in Slack.
Feel free to visit Scott or I in office hours or make an appointment.
Communicate with your classmates, but do not share large snippets of code.
Scott or I will not answer any questions within the first 24 hours of this homework being assigned, and we will not answer any questions within 6 hours of the deadline.
This is an individual assignment. However, you may discuss ideas, how to debug code, and how to approach a problem with your classmates. You may not copy-and-paste another individual’s code from this class. As a reminder, below is the policy on sharing and using other’s code.
Similar reproducible examples (reprex) exist online that will help you answer many of the questions posed on in-class assignments, pre-class assignments, homework assignments, and midterm exams. Use of these resources is allowed unless it is written explicitly on the assignment. You must always cite any code you copy or use as inspiration. Copied code without citation is plagiarism and will result in a 0 for the assignment.
You must use R Markdown. Formatting is at your discretion but is graded. Use the in-class assignments and resources available online for inspiration. Another useful resource for R Markdown formatting is available at: https://holtzy.github.io/Pimp-my-rmd/
| Topic | Points |
|---|---|
| Questions 1-7 | 63 |
| R Markdown formatting | 10 |
| Communication of results | 7 |
| Knit | 7 |
| Code style | 7 |
| Named code chunks | 6 |
| Total | 100 |
A bonus of up to 3 points can be earned for implementing format or style that goes beyond the scope of the R Markdown Reference Guide.