location - the name of the town, township, or regional area in Maine n_wells_tested - the number of wells tested percent_wells_above_guideline - percentage of wells that tested above the maximum exposure guideline median - mg/L for flouride, ug/L for arsenic percentile_95 - the 95th percentile readings in mg/L or ug/L maximum - the maximum readings in mg/L or ug/L Prepare a report by editing assign02.Rmd, that has an interesting narrative that focuses on a subset of the data you find interesting that includes both arsenic and fluoride data. Your report should be uploaded to RPubs, and you should copy the link to your RPubs report and paste in the text submission box below. You are required to join the data.
you must create a data frame or tibble that joins both arsenic and fluoride by location. (20 points) at least one table showing relevant data that is not so long that it overwhelms the report (consider using the head command). (10 points) at least one chart. For at least one of your charts, the code that created it must not be displayed. (10 points) a narrative discussing what you find interesting along with any issues you might have had preparing the data (10 points) published on RPubs (40 points) clickable link posted in Brightspace to your RPubs report (10 points)
Refer to the detailed instructions for this assignment in Brightspace.
Don’t alter the three code chunks in this section. First we read in the two data sets and deleting missing values.
library(tidyverse)
fluoride <- read_csv("http://jamessuleiman.com/teaching/datasets/fluoride.csv")
fluoride <- fluoride %>% drop_na()
arsenic <- read_csv("http://jamessuleiman.com/teaching/datasets/arsenic.csv")
arsenic <- arsenic %>% drop_na()
Next we display the first few rows of fluoride.
head(fluoride)
## # A tibble: 6 x 6
## location n_wells_tested percent_wells_above_gui… median percentile_95 maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Otis 60 30 1.13 3.2 3.6
## 2 Dedham 102 22.5 0.94 3.27 7
## 3 Denmark 46 19.6 0.45 3.15 3.9
## 4 Surry 175 18.3 0.8 3.52 6.9
## 5 Prospect 57 17.5 0.785 2.5 2.7
## 6 Eastbrook 31 16.1 1.29 2.44 3.3
Then we display the first few rows of arsenic.
head(arsenic)
## # A tibble: 6 x 6
## location n_wells_tested percent_wells_above_g… median percentile_95 maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Manchester 275 58.9 14 93 200
## 2 Gorham 467 50.1 10.5 130 460
## 3 Columbia 42 50 9.8 65.9 200
## 4 Monmouth 277 49.5 10 110 368
## 5 Eliot 73 49.3 9.7 41.4 45
## 6 Columbia F… 25 48 8.1 53.8 71
In the code chunk below, create a new tibble called chemicals that joins fluoride and arsenic. You probably want to do an inner join but the join type is up to you.
chemicals <- fluoride %>% inner_join(arsenic, by= "location")
The next code chunk displays the head of your newly created chemicals tibble. Take a look to verify that your join looks ok.
head(chemicals)
## # A tibble: 6 x 11
## location n_wells_tested.x percent_wells_a… median.x percentile_95.x maximum.x
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Otis 60 30 1.13 3.2 3.6
## 2 Dedham 102 22.5 0.94 3.27 7
## 3 Denmark 46 19.6 0.45 3.15 3.9
## 4 Surry 175 18.3 0.8 3.52 6.9
## 5 Prospect 57 17.5 0.785 2.5 2.7
## 6 Eastbro… 31 16.1 1.29 2.44 3.3
## # … with 5 more variables: n_wells_tested.y <dbl>,
## # percent_wells_above_guideline.y <dbl>, median.y <dbl>,
## # percentile_95.y <dbl>, maximum.y <dbl>
In the code chunk below create an interesting subset of the data. You’ll likely find an interesting subset by filtering for locations that have high or low levels of arsenic, flouride, or both.
# The code below displays the top 10 cities with the most arsenic and flouride combined
topTen <- chemicals %>% slice_max(percent_wells_above_guideline.x+percent_wells_above_guideline.y, n = 10) %>% select(location,percent_wells_above_guideline.x,percent_wells_above_guideline.y)
Edit this part to discuss how you selected your interesting subset.
I started by adding up arsenic and floride levels, then I sliced the top ten
Display the first few rows of your interesting subset in the code chunk below.
head(topTen)
## # A tibble: 6 x 3
## location percent_wells_above_guideline.x percent_wells_above_guideline.y
## <chr> <dbl> <dbl>
## 1 Otis 30 39.6
## 2 Manchester 3.3 58.9
## 3 Surry 18.3 40.3
## 4 Monmouth 3.1 49.5
## 5 Blue Hill 9.6 42.7
## 6 Mercer 15.6 36.4
In the code chunk below, create a ggplot visualization of your subset that is fairly simple for a viewer to comprehend.
library(ggplot2)
# Stacked barplot with multiple groups
ggplot(data=topTen, aes(x=factor(location), y=percent_wells_above_guideline.y, fill=percent_wells_above_guideline.x)) +geom_bar(stat="identity")
Once you are done, knit, publish, and then submit your link to your published RPubs document in Brightspace.