Refer to the detailed instructions for this assignment in Brightspace.
Don’t alter the three code chunks in this section. First we read in the two data sets and deleting missing values.
library(tidyverse)
fluoride <- read_csv("http://jamessuleiman.com/teaching/datasets/fluoride.csv")
fluoride <- fluoride %>% drop_na()
arsenic <- read_csv("http://jamessuleiman.com/teaching/datasets/arsenic.csv")
arsenic <- arsenic %>% drop_na()
Next we display the first few rows of fluoride.
head(fluoride)
## # A tibble: 6 x 6
## location n_wells_tested percent_wells_above_gui… median percentile_95 maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Otis 60 30 1.13 3.2 3.6
## 2 Dedham 102 22.5 0.94 3.27 7
## 3 Denmark 46 19.6 0.45 3.15 3.9
## 4 Surry 175 18.3 0.8 3.52 6.9
## 5 Prospect 57 17.5 0.785 2.5 2.7
## 6 Eastbrook 31 16.1 1.29 2.44 3.3
Then we display the first few rows of arsenic.
head(arsenic)
## # A tibble: 6 x 6
## location n_wells_tested percent_wells_above_g… median percentile_95 maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Manchester 275 58.9 14 93 200
## 2 Gorham 467 50.1 10.5 130 460
## 3 Columbia 42 50 9.8 65.9 200
## 4 Monmouth 277 49.5 10 110 368
## 5 Eliot 73 49.3 9.7 41.4 45
## 6 Columbia F… 25 48 8.1 53.8 71
In the code chunk below, create a new tibble called chemicals that joins fluoride and arsenic. You probably want to do an inner join but the join type is up to you.
chemicals <- inner_join(arsenic, fluoride, by="location")
The next code chunk displays the head of your newly created chemicals tibble. Take a look to verify that your join looks ok.
head(chemicals)
## # A tibble: 6 x 11
## location n_wells_tested.x percent_wells_a… median.x percentile_95.x maximum.x
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Manches… 275 58.9 14 93 200
## 2 Gorham 467 50.1 10.5 130 460
## 3 Columbia 42 50 9.8 65.9 200
## 4 Monmouth 277 49.5 10 110 368
## 5 Eliot 73 49.3 9.7 41.4 45
## 6 Columbi… 25 48 8.1 53.8 71
## # … with 5 more variables: n_wells_tested.y <dbl>,
## # percent_wells_above_guideline.y <dbl>, median.y <dbl>,
## # percentile_95.y <dbl>, maximum.y <dbl>
In the code chunk below create an interesting subset of the data. You’ll likely find an interesting subset by filtering for locations that have high or low levels of arsenic, flouride, or both.
percent_wells_abv_avg <- chemicals %>% rename(percent_wells_above_arsenic_guideline = percent_wells_above_guideline.x, percent_wells_above_fluoride_guideline = percent_wells_above_guideline.y) %>% filter(percent_wells_above_arsenic_guideline > mean(percent_wells_above_arsenic_guideline) & percent_wells_above_fluoride_guideline > mean(percent_wells_above_fluoride_guideline)) %>% select(location, percent_wells_above_fluoride_guideline, percent_wells_above_arsenic_guideline)
arrange(percent_wells_abv_avg, desc(percent_wells_above_fluoride_guideline))
## # A tibble: 40 x 3
## location percent_wells_above_fluoride_g… percent_wells_above_arsenic_g…
## <chr> <dbl> <dbl>
## 1 Otis 30 39.6
## 2 Dedham 22.5 17.5
## 3 Surry 18.3 40.3
## 4 Mercer 15.6 36.4
## 5 Stockton Spri… 14.3 15.9
## 6 Clifton 14 19.4
## 7 Starks 13.6 28.6
## 8 Sedgwick 11.2 37.3
## 9 Franklin 10.3 17.6
## 10 Smithfield 10.1 14.6
## # … with 30 more rows
Looking at the data initially, I began to recall that our house has special water filters because of the arsenic levels in the well water (I live right next to Buxton). I was curious if the area we live in also has high fluoride levels, so I chose to investigate which towns had high arsenic and fluoride levels in private well water. To do this, I first considered looking at the maximum levels of arsenic (ug/L) and fluoride (mg/L) by town and sort the data in descending order. I also realized the maximum variable might be unreliable because it included towns with fewer than 20 wells tested. Then I decided that the percent of wells above the guideline was a better indication of the prevalence of arsenic and fluoride in well water by towns in Maine. I calculated the means of “percent of wells above the guideline” for both arsenic and fluoride. I then filtered the chemicals tibble to show only the towns with “percent of wells above the guideline” above the mean. I then arranged the data in descending order to see the towns with the high percentages of wells with fluoride and arsenic above the guidelines. While Buxton has a high percentage of wells above the guidelines for arsenic (43.4), it is not in this subset, meaning the percent of wells above the guideline for fluoride was not above the mean. I can now visualize which towns have a high percentage of wells with fluoride and arsenic, mostly south of Bangor toward the coast (Stockton Springs, Surry, Otis, Clifton, Dedham).
Display the first few rows of your interesting subset in the code chunk below.
top_n(percent_wells_abv_avg, 6, percent_wells_above_fluoride_guideline)
## # A tibble: 6 x 3
## location percent_wells_above_fluoride_gu… percent_wells_above_arsenic_g…
## <chr> <dbl> <dbl>
## 1 Surry 18.3 40.3
## 2 Otis 30 39.6
## 3 Mercer 15.6 36.4
## 4 Clifton 14 19.4
## 5 Dedham 22.5 17.5
## 6 Stockton Spri… 14.3 15.9
In the code chunk below, create a ggplot visualization of your subset that is fairly simple for a viewer to comprehend.
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
Once you are done, knit, publish, and then submit your link to your published RPubs document in Brightspace.