First we read in the two data sets and delete missing values.
library(tidyverse)
fluoride <- read_csv("http://jamessuleiman.com/teaching/datasets/fluoride.csv")
fluoride <- fluoride %>% drop_na()
arsenic <- read_csv("http://jamessuleiman.com/teaching/datasets/arsenic.csv")
arsenic <- arsenic %>% drop_na()
Next we display the first few rows of fluoride.
head(fluoride)
## # A tibble: 6 x 6
## location n_wells_tested percent_wells_above_gui… median percentile_95 maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Otis 60 30 1.13 3.2 3.6
## 2 Dedham 102 22.5 0.94 3.27 7
## 3 Denmark 46 19.6 0.45 3.15 3.9
## 4 Surry 175 18.3 0.8 3.52 6.9
## 5 Prospect 57 17.5 0.785 2.5 2.7
## 6 Eastbrook 31 16.1 1.29 2.44 3.3
Then we display the first few rows of arsenic.
head(arsenic)
## # A tibble: 6 x 6
## location n_wells_tested percent_wells_above_g… median percentile_95 maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Manchester 275 58.9 14 93 200
## 2 Gorham 467 50.1 10.5 130 460
## 3 Columbia 42 50 9.8 65.9 200
## 4 Monmouth 277 49.5 10 110 368
## 5 Eliot 73 49.3 9.7 41.4 45
## 6 Columbia F… 25 48 8.1 53.8 71
In the code chunk below, We create a new datafrane called chemicals that joins fluoride and arsenic.
chemicals <- inner_join(arsenic,fluoride, by="location",suffix=c(" (arsenic)"," (fluoride)"))
Lets take a look at the join.
head(chemicals)
## # A tibble: 6 x 11
## location `n_wells_tested… `percent_wells_… `median (arseni… `percentile_95 …
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Manches… 275 58.9 14 93
## 2 Gorham 467 50.1 10.5 130
## 3 Columbia 42 50 9.8 65.9
## 4 Monmouth 277 49.5 10 110
## 5 Eliot 73 49.3 9.7 41.4
## 6 Columbi… 25 48 8.1 53.8
## # … with 6 more variables: `maximum (arsenic)` <dbl>, `n_wells_tested
## # (fluoride)` <dbl>, `percent_wells_above_guideline (fluoride)` <dbl>,
## # `median (fluoride)` <dbl>, `percentile_95 (fluoride)` <dbl>, `maximum
## # (fluoride)` <dbl>
##Subset
In the code chunk below, we create subsets of the data showing low levels and high levels of arsenic & fluoride.
high_arsenic <- chemicals %>% filter(`median (arsenic)` >= 10)
low_arsenic <- chemicals %>% filter(`median (arsenic)` < 10)
high_fluoride <- chemicals %>% filter(`median (fluoride)` >= 2)
low_fluoride <- chemicals %>% filter(`median (fluoride)` < 2)
For our final analysis, lets focus on towns with high arsenic levels. Counties also included.
high_arsenic_counties <- high_arsenic %>% mutate(county = c("Kenebec","Cumberland","Kenebec"))
head(high_arsenic_counties[,c(1,4,12)])
## # A tibble: 3 x 3
## location `median (arsenic)` county
## <chr> <dbl> <chr>
## 1 Manchester 14 Kenebec
## 2 Gorham 10.5 Cumberland
## 3 Monmouth 10 Kenebec
Lets now look at the counties with at least one town/city where the average arsenic level was greater than the required max limit as stated in the Maine’s Maximum Exposure Guideline
As shown in the map above, 2 out of the 16 counties in Maine had at least one town/city where the average arsenic level was greater than the required max limit as stated in the Maine’s Maximum Exposure Guideline.This is especially problematic since high levels of arsenic in water can have detrimental effects on the health of the public. I found the town of Gorham, Manchester, and Monmouth from Kennebec and Cumberland County respectively, to have an elevated amount of arsenic. Please note that I didn’t consider towns with arsenic levels slightly lower than 10 micrograms per liter (ug/L) which could mean that we are not accounting for towns and counties with potentially high arsenic levels. For example if I set the limit for high arsenic levels at 9 ug/L, an extra 2 towns is added to the list.In summary, we can say that arsenic levels are low in a majority of towns and counties but even one city with high arsenic levels is too much.