The files flouride.csv and arsenic.csv were downloaded from the Maine Tracking Network and contain flouride and arsenic levels, by town, for private well water samples tested by the State of Maine Health and Environmental Testing Laboratory (HETL) between the years 1999 and 2013.

For locations with fewer than 20 wells tested, only the number of wells tested and the maximum value are displayed. All test results reported as less than the laboratory’s limit of detection were replaced with a value that is one-half of the detection limit. Unit abbreviations are: mg/L for milligrams per liter, ug/L for micrograms per liter.

Maine’s Maximum Exposure Guideline for fluoride is 2 milligrams per liter (mg/L). For arsenic is 10 micrograms per liter (ug/L).

The State of Maine Health and Environmental Testing Laboratory provided these data. The table was prepared by the Maine Environmental Public Health Tracking Program. The complete data set contains water test results from 46,855 private wells in Maine. Revision Date: 08/2015.

Goals

The purpose of this study is to identify the locations in Maine with the highest percentages of wells that were tested above state’s maximum exposure guidelines for both fluoride and arsenic. These results will be shared with the policy makers in hopes to remedy this alarming situation with the drinking water in Maine.

Data Cleaning

I started this very important work by loading a few libraries and reading the data files flouride.csv and arsenic.csv into corresponding data frames that were named arsenic and fluoride respectively.

library(tidyr)  
## Warning: package 'tidyr' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.3
library(ggvis)
## Warning: package 'ggvis' was built under R version 3.3.3
arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)  
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)

I used the command “names” to check the tables’ headings for consistency:

names(arsenic)
## [1] "location"                      "n_wells_tested"               
## [3] "percent_wells_above_guideline" "median"                       
## [5] "percentile_95"                 "maximum"
names(flouride)
## [1] "location"                      "n_wells_tested"               
## [3] "percent_wells_above_guideline" "median"                       
## [5] "percentile_95"                 "maximum"

The final step in the cleaning process involved re-naming some of the variable names, so that they better reflect the data:

names(arsenic) <- c("Location", "Arsenic_Wells_Tested", "Arsenic_Above_Guidelines", "Arsenic_Median", "Arsenic_Percentile_95", "Arsenic_Maximum")
names(arsenic)
## [1] "Location"                 "Arsenic_Wells_Tested"    
## [3] "Arsenic_Above_Guidelines" "Arsenic_Median"          
## [5] "Arsenic_Percentile_95"    "Arsenic_Maximum"
names(flouride) <- c("Location", "Fluoride_Wells_Tested", "Flouride_Above_Guidelines", "Fluoride_Median", "Fluoride_Percentile_95", "Fluoride_Maximum")
names(flouride)
## [1] "Location"                  "Fluoride_Wells_Tested"    
## [3] "Flouride_Above_Guidelines" "Fluoride_Median"          
## [5] "Fluoride_Percentile_95"    "Fluoride_Maximum"

Analysis

To address the issue with arsenic contamination, one needs to see which locations have the most wells with arsenic levels above 10 micrograms per liter (ug/L).

Most_Arsenic_Wells<- arsenic %>% select(Location, Arsenic_Above_Guidelines) %>% top_n(25)
## Selecting by Arsenic_Above_Guidelines
## Warning: package 'bindrcpp' was built under R version 3.3.3
kable(Most_Arsenic_Wells, digits = 1)
Location Arsenic_Above_Guidelines
Manchester 58.9
Gorham 50.1
Columbia 50.0
Monmouth 49.5
Eliot 49.3
Columbia Falls 48.0
Winthrop 44.8
Hallowell 44.6
Buxton 43.4
Blue Hill 42.7
Litchfield 42.0
Hollis 41.4
Orland 40.7
Surry 40.3
Danforth 40.0
Mariaville 40.0
Readfield 39.8
Otis 39.6
Dayton 37.7
Sedgwick 37.3
Mercer 36.4
Scarborough 35.2
Saco 34.4
Camden 34.0
Trenton 33.7

The same is easily done with the data that we have on flouride:

Most_Flouride_Wells <- flouride %>% select(Location, Flouride_Above_Guidelines) %>% top_n(25)
## Selecting by Flouride_Above_Guidelines
kable(Most_Flouride_Wells, digits = 1)
Location Flouride_Above_Guidelines
Otis 30.0
Dedham 22.5
Denmark 19.6
Surry 18.3
Prospect 17.5
Eastbrook 16.1
Mercer 15.6
Fryeburg 15.4
Brownfield 15.2
Stockton Springs 14.3
Clifton 14.0
Starks 13.6
Marshfield 12.9
Kennebunk 12.7
Charlotte 12.5
York 12.4
Chesterville 12.3
Stoneham 12.0
Sedgwick 11.2
Mechanic Falls 11.1
Swans Island 10.5
Franklin 10.3
Smithfield 10.1
Biddeford 9.7
Otisfield 9.7

Finally, I merged both files to see the locations that are in the most grave danger of poisoning from BOTH chemicals:

Arsenic_Flouride_Wells <- Most_Arsenic_Wells %>% inner_join (Most_Flouride_Wells)## Joining, by = "Location"
## Joining, by = "Location"
kable(Arsenic_Flouride_Wells)
Location Arsenic_Above_Guidelines Flouride_Above_Guidelines
Surry 40.3 18.3
Otis 39.6 30.0
Sedgwick 37.3 11.2
Mercer 36.4 15.6

Conclusion

The locations in Maine that have the most arsenic and fluoride contamination are: Surry, Otis, Sedgwick and Mercer. The local government should take urgent actions in the area to remedy this alarming situation.