The files flouride.csv and arsenic.csv were downloaded from the Maine Tracking Network and contain flouride and arsenic levels, by town, for private well water samples tested by the State of Maine Health and Environmental Testing Laboratory (HETL) between the years 1999 and 2013.
For locations with fewer than 20 wells tested, only the number of wells tested and the maximum value are displayed. All test results reported as less than the laboratory’s limit of detection were replaced with a value that is one-half of the detection limit. Unit abbreviations are: mg/L for milligrams per liter, ug/L for micrograms per liter.
Maine’s Maximum Exposure Guideline for fluoride is 2 milligrams per liter (mg/L). For arsenic is 10 micrograms per liter (ug/L).
The State of Maine Health and Environmental Testing Laboratory provided these data. The table was prepared by the Maine Environmental Public Health Tracking Program. The complete data set contains water test results from 46,855 private wells in Maine. Revision Date: 08/2015.
The purpose of this study is to identify the locations in Maine with the highest percentages of wells that were tested above state’s maximum exposure guidelines for both fluoride and arsenic. These results will be shared with the policy makers in hopes to remedy this alarming situation with the drinking water in Maine.
I started this very important work by loading a few libraries and reading the data files flouride.csv and arsenic.csv into corresponding data frames that were named arsenic and fluoride respectively.
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.3
library(ggvis)
## Warning: package 'ggvis' was built under R version 3.3.3
arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)
I used the command “names” to check the tables’ headings for consistency:
names(arsenic)
## [1] "location" "n_wells_tested"
## [3] "percent_wells_above_guideline" "median"
## [5] "percentile_95" "maximum"
names(flouride)
## [1] "location" "n_wells_tested"
## [3] "percent_wells_above_guideline" "median"
## [5] "percentile_95" "maximum"
The final step in the cleaning process involved re-naming some of the variable names, so that they better reflect the data:
names(arsenic) <- c("Location", "Arsenic_Wells_Tested", "Arsenic_Above_Guidelines", "Arsenic_Median", "Arsenic_Percentile_95", "Arsenic_Maximum")
names(arsenic)
## [1] "Location" "Arsenic_Wells_Tested"
## [3] "Arsenic_Above_Guidelines" "Arsenic_Median"
## [5] "Arsenic_Percentile_95" "Arsenic_Maximum"
names(flouride) <- c("Location", "Fluoride_Wells_Tested", "Flouride_Above_Guidelines", "Fluoride_Median", "Fluoride_Percentile_95", "Fluoride_Maximum")
names(flouride)
## [1] "Location" "Fluoride_Wells_Tested"
## [3] "Flouride_Above_Guidelines" "Fluoride_Median"
## [5] "Fluoride_Percentile_95" "Fluoride_Maximum"
To address the issue with arsenic contamination, one needs to see which locations have the most wells with arsenic levels above 10 micrograms per liter (ug/L).
Most_Arsenic_Wells<- arsenic %>% select(Location, Arsenic_Above_Guidelines) %>% top_n(25)
## Selecting by Arsenic_Above_Guidelines
## Warning: package 'bindrcpp' was built under R version 3.3.3
kable(Most_Arsenic_Wells, digits = 1)
| Location | Arsenic_Above_Guidelines |
|---|---|
| Manchester | 58.9 |
| Gorham | 50.1 |
| Columbia | 50.0 |
| Monmouth | 49.5 |
| Eliot | 49.3 |
| Columbia Falls | 48.0 |
| Winthrop | 44.8 |
| Hallowell | 44.6 |
| Buxton | 43.4 |
| Blue Hill | 42.7 |
| Litchfield | 42.0 |
| Hollis | 41.4 |
| Orland | 40.7 |
| Surry | 40.3 |
| Danforth | 40.0 |
| Mariaville | 40.0 |
| Readfield | 39.8 |
| Otis | 39.6 |
| Dayton | 37.7 |
| Sedgwick | 37.3 |
| Mercer | 36.4 |
| Scarborough | 35.2 |
| Saco | 34.4 |
| Camden | 34.0 |
| Trenton | 33.7 |
The same is easily done with the data that we have on flouride:
Most_Flouride_Wells <- flouride %>% select(Location, Flouride_Above_Guidelines) %>% top_n(25)
## Selecting by Flouride_Above_Guidelines
kable(Most_Flouride_Wells, digits = 1)
| Location | Flouride_Above_Guidelines |
|---|---|
| Otis | 30.0 |
| Dedham | 22.5 |
| Denmark | 19.6 |
| Surry | 18.3 |
| Prospect | 17.5 |
| Eastbrook | 16.1 |
| Mercer | 15.6 |
| Fryeburg | 15.4 |
| Brownfield | 15.2 |
| Stockton Springs | 14.3 |
| Clifton | 14.0 |
| Starks | 13.6 |
| Marshfield | 12.9 |
| Kennebunk | 12.7 |
| Charlotte | 12.5 |
| York | 12.4 |
| Chesterville | 12.3 |
| Stoneham | 12.0 |
| Sedgwick | 11.2 |
| Mechanic Falls | 11.1 |
| Swans Island | 10.5 |
| Franklin | 10.3 |
| Smithfield | 10.1 |
| Biddeford | 9.7 |
| Otisfield | 9.7 |
Finally, I merged both files to see the locations that are in the most grave danger of poisoning from BOTH chemicals:
Arsenic_Flouride_Wells <- Most_Arsenic_Wells %>% inner_join (Most_Flouride_Wells)## Joining, by = "Location"
## Joining, by = "Location"
kable(Arsenic_Flouride_Wells)
| Location | Arsenic_Above_Guidelines | Flouride_Above_Guidelines |
|---|---|---|
| Surry | 40.3 | 18.3 |
| Otis | 39.6 | 30.0 |
| Sedgwick | 37.3 | 11.2 |
| Mercer | 36.4 | 15.6 |
The locations in Maine that have the most arsenic and fluoride contamination are: Surry, Otis, Sedgwick and Mercer. The local government should take urgent actions in the area to remedy this alarming situation.