This is an analysis of data collected by the Maine Department of Health and Human Services on levels of arsenic and flouride in private well water between 1999 and 2013.
Over that period of time, nearly 47,000 wells were tested statewide. Roughly half of Mainers are reliant on private well water, the highest share of any state in the nation.
Public water is regulated by the federal government, but well water is unregulated. In Maine, much of this water is contaminated with naturally occurring chemicals, including arsenic and flouride.
Arsenic, a carcinogen, may be the main culprit: Statewide, 150,000 people may be drinking from wells testing higher the federal standard for arsenic, according to a Dartmouth College study, and another study from the University of New Hampshire and Columbia University could lower IQ levels in kids. Excess flouride levels have also been linked to tooth pitting and brittle bones.
My analysis focused on one question: Which cities and towns in Maine have the biggest problem with elevated levels of both arsenic and flouride?
First, I loaded the relevant dplyr, knitr and ggvis packages into R.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library (knitr)
library(ggvis)
Then, I loaded the separate arsenic and flouride data files containing well data — including the number of wells tested and how many of them had higher-than-recommended levels of flouride and arsenic — for 917 Maine locations into R.
arsenic <- read.csv("http://jamessuleiman.com/mba676/assets/units/unit4/arsenic.csv", header=TRUE, stringsAsFactors = FALSE)
flouride <- read.csv("http://jamessuleiman.com/mba676/assets/units/unit4/flouride.csv", header=TRUE, stringsAsFactors = FALSE)
I used to the merge function to combine those two data sets.
ar_fl <- merge(arsenic, flouride, by="location", header=TRUE)
Then, I renamed all 11 columns in the dataset to denote if columns contained information about arsenic or flouride.
names(ar_fl) <- c("Municipality", "Wells_Tested_A", "Percent_above_guidelines_A", "Median_A", "Percentile_95_A", "Max_A", "Wells_Tested_F", "Percent_above_guidelines_F", "Median_F", "Percentile_95_F", "Max_F")
Then, I filtered this dataset to exclude all municipalities where fewer than 20 wells — the threshold for complete data collection — were tested.
(Note: This doesn’t imply that other places don’t have issues with groundwater contaminants, just that data is insufficient for the analysis.)
limit_ar_fl <- ar_fl %>% select(Municipality, Wells_Tested_A, Percent_above_guidelines_A, Wells_Tested_F, Percent_above_guidelines_F) %>% filter(Wells_Tested_A >= 20, Wells_Tested_F >= 20)
This narrowed the dataset from 917 locations to 341, getting rid of all “N/A” observations.
Afterward, I filtered the data again to isolate the cities and towns where 10 percent or more of wells — the standard I used to denote the issue in our question — measured at or above the recommended limit for arsenic and flouride.
over10 <- limit_ar_fl %>% filter(Percent_above_guidelines_A >= 10, Percent_above_guidelines_F >= 10) %>% arrange(desc(Percent_above_guidelines_A))
That leaves us with 12 cities and towns above the threshold in our question.
Here they are in a chart, ranked by the cities and towns with the highest arsenic levels.
chart <- over10
kable(chart)
Municipality | Wells_Tested_A | Percent_above_guidelines_A | Wells_Tested_F | Percent_above_guidelines_F |
---|---|---|---|---|
Surry | 181 | 40.3 | 175 | 18.3 |
Otis | 53 | 39.6 | 60 | 30.0 |
Sedgwick | 142 | 37.3 | 143 | 11.2 |
Mercer | 33 | 36.4 | 32 | 15.6 |
Starks | 21 | 28.6 | 22 | 13.6 |
Clifton | 31 | 19.4 | 43 | 14.0 |
Franklin | 74 | 17.6 | 107 | 10.3 |
Dedham | 97 | 17.5 | 102 | 22.5 |
Stockton Springs | 63 | 15.9 | 56 | 14.3 |
Smithfield | 82 | 14.6 | 79 | 10.1 |
Kennebunk | 94 | 11.7 | 110 | 12.7 |
Eastbrook | 28 | 10.7 | 31 | 16.1 |
A scatterplot with arsenic levels on the X-axis and flouride levels on the Y-axis gives us another view.
chart %>% ggvis(~Percent_above_guidelines_A, ~Percent_above_guidelines_F) %>% layer_points
Of our municipalities, Otis, in eastern Maine, is the town most impacted by both chemicals, with nearly 40 percent of wells testing high for arsenic and 30 percent testing high for flouride.
With the exception of Kennebunk in southern Maine, the towns in our subset are clustered in the eastern and central parts of the state, making that a good place for the state to focus on wll remediation efforts.