Exploring Fluoride and Arsenic Levels above Guidelines in Private Wells in Maine

Jaclyn Janis

MPH 676, University of Southern Maine, Fall 2018

Purpose

According to the Maine Tracking Network, in 2014, 56.7% of homes in Maine used private wells as their water supply. Well water is at risk for contaminants such as fluoride and arsenic, which if ingested at levels above exposure guidelines can result in adverse health effects.

The following assignment uses fluoride and arsenic data collected by the State of Maine Health and Environmental Testing Laboratory (HETL) from 1999-2013 across 46,855 private wells in Maine to answer the following questions:

What percent of all wells tested for fluoride had levels above guidelines?
What percent of all wells tested for arsenic had levels above guidelines?
How many towns have both fluoride and arsenic levels above guidelines?

Preparing the Data

The following datasets were downloaded from the Maine Tracking Network via the class website: flouride.csv and arsenic.csv.

library(dplyr)
library(knitr)
fluoride <- read.csv("flouride.csv")
arsenic <- read.csv("arsenic.csv")
kable(head(fluoride))

location	n_wells_tested	percent_wells_above_guideline	median	percentile_95	maximum
Otis	60	30.0	1.130	3.200	3.6
Dedham	102	22.5	0.940	3.270	7.0
Denmark	46	19.6	0.450	3.150	3.9
Surry	175	18.3	0.800	3.525	6.9
Prospect	57	17.5	0.785	2.500	2.7
Eastbrook	31	16.1	1.290	2.445	3.3

kable(head(arsenic))

location	n_wells_tested	percent_wells_above_guideline	median	percentile_95	maximum
Manchester	275	58.9	14.0	93.00	200
Gorham	467	50.1	10.5	130.00	460
Columbia	42	50.0	9.8	65.90	200
Monmouth	277	49.5	10.0	110.00	368
Eliot	73	49.3	9.7	41.35	45
Columbia Falls	25	48.0	8.1	53.75	71

In order to answer my questions, I organized my data by changing column names for ease and specificity to fluoride (“f”) or arsenic (“a”), selecting columns I wanted (location, n_wells_tested, and percent_wells_above_guideline), joining the datasets by location (town), and filtering out the towns that had no wells tested. I decided not to remove missing values at this time.

fluoride2 <- fluoride %>% rename(town = location, pct_f_above_guideline = percent_wells_above_guideline, n_wells_tested_f = n_wells_tested) %>% select(town, n_wells_tested_f, pct_f_above_guideline)
arsenic2 <- arsenic %>% rename(town = location, pct_a_above_guideline = percent_wells_above_guideline, n_wells_tested_a = n_wells_tested) %>% select(town, n_wells_tested_a, pct_a_above_guideline)
well_contam <- fluoride2 %>% full_join(arsenic2) %>% filter(n_wells_tested_f > 0 & n_wells_tested_a >0)
kable(head(well_contam))

town	n_wells_tested_f	pct_f_above_guideline	n_wells_tested_a	pct_a_above_guideline
Otis	60	30.0	53	39.6
Dedham	102	22.5	97	17.5
Denmark	46	19.6	42	0.0
Surry	175	18.3	181	40.3
Prospect	57	17.5	50	4.0
Eastbrook	31	16.1	28	10.7

Exploring the Data

What percent of all wells tested for fluoride had levels above guidelines?
What percent of all wells tested for arsenic had levels above guidelines?

To answer these questions, I need the number of wells (not the percent) that had fluoride or arsenic levels above guidelines and divide that by the total number of wells. I made two new columns: n_wells_above_fguidelines and n_wells_above_aguidelines.

well_contam2 <- well_contam %>% mutate(n_wells_above_fguidelines = round(pct_f_above_guideline * 0.01 * n_wells_tested_f, digits = 0), n_wells_above_aguidelines = round(pct_a_above_guideline * 0.01 * n_wells_tested_a, digits = 0))
kable(head(well_contam2))

town	n_wells_tested_f	pct_f_above_guideline	n_wells_tested_a	pct_a_above_guideline	n_wells_above_fguidelines	n_wells_above_aguidelines
Otis	60	30.0	53	39.6	18	21
Dedham	102	22.5	97	17.5	23	17
Denmark	46	19.6	42	0.0	9	0
Surry	175	18.3	181	40.3	32	73
Prospect	57	17.5	50	4.0	10	2
Eastbrook	31	16.1	28	10.7	5	3

Of 34,997 wells tested for fluoride, 2.44% had fluoride levels above guidelines.

sum(well_contam2$n_wells_tested_f, na.rm = TRUE)

## [1] 34997

sum(well_contam2$n_wells_above_fguidelines, na.rm=TRUE)/sum(well_contam2$n_wells_tested_f, na.rm = TRUE) *100

## [1] 2.440209

Of 31,167 wells tested for arsenic, 15.62% had arsenic levels above guidelines.

sum(well_contam2$n_wells_tested_a, na.rm = TRUE)

## [1] 31167

sum(well_contam2$n_wells_above_aguidelines, na.rm=TRUE)/sum(well_contam2$n_wells_tested_a, na.rm = TRUE) *100

## [1] 15.61908

How many towns have both fluoride and arsenic levels above guidelines?

Of the 556 towns that had wells tested for either fluoride or arsenic, 158 towns had well contaminations of fluoride and well contaminations of arsenic. I arranged them by descending number of wells that had arsenic levels above guidelines, simply because the health effects of arsenic seem a little more grave to me than those of fluoride, so I am choosing to make those more prominent.

two_contam <- well_contam2 %>% select(town, n_wells_above_fguidelines, n_wells_above_aguidelines) %>% filter(n_wells_above_fguidelines >0 & n_wells_above_aguidelines >0) %>% arrange(desc(n_wells_above_aguidelines))

two_contam

Discussion

In this assignment, I attempted to draw out descriptive information about the data in order to gain some understanding of what the contamination problem was and how many towns were affected by contaminated wells to any degree. It made the most sense to me to assess how many wells (possibly as a proxy for households?) had fluoride and/or arsenic levels above guidelines. After determining the number of wells affected by fluoride or arsenic by town, it struck me that the initial presentation of the data in percent of wells above the guideline by town made me mentally overestimate the percent of overall wells that were contaminated. This reminds me how much data presentation matters when attempting to communicate information, particularly the severity or extent of an issue. I look forward to gaining the skills that allow me to examine data like these visually and even geographically. One limitation of these datasets, however, is that all years are combined. I would be curious to see trends over time.

My next step in this exercise would be to map the towns that have fluoride- and arsenic-contaminated wells. I am also curious about well testing behavior; additional data on this are offered by the Maine HETL, though not by town. Testing behavior data were gathered from the Behavioral Risk Factor Surveillance System and show that in 2014, 47.9% of private well-using respondents responded “yes” to having tested their wells. Maybe the towns with the highest numbers of contaminated wells simply test more wells rather than have a heavier burden of contamination.