Exploring Fluoride and Arsenic Levels above Guidelines in Private Wells in Maine

Jaclyn Janis

MPH 676, University of Southern Maine, Fall 2018

Purpose

According to the Maine Tracking Network, in 2014, 56.7% of homes in Maine used private wells as their water supply. Well water is at risk for contaminants such as fluoride and arsenic, which if ingested at levels above exposure guidelines can result in adverse health effects.

The following assignment uses fluoride and arsenic data collected by the State of Maine Health and Environmental Testing Laboratory (HETL) from 1999-2013 across 46,855 private wells in Maine to answer the following questions:

  1. What percent of all wells tested for fluoride had levels above guidelines?
  2. What percent of all wells tested for arsenic had levels above guidelines?
  3. How many towns have both fluoride and arsenic levels above guidelines?

Preparing the Data

The following datasets were downloaded from the Maine Tracking Network via the class website: flouride.csv and arsenic.csv.

library(dplyr)
library(knitr)
fluoride <- read.csv("flouride.csv")
arsenic <- read.csv("arsenic.csv")
kable(head(fluoride))
location n_wells_tested percent_wells_above_guideline median percentile_95 maximum
Otis 60 30.0 1.130 3.200 3.6
Dedham 102 22.5 0.940 3.270 7.0
Denmark 46 19.6 0.450 3.150 3.9
Surry 175 18.3 0.800 3.525 6.9
Prospect 57 17.5 0.785 2.500 2.7
Eastbrook 31 16.1 1.290 2.445 3.3
kable(head(arsenic))
location n_wells_tested percent_wells_above_guideline median percentile_95 maximum
Manchester 275 58.9 14.0 93.00 200
Gorham 467 50.1 10.5 130.00 460
Columbia 42 50.0 9.8 65.90 200
Monmouth 277 49.5 10.0 110.00 368
Eliot 73 49.3 9.7 41.35 45
Columbia Falls 25 48.0 8.1 53.75 71

In order to answer my questions, I organized my data by changing column names for ease and specificity to fluoride (“f”) or arsenic (“a”), selecting columns I wanted (location, n_wells_tested, and percent_wells_above_guideline), joining the datasets by location (town), and filtering out the towns that had no wells tested. I decided not to remove missing values at this time.

fluoride2 <- fluoride %>% rename(town = location, pct_f_above_guideline = percent_wells_above_guideline, n_wells_tested_f = n_wells_tested) %>% select(town, n_wells_tested_f, pct_f_above_guideline)
arsenic2 <- arsenic %>% rename(town = location, pct_a_above_guideline = percent_wells_above_guideline, n_wells_tested_a = n_wells_tested) %>% select(town, n_wells_tested_a, pct_a_above_guideline)
well_contam <- fluoride2 %>% full_join(arsenic2) %>% filter(n_wells_tested_f > 0 & n_wells_tested_a >0)
kable(head(well_contam))
town n_wells_tested_f pct_f_above_guideline n_wells_tested_a pct_a_above_guideline
Otis 60 30.0 53 39.6
Dedham 102 22.5 97 17.5
Denmark 46 19.6 42 0.0
Surry 175 18.3 181 40.3
Prospect 57 17.5 50 4.0
Eastbrook 31 16.1 28 10.7

Exploring the Data

  1. What percent of all wells tested for fluoride had levels above guidelines?
  2. What percent of all wells tested for arsenic had levels above guidelines?

To answer these questions, I need the number of wells (not the percent) that had fluoride or arsenic levels above guidelines and divide that by the total number of wells. I made two new columns: n_wells_above_fguidelines and n_wells_above_aguidelines.

well_contam2 <- well_contam %>% mutate(n_wells_above_fguidelines = round(pct_f_above_guideline * 0.01 * n_wells_tested_f, digits = 0), n_wells_above_aguidelines = round(pct_a_above_guideline * 0.01 * n_wells_tested_a, digits = 0))
kable(head(well_contam2))
town n_wells_tested_f pct_f_above_guideline n_wells_tested_a pct_a_above_guideline n_wells_above_fguidelines n_wells_above_aguidelines
Otis 60 30.0 53 39.6 18 21
Dedham 102 22.5 97 17.5 23 17
Denmark 46 19.6 42 0.0 9 0
Surry 175 18.3 181 40.3 32 73
Prospect 57 17.5 50 4.0 10 2
Eastbrook 31 16.1 28 10.7 5 3
  • Of 34,997 wells tested for fluoride, 2.44% had fluoride levels above guidelines.
sum(well_contam2$n_wells_tested_f, na.rm = TRUE)
## [1] 34997
sum(well_contam2$n_wells_above_fguidelines, na.rm=TRUE)/sum(well_contam2$n_wells_tested_f, na.rm = TRUE) *100
## [1] 2.440209
  • Of 31,167 wells tested for arsenic, 15.62% had arsenic levels above guidelines.
sum(well_contam2$n_wells_tested_a, na.rm = TRUE)
## [1] 31167
sum(well_contam2$n_wells_above_aguidelines, na.rm=TRUE)/sum(well_contam2$n_wells_tested_a, na.rm = TRUE) *100
## [1] 15.61908
  1. How many towns have both fluoride and arsenic levels above guidelines?
  • Of the 556 towns that had wells tested for either fluoride or arsenic, 158 towns had well contaminations of fluoride and well contaminations of arsenic. I arranged them by descending number of wells that had arsenic levels above guidelines, simply because the health effects of arsenic seem a little more grave to me than those of fluoride, so I am choosing to make those more prominent.
two_contam <- well_contam2 %>% select(town, n_wells_above_fguidelines, n_wells_above_aguidelines) %>% filter(n_wells_above_fguidelines >0 & n_wells_above_aguidelines >0) %>% arrange(desc(n_wells_above_aguidelines))
two_contam

Discussion

In this assignment, I attempted to draw out descriptive information about the data in order to gain some understanding of what the contamination problem was and how many towns were affected by contaminated wells to any degree. It made the most sense to me to assess how many wells (possibly as a proxy for households?) had fluoride and/or arsenic levels above guidelines. After determining the number of wells affected by fluoride or arsenic by town, it struck me that the initial presentation of the data in percent of wells above the guideline by town made me mentally overestimate the percent of overall wells that were contaminated. This reminds me how much data presentation matters when attempting to communicate information, particularly the severity or extent of an issue. I look forward to gaining the skills that allow me to examine data like these visually and even geographically. One limitation of these datasets, however, is that all years are combined. I would be curious to see trends over time.

My next step in this exercise would be to map the towns that have fluoride- and arsenic-contaminated wells. I am also curious about well testing behavior; additional data on this are offered by the Maine HETL, though not by town. Testing behavior data were gathered from the Behavioral Risk Factor Surveillance System and show that in 2014, 47.9% of private well-using respondents responded “yes” to having tested their wells. Maybe the towns with the highest numbers of contaminated wells simply test more wells rather than have a heavier burden of contamination.