According to the Maine Tracking Network, in 2014, 56.7% of homes in Maine used private wells as their water supply. Well water is at risk for contaminants such as fluoride and arsenic, which if ingested at levels above exposure guidelines can result in adverse health effects.
The following assignment uses fluoride and arsenic data collected by the State of Maine Health and Environmental Testing Laboratory (HETL) from 1999-2013 across 46,855 private wells in Maine to answer the following questions:
The following datasets were downloaded from the Maine Tracking Network via the class website: flouride.csv and arsenic.csv.
library(dplyr)
library(knitr)
fluoride <- read.csv("flouride.csv")
arsenic <- read.csv("arsenic.csv")
kable(head(fluoride))
| location | n_wells_tested | percent_wells_above_guideline | median | percentile_95 | maximum |
|---|---|---|---|---|---|
| Otis | 60 | 30.0 | 1.130 | 3.200 | 3.6 |
| Dedham | 102 | 22.5 | 0.940 | 3.270 | 7.0 |
| Denmark | 46 | 19.6 | 0.450 | 3.150 | 3.9 |
| Surry | 175 | 18.3 | 0.800 | 3.525 | 6.9 |
| Prospect | 57 | 17.5 | 0.785 | 2.500 | 2.7 |
| Eastbrook | 31 | 16.1 | 1.290 | 2.445 | 3.3 |
kable(head(arsenic))
| location | n_wells_tested | percent_wells_above_guideline | median | percentile_95 | maximum |
|---|---|---|---|---|---|
| Manchester | 275 | 58.9 | 14.0 | 93.00 | 200 |
| Gorham | 467 | 50.1 | 10.5 | 130.00 | 460 |
| Columbia | 42 | 50.0 | 9.8 | 65.90 | 200 |
| Monmouth | 277 | 49.5 | 10.0 | 110.00 | 368 |
| Eliot | 73 | 49.3 | 9.7 | 41.35 | 45 |
| Columbia Falls | 25 | 48.0 | 8.1 | 53.75 | 71 |
In order to answer my questions, I organized my data by changing column names for ease and specificity to fluoride (“f”) or arsenic (“a”), selecting columns I wanted (location, n_wells_tested, and percent_wells_above_guideline), joining the datasets by location (town), and filtering out the towns that had no wells tested. I decided not to remove missing values at this time.
fluoride2 <- fluoride %>% rename(town = location, pct_f_above_guideline = percent_wells_above_guideline, n_wells_tested_f = n_wells_tested) %>% select(town, n_wells_tested_f, pct_f_above_guideline)
arsenic2 <- arsenic %>% rename(town = location, pct_a_above_guideline = percent_wells_above_guideline, n_wells_tested_a = n_wells_tested) %>% select(town, n_wells_tested_a, pct_a_above_guideline)
well_contam <- fluoride2 %>% full_join(arsenic2) %>% filter(n_wells_tested_f > 0 & n_wells_tested_a >0)
kable(head(well_contam))
| town | n_wells_tested_f | pct_f_above_guideline | n_wells_tested_a | pct_a_above_guideline |
|---|---|---|---|---|
| Otis | 60 | 30.0 | 53 | 39.6 |
| Dedham | 102 | 22.5 | 97 | 17.5 |
| Denmark | 46 | 19.6 | 42 | 0.0 |
| Surry | 175 | 18.3 | 181 | 40.3 |
| Prospect | 57 | 17.5 | 50 | 4.0 |
| Eastbrook | 31 | 16.1 | 28 | 10.7 |
To answer these questions, I need the number of wells (not the percent) that had fluoride or arsenic levels above guidelines and divide that by the total number of wells. I made two new columns: n_wells_above_fguidelines and n_wells_above_aguidelines.
well_contam2 <- well_contam %>% mutate(n_wells_above_fguidelines = round(pct_f_above_guideline * 0.01 * n_wells_tested_f, digits = 0), n_wells_above_aguidelines = round(pct_a_above_guideline * 0.01 * n_wells_tested_a, digits = 0))
kable(head(well_contam2))
| town | n_wells_tested_f | pct_f_above_guideline | n_wells_tested_a | pct_a_above_guideline | n_wells_above_fguidelines | n_wells_above_aguidelines |
|---|---|---|---|---|---|---|
| Otis | 60 | 30.0 | 53 | 39.6 | 18 | 21 |
| Dedham | 102 | 22.5 | 97 | 17.5 | 23 | 17 |
| Denmark | 46 | 19.6 | 42 | 0.0 | 9 | 0 |
| Surry | 175 | 18.3 | 181 | 40.3 | 32 | 73 |
| Prospect | 57 | 17.5 | 50 | 4.0 | 10 | 2 |
| Eastbrook | 31 | 16.1 | 28 | 10.7 | 5 | 3 |
sum(well_contam2$n_wells_tested_f, na.rm = TRUE)
## [1] 34997
sum(well_contam2$n_wells_above_fguidelines, na.rm=TRUE)/sum(well_contam2$n_wells_tested_f, na.rm = TRUE) *100
## [1] 2.440209
sum(well_contam2$n_wells_tested_a, na.rm = TRUE)
## [1] 31167
sum(well_contam2$n_wells_above_aguidelines, na.rm=TRUE)/sum(well_contam2$n_wells_tested_a, na.rm = TRUE) *100
## [1] 15.61908
two_contam <- well_contam2 %>% select(town, n_wells_above_fguidelines, n_wells_above_aguidelines) %>% filter(n_wells_above_fguidelines >0 & n_wells_above_aguidelines >0) %>% arrange(desc(n_wells_above_aguidelines))
two_contam
In this assignment, I attempted to draw out descriptive information about the data in order to gain some understanding of what the contamination problem was and how many towns were affected by contaminated wells to any degree. It made the most sense to me to assess how many wells (possibly as a proxy for households?) had fluoride and/or arsenic levels above guidelines. After determining the number of wells affected by fluoride or arsenic by town, it struck me that the initial presentation of the data in percent of wells above the guideline by town made me mentally overestimate the percent of overall wells that were contaminated. This reminds me how much data presentation matters when attempting to communicate information, particularly the severity or extent of an issue. I look forward to gaining the skills that allow me to examine data like these visually and even geographically. One limitation of these datasets, however, is that all years are combined. I would be curious to see trends over time.
My next step in this exercise would be to map the towns that have fluoride- and arsenic-contaminated wells. I am also curious about well testing behavior; additional data on this are offered by the Maine HETL, though not by town. Testing behavior data were gathered from the Behavioral Risk Factor Surveillance System and show that in 2014, 47.9% of private well-using respondents responded “yes” to having tested their wells. Maybe the towns with the highest numbers of contaminated wells simply test more wells rather than have a heavier burden of contamination.