This report focuses on the towns where wells were largely over the acceptable levels, per the maximum exposure guidelines, of either arsenic, flouride or both.
For this analysis, data was compiled from the Maine Tracking Network. The source data can be found by following this link: https://data.mainepublichealth.gov/tracking/home
Both fluoride and arsenic data sets were used. The data was pulled into R Studio as follows:
#Read files
arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)
To organize this data and differentiate between the Arsenic data and the Flouride data the header names were renamed to denote which data set the variables belonged to:
#Rename variables
names(arsenic) <- c("Town", "N_Wells_Tested_AR", "Percent_Above_Guidelines_AR", "Median_AR", "Percentile_95_AR", "Maximum_AR")
names(flouride) <- c("Town", "N_Wells_Tested_FL", "Percent_Above_Guidelines_FL", "Median_FL", "Percentile_95_FL", "Maximum_FL")
#List variables
names(arsenic)
## [1] "Town" "N_Wells_Tested_AR"
## [3] "Percent_Above_Guidelines_AR" "Median_AR"
## [5] "Percentile_95_AR" "Maximum_AR"
names(flouride)
## [1] "Town" "N_Wells_Tested_FL"
## [3] "Percent_Above_Guidelines_FL" "Median_FL"
## [5] "Percentile_95_FL" "Maximum_FL"
While this data set presents a range of interesting and valid observations, I wanted to put principle focus on those towns that had widespread wells that tested above the guidelines. Focusing on these towns will give a good direction to the corrective action agents as where to start with their efforts in remedying these issues.
For this, I made the assumption that we need a good sample size of wells to make determinations on where to focus. As such, I filtered the data sets to only include towns where at least 20 wells were tested. This I felt was a large enough sample size to consider relevant, and where a couple bad measurements would not ruin the statistic for the town.
To be more cautious, this number could certainly have been increased in efforts of finding the most robust data - but at the cost of losing the data points of some towns that may have serious widespread issues with their water.
I then calculated an average for the average percent above the maximum exposure guidelines for fluoride and arsenic. Essentially, this weights the two variables so we can see the bigger picture all at once. By doing this we can see the largest widespread dangers in wells regardless of wells that tested with high arsenic, high fluoride, or a combination of the two variables.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Top 20 AR wells
arsenic_20wells <- arsenic %>% select(Town, N_Wells_Tested_AR, Percent_Above_Guidelines_AR) %>% filter(N_Wells_Tested_AR >= 20) %>% arrange(desc(Percent_Above_Guidelines_AR))
#Top 20 FL wells
flouride_20wells <- flouride %>% select(Town, N_Wells_Tested_FL, Percent_Above_Guidelines_FL) %>% filter(N_Wells_Tested_FL >= 20) %>% arrange(desc(Percent_Above_Guidelines_FL))
#Drop number of wells tested
arsenic_above <- arsenic_20wells %>% select(Town, Percent_Above_Guidelines_AR)
flouride_above <- flouride_20wells %>% select(Town, Percent_Above_Guidelines_FL)
#Top 5 towns for AR and FL
top5_AR <- arsenic_above %>% arrange(desc(Percent_Above_Guidelines_AR)) %>% top_n(5)
## Selecting by Percent_Above_Guidelines_AR
top5_FL <- flouride_above %>% arrange(desc(Percent_Above_Guidelines_FL)) %>% top_n(5)
## Selecting by Percent_Above_Guidelines_FL
#Join data frames
arsenic_flouride_above <- flouride_above %>% inner_join(arsenic_above) %>% mutate(avg = (Percent_Above_Guidelines_FL + Percent_Above_Guidelines_AR) / 2) %>% arrange(desc(avg)) %>% top_n(20)
## Joining, by = "Town"
## Selecting by avg
Looking at the individual variables, we see below the top 5 towns with wells that have the most widespread violations above the set guidelines for both arsenic and fluoride.
Side note: I live in Gorham, and my house had twice the acceptable limit of arsenic when I moved in. My measurements would certainly be adding to the percentage of wells testing above the guidelines. Clearly this report suppports what I have found.
library(knitr)
#List top 5 towns for AR and FL
kable(top5_AR, digits = 1)
| Town | Percent_Above_Guidelines_AR |
|---|---|
| Manchester | 58.9 |
| Gorham | 50.1 |
| Columbia | 50.0 |
| Monmouth | 49.5 |
| Eliot | 49.3 |
kable(top5_FL, digits = 1)
| Town | Percent_Above_Guidelines_FL |
|---|---|
| Otis | 30.0 |
| Dedham | 22.5 |
| Denmark | 19.6 |
| Surry | 18.3 |
| Prospect | 17.5 |
Looking at the data in conjunction, we can see the overall top offenders with regards to widespread issues, whether that is due to high arsenic levels, high fluoride levels, or a combination of the two variables. Below is a chart showing the top 20 towns that have the most widespread issues with contaminants in their water.
library(pander)
#Pander table for top 20 towns for both FL and AR
panderOptions('round', 1)
set.caption("Top Towns Above Guidelines")
pander(arsenic_flouride_above)
| Town | Percent_Above_Guidelines_FL | Percent_Above_Guidelines_AR | avg |
|---|---|---|---|
| Otis | 30 | 39.6 | 34.8 |
| Manchester | 3.3 | 58.9 | 31.1 |
| Surry | 18.3 | 40.3 | 29.3 |
| Monmouth | 3.1 | 49.5 | 26.3 |
| Blue Hill | 9.6 | 42.7 | 26.2 |
| Mercer | 15.6 | 36.4 | 26 |
| Columbia | 1.9 | 50 | 25.9 |
| Gorham | 0 | 50.1 | 25.1 |
| Orland | 8.6 | 40.7 | 24.7 |
| Eliot | 0 | 49.3 | 24.6 |
| Sedgwick | 11.2 | 37.3 | 24.2 |
| Columbia Falls | 0 | 48 | 24 |
| Winthrop | 3.1 | 44.8 | 23.9 |
| Mariaville | 7.5 | 40 | 23.8 |
| Hollis | 3.5 | 41.4 | 22.4 |
| Hallowell | 0 | 44.6 | 22.3 |
| Buxton | 1 | 43.4 | 22.2 |
| Litchfield | 1.9 | 42 | 21.9 |
| Readfield | 2.8 | 39.8 | 21.3 |
| Starks | 13.6 | 28.6 | 21.1 |
Lastly, the following chart shows these top 20 towns, and gives an idea where they stand with regards to how widespread their arsenic and fluoride violations are above the set maximum exposure guidelines.
library(ggvis)
#Scatter top 20
arsenic_flouride_above %>% ggvis(~Percent_Above_Guidelines_FL, ~Percent_Above_Guidelines_AR) %>% layer_points()
Using this information, corrective action can be taken to address the top offending towns, where the issues seem to be the most systemic.
Other consideration should be given to the weight of the two contaminants. That is, my understanding is that high concentrations of arsenic is far more dangerous than heightened levels of fluoride. If this is the case, priority may be placed on those towns where arsenic levels are universally well above the guidelines.
Another extension would be to take the results in this report, and weight by number of wells tested, or a similar variable, such as town population. This would help direct the authorities to the densest areas where water contaminants are a problem and aid them in addressing the water contaminants in those towns.