Objective

This report focuses on the towns where wells were largely over the acceptable levels, per the maximum exposure guidelines, of either arsenic, flouride or both.

Data Source

For this analysis, data was compiled from the Maine Tracking Network. The source data can be found by following this link: https://data.mainepublichealth.gov/tracking/home

Both fluoride and arsenic data sets were used. The data was pulled into R Studio as follows:

#Read files
arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)

Cleaning the Data

To organize this data and differentiate between the Arsenic data and the Flouride data the header names were renamed to denote which data set the variables belonged to:

#Rename variables
names(arsenic) <- c("Town", "N_Wells_Tested_AR", "Percent_Above_Guidelines_AR", "Median_AR", "Percentile_95_AR", "Maximum_AR")

names(flouride) <- c("Town", "N_Wells_Tested_FL", "Percent_Above_Guidelines_FL", "Median_FL", "Percentile_95_FL", "Maximum_FL")

#List variables
names(arsenic)
## [1] "Town"                        "N_Wells_Tested_AR"          
## [3] "Percent_Above_Guidelines_AR" "Median_AR"                  
## [5] "Percentile_95_AR"            "Maximum_AR"
names(flouride)
## [1] "Town"                        "N_Wells_Tested_FL"          
## [3] "Percent_Above_Guidelines_FL" "Median_FL"                  
## [5] "Percentile_95_FL"            "Maximum_FL"

While this data set presents a range of interesting and valid observations, I wanted to put principle focus on those towns that had widespread wells that tested above the guidelines. Focusing on these towns will give a good direction to the corrective action agents as where to start with their efforts in remedying these issues.

For this, I made the assumption that we need a good sample size of wells to make determinations on where to focus. As such, I filtered the data sets to only include towns where at least 20 wells were tested. This I felt was a large enough sample size to consider relevant, and where a couple bad measurements would not ruin the statistic for the town.

To be more cautious, this number could certainly have been increased in efforts of finding the most robust data - but at the cost of losing the data points of some towns that may have serious widespread issues with their water.

I then calculated an average for the average percent above the maximum exposure guidelines for fluoride and arsenic. Essentially, this weights the two variables so we can see the bigger picture all at once. By doing this we can see the largest widespread dangers in wells regardless of wells that tested with high arsenic, high fluoride, or a combination of the two variables.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#Top 20 AR wells
arsenic_20wells <- arsenic %>% select(Town, N_Wells_Tested_AR, Percent_Above_Guidelines_AR) %>% filter(N_Wells_Tested_AR >= 20) %>% arrange(desc(Percent_Above_Guidelines_AR))

#Top 20 FL wells
flouride_20wells <- flouride %>% select(Town, N_Wells_Tested_FL, Percent_Above_Guidelines_FL) %>% filter(N_Wells_Tested_FL >= 20) %>% arrange(desc(Percent_Above_Guidelines_FL))

#Drop number of wells tested
arsenic_above <- arsenic_20wells %>% select(Town, Percent_Above_Guidelines_AR)
flouride_above <- flouride_20wells %>% select(Town, Percent_Above_Guidelines_FL)

#Top 5 towns for AR and FL
top5_AR <- arsenic_above %>% arrange(desc(Percent_Above_Guidelines_AR)) %>% top_n(5)
## Selecting by Percent_Above_Guidelines_AR
top5_FL <- flouride_above %>% arrange(desc(Percent_Above_Guidelines_FL)) %>% top_n(5)
## Selecting by Percent_Above_Guidelines_FL
#Join data frames
arsenic_flouride_above <- flouride_above %>% inner_join(arsenic_above) %>% mutate(avg = (Percent_Above_Guidelines_FL + Percent_Above_Guidelines_AR) / 2) %>% arrange(desc(avg)) %>% top_n(20)
## Joining, by = "Town"
## Selecting by avg

Results

Looking at the individual variables, we see below the top 5 towns with wells that have the most widespread violations above the set guidelines for both arsenic and fluoride.

Side note: I live in Gorham, and my house had twice the acceptable limit of arsenic when I moved in. My measurements would certainly be adding to the percentage of wells testing above the guidelines. Clearly this report suppports what I have found.

library(knitr)
#List top 5 towns for AR and FL
kable(top5_AR, digits = 1)
Town Percent_Above_Guidelines_AR
Manchester 58.9
Gorham 50.1
Columbia 50.0
Monmouth 49.5
Eliot 49.3
kable(top5_FL, digits = 1)
Town Percent_Above_Guidelines_FL
Otis 30.0
Dedham 22.5
Denmark 19.6
Surry 18.3
Prospect 17.5

Top Offenders

Looking at the data in conjunction, we can see the overall top offenders with regards to widespread issues, whether that is due to high arsenic levels, high fluoride levels, or a combination of the two variables. Below is a chart showing the top 20 towns that have the most widespread issues with contaminants in their water.

Sorted by Average Percent Above Guideline
library(pander)
#Pander table for top 20 towns for both FL and AR
panderOptions('round', 1)
set.caption("Top Towns Above Guidelines")
pander(arsenic_flouride_above)
Top Towns Above Guidelines
Town Percent_Above_Guidelines_FL Percent_Above_Guidelines_AR avg
Otis 30 39.6 34.8
Manchester 3.3 58.9 31.1
Surry 18.3 40.3 29.3
Monmouth 3.1 49.5 26.3
Blue Hill 9.6 42.7 26.2
Mercer 15.6 36.4 26
Columbia 1.9 50 25.9
Gorham 0 50.1 25.1
Orland 8.6 40.7 24.7
Eliot 0 49.3 24.6
Sedgwick 11.2 37.3 24.2
Columbia Falls 0 48 24
Winthrop 3.1 44.8 23.9
Mariaville 7.5 40 23.8
Hollis 3.5 41.4 22.4
Hallowell 0 44.6 22.3
Buxton 1 43.4 22.2
Litchfield 1.9 42 21.9
Readfield 2.8 39.8 21.3
Starks 13.6 28.6 21.1

Percent Above Guidelines

Lastly, the following chart shows these top 20 towns, and gives an idea where they stand with regards to how widespread their arsenic and fluoride violations are above the set maximum exposure guidelines.

library(ggvis)
#Scatter top 20
arsenic_flouride_above %>% ggvis(~Percent_Above_Guidelines_FL, ~Percent_Above_Guidelines_AR) %>% layer_points()

Next steps

Using this information, corrective action can be taken to address the top offending towns, where the issues seem to be the most systemic.

Other consideration should be given to the weight of the two contaminants. That is, my understanding is that high concentrations of arsenic is far more dangerous than heightened levels of fluoride. If this is the case, priority may be placed on those towns where arsenic levels are universally well above the guidelines.

Another extension would be to take the results in this report, and weight by number of wells tested, or a similar variable, such as town population. This would help direct the authorities to the densest areas where water contaminants are a problem and aid them in addressing the water contaminants in those towns.