Flourine and Arsenic Levels in Maine Wells

To clean up the data and join the two tables I used a full join. While this allowed N/A (null) values to be present in the combined table (where one table contained data for one test but not for the other) I preferred to be working with all datapoints and not lose anything in the process.

library(dplyr)
library(tidyr)
library(scales)
library(pander)
arsenic <- read.csv("arsenic.csv")
flouride <- read.csv("flouride.csv")
arsenic <- arsenic %>% mutate(Test = "Arsenic")
flouride <- flouride %>% mutate(Test = "Flouride")
#Full Join
wells_cmb <- arsenic %>% full_join(flouride)

Cleaning Up and Removing Values To Improve Analysis

I also wanted to make sure the data was comparing apples to apples so towns who had less than 20 wells were removed as they did not display anything other than number of wells tested and the maximum value.

It was also stated that if the level was under the lab’s limit of detection they replaced valued the town at one-half the detection limit. As these towns introducced an almost categorical variable into the mix (a consistently low value that represented a floor of the data values) I removed these towns as well. After reviewing the data and identifying the lowest values it was determined that arsenic had a detection limit of .5 (lowest value was .25 or half the detection limit) and flouride was .2

#Remove towns with less than 20 wells and observations with measurements below limit of detection (.25 : Arsenic; .1 : Flouride) and created perc95_to_max_exp which divides the 95th percentile of each towns measurements by the maximum exposure limit for each chemical
wells_cmb_20 <- wells_cmb %>% filter(n_wells_tested >= 20, median > if_else(Test == "Arsenic", .25, .1)) %>% mutate( perc95_to_max_exp = if_else(Test == "Arsenic", (percentile_95 / 10), (percentile_95 / 2)))

The group I drew attention to related to how their 95th percentile measurement compared to the maximum allowed limits under state law. Many of the observed towns had values greatly over the limit when it came to their upper percentiles although the median remained below the threshold. I wanted to see if extremes for one chemical might correlate to extremes in the other.

I took the 95th Percentil measurement for each town and divided it by the maximum allowable level. (10 for arsenic, 2 for flouride) To keep the sample size manageable I then only took the top 50 from each test, the results can be seen below.

Top 50 Towns By Test

Multiple By Which 95th Percentile Exceeded Maximum Level of Allowable Exposure

Arsenic

#Took top 50 towns based on the value of their 95 percentile measurement divided by the max level of allowable exposure
slice_tbl1 <- wells_cmb_20 %>% filter(Test == "Arsenic") %>% arrange(desc(perc95_to_max_exp)) %>% top_n(50) %>% mutate(Rank = min_rank(desc(perc95_to_max_exp)))
slice_tbl1 <- slice_tbl1 %>% select(location, perc95_to_max_exp,Test, Rank)
pander(slice_tbl1)
location perc95_to_max_exp Test Rank
Danforth 37.25 Arsenic 1
Matinicus Isle Plt 25.65 Arsenic 2
Sedgwick 23.66 Arsenic 3
Blue Hill 22.9 Arsenic 4
Standish 15.4 Arsenic 5
Surry 14.57 Arsenic 6
Gorham 13 Arsenic 7
Wales 12.9 Arsenic 8
Camden 12.5 Arsenic 9
Otis 12.14 Arsenic 10
Ellsworth 11.96 Arsenic 11
Northport 11.37 Arsenic 12
Monmouth 11 Arsenic 13
Buxton 11 Arsenic 13
Hollis 11 Arsenic 13
Buckfield 10.4 Arsenic 16
Hallowell 10 Arsenic 17
Manchester 9.3 Arsenic 18
Northfield 9.16 Arsenic 19
Rangeley 8.89 Arsenic 20
Litchfield 8.48 Arsenic 21
Saco 8.22 Arsenic 22
Exeter 7.95 Arsenic 23
Biddeford 6.9 Arsenic 24
Unity 6.74 Arsenic 25
Columbia 6.59 Arsenic 26
Winthrop 6.5 Arsenic 27
Penobscot 6.33 Arsenic 28
Dayton 6.275 Arsenic 29
Stockton Springs 6.235 Arsenic 30
Robbinston 6.14 Arsenic 31
Anson 6.1 Arsenic 32
West Gardiner 5.925 Arsenic 33
Mariaville 5.55 Arsenic 34
Columbia Falls 5.375 Arsenic 35
Skowhegan 5.35 Arsenic 36
Readfield 5.3 Arsenic 37
Augusta 5.3 Arsenic 37
Oakland 4.9 Arsenic 39
Baldwin 4.885 Arsenic 40
Starks 4.86 Arsenic 41
Rockport 4.83 Arsenic 42
Swanville 4.78 Arsenic 43
Belgrade 4.7 Arsenic 44
Farmingdale 4.62 Arsenic 45
Orland 4.455 Arsenic 46
Auburn 4.425 Arsenic 47
Greene 4.31 Arsenic 48
Weld 4.27 Arsenic 49
Sidney 4.265 Arsenic 50

Fluoride

slice_tbl2 <- (wells_cmb_20 %>% filter(Test == "Flouride") %>% arrange(desc(perc95_to_max_exp)) %>% top_n(50)) %>% mutate(Rank = min_rank(desc(perc95_to_max_exp)))
slice_tbl2 <- slice_tbl2 %>% select(location, perc95_to_max_exp,Test, Rank)
pander(slice_tbl2)
location perc95_to_max_exp Test Rank
Charlotte 2.22 Flouride 1
Mercer 2.09 Flouride 2
Marshfield 1.785 Flouride 3
Surry 1.762 Flouride 4
York 1.71 Flouride 5
Otisfield 1.65 Flouride 6
Dedham 1.635 Flouride 7
Otis 1.6 Flouride 8
Denmark 1.575 Flouride 9
Fryeburg 1.56 Flouride 10
Sedgwick 1.435 Flouride 11
Blue Hill 1.427 Flouride 12
Stockton Springs 1.42 Flouride 13
Norridgewock 1.4 Flouride 14
Chesterville 1.345 Flouride 15
Stoneham 1.337 Flouride 16
Smithfield 1.308 Flouride 17
Arundel 1.293 Flouride 18
New Sweden 1.288 Flouride 19
Clifton 1.285 Flouride 20
Swans Island 1.255 Flouride 21
Prospect 1.25 Flouride 22
Mechanic Falls 1.248 Flouride 23
Eastbrook 1.222 Flouride 24
Brownfield 1.218 Flouride 25
Hartland 1.188 Flouride 26
Ellsworth 1.15 Flouride 27
Orland 1.15 Flouride 27
Mariaville 1.15 Flouride 27
Bridgton 1.15 Flouride 27
Gouldsboro 1.14 Flouride 31
Franklin 1.133 Flouride 32
Millinocket 1.11 Flouride 33
Farmington 1.1 Flouride 34
Casco 1.1 Flouride 34
Jefferson 1.052 Flouride 36
Hiram 1.05 Flouride 37
Penobscot 1.05 Flouride 37
Belgrade 1.05 Flouride 37
Cornish 1.048 Flouride 40
Lovell 1.042 Flouride 41
Mount Desert 1.029 Flouride 42
Lebanon 1.025 Flouride 43
Sebago 1.002 Flouride 44
Naples 1 Flouride 45
Porter 1 Flouride 45
Searsport 0.9835 Flouride 47
Harrison 0.963 Flouride 48
Dayton 0.96 Flouride 49
Waterford 0.953 Flouride 50

Towns That Were In Both Top 50 Lists

These towns appeared in both the Top 50 results for both arsenic and flouride. Below a table shows the values as a percentage of 95th percentile measurement to maximum allowable limit. You can see that many towns in Maine were well over a 100% in both tests, and some are scarier still.

Why I chose to draw attention to this group of towns is because they not only have extreme values related to one chemical, but two. The 95th percentile can be skewed by a few bad apples (not nearly to the degree that the maximum score would be) but having wells in the town who wound up on both lists is not a good sign. It may be worthwhile to dig a little deeper into what factors would cause a town to have higher extreme values of both chemicals to help lay better well maps in the future.

#joined the two tables together
slice_tbl <- slice_tbl1 %>% full_join(slice_tbl2)
#found towns that were in top 50 for both flouride and arsenic
dist_slice_tbl <- slice_tbl %>% group_by(location) %>% filter(n()>1)
dist_slice_tbl <- dist_slice_tbl %>% arrange(location)
#filtered out other fields to show just percentage table
not_good_list <- dist_slice_tbl %>% select(location, Test,perc95_to_max_exp) %>% spread(Test, perc95_to_max_exp) %>% 
  arrange(desc(Arsenic), desc(Flouride)) %>% mutate(Arsenic = percent(Arsenic), Flouride = percent(Flouride)) %>% rename("Town" = location, "Arsenic 95th Percentile: % Max Exposure" = Arsenic, "Flouride 95th Percentile: % Max Exposure" = Flouride)
set.caption("Towns Where 95 Percentile Measurements for Both Arsenic and Flouride Came In the Top 50")
pander(not_good_list)
Towns Where 95 Percentile Measurements for Both Arsenic and Flouride Came In the Top 50
Town Arsenic 95th Percentile: % Max Exposure Flouride 95th Percentile: % Max Exposure
Sedgwick 2,370% 144%
Blue Hill 2,290% 143%
Surry 1,460% 176%
Otis 1,210% 160%
Ellsworth 1,200% 115%
Penobscot 633% 105%
Dayton 628% 96%
Stockton Springs 624% 142%
Mariaville 555% 115%
Belgrade 470% 105%
Orland 446% 115%