To clean up the data and join the two tables I used a full join. While this allowed N/A (null) values to be present in the combined table (where one table contained data for one test but not for the other) I preferred to be working with all datapoints and not lose anything in the process.
library(dplyr)
library(tidyr)
library(scales)
library(pander)
arsenic <- read.csv("arsenic.csv")
flouride <- read.csv("flouride.csv")
arsenic <- arsenic %>% mutate(Test = "Arsenic")
flouride <- flouride %>% mutate(Test = "Flouride")
#Full Join
wells_cmb <- arsenic %>% full_join(flouride)
I also wanted to make sure the data was comparing apples to apples so towns who had less than 20 wells were removed as they did not display anything other than number of wells tested and the maximum value.
It was also stated that if the level was under the lab’s limit of detection they replaced valued the town at one-half the detection limit. As these towns introducced an almost categorical variable into the mix (a consistently low value that represented a floor of the data values) I removed these towns as well. After reviewing the data and identifying the lowest values it was determined that arsenic had a detection limit of .5 (lowest value was .25 or half the detection limit) and flouride was .2
#Remove towns with less than 20 wells and observations with measurements below limit of detection (.25 : Arsenic; .1 : Flouride) and created perc95_to_max_exp which divides the 95th percentile of each towns measurements by the maximum exposure limit for each chemical
wells_cmb_20 <- wells_cmb %>% filter(n_wells_tested >= 20, median > if_else(Test == "Arsenic", .25, .1)) %>% mutate( perc95_to_max_exp = if_else(Test == "Arsenic", (percentile_95 / 10), (percentile_95 / 2)))
The group I drew attention to related to how their 95th percentile measurement compared to the maximum allowed limits under state law. Many of the observed towns had values greatly over the limit when it came to their upper percentiles although the median remained below the threshold. I wanted to see if extremes for one chemical might correlate to extremes in the other.
I took the 95th Percentil measurement for each town and divided it by the maximum allowable level. (10 for arsenic, 2 for flouride) To keep the sample size manageable I then only took the top 50 from each test, the results can be seen below.
Multiple By Which 95th Percentile Exceeded Maximum Level of Allowable Exposure
#Took top 50 towns based on the value of their 95 percentile measurement divided by the max level of allowable exposure
slice_tbl1 <- wells_cmb_20 %>% filter(Test == "Arsenic") %>% arrange(desc(perc95_to_max_exp)) %>% top_n(50) %>% mutate(Rank = min_rank(desc(perc95_to_max_exp)))
slice_tbl1 <- slice_tbl1 %>% select(location, perc95_to_max_exp,Test, Rank)
pander(slice_tbl1)
| location | perc95_to_max_exp | Test | Rank |
|---|---|---|---|
| Danforth | 37.25 | Arsenic | 1 |
| Matinicus Isle Plt | 25.65 | Arsenic | 2 |
| Sedgwick | 23.66 | Arsenic | 3 |
| Blue Hill | 22.9 | Arsenic | 4 |
| Standish | 15.4 | Arsenic | 5 |
| Surry | 14.57 | Arsenic | 6 |
| Gorham | 13 | Arsenic | 7 |
| Wales | 12.9 | Arsenic | 8 |
| Camden | 12.5 | Arsenic | 9 |
| Otis | 12.14 | Arsenic | 10 |
| Ellsworth | 11.96 | Arsenic | 11 |
| Northport | 11.37 | Arsenic | 12 |
| Monmouth | 11 | Arsenic | 13 |
| Buxton | 11 | Arsenic | 13 |
| Hollis | 11 | Arsenic | 13 |
| Buckfield | 10.4 | Arsenic | 16 |
| Hallowell | 10 | Arsenic | 17 |
| Manchester | 9.3 | Arsenic | 18 |
| Northfield | 9.16 | Arsenic | 19 |
| Rangeley | 8.89 | Arsenic | 20 |
| Litchfield | 8.48 | Arsenic | 21 |
| Saco | 8.22 | Arsenic | 22 |
| Exeter | 7.95 | Arsenic | 23 |
| Biddeford | 6.9 | Arsenic | 24 |
| Unity | 6.74 | Arsenic | 25 |
| Columbia | 6.59 | Arsenic | 26 |
| Winthrop | 6.5 | Arsenic | 27 |
| Penobscot | 6.33 | Arsenic | 28 |
| Dayton | 6.275 | Arsenic | 29 |
| Stockton Springs | 6.235 | Arsenic | 30 |
| Robbinston | 6.14 | Arsenic | 31 |
| Anson | 6.1 | Arsenic | 32 |
| West Gardiner | 5.925 | Arsenic | 33 |
| Mariaville | 5.55 | Arsenic | 34 |
| Columbia Falls | 5.375 | Arsenic | 35 |
| Skowhegan | 5.35 | Arsenic | 36 |
| Readfield | 5.3 | Arsenic | 37 |
| Augusta | 5.3 | Arsenic | 37 |
| Oakland | 4.9 | Arsenic | 39 |
| Baldwin | 4.885 | Arsenic | 40 |
| Starks | 4.86 | Arsenic | 41 |
| Rockport | 4.83 | Arsenic | 42 |
| Swanville | 4.78 | Arsenic | 43 |
| Belgrade | 4.7 | Arsenic | 44 |
| Farmingdale | 4.62 | Arsenic | 45 |
| Orland | 4.455 | Arsenic | 46 |
| Auburn | 4.425 | Arsenic | 47 |
| Greene | 4.31 | Arsenic | 48 |
| Weld | 4.27 | Arsenic | 49 |
| Sidney | 4.265 | Arsenic | 50 |
slice_tbl2 <- (wells_cmb_20 %>% filter(Test == "Flouride") %>% arrange(desc(perc95_to_max_exp)) %>% top_n(50)) %>% mutate(Rank = min_rank(desc(perc95_to_max_exp)))
slice_tbl2 <- slice_tbl2 %>% select(location, perc95_to_max_exp,Test, Rank)
pander(slice_tbl2)
| location | perc95_to_max_exp | Test | Rank |
|---|---|---|---|
| Charlotte | 2.22 | Flouride | 1 |
| Mercer | 2.09 | Flouride | 2 |
| Marshfield | 1.785 | Flouride | 3 |
| Surry | 1.762 | Flouride | 4 |
| York | 1.71 | Flouride | 5 |
| Otisfield | 1.65 | Flouride | 6 |
| Dedham | 1.635 | Flouride | 7 |
| Otis | 1.6 | Flouride | 8 |
| Denmark | 1.575 | Flouride | 9 |
| Fryeburg | 1.56 | Flouride | 10 |
| Sedgwick | 1.435 | Flouride | 11 |
| Blue Hill | 1.427 | Flouride | 12 |
| Stockton Springs | 1.42 | Flouride | 13 |
| Norridgewock | 1.4 | Flouride | 14 |
| Chesterville | 1.345 | Flouride | 15 |
| Stoneham | 1.337 | Flouride | 16 |
| Smithfield | 1.308 | Flouride | 17 |
| Arundel | 1.293 | Flouride | 18 |
| New Sweden | 1.288 | Flouride | 19 |
| Clifton | 1.285 | Flouride | 20 |
| Swans Island | 1.255 | Flouride | 21 |
| Prospect | 1.25 | Flouride | 22 |
| Mechanic Falls | 1.248 | Flouride | 23 |
| Eastbrook | 1.222 | Flouride | 24 |
| Brownfield | 1.218 | Flouride | 25 |
| Hartland | 1.188 | Flouride | 26 |
| Ellsworth | 1.15 | Flouride | 27 |
| Orland | 1.15 | Flouride | 27 |
| Mariaville | 1.15 | Flouride | 27 |
| Bridgton | 1.15 | Flouride | 27 |
| Gouldsboro | 1.14 | Flouride | 31 |
| Franklin | 1.133 | Flouride | 32 |
| Millinocket | 1.11 | Flouride | 33 |
| Farmington | 1.1 | Flouride | 34 |
| Casco | 1.1 | Flouride | 34 |
| Jefferson | 1.052 | Flouride | 36 |
| Hiram | 1.05 | Flouride | 37 |
| Penobscot | 1.05 | Flouride | 37 |
| Belgrade | 1.05 | Flouride | 37 |
| Cornish | 1.048 | Flouride | 40 |
| Lovell | 1.042 | Flouride | 41 |
| Mount Desert | 1.029 | Flouride | 42 |
| Lebanon | 1.025 | Flouride | 43 |
| Sebago | 1.002 | Flouride | 44 |
| Naples | 1 | Flouride | 45 |
| Porter | 1 | Flouride | 45 |
| Searsport | 0.9835 | Flouride | 47 |
| Harrison | 0.963 | Flouride | 48 |
| Dayton | 0.96 | Flouride | 49 |
| Waterford | 0.953 | Flouride | 50 |
These towns appeared in both the Top 50 results for both arsenic and flouride. Below a table shows the values as a percentage of 95th percentile measurement to maximum allowable limit. You can see that many towns in Maine were well over a 100% in both tests, and some are scarier still.
Why I chose to draw attention to this group of towns is because they not only have extreme values related to one chemical, but two. The 95th percentile can be skewed by a few bad apples (not nearly to the degree that the maximum score would be) but having wells in the town who wound up on both lists is not a good sign. It may be worthwhile to dig a little deeper into what factors would cause a town to have higher extreme values of both chemicals to help lay better well maps in the future.
#joined the two tables together
slice_tbl <- slice_tbl1 %>% full_join(slice_tbl2)
#found towns that were in top 50 for both flouride and arsenic
dist_slice_tbl <- slice_tbl %>% group_by(location) %>% filter(n()>1)
dist_slice_tbl <- dist_slice_tbl %>% arrange(location)
#filtered out other fields to show just percentage table
not_good_list <- dist_slice_tbl %>% select(location, Test,perc95_to_max_exp) %>% spread(Test, perc95_to_max_exp) %>%
arrange(desc(Arsenic), desc(Flouride)) %>% mutate(Arsenic = percent(Arsenic), Flouride = percent(Flouride)) %>% rename("Town" = location, "Arsenic 95th Percentile: % Max Exposure" = Arsenic, "Flouride 95th Percentile: % Max Exposure" = Flouride)
set.caption("Towns Where 95 Percentile Measurements for Both Arsenic and Flouride Came In the Top 50")
pander(not_good_list)
| Town | Arsenic 95th Percentile: % Max Exposure | Flouride 95th Percentile: % Max Exposure |
|---|---|---|
| Sedgwick | 2,370% | 144% |
| Blue Hill | 2,290% | 143% |
| Surry | 1,460% | 176% |
| Otis | 1,210% | 160% |
| Ellsworth | 1,200% | 115% |
| Penobscot | 633% | 105% |
| Dayton | 628% | 96% |
| Stockton Springs | 624% | 142% |
| Mariaville | 555% | 115% |
| Belgrade | 470% | 105% |
| Orland | 446% | 115% |