Loading all necessary libraries.
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library (ggvis)
library(knitr)
library(pander)
Only showing the “glimpse” of this dataset to avoid putting table with all 917 rows.
glimpse(arsenic)
## Observations: 917
## Variables: 6
## $ location <chr> "Manchester", "Gorham", "Columbi...
## $ n_wells_tested <int> 275, 467, 42, 277, 73, 25, 424, ...
## $ percent_wells_above_guideline <dbl> 58.9, 50.1, 50.0, 49.5, 49.3, 48...
## $ median <dbl> 14.00, 10.50, 9.80, 10.00, 9.70,...
## $ percentile_95 <dbl> 93.00, 130.00, 65.90, 110.00, 41...
## $ maximum <dbl> 200, 460, 200, 368, 45, 71, 240,...
dim(arsenic)
## [1] 917 6
Only showing the “glimpse” of this dataset to avoid putting table with all 917 rows.
glimpse(flouride)
## Observations: 917
## Variables: 6
## $ location <chr> "Otis", "Dedham", "Denmark", "Su...
## $ n_wells_tested <int> 60, 102, 46, 175, 57, 31, 32, 52...
## $ percent_wells_above_guideline <dbl> 30.0, 22.5, 19.6, 18.3, 17.5, 16...
## $ median <dbl> 1.130, 0.940, 0.450, 0.800, 0.78...
## $ percentile_95 <dbl> 3.200, 3.270, 3.150, 3.525, 2.50...
## $ maximum <dbl> 3.6, 7.0, 3.9, 6.9, 2.7, 3.3, 6....
dim(flouride)
## [1] 917 6
After merging both the flouride and arsenic datasets, the data was still a little unclear. Once the datasets were merged correctly, the percent of wells above guidelines (variable: “percent_wells_above_guidelines”) for both x and y (which represent flouride, x and arsenic, y) as well as the percentile rank (variable: “percentile_95”) were removed.
Min_Max_Only was created from the merged data on flouride and arsenic levels, this variables shows Location, Number of Wells Tested, Median, and Maximum values captured by the observations.
Looking at the median values for each location by flouride and arsenic, it become clear that there are quite a few outliers with very high median arsenic levels.
merged.data %>% ggvis(~median.x, ~median.y) %>% layer_points()
merged.data %>% ggvis(~maximum.x, ~maximum.y) %>% layer_points()
Looking at Table 1, where we can see the top 10 location by highest maximum arsenic levels in table form. It shows Danforth has the highest reported level of maximum arsenic (at 3100 ug/L) and a sample size of 35 wells. This maximum arsenic level is much higher than even the next location, Northport (at 1700 ug/L) which was clear from the Figure 2 scatterplot that plotted maximums for flouride against maximums for arsenic. However, Danforth’s maximum flouride level is not as high as other locations where high arsenic levels were also reported.
(Also showing Flouride Information on location, median and maximum levels)
kable(merged.data %>% arrange(desc(maximum.y)) %>% top_n(10) %>% select(location, n_wells_tested.y, median.y, maximum.y, n_wells_tested.x, median.x, maximum.x),digits=1)
## Selecting by maximum.y
| location | n_wells_tested.y | median.y | maximum.y | n_wells_tested.x | median.x | maximum.x |
|---|---|---|---|---|---|---|
| Danforth | 35 | 5.0 | 3100 | 35 | 0.2 | 1.9 |
| Northport | 157 | 2.1 | 1700 | 87 | 0.3 | 4.1 |
| Blue Hill | 241 | 7.0 | 930 | 209 | 0.4 | 4.5 |
| Sedgwick | 142 | 4.2 | 840 | 143 | 0.4 | 4.2 |
| Buxton | 334 | 6.0 | 670 | 383 | 0.1 | 3.2 |
| Standish | 632 | 2.0 | 550 | 290 | 0.1 | 4.7 |
| Ellsworth | 428 | 3.6 | 470 | 503 | 0.5 | 7.0 |
| Surry | 181 | 6.0 | 470 | 175 | 0.8 | 6.9 |
| Gorham | 467 | 10.5 | 460 | 452 | 0.1 | 2.0 |
| Matinicus Isle Plt | 27 | 0.6 | 460 | 7 | NA | 1.2 |
(Also showing Flouride Information on location, median and maximum levels)
kable(merged.data %>% arrange(desc(median.y)) %>% slice(1:10) %>% select(location, n_wells_tested.y, median.y, maximum.y, n_wells_tested.x, median.x, maximum.x),digits=1)
| location | n_wells_tested.y | median.y | maximum.y | n_wells_tested.x | median.x | maximum.x |
|---|---|---|---|---|---|---|
| Manchester | 275 | 14.0 | 200 | 276 | 0.3 | 3.6 |
| Gorham | 467 | 10.5 | 460 | 452 | 0.1 | 2.0 |
| Monmouth | 277 | 10.0 | 368 | 288 | 0.3 | 3.4 |
| Columbia | 42 | 9.8 | 200 | 54 | 0.3 | 4.3 |
| Eliot | 73 | 9.7 | 45 | 84 | 0.2 | 1.5 |
| Hallowell | 65 | 8.6 | 431 | 59 | 0.1 | 1.6 |
| Winthrop | 424 | 8.2 | 240 | 453 | 0.3 | 3.7 |
| Columbia Falls | 25 | 8.1 | 71 | 38 | 0.2 | 0.9 |
| Mariaville | 30 | 7.2 | 57 | 40 | 0.4 | 4.9 |
| Readfield | 344 | 7.2 | 280 | 351 | 0.3 | 2.7 |
Danforth may have the highest maximum arsenic level, but as seen in Table 2, Manchester has the highest median arsenic level and with a sample size of 275 wells, that high median doesn’t suggest the water quality is very safe to drink in that location. Especially because Maine’s Maximum Exposure Guidelines for arsenic are 10 ug/L and with a median in Manchester of 14.0 ug/L it seems that some of the wells in this location are likely over the recommendated exposure guidelines. Which is even more evident later on in the analysis, see Table 8 for Manchester’s percentage of wells above the guidelines (58.9%).
Table 3 shows that in addition to Manchester, two other locations in Maine have wells with median levels of arsenic at or over the Maine’s Maximum Exposure Guidelines of 10 ug/L.
Arsenic_higherthanadvised <- merged.data %>% select(location, n_wells_tested.y, median.y, maximum.y) %>% arrange(desc(median.y)) %>% filter(median.y >=10)
Arsenic_higherthanadvised
## location n_wells_tested.y median.y maximum.y
## 1 Manchester 275 14.0 200
## 2 Gorham 467 10.5 460
## 3 Monmouth 277 10.0 368
Furthermore, there were 370 locations that had maximum levels of arsenic that were above the exposure guidelines.
Looking at the top 10 locations based on maximum flouride levels, we can see that Anson has the highest level of maximum flouride reported in the sample of 40 wells. Similarly, Anson’s median arsenic level is relatively high as well, compared to the other locations with high flouride levels.
kable(merged.data %>% arrange(desc(maximum.x)) %>% slice(1:10) %>% select(location, n_wells_tested.x, median.x, maximum.x, n_wells_tested.y, median.y, maximum.y), digits=1)
| location | n_wells_tested.x | median.x | maximum.x | n_wells_tested.y | median.y | maximum.y |
|---|---|---|---|---|---|---|
| Anson | 40 | 0.1 | 14.0 | 36 | 2.0 | 110.0 |
| Ashland | 27 | 0.1 | 10.0 | 20 | 0.5 | 6.9 |
| Frenchville | 12 | NA | 10.0 | 11 | NA | 6.9 |
| Peru | 58 | 0.1 | 9.9 | 54 | 0.5 | 25.0 |
| Kennebunk | 110 | 0.1 | 9.6 | 94 | 1.8 | 31.0 |
| Frenchboro | 6 | NA | 9.1 | 4 | NA | 162.0 |
| Raymond | 181 | 0.3 | 9.1 | 173 | 0.5 | 20.0 |
| Limington | 108 | 0.1 | 8.1 | 104 | 1.0 | 85.0 |
| Falmouth | 167 | 0.1 | 7.1 | 134 | 2.3 | 25.0 |
| Dedham | 102 | 0.9 | 7.0 | 97 | 1.0 | 43.0 |
(Also showing Arsenic Information on location, median and maximum levels)
kable(merged.data %>% arrange(desc(median.x)) %>% slice(1:10) %>% select(location, n_wells_tested.x, median.x, maximum.x, n_wells_tested.y, median.y, maximum.y), digits=1)
| location | n_wells_tested.x | median.x | maximum.x | n_wells_tested.y | median.y | maximum.y |
|---|---|---|---|---|---|---|
| Eastbrook | 31 | 1.3 | 3.3 | 28 | 1.5 | 41 |
| Otis | 60 | 1.1 | 3.6 | 53 | 4.8 | 200 |
| Marshfield | 31 | 1.0 | 4.4 | 26 | 1.0 | 64 |
| Dedham | 102 | 0.9 | 7.0 | 97 | 1.0 | 43 |
| Surry | 175 | 0.8 | 6.9 | 181 | 6.0 | 470 |
| Prospect | 57 | 0.8 | 2.7 | 50 | 1.0 | 16 |
| Fryeburg | 52 | 0.8 | 4.1 | 37 | 0.5 | 7 |
| Mercer | 32 | 0.6 | 6.1 | 33 | 4.5 | 40 |
| Rome | 82 | 0.6 | 4.5 | 79 | 5.5 | 55 |
| Stockton Springs | 56 | 0.6 | 3.3 | 63 | 0.8 | 250 |
Anson may have the highest maximum flouride level, but Eastbrook has the highest median flouride level. However, none of the sample wells in any location showed median flouride levels above Maine’s Maximum Exposure Guidelines of 2 mg/L. Although both Anson and Eastbrook reported having wells above the guidelines for flouride, see Table 6 below.
kable(merged.data %>% filter(percent_wells_above_guideline.y, location == "Anson" |location == "Eastbrook") %>% select(location, n_wells_tested.x, percent_wells_above_guideline.x, median.x, maximum.x))
| location | n_wells_tested.x | percent_wells_above_guideline.x | median.x | maximum.x |
|---|---|---|---|---|
| Anson | 40 | 5.0 | 0.10 | 14.0 |
| Eastbrook | 31 | 16.1 | 1.29 | 3.3 |
Additionally, there were 211 locations with maximum flouride values that were higher than the exposure guidelines, but none with medians that were over.
Towns with Wells that have Median Flouride Levels Above Maine’s Maximum Exposure Guidelines (2 mg/L)
There were 34 locations where wells tested for flouride exceeded 200 wells.
kable(merged.data %>% arrange(desc(median.x)) %>% select(location, n_wells_tested.x, median.x, maximum.x, n_wells_tested.y, median.y, maximum.y) %>% filter(n_wells_tested.x >= 200), digits=1)
| location | n_wells_tested.x | median.x | maximum.x | n_wells_tested.y | median.y | maximum.y |
|---|---|---|---|---|---|---|
| Ellsworth | 503 | 0.5 | 7.0 | 428 | 3.6 | 470 |
| Blue Hill | 209 | 0.4 | 4.5 | 241 | 7.0 | 930 |
| Belgrade | 417 | 0.3 | 3.8 | 401 | 5.2 | 220 |
| Winthrop | 453 | 0.3 | 3.7 | 424 | 8.2 | 240 |
| Manchester | 276 | 0.3 | 3.6 | 275 | 14.0 | 200 |
| Monmouth | 288 | 0.3 | 3.4 | 277 | 10.0 | 368 |
| Pittston | 249 | 0.3 | 3.9 | 251 | 0.5 | 41 |
| Readfield | 351 | 0.3 | 2.7 | 344 | 7.2 | 280 |
| Jefferson | 205 | 0.3 | 4.2 | 206 | 0.5 | 69 |
| Windsor | 227 | 0.2 | 3.1 | 213 | 1.0 | 43 |
| Augusta | 479 | 0.2 | 4.4 | 454 | 4.0 | 320 |
| Gardiner | 299 | 0.2 | 4.0 | 279 | 2.0 | 110 |
| Mount Vernon | 219 | 0.2 | 6.6 | 217 | 4.0 | 160 |
| Sidney | 312 | 0.2 | 6.8 | 287 | 3.8 | 150 |
| Wells | 207 | 0.2 | 3.5 | 184 | 0.5 | 34 |
| Brunswick | 299 | 0.1 | 5.9 | 255 | 0.5 | 250 |
| Buxton | 383 | 0.1 | 3.2 | 334 | 6.0 | 670 |
| Chelsea | 262 | 0.1 | 3.7 | 237 | 1.0 | 220 |
| China | 251 | 0.1 | 3.0 | 214 | 2.8 | 92 |
| Cumberland | 216 | 0.1 | 1.4 | 186 | 1.0 | 87 |
| Durham | 274 | 0.1 | 2.9 | 244 | 0.5 | 35 |
| Freeport | 278 | 0.1 | 2.8 | 239 | 0.5 | 110 |
| Gorham | 452 | 0.1 | 2.0 | 467 | 10.5 | 460 |
| Gray | 273 | 0.1 | 4.9 | 228 | 3.2 | 270 |
| Harpswell | 318 | 0.1 | 3.7 | 300 | 0.5 | 47 |
| Hermon | 202 | 0.1 | 0.4 | 160 | 0.9 | 68 |
| Litchfield | 266 | 0.1 | 4.2 | 262 | 7.0 | 400 |
| New Gloucester | 265 | 0.1 | 2.4 | 251 | 1.0 | 66 |
| Orrington | 216 | 0.1 | 2.0 | 200 | 0.5 | 29 |
| Saco | 236 | 0.1 | 4.3 | 218 | 2.4 | 380 |
| Scarborough | 212 | 0.1 | 4.4 | 182 | 5.2 | 220 |
| Standish | 290 | 0.1 | 4.7 | 632 | 2.0 | 550 |
| Whitefield | 249 | 0.1 | 4.8 | 236 | 0.5 | 84 |
| Windham | 282 | 0.1 | 3.0 | 248 | 2.2 | 140 |
There were 30 locations where wells tested for arsenic exceeded 200 wells.
kable(merged.data %>% arrange(desc(median.y)) %>% select(location, n_wells_tested.y, median.y, maximum.y, n_wells_tested.x, median.x, maximum.x) %>% filter(n_wells_tested.y >= 200), digits=1)
| location | n_wells_tested.y | median.y | maximum.y | n_wells_tested.x | median.x | maximum.x |
|---|---|---|---|---|---|---|
| Manchester | 275 | 14.0 | 200 | 276 | 0.3 | 3.6 |
| Gorham | 467 | 10.5 | 460 | 452 | 0.1 | 2.0 |
| Monmouth | 277 | 10.0 | 368 | 288 | 0.3 | 3.4 |
| Winthrop | 424 | 8.2 | 240 | 453 | 0.3 | 3.7 |
| Readfield | 344 | 7.2 | 280 | 351 | 0.3 | 2.7 |
| Blue Hill | 241 | 7.0 | 930 | 209 | 0.4 | 4.5 |
| Litchfield | 262 | 7.0 | 400 | 266 | 0.1 | 4.2 |
| Buxton | 334 | 6.0 | 670 | 383 | 0.1 | 3.2 |
| Belgrade | 401 | 5.2 | 220 | 417 | 0.3 | 3.8 |
| Augusta | 454 | 4.0 | 320 | 479 | 0.2 | 4.4 |
| Mount Vernon | 217 | 4.0 | 160 | 219 | 0.2 | 6.6 |
| Sidney | 287 | 3.8 | 150 | 312 | 0.2 | 6.8 |
| Ellsworth | 428 | 3.6 | 470 | 503 | 0.5 | 7.0 |
| Gray | 228 | 3.2 | 270 | 273 | 0.1 | 4.9 |
| China | 214 | 2.8 | 92 | 251 | 0.1 | 3.0 |
| Saco | 218 | 2.4 | 380 | 236 | 0.1 | 4.3 |
| Windham | 248 | 2.2 | 140 | 282 | 0.1 | 3.0 |
| Gardiner | 279 | 2.0 | 110 | 299 | 0.2 | 4.0 |
| Standish | 632 | 2.0 | 550 | 290 | 0.1 | 4.7 |
| Chelsea | 237 | 1.0 | 220 | 262 | 0.1 | 3.7 |
| New Gloucester | 251 | 1.0 | 66 | 265 | 0.1 | 2.4 |
| Windsor | 213 | 1.0 | 43 | 227 | 0.2 | 3.1 |
| Brunswick | 255 | 0.5 | 250 | 299 | 0.1 | 5.9 |
| Durham | 244 | 0.5 | 35 | 274 | 0.1 | 2.9 |
| Freeport | 239 | 0.5 | 110 | 278 | 0.1 | 2.8 |
| Harpswell | 300 | 0.5 | 47 | 318 | 0.1 | 3.7 |
| Jefferson | 206 | 0.5 | 69 | 205 | 0.3 | 4.2 |
| Orrington | 200 | 0.5 | 29 | 216 | 0.1 | 2.0 |
| Pittston | 251 | 0.5 | 41 | 249 | 0.3 | 3.9 |
| Whitefield | 236 | 0.5 | 84 | 249 | 0.1 | 4.8 |
I find it interesting that there were only 34 locations with greater than or equal to 200 wells tested for flouride and 30 locations with greater than or equal to 200 wells tested for arsenic. I also find it interesting that there was no information on a sample size that is adequately representative of each location and explaining how to project the median established with the sample size of wells across that location. For example, I would expect less populated locations to have a smaller number of wells to test, but an explanation of how the data should be weighted and/or guidelines around when the number is representative or not would have been helpful. Or even including population of the location would have been helpful.
I didn’t spend much time looking at the percent of wells above the guidelines because it wasn’t very clear to me what the cut off point would be on a percent basis to determine if the well water was suitable to consume/use. For example, is there a margin where the level above the guideline is acceptable to consume? Is the cut off 5% above? 10% above? Does it change for arsenic and flouride? Or is there no level above the guidelines that is safe to consume or use? Is the threshold the same for batheing, dishes, laundry versus consumption? Can the well water be used for different activities at different levels of “contamination” and do the guidelines for maximum exposure for arsenic and/or flouride therefore change depending on the activity?
I did look at the actual median levels of both arsenic and flouride in the well samples and determined that the wells sampled were below maximum exposure levels for most locations (based on the median) for flouride. However, there were three locations where the median exceeded Maine’s Maximum Exposure Guidelines for arsenic and I found that to be very interesting. I’d be interested in understanding how these locations are dealing with well contamination from arsenic or if they are in fact doing anything about this on a town/city/location level. Many locations and households in Maine rely on well water as their water source so it would be interesting to know how these levels have changed since 2015.
It wasn’t surprising to see the three locations with the highest median arsenic levels and the only locations at or above the exposure maximum guidelines (again based on median),in table 9 which shows the top 10 locations based on highest percentage of wells above the guidelines.
If we are curious in looking at locations in Maine with the highest percentage of wells that tested above the guideline for maximum exposure for arsenic, we can see from the data that 292 locations had wells that tested from >0% to almost 59% above the guidelines for arsenic. This number drops to 223 locations had wells that tested from 5% to almost 59% above the the guidelines for arsenic. When changing the percent above guidelines to 15% to 58% that number drops almost in half to 113 locations. See Table 9 for the highest 10 based on locations with the highest percentage of wells above the guidelines for arsenic.
merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >0)
merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >5)
merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >15)
kable(merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >0) %>% top_n(10))
## Selecting by percent_wells_above_guideline.y
| location | n_wells_tested.y | median.y | percent_wells_above_guideline.y |
|---|---|---|---|
| Manchester | 275 | 14.0 | 58.9 |
| Gorham | 467 | 10.5 | 50.1 |
| Columbia | 42 | 9.8 | 50.0 |
| Monmouth | 277 | 10.0 | 49.5 |
| Eliot | 73 | 9.7 | 49.3 |
| Columbia Falls | 25 | 8.1 | 48.0 |
| Winthrop | 424 | 8.2 | 44.8 |
| Hallowell | 65 | 8.6 | 44.6 |
| Buxton | 334 | 6.0 | 43.4 |
| Blue Hill | 241 | 7.0 | 42.7 |
Similarly, if we wanted to look at locations in Maine with the highest percentage of wells that tested above the guidelines for maximum exposure for flouride, we can see from the data that 186 locations had wells that tested from 0% to 30% above the guidelines for flouride. When changing this to 5% to 30% that number drops to 54 location had wells that tested from 5% to 30% above the guidelines for flouride. When changing the percent above guidelines to 15% to 30% the number of locations having wells falling into this range above the guidelines drops to 9 locations. Table 10 shows the highest 10 locations based on the highest percentage of wells above the guidelines for flouride.
merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >0)
merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >5)
merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >15)
kable(merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >0) %>% top_n(10))
## Selecting by percent_wells_above_guideline.x
| location | n_wells_tested.x | median.x | percent_wells_above_guideline.x |
|---|---|---|---|
| Otis | 60 | 1.130 | 30.0 |
| Dedham | 102 | 0.940 | 22.5 |
| Denmark | 46 | 0.450 | 19.6 |
| Surry | 175 | 0.800 | 18.3 |
| Prospect | 57 | 0.785 | 17.5 |
| Eastbrook | 31 | 1.290 | 16.1 |
| Mercer | 32 | 0.600 | 15.6 |
| Fryeburg | 52 | 0.760 | 15.4 |
| Brownfield | 33 | 0.265 | 15.2 |
| Stockton Springs | 56 | 0.600 | 14.3 |
In conclusion, I have more questions around how stringent the guidelines are and the safety concerns around the usage of well water, i.e. the activities the well water is being used for and if there is any room to go over the guidelines from an exposure perspective before I would make a strong argument for or against the use of well water in particular locations. If the guidelines are in fact the maximum exposure for any use of well water, it is concerning that so many locations have any percent of wells testing above the guidelines, let alone having almost 59% of wells above the guidelines for arsenic, like we saw in Manchester.