Assignment 1: Flouride and Arsenic Levels in Wells in Maine

Loading all necessary libraries.

library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library (ggvis)
library(knitr)
library(pander)

Importing Dataset “Arsenic”

Only showing the “glimpse” of this dataset to avoid putting table with all 917 rows.

glimpse(arsenic)
## Observations: 917
## Variables: 6
## $ location                      <chr> "Manchester", "Gorham", "Columbi...
## $ n_wells_tested                <int> 275, 467, 42, 277, 73, 25, 424, ...
## $ percent_wells_above_guideline <dbl> 58.9, 50.1, 50.0, 49.5, 49.3, 48...
## $ median                        <dbl> 14.00, 10.50, 9.80, 10.00, 9.70,...
## $ percentile_95                 <dbl> 93.00, 130.00, 65.90, 110.00, 41...
## $ maximum                       <dbl> 200, 460, 200, 368, 45, 71, 240,...
dim(arsenic)
## [1] 917   6

Importing Dataset “Flouride”

Only showing the “glimpse” of this dataset to avoid putting table with all 917 rows.

glimpse(flouride)
## Observations: 917
## Variables: 6
## $ location                      <chr> "Otis", "Dedham", "Denmark", "Su...
## $ n_wells_tested                <int> 60, 102, 46, 175, 57, 31, 32, 52...
## $ percent_wells_above_guideline <dbl> 30.0, 22.5, 19.6, 18.3, 17.5, 16...
## $ median                        <dbl> 1.130, 0.940, 0.450, 0.800, 0.78...
## $ percentile_95                 <dbl> 3.200, 3.270, 3.150, 3.525, 2.50...
## $ maximum                       <dbl> 3.6, 7.0, 3.9, 6.9, 2.7, 3.3, 6....
dim(flouride)
## [1] 917   6

Merging Arsenic and Flouride Datasets into One

After merging both the flouride and arsenic datasets, the data was still a little unclear. Once the datasets were merged correctly, the percent of wells above guidelines (variable: “percent_wells_above_guidelines”) for both x and y (which represent flouride, x and arsenic, y) as well as the percentile rank (variable: “percentile_95”) were removed.

Min_Max_Only was created from the merged data on flouride and arsenic levels, this variables shows Location, Number of Wells Tested, Median, and Maximum values captured by the observations.

Looking at the median values for each location by flouride and arsenic, it become clear that there are quite a few outliers with very high median arsenic levels.

Figure 1: Median Flouride (x) and Arsenic (y) Levels

merged.data %>% ggvis(~median.x, ~median.y) %>% layer_points()

Figure 2: Maximum Flouride (x) and Arsenic (y) Levels

merged.data %>% ggvis(~maximum.x, ~maximum.y) %>% layer_points()

Arsenic Levels in Maine Wells

Looking at Table 1, where we can see the top 10 location by highest maximum arsenic levels in table form. It shows Danforth has the highest reported level of maximum arsenic (at 3100 ug/L) and a sample size of 35 wells. This maximum arsenic level is much higher than even the next location, Northport (at 1700 ug/L) which was clear from the Figure 2 scatterplot that plotted maximums for flouride against maximums for arsenic. However, Danforth’s maximum flouride level is not as high as other locations where high arsenic levels were also reported.

Table 1:Top 10 Locations by highest Maximum Arsenic Levels

(Also showing Flouride Information on location, median and maximum levels)

kable(merged.data %>% arrange(desc(maximum.y)) %>% top_n(10) %>% select(location, n_wells_tested.y, median.y, maximum.y, n_wells_tested.x, median.x, maximum.x),digits=1)
## Selecting by maximum.y
location n_wells_tested.y median.y maximum.y n_wells_tested.x median.x maximum.x
Danforth 35 5.0 3100 35 0.2 1.9
Northport 157 2.1 1700 87 0.3 4.1
Blue Hill 241 7.0 930 209 0.4 4.5
Sedgwick 142 4.2 840 143 0.4 4.2
Buxton 334 6.0 670 383 0.1 3.2
Standish 632 2.0 550 290 0.1 4.7
Ellsworth 428 3.6 470 503 0.5 7.0
Surry 181 6.0 470 175 0.8 6.9
Gorham 467 10.5 460 452 0.1 2.0
Matinicus Isle Plt 27 0.6 460 7 NA 1.2

Table 2: Top 10 Locations by highest Median Arsenic Levels

(Also showing Flouride Information on location, median and maximum levels)

kable(merged.data %>% arrange(desc(median.y)) %>% slice(1:10) %>% select(location, n_wells_tested.y, median.y, maximum.y, n_wells_tested.x, median.x, maximum.x),digits=1)
location n_wells_tested.y median.y maximum.y n_wells_tested.x median.x maximum.x
Manchester 275 14.0 200 276 0.3 3.6
Gorham 467 10.5 460 452 0.1 2.0
Monmouth 277 10.0 368 288 0.3 3.4
Columbia 42 9.8 200 54 0.3 4.3
Eliot 73 9.7 45 84 0.2 1.5
Hallowell 65 8.6 431 59 0.1 1.6
Winthrop 424 8.2 240 453 0.3 3.7
Columbia Falls 25 8.1 71 38 0.2 0.9
Mariaville 30 7.2 57 40 0.4 4.9
Readfield 344 7.2 280 351 0.3 2.7

Danforth may have the highest maximum arsenic level, but as seen in Table 2, Manchester has the highest median arsenic level and with a sample size of 275 wells, that high median doesn’t suggest the water quality is very safe to drink in that location. Especially because Maine’s Maximum Exposure Guidelines for arsenic are 10 ug/L and with a median in Manchester of 14.0 ug/L it seems that some of the wells in this location are likely over the recommendated exposure guidelines. Which is even more evident later on in the analysis, see Table 8 for Manchester’s percentage of wells above the guidelines (58.9%).

Table 3 shows that in addition to Manchester, two other locations in Maine have wells with median levels of arsenic at or over the Maine’s Maximum Exposure Guidelines of 10 ug/L.

Table 3: Towns with Wells that have Median Arsenic Levels Above Maine’s Maximum Exposure Guidelines (10 ug/L)

Arsenic_higherthanadvised <- merged.data %>% select(location, n_wells_tested.y, median.y, maximum.y) %>% arrange(desc(median.y)) %>% filter(median.y >=10)
Arsenic_higherthanadvised
##     location n_wells_tested.y median.y maximum.y
## 1 Manchester              275     14.0       200
## 2     Gorham              467     10.5       460
## 3   Monmouth              277     10.0       368

Furthermore, there were 370 locations that had maximum levels of arsenic that were above the exposure guidelines.

Flouride Levels in Maine Wells

Looking at the top 10 locations based on maximum flouride levels, we can see that Anson has the highest level of maximum flouride reported in the sample of 40 wells. Similarly, Anson’s median arsenic level is relatively high as well, compared to the other locations with high flouride levels.

Table 4: Top 10 Locations for highest Maximum Flouride Levels

kable(merged.data %>% arrange(desc(maximum.x)) %>% slice(1:10) %>% select(location, n_wells_tested.x, median.x, maximum.x, n_wells_tested.y, median.y, maximum.y), digits=1)
location n_wells_tested.x median.x maximum.x n_wells_tested.y median.y maximum.y
Anson 40 0.1 14.0 36 2.0 110.0
Ashland 27 0.1 10.0 20 0.5 6.9
Frenchville 12 NA 10.0 11 NA 6.9
Peru 58 0.1 9.9 54 0.5 25.0
Kennebunk 110 0.1 9.6 94 1.8 31.0
Frenchboro 6 NA 9.1 4 NA 162.0
Raymond 181 0.3 9.1 173 0.5 20.0
Limington 108 0.1 8.1 104 1.0 85.0
Falmouth 167 0.1 7.1 134 2.3 25.0
Dedham 102 0.9 7.0 97 1.0 43.0

Table 5: Top 10 Locations by highest Median Flouride Levels

(Also showing Arsenic Information on location, median and maximum levels)

kable(merged.data %>% arrange(desc(median.x)) %>% slice(1:10) %>% select(location, n_wells_tested.x, median.x, maximum.x, n_wells_tested.y, median.y, maximum.y), digits=1)
location n_wells_tested.x median.x maximum.x n_wells_tested.y median.y maximum.y
Eastbrook 31 1.3 3.3 28 1.5 41
Otis 60 1.1 3.6 53 4.8 200
Marshfield 31 1.0 4.4 26 1.0 64
Dedham 102 0.9 7.0 97 1.0 43
Surry 175 0.8 6.9 181 6.0 470
Prospect 57 0.8 2.7 50 1.0 16
Fryeburg 52 0.8 4.1 37 0.5 7
Mercer 32 0.6 6.1 33 4.5 40
Rome 82 0.6 4.5 79 5.5 55
Stockton Springs 56 0.6 3.3 63 0.8 250

Anson may have the highest maximum flouride level, but Eastbrook has the highest median flouride level. However, none of the sample wells in any location showed median flouride levels above Maine’s Maximum Exposure Guidelines of 2 mg/L. Although both Anson and Eastbrook reported having wells above the guidelines for flouride, see Table 6 below.

Table 6: Looking at Anson and Eastbrook for Percent of Wells Above Flouride Guidelines

kable(merged.data %>% filter(percent_wells_above_guideline.y, location == "Anson" |location == "Eastbrook") %>% select(location, n_wells_tested.x, percent_wells_above_guideline.x, median.x, maximum.x))
location n_wells_tested.x percent_wells_above_guideline.x median.x maximum.x
Anson 40 5.0 0.10 14.0
Eastbrook 31 16.1 1.29 3.3

Additionally, there were 211 locations with maximum flouride values that were higher than the exposure guidelines, but none with medians that were over.

Towns with Wells that have Median Flouride Levels Above Maine’s Maximum Exposure Guidelines (2 mg/L)

There were 34 locations where wells tested for flouride exceeded 200 wells.

Table 7: Locations Where Number of Sample Wells Exceeds 200 (tested for Flouride)

kable(merged.data %>% arrange(desc(median.x)) %>% select(location, n_wells_tested.x, median.x, maximum.x, n_wells_tested.y, median.y, maximum.y) %>% filter(n_wells_tested.x >= 200), digits=1)
location n_wells_tested.x median.x maximum.x n_wells_tested.y median.y maximum.y
Ellsworth 503 0.5 7.0 428 3.6 470
Blue Hill 209 0.4 4.5 241 7.0 930
Belgrade 417 0.3 3.8 401 5.2 220
Winthrop 453 0.3 3.7 424 8.2 240
Manchester 276 0.3 3.6 275 14.0 200
Monmouth 288 0.3 3.4 277 10.0 368
Pittston 249 0.3 3.9 251 0.5 41
Readfield 351 0.3 2.7 344 7.2 280
Jefferson 205 0.3 4.2 206 0.5 69
Windsor 227 0.2 3.1 213 1.0 43
Augusta 479 0.2 4.4 454 4.0 320
Gardiner 299 0.2 4.0 279 2.0 110
Mount Vernon 219 0.2 6.6 217 4.0 160
Sidney 312 0.2 6.8 287 3.8 150
Wells 207 0.2 3.5 184 0.5 34
Brunswick 299 0.1 5.9 255 0.5 250
Buxton 383 0.1 3.2 334 6.0 670
Chelsea 262 0.1 3.7 237 1.0 220
China 251 0.1 3.0 214 2.8 92
Cumberland 216 0.1 1.4 186 1.0 87
Durham 274 0.1 2.9 244 0.5 35
Freeport 278 0.1 2.8 239 0.5 110
Gorham 452 0.1 2.0 467 10.5 460
Gray 273 0.1 4.9 228 3.2 270
Harpswell 318 0.1 3.7 300 0.5 47
Hermon 202 0.1 0.4 160 0.9 68
Litchfield 266 0.1 4.2 262 7.0 400
New Gloucester 265 0.1 2.4 251 1.0 66
Orrington 216 0.1 2.0 200 0.5 29
Saco 236 0.1 4.3 218 2.4 380
Scarborough 212 0.1 4.4 182 5.2 220
Standish 290 0.1 4.7 632 2.0 550
Whitefield 249 0.1 4.8 236 0.5 84
Windham 282 0.1 3.0 248 2.2 140

There were 30 locations where wells tested for arsenic exceeded 200 wells.

Table 8: Locations Where Number of Sample Wells Exceeds 200 (tested for Arsenic)

kable(merged.data %>% arrange(desc(median.y)) %>% select(location, n_wells_tested.y, median.y, maximum.y, n_wells_tested.x, median.x, maximum.x) %>% filter(n_wells_tested.y >= 200), digits=1)
location n_wells_tested.y median.y maximum.y n_wells_tested.x median.x maximum.x
Manchester 275 14.0 200 276 0.3 3.6
Gorham 467 10.5 460 452 0.1 2.0
Monmouth 277 10.0 368 288 0.3 3.4
Winthrop 424 8.2 240 453 0.3 3.7
Readfield 344 7.2 280 351 0.3 2.7
Blue Hill 241 7.0 930 209 0.4 4.5
Litchfield 262 7.0 400 266 0.1 4.2
Buxton 334 6.0 670 383 0.1 3.2
Belgrade 401 5.2 220 417 0.3 3.8
Augusta 454 4.0 320 479 0.2 4.4
Mount Vernon 217 4.0 160 219 0.2 6.6
Sidney 287 3.8 150 312 0.2 6.8
Ellsworth 428 3.6 470 503 0.5 7.0
Gray 228 3.2 270 273 0.1 4.9
China 214 2.8 92 251 0.1 3.0
Saco 218 2.4 380 236 0.1 4.3
Windham 248 2.2 140 282 0.1 3.0
Gardiner 279 2.0 110 299 0.2 4.0
Standish 632 2.0 550 290 0.1 4.7
Chelsea 237 1.0 220 262 0.1 3.7
New Gloucester 251 1.0 66 265 0.1 2.4
Windsor 213 1.0 43 227 0.2 3.1
Brunswick 255 0.5 250 299 0.1 5.9
Durham 244 0.5 35 274 0.1 2.9
Freeport 239 0.5 110 278 0.1 2.8
Harpswell 300 0.5 47 318 0.1 3.7
Jefferson 206 0.5 69 205 0.3 4.2
Orrington 200 0.5 29 216 0.1 2.0
Pittston 251 0.5 41 249 0.3 3.9
Whitefield 236 0.5 84 249 0.1 4.8

I find it interesting that there were only 34 locations with greater than or equal to 200 wells tested for flouride and 30 locations with greater than or equal to 200 wells tested for arsenic. I also find it interesting that there was no information on a sample size that is adequately representative of each location and explaining how to project the median established with the sample size of wells across that location. For example, I would expect less populated locations to have a smaller number of wells to test, but an explanation of how the data should be weighted and/or guidelines around when the number is representative or not would have been helpful. Or even including population of the location would have been helpful.

I didn’t spend much time looking at the percent of wells above the guidelines because it wasn’t very clear to me what the cut off point would be on a percent basis to determine if the well water was suitable to consume/use. For example, is there a margin where the level above the guideline is acceptable to consume? Is the cut off 5% above? 10% above? Does it change for arsenic and flouride? Or is there no level above the guidelines that is safe to consume or use? Is the threshold the same for batheing, dishes, laundry versus consumption? Can the well water be used for different activities at different levels of “contamination” and do the guidelines for maximum exposure for arsenic and/or flouride therefore change depending on the activity?

I did look at the actual median levels of both arsenic and flouride in the well samples and determined that the wells sampled were below maximum exposure levels for most locations (based on the median) for flouride. However, there were three locations where the median exceeded Maine’s Maximum Exposure Guidelines for arsenic and I found that to be very interesting. I’d be interested in understanding how these locations are dealing with well contamination from arsenic or if they are in fact doing anything about this on a town/city/location level. Many locations and households in Maine rely on well water as their water source so it would be interesting to know how these levels have changed since 2015.

It wasn’t surprising to see the three locations with the highest median arsenic levels and the only locations at or above the exposure maximum guidelines (again based on median),in table 9 which shows the top 10 locations based on highest percentage of wells above the guidelines.

Looking at Locations in Maine Where The Percent of Wells Arsenic and/or Flouride Levels Were Above Maximum Exposure Guidelines

If we are curious in looking at locations in Maine with the highest percentage of wells that tested above the guideline for maximum exposure for arsenic, we can see from the data that 292 locations had wells that tested from >0% to almost 59% above the guidelines for arsenic. This number drops to 223 locations had wells that tested from 5% to almost 59% above the the guidelines for arsenic. When changing the percent above guidelines to 15% to 58% that number drops almost in half to 113 locations. See Table 9 for the highest 10 based on locations with the highest percentage of wells above the guidelines for arsenic.

merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >0)
merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >5)
merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >15)

Table 9: Top 10 Locations in Maine with the Highest Percentage of Wells Above the Guidelines for Arsenic

kable(merged.data %>% arrange(desc(percent_wells_above_guideline.y)) %>% select(location, n_wells_tested.y, median.y, percent_wells_above_guideline.y) %>% filter(percent_wells_above_guideline.y >0) %>% top_n(10))
## Selecting by percent_wells_above_guideline.y
location n_wells_tested.y median.y percent_wells_above_guideline.y
Manchester 275 14.0 58.9
Gorham 467 10.5 50.1
Columbia 42 9.8 50.0
Monmouth 277 10.0 49.5
Eliot 73 9.7 49.3
Columbia Falls 25 8.1 48.0
Winthrop 424 8.2 44.8
Hallowell 65 8.6 44.6
Buxton 334 6.0 43.4
Blue Hill 241 7.0 42.7

Similarly, if we wanted to look at locations in Maine with the highest percentage of wells that tested above the guidelines for maximum exposure for flouride, we can see from the data that 186 locations had wells that tested from 0% to 30% above the guidelines for flouride. When changing this to 5% to 30% that number drops to 54 location had wells that tested from 5% to 30% above the guidelines for flouride. When changing the percent above guidelines to 15% to 30% the number of locations having wells falling into this range above the guidelines drops to 9 locations. Table 10 shows the highest 10 locations based on the highest percentage of wells above the guidelines for flouride.

merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >0)
merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >5)
merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >15)

Table 10: Top 10 Locations in Maine with the Highest Percentage of Wells Above the Guidelines for Flouride

kable(merged.data %>% arrange(desc(percent_wells_above_guideline.x)) %>% select(location, n_wells_tested.x, median.x, percent_wells_above_guideline.x) %>% filter(percent_wells_above_guideline.x >0) %>% top_n(10))
## Selecting by percent_wells_above_guideline.x
location n_wells_tested.x median.x percent_wells_above_guideline.x
Otis 60 1.130 30.0
Dedham 102 0.940 22.5
Denmark 46 0.450 19.6
Surry 175 0.800 18.3
Prospect 57 0.785 17.5
Eastbrook 31 1.290 16.1
Mercer 32 0.600 15.6
Fryeburg 52 0.760 15.4
Brownfield 33 0.265 15.2
Stockton Springs 56 0.600 14.3

In conclusion, I have more questions around how stringent the guidelines are and the safety concerns around the usage of well water, i.e. the activities the well water is being used for and if there is any room to go over the guidelines from an exposure perspective before I would make a strong argument for or against the use of well water in particular locations. If the guidelines are in fact the maximum exposure for any use of well water, it is concerning that so many locations have any percent of wells testing above the guidelines, let alone having almost 59% of wells above the guidelines for arsenic, like we saw in Manchester.