Assignment 1

Arsenic and Flouride Levels in Maine Wells

For Assignment 1, we looked at data from Maine Tracking Network regarding the level of arsenic and flouride in wells throughout Maine.

I wanted to focus on some way to present the data in a positive way as my approach. To do this I wanted to ensure I discovered two things:

number and overall percentage of towns that were below the arsenic and flouride levels found to be unsafe
What towns make up this safe-area

Retriving the Data

To begin I’ve pulled in the data from Maine Tracking Network.

library(tidyr)
library(knitr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggvis)
library(DT)
arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)

Organizing the Information

I began by viewing the names of the variables in each data set.

names(arsenic)

## [1] "location"                      "n_wells_tested"               
## [3] "percent_wells_above_guideline" "median"                       
## [5] "percentile_95"                 "maximum"

names(flouride)

## [1] "location"                      "n_wells_tested"               
## [3] "percent_wells_above_guideline" "median"                       
## [5] "percentile_95"                 "maximum"

After pulling in the data, I wanted to ensure I could differentiate between the arsenic and flouride variables in my comparisons. To do this I renamed the variables that were contained in each set of data. I did not rename town, as this variable will be the same in both sets of data.

names(arsenic) <- c("Town", "ArsWells_Tested", "ArsPercent_Above", "ArsMedian", "ArsPercentile_95", "ArsMaximum")

names(flouride) <- c("Town", "FloWells_Tested", "FloPercent_Above_Guidelines", "FloMedian", "FloPercentile_95", "FloMaximum")

I wanted to clean up the data a bit by removing rows that would not have sufficient information. In the information provided on our course website for Assignment 1, we are told that:

For locations with fewer than 20 wells tested, only the number of wells tested and the maximum value are displayed. All test results reported as less than the laboratory’s limit of detection were replaced with a value that is one-half of the detection limit.

This indicates to me that towns with fewer than 20 wells tested do not have the information I will need to discuss the towns that have good rates of safe water in their wells. I will therefore exclude this information from my data set. I did note that reports under the detection limit would be valued at one-half the limit, but I didn’t feel this would limit the use of those data points. I also wanted to ensure the data was sorted in ascending order of percent over guidelines, so that I would have the lowest number over guidelines at the top of the data set.

arsenic_testingover20 <- arsenic %>% select(Town, ArsPercent_Above, ArsWells_Tested) %>% filter(ArsWells_Tested >= 20)

flouride_testingover20 <- flouride %>% select(Town, FloPercent_Above_Guidelines, FloWells_Tested) %>% filter(FloWells_Tested >= 20)

I then joined the data to have the information I Wanted to focus on from both data sets together.

PercentageAboveGuidelines <- arsenic_testingover20 %>%inner_join(flouride_testingover20)

## Joining, by = "Town"

Illustrating the Results

To illustrate the data that I’ve now organized and cleaned up, I’ve used two methods.

PercentageAboveGuidelines %>% ggvis(~ArsPercent_Above, ~FloPercent_Above_Guidelines) %>% layer_points()

This scatterplot is a good visual interpretation as it shows just how many of the data points fall along the zero percentage above safe guidelines, for both flouride and arsenic. There are data throughout the grid, but the general cluster is in the bottom left corner of the scatterplot near zero percent above.

I wanted to also break the information down to show the towns that are all at 0% above the safe guidelines for both flouride and arsenic.

SafeArsenicLevelTowns <-arsenic_testingover20 %>% select(Town, ArsPercent_Above) %>% filter(ArsPercent_Above == 0)
SafeFlourideLevelTowns <- flouride_testingover20 %>% select(Town, FloPercent_Above_Guidelines) %>% filter(FloPercent_Above_Guidelines == 0)

SafeWaterTowns <- SafeArsenicLevelTowns %>% inner_join(SafeFlourideLevelTowns)

## Joining, by = "Town"

kable(SafeWaterTowns)

Town	ArsPercent_Above	FloPercent_Above_Guidelines
Andover	0	0
Arrowsic	0	0
Bethel	0	0
Boothbay Harbor	0	0
Canton	0	0
Caribou	0	0
Carthage	0	0
Cranberry Isles	0	0
Damariscotta	0	0
Dixfield	0	0
Eagle Lake	0	0
Edgecomb	0	0
Fort Fairfield	0	0
Hudson	0	0
Industry	0	0
Lubec	0	0
Madawaska	0	0
Mapleton	0	0
Mexico	0	0
Newburgh	0	0
Pittsfield	0	0
Presque Isle	0	0
Sangerville	0	0
Southport	0	0
Wallagrass	0	0
Westport Island	0	0
Woodville	0	0

To find out what percentage of towns this is, I will use the subset that has safe levels for both flouride and arsenic. There are approximately 8% of towns with over 20 wells sampled that have a safe level of both.

nrow(SafeWaterTowns)

## [1] 27

nrow(SafeWaterTowns)/nrow(PercentageAboveGuidelines)

## [1] 0.07917889

Then I will find out the percentage that has safe levels for one or the other of these. This works out to approximately 57% of towns with over 20 wells sampled, as shown in the calculations below. The results of this are a much larger data set, so I’ve used the DT datatable function to enable an easier view and search functionality.

SafeArs_or_Flo_Towns <- SafeArsenicLevelTowns %>% full_join(SafeFlourideLevelTowns)

## Joining, by = "Town"

nrow(SafeArs_or_Flo_Towns)/nrow(PercentageAboveGuidelines)

## [1] 0.5777126

Note: I used the full join function above in hopes of doing the R equivalent of an outer join on the two sets of towns with 0 percent above guidelines for arsenic and flouride.

DT:: datatable(SafeArs_or_Flo_Towns)

Summary

The data shows that although not many towns have safe levels of both arsenic and flouride in well water (only 8%), most do have a safe level of one or the other. I set out in hopes of illustrating the scale of this issue, which I think has been accomplished. It is unfortunately showing that the issue is incredibly wide-spread, with a very small percentage falling under the guidelines for arsenic and flouride levels. Also, the scatterplot seemed to illustrate a more promising outcome, with the data appearing to be heavily weighted to 0% for one test or the other. It just shows how data can be represnted to illustrate the point you are hoping for it to make, depending on how you set up the visualization.

To conclude, I have noted responses to the items outlined as goals at the start:

What is the number and overall percentage of towns that were below the arsenic and flouride levels found to be unsafe? Twenty-seven towns have safe levels of both arsenic and flouride, making it 8% of all towns with over 20 wells sampled.
What towns make up this safe-area? The towns are listed in the chart shown above