Testing for Arsenic and Flouride in Maine Drinking Water

By town, 1999-2013

Jake Lowell

September 25, 2016

Establish working Libraries

library(tidyr)
library(knitr)
library(rmarkdown)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggvis)  
input data
setwd("C:/Users/Jake/Desktop/RWD")

flouride  <- read.csv("flouride.csv", header = TRUE,stringsAsFactors = FALSE)

library(reshape)
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
## 
##     rename
## The following object is masked from 'package:tidyr':
## 
##     expand
flouride <- rename(flouride, c(n_wells_tested="wells_tested_fl"))
flouride <- rename(flouride, c(maximum="maximum_fl"))
flouride <- rename(flouride, c(median="median_fl"))

arsenic  <- read.csv("arsenic.csv", header = TRUE,stringsAsFactors = FALSE)
arsenic <- rename(arsenic, c(n_wells_tested="wells_tested_ar"))
arsenic <- rename(arsenic, c(maximum="maximum_ar"))
arsenic <- rename(arsenic, c(median="median_ar"))
#Merge the two data sets

water <-  merge(flouride, arsenic , by ="location")
summary stats for number of wells tested for arsenic and flouride
summary(water$wells_tested_ar)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    5.00   33.99   41.00  632.00
summary(water$wells_tested_fl)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    6.00   38.17   49.00  503.00
Plot of wells tested for arsenic vs. wells tested for flouride
Testing volume for arsenic and flouride is highly correlated, with a couple outliers
library(ggvis)
library(dplyr)
water %>% ggvis(~wells_tested_ar, ~wells_tested_fl) %>% layer_points()