Assignment 1

The files flouride.csv and arsenic.csv were downloaded from the Maine Tracking Network and contain flouride and arsenic levels, by town, for private well water samples tested by the State of Maine Health and Environmental Testing Laboratory (HETL) between the years 1999 and 2013.

For locations with fewer than 20 wells tested, only the number of wells tested and the maximum value are displayed. All test results reported as less than the laboratory’s limit of detection were replaced with a value that is one-half of the detection limit. Unit abbreviations are: mg/L for milligrams per liter, ug/L for micrograms per liter.

Maine’s Maximum Exposure Guideline for fluoride is 2 milligrams per liter (mg/L). For arsenic is 10 micrograms per liter (ug/L).

The State of Maine Health and Environmental Testing Laboratory provided these data. The table was prepared by the Maine Environmental Public Health Tracking Program. The complete data set contains water test results from 46,855 private wells in Maine. Revision Date: 08/2015.

Goals

The purpose of this study is to identify the locations in Maine with the highest percentages of wells that were tested above state’s maximum exposure guidelines for both fluoride and arsenic. These results will be shared with the policy makers in hopes to remedy this alarming situation with the drinking water in Maine.

Data Cleaning

I started this very important work by loading a few libraries and reading the data files flouride.csv and arsenic.csv into corresponding data frames that were named arsenic and fluoride respectively.

library(tidyr)

## Warning: package 'tidyr' was built under R version 3.3.3

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.3.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(knitr)

## Warning: package 'knitr' was built under R version 3.3.3

library(ggvis)

## Warning: package 'ggvis' was built under R version 3.3.3

arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)  
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)

I used the command “names” to check the tables’ headings for consistency:

names(arsenic)

## [1] "location"                      "n_wells_tested"               
## [3] "percent_wells_above_guideline" "median"                       
## [5] "percentile_95"                 "maximum"

names(flouride)

## [1] "location"                      "n_wells_tested"               
## [3] "percent_wells_above_guideline" "median"                       
## [5] "percentile_95"                 "maximum"

The final step in the cleaning process involved re-naming some of the variable names, so that they better reflect the data:

names(arsenic) <- c("Location", "Arsenic_Wells_Tested", "Arsenic_Above_Guidelines", "Arsenic_Median", "Arsenic_Percentile_95", "Arsenic_Maximum")

names(arsenic)

## [1] "Location"                 "Arsenic_Wells_Tested"    
## [3] "Arsenic_Above_Guidelines" "Arsenic_Median"          
## [5] "Arsenic_Percentile_95"    "Arsenic_Maximum"

names(flouride) <- c("Location", "Fluoride_Wells_Tested", "Flouride_Above_Guidelines", "Fluoride_Median", "Fluoride_Percentile_95", "Fluoride_Maximum")

names(flouride)

## [1] "Location"                  "Fluoride_Wells_Tested"    
## [3] "Flouride_Above_Guidelines" "Fluoride_Median"          
## [5] "Fluoride_Percentile_95"    "Fluoride_Maximum"

Analysis

To address the issue with arsenic contamination, one needs to see which locations have the most wells with arsenic levels above 10 micrograms per liter (ug/L).

Most_Arsenic_Wells<- arsenic %>% select(Location, Arsenic_Above_Guidelines) %>% top_n(25)

## Selecting by Arsenic_Above_Guidelines

## Warning: package 'bindrcpp' was built under R version 3.3.3

kable(Most_Arsenic_Wells, digits = 1)

Location	Arsenic_Above_Guidelines
Manchester	58.9
Gorham	50.1
Columbia	50.0
Monmouth	49.5
Eliot	49.3
Columbia Falls	48.0
Winthrop	44.8
Hallowell	44.6
Buxton	43.4
Blue Hill	42.7
Litchfield	42.0
Hollis	41.4
Orland	40.7
Surry	40.3
Danforth	40.0
Mariaville	40.0
Readfield	39.8
Otis	39.6
Dayton	37.7
Sedgwick	37.3
Mercer	36.4
Scarborough	35.2
Saco	34.4
Camden	34.0
Trenton	33.7

The same is easily done with the data that we have on flouride:

Most_Flouride_Wells <- flouride %>% select(Location, Flouride_Above_Guidelines) %>% top_n(25)

## Selecting by Flouride_Above_Guidelines

kable(Most_Flouride_Wells, digits = 1)

Location	Flouride_Above_Guidelines
Otis	30.0
Dedham	22.5
Denmark	19.6
Surry	18.3
Prospect	17.5
Eastbrook	16.1
Mercer	15.6
Fryeburg	15.4
Brownfield	15.2
Stockton Springs	14.3
Clifton	14.0
Starks	13.6
Marshfield	12.9
Kennebunk	12.7
Charlotte	12.5
York	12.4
Chesterville	12.3
Stoneham	12.0
Sedgwick	11.2
Mechanic Falls	11.1
Swans Island	10.5
Franklin	10.3
Smithfield	10.1
Biddeford	9.7
Otisfield	9.7

Finally, I merged both files to see the locations that are in the most grave danger of poisoning from BOTH chemicals:

Arsenic_Flouride_Wells <- Most_Arsenic_Wells %>% inner_join (Most_Flouride_Wells)## Joining, by = "Location"

## Joining, by = "Location"

kable(Arsenic_Flouride_Wells)

Location	Arsenic_Above_Guidelines	Flouride_Above_Guidelines
Surry	40.3	18.3
Otis	39.6	30.0
Sedgwick	37.3	11.2
Mercer	36.4	15.6

Conclusion

The locations in Maine that have the most arsenic and fluoride contamination are: Surry, Otis, Sedgwick and Mercer. The local government should take urgent actions in the area to remedy this alarming situation.

Assignment 1

Viacheslav Tomenko

September 22, 2017

Goals

Data Cleaning

Analysis

Conclusion