Importing Datasets

First I imported the two datasets, arsenic.csv and flouride.csv, and saved them as arsenic and flouride, respectively.

arsenic <- read.csv((file = "arsenic.csv"), header = TRUE, stringsAsFactors = FALSE)

flouride <- read.csv((file = "flouride.csv"), header = TRUE, stringsAsFactors = FALSE)

Report Outline

I decided to look at three different things with these data:

  1. The top 5 and bottom 5 locations for maximum levels of either arsenic or flouride
  2. The top 30 locations in terms of percent of wells above guidelines for arsenic or flouride
  3. The locations that are in the top 30 for both arsenic and flouride

Top and Bottom 5 Maximum Levels

In this section I wanted to see which 5 locations had the absolute highest levels of each arsenic and flouride, as well as which 5 locations were on the bottom of the maximum list. For these I included only locations that had tested at least 20 wells.

Arsenic

These are the locations with the top 5 highest maximum levels of arsenic in wells.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
arsenic_max1 <- arsenic %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(desc(maximum)) %>% top_n(5)
## Selecting by maximum
kable(arsenic_max1)
location maximum
Danforth 3100
Northport 1700
Blue Hill 930
Sedgwick 840
Buxton 670

These are the locations with the bottom 5 maximum (5 lowest) levels of arsenic in wells.

arsenic_min1 <- arsenic %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(maximum) %>% top_n(-5)
## Selecting by maximum
kable(arsenic_min1)
location maximum
Waterford 1.0
Mexico 1.0
Presque Isle 1.0
Andover 1.1
Carthage 1.3

Flouride

These are the locations with the top 5 highest maximum levels of flouride in wells.

flouride_max1 <- flouride %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(desc(maximum)) %>% top_n(5)
## Selecting by maximum
kable(flouride_max1)
location maximum
Anson 14.0
Ashland 10.0
Peru 9.9
Kennebunk 9.6
Raymond 9.1

These are the locations with the bottom 5 maximums (5 lowest) levels of flouride in wells.

flouride_min1 <- flouride %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(maximum) %>% top_n(-5)
## Selecting by maximum
kable(flouride_min1)
location maximum
Hodgdon 0.1
Boothbay Harbor 0.1
Wallagrass 0.1
Sangerville 0.1
Garland 0.1
Sherman 0.1
Etna 0.1
Newburgh 0.1

Top 30 locations for percent of wells above guideline

In this section I looked at only two columns, the location and the percent of wells above guideline. I created a table with the top 30 locations for arsenic and a separate table for the top 30 for flouride. Again, I only included locations that had at least 20 wells measured.

Arsenic

These are the top 30 locations for arsenic.

arsenic2 <- arsenic %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, ar_percent_above = percent_wells_above_guideline) %>% arrange(desc(ar_percent_above)) %>% top_n(30)
## Selecting by ar_percent_above
kable(arsenic2, digits = 1)
location ar_percent_above
Manchester 58.9
Gorham 50.1
Columbia 50.0
Monmouth 49.5
Eliot 49.3
Columbia Falls 48.0
Winthrop 44.8
Hallowell 44.6
Buxton 43.4
Blue Hill 42.7
Litchfield 42.0
Hollis 41.4
Orland 40.7
Surry 40.3
Mariaville 40.0
Danforth 40.0
Readfield 39.8
Otis 39.6
Dayton 37.7
Sedgwick 37.3
Mercer 36.4
Scarborough 35.2
Saco 34.4
Camden 34.0
Trenton 33.7
Anson 33.3
Wales 33.3
Rangeley 33.1
Oakland 33.0
Carrabassett Valley 32.5
Minot 32.5

Flouride

These are the top 30 locations for flouride.

flouride2 <- flouride %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, fl_percent_above = percent_wells_above_guideline) %>% arrange(desc(fl_percent_above)) %>% top_n(30)
## Selecting by fl_percent_above
kable(flouride2, digits = 1)
location fl_percent_above
Otis 30.0
Dedham 22.5
Denmark 19.6
Surry 18.3
Prospect 17.5
Eastbrook 16.1
Mercer 15.6
Fryeburg 15.4
Brownfield 15.2
Stockton Springs 14.3
Clifton 14.0
Starks 13.6
Marshfield 12.9
Kennebunk 12.7
Charlotte 12.5
York 12.4
Chesterville 12.3
Stoneham 12.0
Sedgwick 11.2
Mechanic Falls 11.1
Swans Island 10.5
Franklin 10.3
Smithfield 10.1
Otisfield 9.7
Biddeford 9.7
Blue Hill 9.6
Arundel 9.5
Ellsworth 9.3
Hiram 8.9
Norridgewock 8.9

Locations in the Top 30 for arsenic and flouride

In this section I joined the dataframes from the previous section to see which locations showed up in the top 30 for both arsenic and flouride levels about the guideline. I then created a graph of the results. There were only 5 locations that showed up in both top 30 lists, so the scatterplot is not very informative.

Table and Scatterplot for joined data

library(ggvis)
arsenic_and_flouride <- arsenic2 %>% inner_join(flouride2)
## Joining, by = "location"
kable(arsenic_and_flouride, digits=1)
location ar_percent_above fl_percent_above
Blue Hill 42.7 9.6
Surry 40.3 18.3
Otis 39.6 30.0
Sedgwick 37.3 11.2
Mercer 36.4 15.6
arsenic_and_flouride %>% ggvis(~ar_percent_above, ~fl_percent_above) %>% layer_points()