Lily Parenteau Assignment 1

Importing Datasets

First I imported the two datasets, arsenic.csv and flouride.csv, and saved them as arsenic and flouride, respectively.

arsenic <- read.csv((file = "arsenic.csv"), header = TRUE, stringsAsFactors = FALSE)

flouride <- read.csv((file = "flouride.csv"), header = TRUE, stringsAsFactors = FALSE)

Report Outline

I decided to look at three different things with these data:

The top 5 and bottom 5 locations for maximum levels of either arsenic or flouride
The top 30 locations in terms of percent of wells above guidelines for arsenic or flouride
The locations that are in the top 30 for both arsenic and flouride

Top and Bottom 5 Maximum Levels

In this section I wanted to see which 5 locations had the absolute highest levels of each arsenic and flouride, as well as which 5 locations were on the bottom of the maximum list. For these I included only locations that had tested at least 20 wells.

Arsenic

These are the locations with the top 5 highest maximum levels of arsenic in wells.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(knitr)
arsenic_max1 <- arsenic %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(desc(maximum)) %>% top_n(5)

## Selecting by maximum

kable(arsenic_max1)

location	maximum
Danforth	3100
Northport	1700
Blue Hill	930
Sedgwick	840
Buxton	670

These are the locations with the bottom 5 maximum (5 lowest) levels of arsenic in wells.

arsenic_min1 <- arsenic %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(maximum) %>% top_n(-5)

## Selecting by maximum

kable(arsenic_min1)

location	maximum
Waterford	1.0
Mexico	1.0
Presque Isle	1.0
Andover	1.1
Carthage	1.3

Flouride

These are the locations with the top 5 highest maximum levels of flouride in wells.

flouride_max1 <- flouride %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(desc(maximum)) %>% top_n(5)

## Selecting by maximum

kable(flouride_max1)

location	maximum
Anson	14.0
Ashland	10.0
Peru	9.9
Kennebunk	9.6
Raymond	9.1

These are the locations with the bottom 5 maximums (5 lowest) levels of flouride in wells.

flouride_min1 <- flouride %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, maximum) %>% arrange(maximum) %>% top_n(-5)

## Selecting by maximum

kable(flouride_min1)

location	maximum
Hodgdon	0.1
Boothbay Harbor	0.1
Wallagrass	0.1
Sangerville	0.1
Garland	0.1
Sherman	0.1
Etna	0.1
Newburgh	0.1

Top 30 locations for percent of wells above guideline

In this section I looked at only two columns, the location and the percent of wells above guideline. I created a table with the top 30 locations for arsenic and a separate table for the top 30 for flouride. Again, I only included locations that had at least 20 wells measured.

Arsenic

These are the top 30 locations for arsenic.

arsenic2 <- arsenic %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, ar_percent_above = percent_wells_above_guideline) %>% arrange(desc(ar_percent_above)) %>% top_n(30)

## Selecting by ar_percent_above

kable(arsenic2, digits = 1)

location	ar_percent_above
Manchester	58.9
Gorham	50.1
Columbia	50.0
Monmouth	49.5
Eliot	49.3
Columbia Falls	48.0
Winthrop	44.8
Hallowell	44.6
Buxton	43.4
Blue Hill	42.7
Litchfield	42.0
Hollis	41.4
Orland	40.7
Surry	40.3
Mariaville	40.0
Danforth	40.0
Readfield	39.8
Otis	39.6
Dayton	37.7
Sedgwick	37.3
Mercer	36.4
Scarborough	35.2
Saco	34.4
Camden	34.0
Trenton	33.7
Anson	33.3
Wales	33.3
Rangeley	33.1
Oakland	33.0
Carrabassett Valley	32.5
Minot	32.5

Flouride

These are the top 30 locations for flouride.

flouride2 <- flouride %>% arrange(n_wells_tested) %>% filter(n_wells_tested >=20) %>% select(location, fl_percent_above = percent_wells_above_guideline) %>% arrange(desc(fl_percent_above)) %>% top_n(30)

## Selecting by fl_percent_above

kable(flouride2, digits = 1)

location	fl_percent_above
Otis	30.0
Dedham	22.5
Denmark	19.6
Surry	18.3
Prospect	17.5
Eastbrook	16.1
Mercer	15.6
Fryeburg	15.4
Brownfield	15.2
Stockton Springs	14.3
Clifton	14.0
Starks	13.6
Marshfield	12.9
Kennebunk	12.7
Charlotte	12.5
York	12.4
Chesterville	12.3
Stoneham	12.0
Sedgwick	11.2
Mechanic Falls	11.1
Swans Island	10.5
Franklin	10.3
Smithfield	10.1
Otisfield	9.7
Biddeford	9.7
Blue Hill	9.6
Arundel	9.5
Ellsworth	9.3
Hiram	8.9
Norridgewock	8.9

Locations in the Top 30 for arsenic and flouride

In this section I joined the dataframes from the previous section to see which locations showed up in the top 30 for both arsenic and flouride levels about the guideline. I then created a graph of the results. There were only 5 locations that showed up in both top 30 lists, so the scatterplot is not very informative.

Table and Scatterplot for joined data

library(ggvis)
arsenic_and_flouride <- arsenic2 %>% inner_join(flouride2)

## Joining, by = "location"

kable(arsenic_and_flouride, digits=1)

location	ar_percent_above	fl_percent_above
Blue Hill	42.7	9.6
Surry	40.3	18.3
Otis	39.6	30.0
Sedgwick	37.3	11.2
Mercer	36.4	15.6

arsenic_and_flouride %>% ggvis(~ar_percent_above, ~fl_percent_above) %>% layer_points()

Lily Parenteau Assignment 1

Lily

September 25, 2016

Importing Datasets

Report Outline

Top and Bottom 5 Maximum Levels

Arsenic

Flouride

Top 30 locations for percent of wells above guideline

Arsenic

Flouride

Locations in the Top 30 for arsenic and flouride

Table and Scatterplot for joined data