For this assignment I decided to determine if there were any towns in the top thirty for percentage of wells above guidelines in both arsenic and flouride and see if I could find any commonalities between them based on knowledge of the town.
I started off by importing the data from the csv files for both arsenic and flouride and selecting the top 30 results for both based on percentage of wells above the Maine guidelines. I only selected the “location” and “percent_wells_above_guideline” columns as I wasn’t concerned with the other variables. I then joined the tables together to figure out which towns were in the top 30 based on percentage above Maine guidelines for both arsenic and flouride.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)
arsenicf <- arsenic %>% select(location, percent_wells_above_guideline) %>% arrange(desc(percent_wells_above_guideline)) %>% top_n(30)
## Selecting by percent_wells_above_guideline
arsenicf
## location percent_wells_above_guideline
## 1 Manchester 58.9
## 2 Gorham 50.1
## 3 Columbia 50.0
## 4 Monmouth 49.5
## 5 Eliot 49.3
## 6 Columbia Falls 48.0
## 7 Winthrop 44.8
## 8 Hallowell 44.6
## 9 Buxton 43.4
## 10 Blue Hill 42.7
## 11 Litchfield 42.0
## 12 Hollis 41.4
## 13 Orland 40.7
## 14 Surry 40.3
## 15 Danforth 40.0
## 16 Mariaville 40.0
## 17 Readfield 39.8
## 18 Otis 39.6
## 19 Dayton 37.7
## 20 Sedgwick 37.3
## 21 Mercer 36.4
## 22 Scarborough 35.2
## 23 Saco 34.4
## 24 Camden 34.0
## 25 Trenton 33.7
## 26 Anson 33.3
## 27 Wales 33.3
## 28 Rangeley 33.1
## 29 Oakland 33.0
## 30 Carrabassett Valley 32.5
## 31 Minot 32.5
flouridef <- flouride %>% select(location, percent_wells_above_guideline) %>% arrange(desc(percent_wells_above_guideline)) %>% top_n(30)
## Selecting by percent_wells_above_guideline
flouridef
## location percent_wells_above_guideline
## 1 Otis 30.0
## 2 Dedham 22.5
## 3 Denmark 19.6
## 4 Surry 18.3
## 5 Prospect 17.5
## 6 Eastbrook 16.1
## 7 Mercer 15.6
## 8 Fryeburg 15.4
## 9 Brownfield 15.2
## 10 Stockton Springs 14.3
## 11 Clifton 14.0
## 12 Starks 13.6
## 13 Marshfield 12.9
## 14 Kennebunk 12.7
## 15 Charlotte 12.5
## 16 York 12.4
## 17 Chesterville 12.3
## 18 Stoneham 12.0
## 19 Sedgwick 11.2
## 20 Mechanic Falls 11.1
## 21 Swans Island 10.5
## 22 Franklin 10.3
## 23 Smithfield 10.1
## 24 Biddeford 9.7
## 25 Otisfield 9.7
## 26 Blue Hill 9.6
## 27 Arundel 9.5
## 28 Ellsworth 9.3
## 29 Hiram 8.9
## 30 Norridgewock 8.9
af <- inner_join(arsenicf, flouridef, by = "location")
colnames(af) <- c("town", "% wells above-arsenic", "% wells above-flouride")
kable(af)
| town | % wells above-arsenic | % wells above-flouride |
|---|---|---|
| Blue Hill | 42.7 | 9.6 |
| Surry | 40.3 | 18.3 |
| Otis | 39.6 | 30.0 |
| Sedgwick | 37.3 | 11.2 |
| Mercer | 36.4 | 15.6 |
After figuring out the five results of the query, I decided to locate each town on a map. It turns out that both Surry and Sedgewick border Blue Hill, and Otis is very close to those three. All of them are in the downeast area, all very close to Mount Desert Island. The only one of these towns that is not close to the others is Mercer. Obviously correlation does not automatically mean causation in any way, but it was still interesting to note that 4 out of the 5 towns were in the same geographic area.