Katrina Pooler
library(ggvis)
library(dplyr)
library(knitr)
arsenic = read.csv(url("http://jamessuleiman.com/mba676/assets/units/unit4/arsenic.csv"), stringsAsFactors = FALSE)
names(arsenic) = c("location", "tested", "percent_above_guideline", "median", "percentile", "maximum")
flouride = read.csv(url("http://jamessuleiman.com/mba676/assets/units/unit4/flouride.csv"), stringsAsFactors = FALSE)
names(flouride) = c("location", "tested", "percent_above_guideline", "median", "percentile", "maximum")
I was interested to see if locations with high arsenic levels also tended to have high levels of flouride. First, I looked at the data from all locations and figured out the maximum percentage above guideline for both the arsenic and the flouride, so that I would have something to reference when looking at the percentage above guidelines for the 10 random locations.
Highest Arsenic Level (x):
highest_arsenic = (arsenic %>% select(location, percent_above_guideline) %>% arrange(desc(percent_above_guideline)) %>% top_n(1))
highest_arsenic
## location percent_above_guideline
## 1 Manchester 58.9
Highest Flouride Level (y):
highest_flouride = (flouride %>% select(location, percent_above_guideline) %>% arrange(desc(percent_above_guideline)) %>% top_n(1))
highest_flouride
## location percent_above_guideline
## 1 Otis 30
Then, I chose 10 random locations to look at:
arsenic2 = select(arsenic, -c(percentile, maximum, median, tested))
arsenic2 = filter(arsenic2, location == "Addison"| location == "Greene"| location == "Belmont"| location == "South Berwick"| location == "Rangeley" | location == "Eastport" | location == "Holden" | location == "Madison" | location == "Saco" | location == "China")
flouride2 = select(flouride, location, percent_above_guideline)
flouride2 = filter(flouride2, location == "Addison"| location == "Greene"| location == "Belmont"| location == "South Berwick"| location == "Rangeley" | location == "Eastport" | location == "Holden" | location == "Madison" | location == "Saco" | location == "China")
arsenic_and_flouride_random_locations = merge(arsenic2, flouride2, by = "location")
arsenic_and_flouride_random_locations
## location percent_above_guideline.x percent_above_guideline.y
## 1 Addison 9.6 1.2
## 2 Belmont 15.4 0.0
## 3 China 15.9 1.2
## 4 Eastport 18.5 3.2
## 5 Greene 30.8 1.4
## 6 Holden 5.9 3.2
## 7 Madison 13.2 0.0
## 8 Rangeley 33.1 2.3
## 9 Saco 34.4 1.7
## 10 South Berwick 12.2 2.1
Based on the table, there’s no evidence that water with high levels of arsenic also tends to have high levels of flouride. In fact, some of the highest arsenic levels had some of the lowest levels of flouride.
*Percentage “x” refers to arsenic, percentage “y” refers to flouride.