Assignment #1

Arsenic Data Tidy

arsenic = read.csv(url("http://jamessuleiman.com/mba676/assets/units/unit4/arsenic.csv"), stringsAsFactors = FALSE)
  names(arsenic) = c("location", "tested", "percent_above_guideline", "median", "percentile", "maximum")

Flouride Data Tidy

flouride = read.csv(url("http://jamessuleiman.com/mba676/assets/units/unit4/flouride.csv"), stringsAsFactors = FALSE)
names(flouride) = c("location", "tested", "percent_above_guideline", "median", "percentile", "maximum")

Do high arsenic levels tend to mean high flouride levels?

I was interested to see if locations with high arsenic levels also tended to have high levels of flouride. First, I looked at the data from all locations and figured out the maximum percentage above guideline for both the arsenic and the flouride, so that I would have something to reference when looking at the percentage above guidelines for the 10 random locations.

Highest Arsenic Level (x):

highest_arsenic = (arsenic %>% select(location, percent_above_guideline) %>% arrange(desc(percent_above_guideline)) %>% top_n(1))
highest_arsenic

##     location percent_above_guideline
## 1 Manchester                    58.9

Highest Flouride Level (y):

highest_flouride = (flouride %>% select(location, percent_above_guideline) %>% arrange(desc(percent_above_guideline)) %>% top_n(1))
highest_flouride

##   location percent_above_guideline
## 1     Otis                      30

Then, I chose 10 random locations to look at:

arsenic2 = select(arsenic, -c(percentile, maximum, median, tested))
arsenic2 = filter(arsenic2, location == "Addison"| location == "Greene"| location == "Belmont"| location == "South Berwick"| location == "Rangeley" | location == "Eastport" | location == "Holden" | location == "Madison" | location == "Saco" | location == "China")  

flouride2 = select(flouride, location, percent_above_guideline)
flouride2 = filter(flouride2, location == "Addison"| location == "Greene"| location == "Belmont"| location == "South Berwick"| location == "Rangeley" | location == "Eastport" | location == "Holden" | location == "Madison" | location == "Saco" | location == "China")

arsenic_and_flouride_random_locations = merge(arsenic2, flouride2, by = "location")
arsenic_and_flouride_random_locations

##         location percent_above_guideline.x percent_above_guideline.y
## 1        Addison                       9.6                       1.2
## 2        Belmont                      15.4                       0.0
## 3          China                      15.9                       1.2
## 4       Eastport                      18.5                       3.2
## 5         Greene                      30.8                       1.4
## 6         Holden                       5.9                       3.2
## 7        Madison                      13.2                       0.0
## 8       Rangeley                      33.1                       2.3
## 9           Saco                      34.4                       1.7
## 10 South Berwick                      12.2                       2.1

Based on the table, there’s no evidence that water with high levels of arsenic also tends to have high levels of flouride. In fact, some of the highest arsenic levels had some of the lowest levels of flouride.

*Percentage “x” refers to arsenic, percentage “y” refers to flouride.