Refer to the detailed instructions for this assignment in Brightspace.
Don’t alter the three code chunks in this section. First we read in the two data sets and deleting missing values.
library(tidyverse)
fluoride <- read_csv("http://jamessuleiman.com/teaching/datasets/fluoride.csv")
fluoride <- fluoride %>% drop_na()
arsenic <- read_csv("http://jamessuleiman.com/teaching/datasets/arsenic.csv")
arsenic <- arsenic %>% drop_na()
Next we display the first few rows of fluoride.
head(fluoride)
## # A tibble: 6 × 6
## location n_wells_tested percent_wells_above_guideline median percen…¹ maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Otis 60 30 1.13 3.2 3.6
## 2 Dedham 102 22.5 0.94 3.27 7
## 3 Denmark 46 19.6 0.45 3.15 3.9
## 4 Surry 175 18.3 0.8 3.52 6.9
## 5 Prospect 57 17.5 0.785 2.5 2.7
## 6 Eastbrook 31 16.1 1.29 2.44 3.3
## # … with abbreviated variable name ¹percentile_95
Then we display the first few rows of arsenic.
head(arsenic)
## # A tibble: 6 × 6
## location n_wells_tested percent_wells_above_gui…¹ median perce…² maximum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Manchester 275 58.9 14 93 200
## 2 Gorham 467 50.1 10.5 130 460
## 3 Columbia 42 50 9.8 65.9 200
## 4 Monmouth 277 49.5 10 110 368
## 5 Eliot 73 49.3 9.7 41.4 45
## 6 Columbia Falls 25 48 8.1 53.8 71
## # … with abbreviated variable names ¹percent_wells_above_guideline,
## # ²percentile_95
In the code chunk below, create a new tibble called
chemicals that joins fluoride and arsenic. You probably
want to do an inner join but the join type is up to you.
chemicals <- fluoride %>% inner_join(arsenic, by= "location")
The next code chunk displays the head of your newly created
chemicals tibble. Take a look to verify that your join
looks ok.
head(chemicals)
## # A tibble: 6 × 11
## location n_wells_te…¹ perce…² media…³ perce…⁴ maxim…⁵ n_wel…⁶ perce…⁷ media…⁸
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Otis 60 30 1.13 3.2 3.6 53 39.6 4.8
## 2 Dedham 102 22.5 0.94 3.27 7 97 17.5 1
## 3 Denmark 46 19.6 0.45 3.15 3.9 42 0 0.25
## 4 Surry 175 18.3 0.8 3.52 6.9 181 40.3 6
## 5 Prospect 57 17.5 0.785 2.5 2.7 50 4 1
## 6 Eastbrook 31 16.1 1.29 2.44 3.3 28 10.7 1.5
## # … with 2 more variables: percentile_95.y <dbl>, maximum.y <dbl>, and
## # abbreviated variable names ¹n_wells_tested.x,
## # ²percent_wells_above_guideline.x, ³median.x, ⁴percentile_95.x, ⁵maximum.x,
## # ⁶n_wells_tested.y, ⁷percent_wells_above_guideline.y, ⁸median.y
In the code chunk below create an interesting subset of the data. You’ll likely find an interesting subset by filtering for locations that have high or low levels of arsenic, flouride, or both.
topfive <- fluoride %>% slice_max(percent_wells_above_guideline, n=5) %>% select(location, percent_wells_above_guideline)
Edit this part to discuss how you selected your interesting subset.
I sorted the top 5 town in terms of fluoride levels above guidelines by location.
Display the first few rows of your interesting subset in the code chunk below.
head(topfive)
## # A tibble: 5 × 2
## location percent_wells_above_guideline
## <chr> <dbl>
## 1 Otis 30
## 2 Dedham 22.5
## 3 Denmark 19.6
## 4 Surry 18.3
## 5 Prospect 17.5
In the code chunk below, create a ggplot visualization of your subset that is fairly simple for a viewer to comprehend.
library(ggplot2)
ggplot(data=topfive, aes(x=percent_wells_above_guideline, y=location)) + geom_bar(stat="identity", color="black", fill="black")
One thing I found interesting about my data set is that all of the
towns, except Denmark, are all in a very tight knit area. This caused me
to draw a conclusion that the high fluoride levels are related to the
whole area around these towns, and not necessarily an issue related to
each individual town. One issue I ran into was trying to sort my data in
a descending order. After lots of research I still was not able to
achieve it. However, I am proud that I was able to complete this project
as I have no prior coding experience, aside from this class, so that was
very rewarding.
Once you are done, knit, publish, and then submit your link to your published RPubs document in Brightspace.