Data Import

Don’t alter the three code chunks in this section. First we read in the two data sets and deleting missing values.

library(tidyverse)
fluoride <- read_csv("http://jamessuleiman.com/teaching/datasets/fluoride.csv")
fluoride <- fluoride %>% drop_na()
arsenic <- read_csv("http://jamessuleiman.com/teaching/datasets/arsenic.csv")
arsenic <- arsenic %>% drop_na()

Next we display the first few rows of fluoride.

head(fluoride)

## # A tibble: 6 x 6
##   location  n_wells_tested percent_wells_above_gui… median percentile_95 maximum
##   <chr>              <dbl>                    <dbl>  <dbl>         <dbl>   <dbl>
## 1 Otis                  60                     30    1.13           3.2      3.6
## 2 Dedham               102                     22.5  0.94           3.27     7  
## 3 Denmark               46                     19.6  0.45           3.15     3.9
## 4 Surry                175                     18.3  0.8            3.52     6.9
## 5 Prospect              57                     17.5  0.785          2.5      2.7
## 6 Eastbrook             31                     16.1  1.29           2.44     3.3

Then we display the first few rows of arsenic.

head(arsenic)

## # A tibble: 6 x 6
##   location    n_wells_tested percent_wells_above_g… median percentile_95 maximum
##   <chr>                <dbl>                  <dbl>  <dbl>         <dbl>   <dbl>
## 1 Manchester             275                   58.9   14            93       200
## 2 Gorham                 467                   50.1   10.5         130       460
## 3 Columbia                42                   50      9.8          65.9     200
## 4 Monmouth               277                   49.5   10           110       368
## 5 Eliot                   73                   49.3    9.7          41.4      45
## 6 Columbia F…             25                   48      8.1          53.8      71

Join data

In the code chunk below, create a new tibble called chemicals that joins fluoride and arsenic. You probably want to do an inner join but the join type is up to you.

chemicals <- fluoride %>% inner_join(arsenic, by = "location")

The next code chunk displays the head of your newly created chemicals tibble. Take a look to verify that your join looks ok.

## # A tibble: 6 x 11
##   location n_wells_tested.x percent_wells_a… median.x percentile_95.x maximum.x
##   <chr>               <dbl>            <dbl>    <dbl>           <dbl>     <dbl>
## 1 Otis                   60             30      1.13             3.2        3.6
## 2 Dedham                102             22.5    0.94             3.27       7  
## 3 Denmark                46             19.6    0.45             3.15       3.9
## 4 Surry                 175             18.3    0.8              3.52       6.9
## 5 Prospect               57             17.5    0.785            2.5        2.7
## 6 Eastbro…               31             16.1    1.29             2.44       3.3
## # … with 5 more variables: n_wells_tested.y <dbl>,
## #   percent_wells_above_guideline.y <dbl>, median.y <dbl>,
## #   percentile_95.y <dbl>, maximum.y <dbl>

Intersting subset

In the code chunk below create an interesting subset of the data. You’ll likely find an interesting subset by filtering for locations that have high or low levels of arsenic, flouride, or both.

interesting <- chemicals %>% select(location, percent_wells_above_guideline.y, percent_wells_above_guideline.x) %>% 
filter( percent_wells_above_guideline.y >= 40)

interesting1 <- interesting %>% rename(Flouride = percent_wells_above_guideline.x, Arsenic = percent_wells_above_guideline.y, Location = location) %>%
  arrange(desc(Arsenic))

interesting1

## # A tibble: 16 x 3
##    Location       Arsenic Flouride
##    <chr>            <dbl>    <dbl>
##  1 Manchester        58.9      3.3
##  2 Gorham            50.1      0  
##  3 Columbia          50        1.9
##  4 Monmouth          49.5      3.1
##  5 Eliot             49.3      0  
##  6 Columbia Falls    48        0  
##  7 Winthrop          44.8      3.1
##  8 Hallowell         44.6      0  
##  9 Buxton            43.4      1  
## 10 Blue Hill         42.7      9.6
## 11 Litchfield        42        1.9
## 12 Hollis            41.4      3.5
## 13 Orland            40.7      8.6
## 14 Surry             40.3     18.3
## 15 Mariaville        40        7.5
## 16 Danforth          40        0

Edit this part to discuss how you selected your interesting subset.

Given the data I had, I wanted to see how and if levels of Arsenic were a determining factor in levels of Fluoride. I was hoping, but failed to prove that bad water generally really bad. This project was a little harder than I thought as I had gotten everything correct, only to get to the graphing section, where I had to rethink my data in order to create an easy to read graph. I ended up not being able to figure out how to incorporate both fluoride and arsenic data on one graph despite trying different coding techniques for hours. I also had issues as you can see with a tibble displaying in the ggplot section. In order to sort/manage my data I created a few new variables which some how resulted in a tibble of my data displaying.

Display the first few rows of your interesting subset in the code chunk below.

## # A tibble: 5 x 3
##   Location   Arsenic Flouride
##   <chr>        <dbl>    <dbl>
## 1 Manchester    58.9      3.3
## 2 Gorham        50.1      0  
## 3 Columbia      50        1.9
## 4 Monmouth      49.5      3.1
## 5 Eliot         49.3      0

Visualize your subset

In the code chunk below, create a ggplot visualization of your subset that is fairly simple for a viewer to comprehend.

## # A tibble: 16 x 3
##    Location       Arsenic Flouride
##    <chr>            <dbl>    <dbl>
##  1 Manchester        58.9      3.3
##  2 Gorham            50.1      0  
##  3 Columbia          50        1.9
##  4 Monmouth          49.5      3.1
##  5 Eliot             49.3      0  
##  6 Columbia Falls    48        0  
##  7 Winthrop          44.8      3.1
##  8 Hallowell         44.6      0  
##  9 Buxton            43.4      1  
## 10 Blue Hill         42.7      9.6
## 11 Litchfield        42        1.9
## 12 Hollis            41.4      3.5
## 13 Orland            40.7      8.6
## 14 Surry             40.3     18.3
## 15 Mariaville        40        7.5
## 16 Danforth          40        0

Once you are done, knit, publish, and then submit your link to your published RPubs document in Brightspace.

Assignment 2

Evan Libby

Data Import

Join data

Intersting subset

Visualize your subset