Instructions

Refer to the detailed instructions for this assignment in Brightspace.

Data Import

Don’t alter the three code chunks in this section. First we read in the two data sets and deleting missing values.

library(tidyverse)
fluoride <- read_csv("http://jamessuleiman.com/teaching/datasets/fluoride.csv")
fluoride <- fluoride %>% drop_na()
arsenic <- read_csv("http://jamessuleiman.com/teaching/datasets/arsenic.csv")
arsenic <- arsenic %>% drop_na()

Next we display the first few rows of fluoride.

head(fluoride)
## # A tibble: 6 x 6
##   location  n_wells_tested percent_wells_above_gui… median percentile_95 maximum
##   <chr>              <dbl>                    <dbl>  <dbl>         <dbl>   <dbl>
## 1 Otis                  60                     30    1.13           3.2      3.6
## 2 Dedham               102                     22.5  0.94           3.27     7  
## 3 Denmark               46                     19.6  0.45           3.15     3.9
## 4 Surry                175                     18.3  0.8            3.52     6.9
## 5 Prospect              57                     17.5  0.785          2.5      2.7
## 6 Eastbrook             31                     16.1  1.29           2.44     3.3

Then we display the first few rows of arsenic.

head(arsenic)
## # A tibble: 6 x 6
##   location    n_wells_tested percent_wells_above_g… median percentile_95 maximum
##   <chr>                <dbl>                  <dbl>  <dbl>         <dbl>   <dbl>
## 1 Manchester             275                   58.9   14            93       200
## 2 Gorham                 467                   50.1   10.5         130       460
## 3 Columbia                42                   50      9.8          65.9     200
## 4 Monmouth               277                   49.5   10           110       368
## 5 Eliot                   73                   49.3    9.7          41.4      45
## 6 Columbia F…             25                   48      8.1          53.8      71

Join data

In the code chunk below, create a new tibble called chemicals that joins fluoride and arsenic. You probably want to do an inner join but the join type is up to you.

chemicals <- inner_join(arsenic, fluoride, by="location")

The next code chunk displays the head of your newly created chemicals tibble. Take a look to verify that your join looks ok.

head(chemicals)
## # A tibble: 6 x 11
##   location n_wells_tested.x percent_wells_a… median.x percentile_95.x maximum.x
##   <chr>               <dbl>            <dbl>    <dbl>           <dbl>     <dbl>
## 1 Manches…              275             58.9     14              93         200
## 2 Gorham                467             50.1     10.5           130         460
## 3 Columbia               42             50        9.8            65.9       200
## 4 Monmouth              277             49.5     10             110         368
## 5 Eliot                  73             49.3      9.7            41.4        45
## 6 Columbi…               25             48        8.1            53.8        71
## # … with 5 more variables: n_wells_tested.y <dbl>,
## #   percent_wells_above_guideline.y <dbl>, median.y <dbl>,
## #   percentile_95.y <dbl>, maximum.y <dbl>

Interesting subset

In the code chunk below create an interesting subset of the data. You’ll likely find an interesting subset by filtering for locations that have high or low levels of arsenic, flouride, or both.

percent_wells_abv_avg <- chemicals %>% rename(percent_wells_above_arsenic_guideline = percent_wells_above_guideline.x, percent_wells_above_fluoride_guideline = percent_wells_above_guideline.y) %>% filter(percent_wells_above_arsenic_guideline > mean(percent_wells_above_arsenic_guideline) & percent_wells_above_fluoride_guideline > mean(percent_wells_above_fluoride_guideline)) %>% select(location, percent_wells_above_fluoride_guideline, percent_wells_above_arsenic_guideline)
arrange(percent_wells_abv_avg, desc(percent_wells_above_fluoride_guideline))
## # A tibble: 40 x 3
##    location       percent_wells_above_fluoride_g… percent_wells_above_arsenic_g…
##    <chr>                                    <dbl>                          <dbl>
##  1 Otis                                      30                             39.6
##  2 Dedham                                    22.5                           17.5
##  3 Surry                                     18.3                           40.3
##  4 Mercer                                    15.6                           36.4
##  5 Stockton Spri…                            14.3                           15.9
##  6 Clifton                                   14                             19.4
##  7 Starks                                    13.6                           28.6
##  8 Sedgwick                                  11.2                           37.3
##  9 Franklin                                  10.3                           17.6
## 10 Smithfield                                10.1                           14.6
## # … with 30 more rows

Looking at the data initially, I began to recall that our house has special water filters because of the arsenic levels in the well water (I live right next to Buxton). I was curious if the area we live in also has high fluoride levels, so I chose to investigate which towns had high arsenic and fluoride levels in private well water. To do this, I first considered looking at the maximum levels of arsenic (ug/L) and fluoride (mg/L) by town and sort the data in descending order. I also realized the maximum variable might be unreliable because it included towns with fewer than 20 wells tested. Then I decided that the percent of wells above the guideline was a better indication of the prevalence of arsenic and fluoride in well water by towns in Maine. I calculated the means of “percent of wells above the guideline” for both arsenic and fluoride. I then filtered the chemicals tibble to show only the towns with “percent of wells above the guideline” above the mean. I then arranged the data in descending order to see the towns with the high percentages of wells with fluoride and arsenic above the guidelines. While Buxton has a high percentage of wells above the guidelines for arsenic (43.4), it is not in this subset, meaning the percent of wells above the guideline for fluoride was not above the mean. I can now visualize which towns have a high percentage of wells with fluoride and arsenic, mostly south of Bangor toward the coast (Stockton Springs, Surry, Otis, Clifton, Dedham).

Display the first few rows of your interesting subset in the code chunk below.

top_n(percent_wells_abv_avg, 6, percent_wells_above_fluoride_guideline)
## # A tibble: 6 x 3
##   location       percent_wells_above_fluoride_gu… percent_wells_above_arsenic_g…
##   <chr>                                     <dbl>                          <dbl>
## 1 Surry                                      18.3                           40.3
## 2 Otis                                       30                             39.6
## 3 Mercer                                     15.6                           36.4
## 4 Clifton                                    14                             19.4
## 5 Dedham                                     22.5                           17.5
## 6 Stockton Spri…                             14.3                           15.9

Visualize your subset

In the code chunk below, create a ggplot visualization of your subset that is fairly simple for a viewer to comprehend.

## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

Once you are done, knit, publish, and then submit your link to your published RPubs document in Brightspace.