Assignment 2

Instructions

Refer to the detailed instructions for this assignment in Brightspace.

Data Import

Don’t alter the three code chunks in this section. First we read in the two data sets and deleting missing values.

library(tidyverse)
fluoride <- read_csv("http://jamessuleiman.com/teaching/datasets/fluoride.csv")
fluoride <- fluoride %>% drop_na()
arsenic <- read_csv("http://jamessuleiman.com/teaching/datasets/arsenic.csv")
arsenic <- arsenic %>% drop_na()

Next we display the first few rows of fluoride.

head(fluoride)

## # A tibble: 6 x 6
##   location  n_wells_tested percent_wells_above_gui… median percentile_95 maximum
##   <chr>              <dbl>                    <dbl>  <dbl>         <dbl>   <dbl>
## 1 Otis                  60                     30    1.13           3.2      3.6
## 2 Dedham               102                     22.5  0.94           3.27     7  
## 3 Denmark               46                     19.6  0.45           3.15     3.9
## 4 Surry                175                     18.3  0.8            3.52     6.9
## 5 Prospect              57                     17.5  0.785          2.5      2.7
## 6 Eastbrook             31                     16.1  1.29           2.44     3.3

Then we display the first few rows of arsenic.

head(arsenic)

## # A tibble: 6 x 6
##   location    n_wells_tested percent_wells_above_g… median percentile_95 maximum
##   <chr>                <dbl>                  <dbl>  <dbl>         <dbl>   <dbl>
## 1 Manchester             275                   58.9   14            93       200
## 2 Gorham                 467                   50.1   10.5         130       460
## 3 Columbia                42                   50      9.8          65.9     200
## 4 Monmouth               277                   49.5   10           110       368
## 5 Eliot                   73                   49.3    9.7          41.4      45
## 6 Columbia F…             25                   48      8.1          53.8      71

Join data

In the code chunk below, create a new tibble called chemicals that joins fluoride and arsenic. You probably want to do an inner join but the join type is up to you.

chemicals <- arsenic %>%
  inner_join(fluoride, by = "location")

The next code chunk displays the head of your newly created chemicals tibble. Take a look to verify that your join looks ok.

head(chemicals)

## # A tibble: 6 x 11
##   location n_wells_tested.x percent_wells_a… median.x percentile_95.x maximum.x
##   <chr>               <dbl>            <dbl>    <dbl>           <dbl>     <dbl>
## 1 Manches…              275             58.9     14              93         200
## 2 Gorham                467             50.1     10.5           130         460
## 3 Columbia               42             50        9.8            65.9       200
## 4 Monmouth              277             49.5     10             110         368
## 5 Eliot                  73             49.3      9.7            41.4        45
## 6 Columbi…               25             48        8.1            53.8        71
## # … with 5 more variables: n_wells_tested.y <dbl>,
## #   percent_wells_above_guideline.y <dbl>, median.y <dbl>,
## #   percentile_95.y <dbl>, maximum.y <dbl>

Intersting subset

In the code chunk below create an interesting subset of the data. You’ll likely find an interesting subset by filtering for locations that have high or low levels of arsenic, flouride, or both.

chemicals %>% mutate(
  double_exposure = (percent_wells_above_guideline.x > 30 &
      percent_wells_above_guideline.y > 3)) %>% select(location, percent_wells_above_guideline.x, percent_wells_above_guideline.y, double_exposure)

## # A tibble: 341 x 4
##    location     percent_wells_above_gui… percent_wells_above_gu… double_exposure
##    <chr>                           <dbl>                   <dbl> <lgl>          
##  1 Manchester                       58.9                     3.3 TRUE           
##  2 Gorham                           50.1                     0   FALSE          
##  3 Columbia                         50                       1.9 FALSE          
##  4 Monmouth                         49.5                     3.1 TRUE           
##  5 Eliot                            49.3                     0   FALSE          
##  6 Columbia Fa…                     48                       0   FALSE          
##  7 Winthrop                         44.8                     3.1 TRUE           
##  8 Hallowell                        44.6                     0   FALSE          
##  9 Buxton                           43.4                     1   FALSE          
## 10 Blue Hill                        42.7                     9.6 TRUE           
## # … with 331 more rows

This dataset documents the percentage of wells that tested above the maximum exposure guidelines (MEG) for arsenic and fluoride in Maine towns. I chose to look at areas with higher percentages of wells testing positive for arsenic (greater than 30% of wells above MEG) while concurrently having higher percentages of wells testing positive for fluoride (greater than 3% of wells above MEG). I chose to look for these locations because while exposure to either contaminate above suggested guidelines is well documented and studied, it is often lesser studied what the impacts of co-exposures are. Areas with higher percentages of contaminated wells for both arsenic and fluoride should receive monitoring that considers the possibility of co-exposures and potential health impacts. 14 locations met the chosen criteria. I chose to filter the dataframe by the double exposure criteria while keeping in the percentage of wells tested above guidelines for both exposures. Sorting the data by areas with the highest percentages of detection of arsenic above MEG allows us to pinpoint areas where further research can be done.

Display the first few rows of your interesting subset in the code chunk below.

chemicals %>% mutate(
  double_exposure = (percent_wells_above_guideline.x > 30 &
      percent_wells_above_guideline.y > 3)) %>% select(location, percent_wells_above_guideline.x, percent_wells_above_guideline.y, double_exposure) %>% filter(double_exposure==TRUE) %>% slice_max(
percent_wells_above_guideline.x, n = 5)

## # A tibble: 5 x 4
##   location   percent_wells_above_guid… percent_wells_above_guid… double_exposure
##   <chr>                          <dbl>                     <dbl> <lgl>          
## 1 Manchester                      58.9                       3.3 TRUE           
## 2 Monmouth                        49.5                       3.1 TRUE           
## 3 Winthrop                        44.8                       3.1 TRUE           
## 4 Blue Hill                       42.7                       9.6 TRUE           
## 5 Hollis                          41.4                       3.5 TRUE

Visualize your subset

In the code chunk below, create a ggplot visualization of your subset that is fairly simple for a viewer to comprehend.

  chemicals %>% mutate(
  double_exposure = (percent_wells_above_guideline.x > 30 &
      percent_wells_above_guideline.y > 3)) %>% select(location, percent_wells_above_guideline.x, percent_wells_above_guideline.y, double_exposure) %>% filter(double_exposure==TRUE) %>%

ggplot(aes(location, percent_wells_above_guideline.x)) + geom_point() +
  ggtitle("Arsenic Detection in Wells in Towns with Higher Levels of Fluoride Detection") + xlab("Town") + ylab("Wells Tested Above MEG of Arsenic (%)") + theme(axis.text.x = element_text(angle = 40))