In this report I will be analyzing a subset of two data sets that contain flouride and arsenic levels, by town, for private well water samples tested by the State of Maine Health and Environmental Testing Laboratory (HETL) between the years 1999 and 2013. The files flouride.csv and arsenic.csv were downloaded from the Maine Tracking Network.

To begin, I read in the two csv files, arsenic.csv and flouride.csv, into the respective data frames, arsenic and flouride.

arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)
flouride <- read.csv("flouride.csv", header = TRUE, stringsAsFactors = FALSE)

Since I needed to tidy and sort the data before it could be analyzed, I used two additional packages tidyr and dplyr.

library(tidyr)
library(dplyr)

Next, I used the head() command to look at the variable names of the two dataframes I just created.

arsenic
head(arsenic)
##         location n_wells_tested percent_wells_above_guideline median
## 1     Manchester            275                          58.9   14.0
## 2         Gorham            467                          50.1   10.5
## 3       Columbia             42                          50.0    9.8
## 4       Monmouth            277                          49.5   10.0
## 5          Eliot             73                          49.3    9.7
## 6 Columbia Falls             25                          48.0    8.1
##   percentile_95 maximum
## 1         93.00     200
## 2        130.00     460
## 3         65.90     200
## 4        110.00     368
## 5         41.35      45
## 6         53.75      71
flouride
head(flouride)
##    location n_wells_tested percent_wells_above_guideline median
## 1      Otis             60                          30.0  1.130
## 2    Dedham            102                          22.5  0.940
## 3   Denmark             46                          19.6  0.450
## 4     Surry            175                          18.3  0.800
## 5  Prospect             57                          17.5  0.785
## 6 Eastbrook             31                          16.1  1.290
##   percentile_95 maximum
## 1         3.200     3.6
## 2         3.270     7.0
## 3         3.150     3.9
## 4         3.525     6.9
## 5         2.500     2.7
## 6         2.445     3.3

For this analysis, I have decided to narrow my focus to the towns and cities with the 50 highest percentages of wells above Maine’s Maximum Exposure Guideline for arsenic (10 micrograms per liter) and flouride (2 milligrams per liter), respectively. Within that subset, I would like to know which towns and cities place in both categories. These towns and cities could then be recommended as the first target areas for arsenic and flouride rememediation.

For this reason, I renamed some of the variables in each dataset to make for easier analysis. The variables I am looking at are “location” and “percent_wells_above_quideline” but I need to be able to distinguish “percent_wells_above_guideline” for arsenic and flouride. I also wanted to rename “location” “city_town” to make the nature of the variable more specific. I do this using the names() command.

names(arsenic) <- c("town_city", "n_wells_tested", "arsenic_wells_above", "median", "percentile_95", "maximum")
names(arsenic)
## [1] "town_city"           "n_wells_tested"      "arsenic_wells_above"
## [4] "median"              "percentile_95"       "maximum"
names(flouride) <- c("town_city", "n_wells_tested", "flouride_wells_above", "median", "percentile_95", "maximum")
names(flouride)
## [1] "town_city"            "n_wells_tested"       "flouride_wells_above"
## [4] "median"               "percentile_95"        "maximum"

Next, I created two new dataframes that just contain the data I want to analyze and excluding “n_wells_test”, “median”, “percentile_95”, and “maximum”.

arsenic
arsenic_wells <- arsenic %>% select(town_city, arsenic_wells_above) %>% top_n(50)
## Selecting by arsenic_wells_above
arsenic_wells
##              town_city arsenic_wells_above
## 1           Manchester                58.9
## 2               Gorham                50.1
## 3             Columbia                50.0
## 4             Monmouth                49.5
## 5                Eliot                49.3
## 6       Columbia Falls                48.0
## 7             Winthrop                44.8
## 8            Hallowell                44.6
## 9               Buxton                43.4
## 10           Blue Hill                42.7
## 11          Litchfield                42.0
## 12              Hollis                41.4
## 13              Orland                40.7
## 14               Surry                40.3
## 15            Danforth                40.0
## 16          Mariaville                40.0
## 17           Readfield                39.8
## 18                Otis                39.6
## 19              Dayton                37.7
## 20            Sedgwick                37.3
## 21              Mercer                36.4
## 22         Scarborough                35.2
## 23                Saco                34.4
## 24              Camden                34.0
## 25             Trenton                33.7
## 26               Anson                33.3
## 27               Wales                33.3
## 28            Rangeley                33.1
## 29             Oakland                33.0
## 30 Carrabassett Valley                32.5
## 31               Minot                32.5
## 32           Kingfield                32.1
## 33            Belgrade                31.2
## 34         Millinocket                31.0
## 35            Rockport                31.0
## 36              Greene                30.8
## 37           Ellsworth                29.7
## 38           Fairfield                29.7
## 39           Owls Head                29.7
## 40        Mount Vernon                29.5
## 41              Starks                28.6
## 42           Biddeford                28.3
## 43        Island Falls                27.6
## 44              Auburn                27.2
## 45             Arundel                27.1
## 46           Penobscot                27.1
## 47            Standish                26.9
## 48              Sidney                26.8
## 49                Rome                26.6
## 50             Augusta                26.4
flouride
flouride_wells <- flouride %>% select(town_city, flouride_wells_above) %>% top_n(50)
## Selecting by flouride_wells_above
flouride_wells
##           town_city flouride_wells_above
## 1              Otis                 30.0
## 2            Dedham                 22.5
## 3           Denmark                 19.6
## 4             Surry                 18.3
## 5          Prospect                 17.5
## 6         Eastbrook                 16.1
## 7            Mercer                 15.6
## 8          Fryeburg                 15.4
## 9        Brownfield                 15.2
## 10 Stockton Springs                 14.3
## 11          Clifton                 14.0
## 12           Starks                 13.6
## 13       Marshfield                 12.9
## 14        Kennebunk                 12.7
## 15        Charlotte                 12.5
## 16             York                 12.4
## 17     Chesterville                 12.3
## 18         Stoneham                 12.0
## 19         Sedgwick                 11.2
## 20   Mechanic Falls                 11.1
## 21     Swans Island                 10.5
## 22         Franklin                 10.3
## 23       Smithfield                 10.1
## 24        Biddeford                  9.7
## 25        Otisfield                  9.7
## 26        Blue Hill                  9.6
## 27          Arundel                  9.5
## 28        Ellsworth                  9.3
## 29            Hiram                  8.9
## 30     Norridgewock                  8.9
## 31           Orland                  8.6
## 32          Jackman                  8.3
## 33 Southwest Harbor                  8.1
## 34        Greenwood                  8.0
## 35         Hartland                  8.0
## 36       New Sweden                  8.0
## 37       Farmington                  7.9
## 38            Casco                  7.6
## 39       Mariaville                  7.5
## 40       Gouldsboro                  7.3
## 41           Lovell                  7.0
## 42          Cornish                  6.9
## 43      Millinocket                  6.9
## 44         Bridgton                  6.6
## 45        Penobscot                  5.9
## 46           Sumner                  5.9
## 47          Machias                  5.8
## 48         Belgrade                  5.5
## 49     Mount Desert                  5.5
## 50        Jefferson                  5.4

Finally, I joined the two subset dataframes to generate the list of towns and cities that have the top 50 highest percentage of wells with levels above guideline for arsenic AND flouride. I use the inner_join() command to do this.

arsenic_flouride_wells <- arsenic_wells %>% inner_join(flouride_wells)
## Joining, by = "town_city"
arsenic_flouride_wells
##      town_city arsenic_wells_above flouride_wells_above
## 1    Blue Hill                42.7                  9.6
## 2       Orland                40.7                  8.6
## 3        Surry                40.3                 18.3
## 4   Mariaville                40.0                  7.5
## 5         Otis                39.6                 30.0
## 6     Sedgwick                37.3                 11.2
## 7       Mercer                36.4                 15.6
## 8     Belgrade                31.2                  5.5
## 9  Millinocket                31.0                  6.9
## 10   Ellsworth                29.7                  9.3
## 11      Starks                28.6                 13.6
## 12   Biddeford                28.3                  9.7
## 13     Arundel                27.1                  9.5
## 14   Penobscot                27.1                  5.9