Introduction

Between the years of 1999 and 2013, The Maine Tracking Network collected flouride and arsenic levels all of tthe state of Maine. For all the locations that were tested, the locations with less than 20 wells tested we are given limited data. For all the other locations we are given…

location number of wells tested percentage of wells tested above maximum exposure guideline median the 95th percentile on mg/L or ug/L maximum

Each aspect of the data can give us different information, and help us better understand and get a better analysis on the data. When you first start looking at the data there are 917 observations for each of the two data sets. So to get this data to be a bit more condence, I decided to focus on the locations that were in the 95th percentile for arsenic. There is a table included that includes that locations in the 95th percentile for fluoride and arsenic.

Preperations

## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.5'
## (as 'lib' is unspecified)
## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.5'
## (as 'lib' is unspecified)

Data

Used data can be found at…

fluoride <- read.csv(url("http://jamessuleiman.com/teaching/datasets/fluoride.csv"),
                    stringsAsFactors = FALSE)
arsenic <- read.csv(url("http://jamessuleiman.com/teaching/datasets/arsenic.csv"),
                    stringsAsFactors = FALSE)

Tables

##     location n_wells_tested percent_wells_above_guideline
## 1   Columbia             42                            50
## 2      Eliot             73                          49.3
## 3     Gorham            467                          50.1
## 4 Manchester            275                          58.9
## 5   Monmouth            277                          49.5

This first table shows the number of wells tested for arsenic, and how many of those were above the guidlines.

##     location n_wells_tested percent_wells_above_guideline
## 1   Columbia             54                           1.9
## 2      Eliot             84                             0
## 3     Gorham            452                             0
## 4 Manchester            276                           3.3
## 5   Monmouth            288                           3.1

Very similar to the first table, table two takes the same locations and shows the number of wells tested for fluoride, and how many of those tested were above guidelines. Both tables are easy to read, but going forward they will look a bit nicer!

##     location n_wells_tested.x percent_wells_above_guideline.x
## 1   Columbia               42                              50
## 2      Eliot               73                            49.3
## 3     Gorham              467                            50.1
## 4 Manchester              275                            58.9
## 5   Monmouth              277                            49.5
##   n_wells_tested.y percent_wells_above_guideline.y
## 1               54                             1.9
## 2               84                               0
## 3              452                               0
## 4              276                             3.3
## 5              288                             3.1

The merged dataframe above shows the arsenic (x), and fluoride (y) results. You can scroll once to see the results for the fluoride. This merged dataframe is a bit cimplicated to read, so in the table below allows you to easier view the data on the percent of wells above guidelines and the number of wells tested, and the relation between the two.

library(knitr)

kable(merged.df[1:5, ], caption="Number of Wells Tested & Percentage Above Guidelines by Location")
Number of Wells Tested & Percentage Above Guidelines by Location
location n_wells_tested.x percent_wells_above_guideline.x n_wells_tested.y percent_wells_above_guideline.y
Columbia 42 50 54 1.9
Eliot 73 49.3 84 0
Gorham 467 50.1 452 0
Manchester 275 58.9 276 3.3
Monmouth 277 49.5 288 3.1
  colnames <- c("location", "n_wells_tested.x", "percent_wells_above_guideline.x", "n_wells_tested.y", "percent_wells_above_guideline.y")
colnames <- c("Towns", "wells tested for arsenic", "percentage above guidelines","wells tested for fluoride", "percentage above guidelines")




library(knitr)
library(kableExtra)
kable(merged.df) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
location n_wells_tested.x percent_wells_above_guideline.x n_wells_tested.y percent_wells_above_guideline.y
Columbia 42 50 54 1.9
Eliot 73 49.3 84 0
Gorham 467 50.1 452 0
Manchester 275 58.9 276 3.3
Monmouth 277 49.5 288 3.1

Plots

For this first plot I decided to look at the arsenic levels that were in the 95th percentile of readings in locations that tested a significant amount of wells tested in relation to their wells above guidelines or in the 95th percentile. I really was curious to see if there was a connection between the number of wells tested and the percentile. In this case, Eliot, and Columbia had the lowest amount of wells tested but their percent of wells tested were very similar to the other locations.

Conclusion

What I found interesting about this assignment was the amount of data that was created. When just starting out, it was an overwhelming amount of data, but when you look at it its extremely interesting to see which locations weere higher in arsenci or flouride, depending. I currently am living in California, and we aren’t even able to drink out of the tap water, and not many people I want to, I am missing Bangor which had 95 wells tested for arsenic and only 3.2% were above guidelines. I found the data really interesting and would be interested to see how the levels compare to other states depending on climate, rainfall, etc. This assignment was extremely challenging for me, but one that I feel that I learned a lot from, and have learned the value in the software. the biggest issues I had were getting started. I found that on DataCamp if I was stuck I could easily go back to the videos, which I did do a few times for this assignment, and I could get help from DataCamp on the questions and I think that I have been relying on it too much. I have been taking notes and watching the videos all semester but I found when it was time to take what I had learned I drew a blank at first, and I was completely lost. Once I got going, the major thing that tripped me up was creating the two different tables, and ensuring that there wasn’t an overwhelming amount of data on the tables. My final challenge is one that I am still trying to figure out, I tried to change the column names in my table multiple times, and have tried a few different ways, but haven’t had any luck yet. I am going to keep trying! I found this assignment to be stressful and chellenging but one that I really needed in order to fully understand what I have been learrning. I also learned that I am going to start assignment number 2 a lot sooner.