Between the years of 1999 and 2013, The Maine Tracking Network collected flouride and arsenic levels all of tthe state of Maine. For all the locations that were tested, the locations with less than 20 wells tested we are given limited data. For all the other locations we are given…
location number of wells tested percentage of wells tested above maximum exposure guideline median the 95th percentile on mg/L or ug/L maximum
Each aspect of the data can give us different information, and help us better understand and get a better analysis on the data. When you first start looking at the data there are 917 observations for each of the two data sets. So to get this data to be a bit more condence, I decided to focus on the locations that were in the 95th percentile for arsenic. There is a table included that includes that locations in the 95th percentile for fluoride and arsenic.
## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.5'
## (as 'lib' is unspecified)
## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.5'
## (as 'lib' is unspecified)
Used data can be found at…
fluoride <- read.csv(url("http://jamessuleiman.com/teaching/datasets/fluoride.csv"),
stringsAsFactors = FALSE)
arsenic <- read.csv(url("http://jamessuleiman.com/teaching/datasets/arsenic.csv"),
stringsAsFactors = FALSE)
## location n_wells_tested percent_wells_above_guideline
## 1 Columbia 42 50
## 2 Eliot 73 49.3
## 3 Gorham 467 50.1
## 4 Manchester 275 58.9
## 5 Monmouth 277 49.5
This first table shows the number of wells tested for arsenic, and how many of those were above the guidlines.
## location n_wells_tested percent_wells_above_guideline
## 1 Columbia 54 1.9
## 2 Eliot 84 0
## 3 Gorham 452 0
## 4 Manchester 276 3.3
## 5 Monmouth 288 3.1
Very similar to the first table, table two takes the same locations and shows the number of wells tested for fluoride, and how many of those tested were above guidelines. Both tables are easy to read, but going forward they will look a bit nicer!
## location n_wells_tested.x percent_wells_above_guideline.x
## 1 Columbia 42 50
## 2 Eliot 73 49.3
## 3 Gorham 467 50.1
## 4 Manchester 275 58.9
## 5 Monmouth 277 49.5
## n_wells_tested.y percent_wells_above_guideline.y
## 1 54 1.9
## 2 84 0
## 3 452 0
## 4 276 3.3
## 5 288 3.1
The merged dataframe above shows the arsenic (x), and fluoride (y) results. You can scroll once to see the results for the fluoride. This merged dataframe is a bit cimplicated to read, so in the table below allows you to easier view the data on the percent of wells above guidelines and the number of wells tested, and the relation between the two.
library(knitr)
kable(merged.df[1:5, ], caption="Number of Wells Tested & Percentage Above Guidelines by Location")
| location | n_wells_tested.x | percent_wells_above_guideline.x | n_wells_tested.y | percent_wells_above_guideline.y |
|---|---|---|---|---|
| Columbia | 42 | 50 | 54 | 1.9 |
| Eliot | 73 | 49.3 | 84 | 0 |
| Gorham | 467 | 50.1 | 452 | 0 |
| Manchester | 275 | 58.9 | 276 | 3.3 |
| Monmouth | 277 | 49.5 | 288 | 3.1 |
colnames <- c("location", "n_wells_tested.x", "percent_wells_above_guideline.x", "n_wells_tested.y", "percent_wells_above_guideline.y")
colnames <- c("Towns", "wells tested for arsenic", "percentage above guidelines","wells tested for fluoride", "percentage above guidelines")
library(knitr)
library(kableExtra)
kable(merged.df) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| location | n_wells_tested.x | percent_wells_above_guideline.x | n_wells_tested.y | percent_wells_above_guideline.y |
|---|---|---|---|---|
| Columbia | 42 | 50 | 54 | 1.9 |
| Eliot | 73 | 49.3 | 84 | 0 |
| Gorham | 467 | 50.1 | 452 | 0 |
| Manchester | 275 | 58.9 | 276 | 3.3 |
| Monmouth | 277 | 49.5 | 288 | 3.1 |
For this first plot I decided to look at the arsenic levels that were in the 95th percentile of readings in locations that tested a significant amount of wells tested in relation to their wells above guidelines or in the 95th percentile. I really was curious to see if there was a connection between the number of wells tested and the percentile. In this case, Eliot, and Columbia had the lowest amount of wells tested but their percent of wells tested were very similar to the other locations.
What I found interesting about this assignment was the amount of data that was created. When just starting out, it was an overwhelming amount of data, but when you look at it its extremely interesting to see which locations weere higher in arsenci or flouride, depending. I currently am living in California, and we aren’t even able to drink out of the tap water, and not many people I want to, I am missing Bangor which had 95 wells tested for arsenic and only 3.2% were above guidelines. I found the data really interesting and would be interested to see how the levels compare to other states depending on climate, rainfall, etc. This assignment was extremely challenging for me, but one that I feel that I learned a lot from, and have learned the value in the software. the biggest issues I had were getting started. I found that on DataCamp if I was stuck I could easily go back to the videos, which I did do a few times for this assignment, and I could get help from DataCamp on the questions and I think that I have been relying on it too much. I have been taking notes and watching the videos all semester but I found when it was time to take what I had learned I drew a blank at first, and I was completely lost. Once I got going, the major thing that tripped me up was creating the two different tables, and ensuring that there wasn’t an overwhelming amount of data on the tables. My final challenge is one that I am still trying to figure out, I tried to change the column names in my table multiple times, and have tried a few different ways, but haven’t had any luck yet. I am going to keep trying! I found this assignment to be stressful and chellenging but one that I really needed in order to fully understand what I have been learrning. I also learned that I am going to start assignment number 2 a lot sooner.