Analysis I tried to take a look in the dataset to see the missing values and t seems like for all n_wells_tested less than 20, there is a missing value in (percent_wells_above_guideline, median and percentile_95).
I will subset the dataset and keep the variable that I will use for this analysis. in this case I select to use the location, n_wells_tested and percent_wells_above guideline and median. I will then filter the data to only have the location were the n_wells_tested is above 20 to eliminate the missing value. After eliminating the missing value in our datasets, I will rename the variables name to make it reflect the specific dataset, I will add respectively Flouride and Arsenic on the varibale names. I will again check the data to make sure that there is not any NA missin values and I will omit any NA values in the dataset. After that I will merge both dataset to have a single dataset to use for this analysis.
I will create double different dataset for median tested and percent well above for the different value, Flouride and Arsenic. I will display only first 5 observations on both datasets.
| location | median_flr | median_ars |
|---|---|---|
| Manchester | 0.30 | 14.0 |
| Monmouth | 0.30 | 10.0 |
| Columbia | 0.31 | 9.8 |
| Eliot | 0.20 | 9.7 |
| Gorham | 0.10 | 10.5 |
| location | percent_wells_above_flr | percent_wells_above_ars |
|---|---|---|
| Manchester | 3.3 | 58.9 |
| Monmouth | 3.1 | 49.5 |
| Columbia | 1.9 | 50.0 |
| Eliot | 0.0 | 49.3 |
| Gorham | 0.0 | 50.1 |
I will create a side by side histogram to see how the various levels of contamination reflect in a graph based on the median. The national water quality standard of 10 ug/L is indicated by the red line for the arsenic. We can clearly see that the data in the graph the majority of the data are right skewed, most of them are clustered on the right side of the histogram.
Histogram and density in the same time
I will rearrange again the dataset to reflect the problem, I will isolate observation we don’t need in this dataset by filtering the observation where we have more than 10 percent above guidelines for bith arsenic and fluoride. I will rearrange it in descendant order by percent_wells_above_flr
We do the same process but we filtered the result by percent_wells-above_ars, we can clearly see that we have a result which is similar.
I have now the result of the dataset sorted by fluouride and arsenic, I will have now to plot the result using ggvis
I will order both of the dataset by deescending order for the percent_well_above and I will locate the city where the number is higher and I will get the coordinate of all city and put it in a map to see if we can conclude somethng based on that.
I will plot here the dataset in the graph first, and we can see that that they are close, except Kennebunk which is more far than other city. We can conclude that the State of Maine can for the purpose of their analyzing put more focus on testing water in the easter region as the city around have a more concentration in Fluouride and Arsenic.
We will get here geocode the city name to get the longitude and latitude of the data then we will create a new data set to merge with the dataset we already have and then plot it in the map.
I will create a new dataset where I will add different geocode for different city in my data and I will merge those 2 new variable longitude and lattitude to my dataset and I will use that dataset to help me plot the city in the map.
I will use the mapview library to display the map in the viewer. When you click on different point in the map, it gives you a popup with different useful information you have in the dataset.
Based on the result of our map, we can clearly assume that the town most affected by arsenic and fluoride are Surry and Otis which have a high concentration on both of those chemicals and other city around Graham Lake are also among city with high concentration. The county of Hancock dominate in the number of town with higher cncentration on arsenic and fluoride. Using this result we can conclude that the downeast region of maine are the part which need more effort on groundwater remediation.