The Beginning:

The Maine Tracking Network offers a plethora of public data on Maine Environmental Public Health. The data sets provided for MBA 676 Assignment 1 were extracted from this information available on this portal.

I started the assignment by downloading the data sets from our class website. I the data into R markdown after I installed and loaded the packages necessary to generate a report.

library(knitr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(pander)

Read in arsenic and fluoride as dafaframes and rename the variables.

arsenic <- read.csv("arsenic.csv", header = TRUE, stringsAsFactors = FALSE)
fluoride <- read.csv("fluoride.csv", header = TRUE, stringsAsFactors = FALSE)

I renamed the variables and used knitr formatting to display the tables for aesthetic purposes.

kable(head(arsenic %>% rename(Location=location, N_WellsTested_As= n_wells_tested, As_PctAboveGuideline = percent_wells_above_guideline, MedianLevelAs=median, As95Percentile=percentile_95,MaxAs=maximum)) %>% filter( N_WellsTested_As>=20))
Location N_WellsTested_As As_PctAboveGuideline MedianLevelAs As95Percentile MaxAs
Manchester 275 58.9 14.0 93.00 200
Gorham 467 50.1 10.5 130.00 460
Columbia 42 50.0 9.8 65.90 200
Monmouth 277 49.5 10.0 110.00 368
Eliot 73 49.3 9.7 41.35 45
Columbia Falls 25 48.0 8.1 53.75 71
kable(head(fluoride %>% rename(Location=location, N_WellsTestedF= n_wells_tested, F_PctAboveGuideline = percent_wells_above_guideline, MedianLevelF=median, F95Percentile=percentile_95,MaxF=maximum)))
Location N_WellsTestedF F_PctAboveGuideline MedianLevelF F95Percentile MaxF
Otis 60 30.0 1.130 3.200 3.6
Dedham 102 22.5 0.940 3.270 7.0
Denmark 46 19.6 0.450 3.150 3.9
Surry 175 18.3 0.800 3.525 6.9
Prospect 57 17.5 0.785 2.500 2.7
Eastbrook 31 16.1 1.290 2.445 3.3
kable(str(arsenic))
## 'data.frame':    917 obs. of  6 variables:
##  $ location                     : chr  "Manchester" "Gorham" "Columbia" "Monmouth" ...
##  $ n_wells_tested               : int  275 467 42 277 73 25 424 65 334 241 ...
##  $ percent_wells_above_guideline: num  58.9 50.1 50 49.5 49.3 48 44.8 44.6 43.4 42.7 ...
##  $ median                       : num  14 10.5 9.8 10 9.7 8.1 8.2 8.6 6 7 ...
##  $ percentile_95                : num  93 130 65.9 110 41.4 ...
##  $ maximum                      : num  200 460 200 368 45 71 240 431 670 930 ...

Although this is redundant, I created new dataframes with the renamed Variables.

arsenic1 <- arsenic %>% rename(Location=location, N_WellsTested_As= n_wells_tested, As_PctExceeds = percent_wells_above_guideline, MedianLevelAs=median, As95Percentile=percentile_95,MaxAs=maximum) %>% filter(N_WellsTested_As>=20) 
fluoride1 <- fluoride %>% rename(Location=location, N_WellsTestedF= n_wells_tested, F_PctExceeds= percent_wells_above_guideline, MedianLevelF=median, F95Percentile=percentile_95,MaxF=maximum) %>% filter(N_WellsTestedF>=20) 

Arsenic data renamed variables

kable(head(arsenic1))
Location N_WellsTested_As As_PctExceeds MedianLevelAs As95Percentile MaxAs
Manchester 275 58.9 14.0 93.00 200
Gorham 467 50.1 10.5 130.00 460
Columbia 42 50.0 9.8 65.90 200
Monmouth 277 49.5 10.0 110.00 368
Eliot 73 49.3 9.7 41.35 45
Columbia Falls 25 48.0 8.1 53.75 71

fluoride data with renamed variables

kable(head(fluoride1))
Location N_WellsTestedF F_PctExceeds MedianLevelF F95Percentile MaxF
Otis 60 30.0 1.130 3.200 3.6
Dedham 102 22.5 0.940 3.270 7.0
Denmark 46 19.6 0.450 3.150 3.9
Surry 175 18.3 0.800 3.525 6.9
Prospect 57 17.5 0.785 2.500 2.7
Eastbrook 31 16.1 1.290 2.445 3.3

For space sake, I only ran the head() function. Although only 6 rows are available for view, with the filter function, I was able to eliminate the NA values by limiting water tests to 20 and above for the towns. I learned this from a classmate’s method when viewing on RPubs.

Inner Join to combine the two dataframes into a new dataframe called AsF. The entire dataset includes well water testing information on arsenic and fluoride levels for over 300 towns in Maine. To verify that the tables joined correctly, I inspected the first few rows and ran a summary on the variables.

AsF <- inner_join(arsenic1,fluoride1)
## Joining, by = "Location"

These tables offer a quick glance of the AsF table join.

kable(summary(AsF))
Location N_WellsTested_As As_PctExceeds MedianLevelAs As95Percentile MaxAs N_WellsTestedF F_PctExceeds MedianLevelF F95Percentile MaxF
Length:341 Min. : 20.00 Min. : 0.00 Min. : 0.25 Min. : 0.50 Min. : 1.00 Min. : 21.00 Min. : 0.000 Min. :0.1000 Min. :0.100 Min. : 0.100
Class :character 1st Qu.: 36.00 1st Qu.: 3.20 1st Qu.: 0.50 1st Qu.: 6.22 1st Qu.: 18.00 1st Qu.: 43.00 1st Qu.: 0.000 1st Qu.:0.1000 1st Qu.:0.522 1st Qu.: 1.200
Mode :character Median : 57.00 Median : 8.30 Median : 1.00 Median : 13.55 Median : 39.00 Median : 69.00 Median : 0.700 Median :0.1000 Median :1.003 Median : 2.200
NA Mean : 86.69 Mean :12.37 Mean : 1.62 Mean : 24.87 Mean : 95.03 Mean : 97.21 Mean : 2.465 Mean :0.1765 Mean :1.153 Mean : 2.546
NA 3rd Qu.:109.00 3rd Qu.:18.30 3rd Qu.: 1.90 3rd Qu.: 28.20 3rd Qu.: 92.00 3rd Qu.:126.00 3rd Qu.: 3.100 3rd Qu.:0.2000 3rd Qu.:1.601 3rd Qu.: 3.400
NA Max. :632.00 Max. :58.90 Max. :14.00 Max. :372.50 Max. :3100.00 Max. :503.00 Max. :30.000 Max. :1.2900 Max. :4.180 Max. :14.000
kable(head(AsF))
Location N_WellsTested_As As_PctExceeds MedianLevelAs As95Percentile MaxAs N_WellsTestedF F_PctExceeds MedianLevelF F95Percentile MaxF
Manchester 275 58.9 14.0 93.00 200 276 3.3 0.30 1.700 3.60
Gorham 467 50.1 10.5 130.00 460 452 0.0 0.10 0.682 2.00
Columbia 42 50.0 9.8 65.90 200 54 1.9 0.31 1.329 4.30
Monmouth 277 49.5 10.0 110.00 368 288 3.1 0.30 1.676 3.40
Eliot 73 49.3 9.7 41.35 45 84 0.0 0.20 0.658 1.54
Columbia Falls 25 48.0 8.1 53.75 71 38 0.0 0.21 0.641 0.90
str(AsF)
## 'data.frame':    341 obs. of  11 variables:
##  $ Location        : chr  "Manchester" "Gorham" "Columbia" "Monmouth" ...
##  $ N_WellsTested_As: int  275 467 42 277 73 25 424 65 334 241 ...
##  $ As_PctExceeds   : num  58.9 50.1 50 49.5 49.3 48 44.8 44.6 43.4 42.7 ...
##  $ MedianLevelAs   : num  14 10.5 9.8 10 9.7 8.1 8.2 8.6 6 7 ...
##  $ As95Percentile  : num  93 130 65.9 110 41.4 ...
##  $ MaxAs           : num  200 460 200 368 45 71 240 431 670 930 ...
##  $ N_WellsTestedF  : int  276 452 54 288 84 38 453 59 383 209 ...
##  $ F_PctExceeds    : num  3.3 0 1.9 3.1 0 0 3.1 0 1 9.6 ...
##  $ MedianLevelF    : num  0.3 0.1 0.31 0.3 0.2 0.21 0.31 0.1 0.1 0.43 ...
##  $ F95Percentile   : num  1.7 0.682 1.329 1.676 0.658 ...
##  $ MaxF            : num  3.6 2 4.3 3.4 1.54 0.9 3.7 1.63 3.2 4.5 ...

I wanted to compare the percentage of wells tested that have excess levels of arsenic and fluoride in the water samples. I did this by using the subset function.

Subsets1<- subset(AsF, select = c("As_PctExceeds", "F_PctExceeds"))
summary(Subsets1)
##  As_PctExceeds    F_PctExceeds   
##  Min.   : 0.00   Min.   : 0.000  
##  1st Qu.: 3.20   1st Qu.: 0.000  
##  Median : 8.30   Median : 0.700  
##  Mean   :12.37   Mean   : 2.465  
##  3rd Qu.:18.30   3rd Qu.: 3.100  
##  Max.   :58.90   Max.   :30.000

I also wanted to look for any correlations in the data findings.

cor(Subsets1)
##               As_PctExceeds F_PctExceeds
## As_PctExceeds     1.0000000    0.1215215
## F_PctExceeds      0.1215215    1.0000000

I obviously coded it wrong or there is no correlation.

I also plotted the subset.

plot(Subsets1)

To calculate the number of wells in ME that have been tested for arsenic…

kable(sum(AsF$N_WellsTested_As))
29560

To calculate the number of wells in ME that have been tested for fluoride…

kable(sum(AsF$N_WellsTestedF))
33148

Between 1999-2013, there were more well water samples tested for fluoride.

I also had a little fun looking for correlations with the cor().Note the correlation between As testing and excess As findings is much stronger than that of the fluoride relationship. This is no doubt due to the fact that there is very little naturally occurring fluoride in groundwater.

pander(cor(AsF$N_WellsTested_As, AsF$As_PctExceeds))

0.3575

pander(cor(AsF$N_WellsTestedF, AsF$F_PctExceeds))

0.02335

Population Table

I wanted to bring in another data set that I could join with the water sample data. I found town population data and converted it into to csv from this webpage

Pop <- read.csv("NewPopTowns.csv", header = TRUE, stringsAsFactors = FALSE)

Population Table Variable Names

kable(names(Pop))
City
Population

Rename Pop so there’s a common variable name for a new join

PopTable <- Pop %>% rename(Location = City)
kable(str(PopTable))
## 'data.frame':    452 obs. of  2 variables:
##  $ Location  : chr  "Portland" "Lewiston" "Bangor" "South Portland" ...
##  $ Population: int  66881 36202 32391 25556 22871 21282 20893 20495 19691 19078 ...

Inner Join

PopAsF <- inner_join(AsF, PopTable) %>%  select(Location,Population,N_WellsTested_As,As_PctExceeds,N_WellsTestedF, F_PctExceeds,  MedianLevelAs,MedianLevelF)
## Joining, by = "Location"

The joined table showing the population of the towns in Maine and the arsenic and fluoride information from their well water.

kable(head(PopAsF))
Location Population N_WellsTested_As As_PctExceeds N_WellsTestedF F_PctExceeds MedianLevelAs MedianLevelF
Manchester 2528 275 58.9 276 3.3 14.0 0.30
Gorham 17181 467 50.1 452 0.0 10.5 0.10
Columbia 466 42 50.0 54 1.9 9.8 0.31
Monmouth 4025 277 49.5 288 3.1 10.0 0.30
Eliot 6335 73 49.3 84 0.0 9.7 0.20
Winthrop 5921 424 44.8 453 3.1 8.2 0.31

I previously ran into trouble because some variables were classified as “chr” when they should have been “num” or “int” so I ran the str() function to verify they were classed correctly.

kable(str(PopAsF))
## 'data.frame':    299 obs. of  8 variables:
##  $ Location        : chr  "Manchester" "Gorham" "Columbia" "Monmouth" ...
##  $ Population      : int  2528 17181 466 4025 6335 5921 2315 8149 2664 4454 ...
##  $ N_WellsTested_As: int  275 467 42 277 73 424 65 334 241 133 ...
##  $ As_PctExceeds   : num  58.9 50.1 50 49.5 49.3 44.8 44.6 43.4 42.7 41.4 ...
##  $ N_WellsTestedF  : int  276 452 54 288 84 453 59 383 209 143 ...
##  $ F_PctExceeds    : num  3.3 0 1.9 3.1 0 3.1 0 1 9.6 3.5 ...
##  $ MedianLevelAs   : num  14 10.5 9.8 10 9.7 8.2 8.6 6 7 5.25 ...
##  $ MedianLevelF    : num  0.3 0.1 0.31 0.3 0.2 0.31 0.1 0.1 0.43 0.2 ...

The percentage of wells that contain water with arsenic exceeding guidelines and their median arsenic levels.

This table also makes note if the median level of arsenic in each town exceeds the guidelines. To do this, I used the mutate function. I limited the table to 20 towns to save space.

The majority of ME towns did not have median levels of As that were in excess of the guidelines from the sampling between 1999-2013.

kable(PopAsF %>% mutate(AsAboveG.L.= MedianLevelAs >= 10) %>% select(Location, Population, As_PctExceeds, MedianLevelAs,AsAboveG.L.) %>% arrange(desc(MedianLevelAs)) %>% top_n(20,MedianLevelAs),digits=1)
Location Population As_PctExceeds MedianLevelAs AsAboveG.L.
Manchester 2528 58.9 14.0 TRUE
Gorham 17181 50.1 10.5 TRUE
Monmouth 4025 49.5 10.0 TRUE
Columbia 466 50.0 9.8 FALSE
Eliot 6335 49.3 9.7 FALSE
Hallowell 2315 44.6 8.6 FALSE
Winthrop 5921 44.8 8.2 FALSE
Mariaville 524 40.0 7.2 FALSE
Readfield 2520 39.8 7.2 FALSE
Blue Hill 2664 42.7 7.0 FALSE
Millinocket 4359 31.0 6.8 FALSE
Buxton 8149 43.4 6.0 FALSE
Surry 1459 40.3 6.0 FALSE
Rome 991 26.6 5.5 FALSE
Orland 2205 40.7 5.4 FALSE
Hollis 4454 41.4 5.2 FALSE
Belgrade 3115 31.2 5.2 FALSE
Scarborough 19691 35.2 5.2 FALSE
Vassalboro 4247 26.3 5.2 FALSE
Oakland 6138 33.0 5.0 FALSE

Therapeutic fluoride

**According to the American Academy of Pediatric Dentists, “since fluoride from water supplies is now one of several sources of fluoride, the Department of Health and Human Services recently has proposed to not have a fluoride range, but rather to limit the recommendation to the lower limit of 0.7 ppm F.”

Using these parameters above, Maine well water does not offer dental cavity protection for the most part; this is illustrated in the table below. Again, I’ve limited the data to 20 rows to save space. The vast majority of samples show that excess flouoride is not a problem, in fact, there in insufficient fluoride to offer dental cavity prophylaxis. In the column SubTherapeuticF, FALSE indicates that the median level in meets the AAPD recommendations.

I have also run the str() function to verify the data is classified correctly.

TherapeuticF <- PopAsF %>% mutate(SubTherapeuticF = MedianLevelF<=0.7) %>% select(Location, Population, N_WellsTestedF, SubTherapeuticF, MedianLevelF) %>% arrange(desc(MedianLevelF)) %>% top_n(20, MedianLevelF) 
str(TherapeuticF)
## 'data.frame':    20 obs. of  5 variables:
##  $ Location       : chr  "Eastbrook" "Otis" "Marshfield" "Dedham" ...
##  $ Population     : int  421 668 508 1683 1459 3395 991 1608 7857 4359 ...
##  $ N_WellsTestedF : int  31 60 31 102 175 52 82 56 503 29 ...
##  $ SubTherapeuticF: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ MedianLevelF   : num  1.29 1.13 1 0.94 0.8 0.76 0.6 0.6 0.5 0.495 ...
kable(TherapeuticF)
Location Population N_WellsTestedF SubTherapeuticF MedianLevelF
Eastbrook 421 31 FALSE 1.290
Otis 668 60 FALSE 1.130
Marshfield 508 31 FALSE 1.000
Dedham 1683 102 FALSE 0.940
Surry 1459 175 FALSE 0.800
Fryeburg 3395 52 FALSE 0.760
Rome 991 82 TRUE 0.600
Stockton Springs 1608 56 TRUE 0.600
Ellsworth 7857 503 TRUE 0.500
Millinocket 4359 29 TRUE 0.495
Stoneham 236 25 TRUE 0.490
Smithfield 999 79 TRUE 0.450
Denmark 1143 46 TRUE 0.450
Casco 3880 119 TRUE 0.445
Chesterville 1330 81 TRUE 0.440
Otisfield 1762 62 TRUE 0.440
Lovell 1133 43 TRUE 0.435
Blue Hill 2664 209 TRUE 0.430
Sedgwick 1182 143 TRUE 0.425
Porter 1490 41 TRUE 0.425

Rather than sort through the whole table to determine if any towns’ wells have median fluoride levels in excess of 2mg/ml, the maximum exposure guideline, R can do a quick calculation. The ‘NA’ below means no values were returned.

I’ve loaded the library(pander) function for formatting. Like kable(), it offers a cleaner presentation. For some reason, kable() wouldn’t work for this code.

ExcessF <- sum(PopAsF$MedianLevelF>=2)
pander(sum(ExcessF>=2))

0

I thought I would look to see if there is a correlation between the towns that have the most water testing per capita and findings of excessive levels of either compound. To do this, I calculated water tests per capita, using the higher of the two N_WellsTested. I did this in two steps, first calculating the higher test figure to use as a denominator for ‘tests per capita’.

MaxWells <- PopAsF %>% mutate(MaxWell = pmax(N_WellsTested_As,N_WellsTestedF)) %>% select(Location, Population, MaxWell, As_PctExceeds, F_PctExceeds)
kable(head(MaxWells))
Location Population MaxWell As_PctExceeds F_PctExceeds
Manchester 2528 276 58.9 3.3
Gorham 17181 467 50.1 0.0
Columbia 466 54 50.0 1.9
Monmouth 4025 288 49.5 3.1
Eliot 6335 84 49.3 0.0
Winthrop 5921 453 44.8 3.1

And just to make sure the new MaxWells variable is classified as an ‘int’…

kable(str(MaxWells))
## 'data.frame':    299 obs. of  5 variables:
##  $ Location     : chr  "Manchester" "Gorham" "Columbia" "Monmouth" ...
##  $ Population   : int  2528 17181 466 4025 6335 5921 2315 8149 2664 4454 ...
##  $ MaxWell      : int  276 467 54 288 84 453 65 383 241 143 ...
##  $ As_PctExceeds: num  58.9 50.1 50 49.5 49.3 44.8 44.6 43.4 42.7 41.4 ...
##  $ F_PctExceeds : num  3.3 0 1.9 3.1 0 3.1 0 1 9.6 3.5 ...

Next, I create a new dataframe with the MaxWells variable to get the tests per capita. Again, I use the mutate function. I know I could have done this all in one code instead of two, but I kept getting error codes. I do it successfully later.

MaxWells1 <- MaxWells %>% mutate(TestsPerPop = Population/MaxWell) %>% select(Location, TestsPerPop,  As_PctExceeds, F_PctExceeds) %>%  arrange(desc(As_PctExceeds))
kable(head(MaxWells1, digits = 2))
Location TestsPerPop As_PctExceeds F_PctExceeds
Manchester 9.15942 58.9 3.3
Gorham 36.79015 50.1 0.0
Columbia 8.62963 50.0 1.9
Monmouth 13.97569 49.5 3.1
Eliot 75.41667 49.3 0.0
Winthrop 13.07064 44.8 3.1
str(MaxWells1)
## 'data.frame':    299 obs. of  4 variables:
##  $ Location     : chr  "Manchester" "Gorham" "Columbia" "Monmouth" ...
##  $ TestsPerPop  : num  9.16 36.79 8.63 13.98 75.42 ...
##  $ As_PctExceeds: num  58.9 50.1 50 49.5 49.3 44.8 44.6 43.4 42.7 41.4 ...
##  $ F_PctExceeds : num  3.3 0 1.9 3.1 0 3.1 0 1 9.6 3.5 ...

Plot looking for a correlation between tests per capita and the percentage of wells that have excessive arsenic

library(ggvis) 
MaxWells1 %>% ggvis(~TestsPerPop, ~As_PctExceeds ) %>% layer_points(size := 30, shape := "circle", stroke := "blue", fill := NA)

Is there a correlation between tests performed per capita and elevated arsenic levels?

MaxWells1.lm = lm(As_PctExceeds ~TestsPerPop, data = MaxWells1)
pander(summary(MaxWells1.lm)$r.squared)

8.727e-07

Out of desperation, I reversed the X, Y variables, for naught.

MaxWells1.lm = lm(TestsPerPop ~ As_PctExceeds, data = MaxWells1)
pander(summary(MaxWells1.lm)$r.squared)

8.727e-07

pander(cor(MaxWells1$TestsPerPop, MaxWells1$As_PctExceeds))

-0.0009342

Initially, I thought the problem here was that that I couldnt’t exclude the NA values or that I need to set values for the x-axis. But then I was able to filter the NA values out. The problem could also be that the TestsPerPop is a calculated field, and I’ve yet to learn to deal with them. I’m also not sure if I was able to get the regression summary to work.

One final set I decided to generate, after the deadline, was inspired by a classmate’s work. She created a clean water table. I decided to see where the best well water is; my definition of best is low arsenic and a therapeutic level of fluoride.

#Create a variable for Maine's Population
MEPop <- 1329000
kable(MEPop)
1329000
#filter safe arsenic water
Good_WaterAs <- PopAsF %>% filter(MedianLevelAs<10) %>% select(Location, Population, MedianLevelAs)
Good_WaterF <- PopAsF %>% filter(MedianLevelF<=1.2) %>% select(Location, Population, MedianLevelF)
#get rid of too high
#get rid of too low fluoride
BetterF <- Good_WaterF %>% filter(MedianLevelF>=.7) %>% select (Location, Population, MedianLevelF)
#This is theraputic
#Join tables to get optimal levels of both
BestH2O <- inner_join(BetterF,Good_WaterAs) %>% mutate(PctOfState=Population/MEPop*100) %>% select(Location, Population, PctOfState, MedianLevelF, MedianLevelAs)
## Joining, by = c("Location", "Population")

Based on the data given to us, these 5 towns offer a good balanced well water.

#The code above is how to change font color in R Markdown. 
kable(BestH2O)
Location Population PctOfState MedianLevelF MedianLevelAs
Surry 1459 0.1097818 0.80 6.0
Otis 668 0.0502634 1.13 4.8
Dedham 1683 0.1266366 0.94 1.0
Marshfield 508 0.0382242 1.00 1.0
Fryeburg 3395 0.2554552 0.76 0.5

Onward

I decided to try another route and explore if there were more dental practices in the counties where there is a higher population of patients who don’t get adequate fluoridationin their water. I am ignoring whether patients use fluoride supplements and any dental health trends that might be present. My route is circuitous because this is a learning experience.

What percent of the homes rely on private wells for their water by county?

To help explore if the dentists where the fluoride is not, I use household size from the latest Census data.. Wells use by percentage is given for households. The Census gives mean household size by county.

WellsInCounties <- read.csv("HomesWithWellsbyPercent.csv", header=TRUE, stringsAsFactors=FALSE)
head(WellsInCounties)
##         County PctOfHomeWells Population CapitaHousehold
## 1 Androscoggin           0.44     107233            2.35
## 2    Aroostook           0.59      68628            2.24
## 3   Cumberland           0.36     289977            2.35
## 4     Franklin           0.61      29991            2.46
## 5      Hancock           0.76      54659            2.18
## 6     Kennebec           0.56     119980            2.30
str(WellsInCounties)
## 'data.frame':    17 obs. of  4 variables:
##  $ County         : chr  "Androscoggin" "Aroostook" "Cumberland" "Franklin" ...
##  $ PctOfHomeWells : num  0.44 0.59 0.36 0.61 0.76 0.56 0.56 0.82 0.65 0.54 ...
##  $ Population     : int  107233 68628 289977 29991 54659 119980 39855 33969 57202 152692 ...
##  $ CapitaHousehold: num  2.35 2.24 2.35 2.46 2.18 2.3 2.26 2.26 2.53 2.36 ...

fluoride Levels by County. The State of Maine Health and Environmental Testing Laboratory provided these data. 1999-2013,

This information is much the same as that reviewed earlier, however now it is by county, not town. This is because I could only find data on practicing Dentists by county.

Countyfluoride <- read.csv("fluorideByCounty.csv", header = TRUE, stringsAsFactors = FALSE) 
kable(str(Countyfluoride))
## 'data.frame':    37 obs. of  6 variables:
##  $ County        : chr  "Hancock" "Oxford" "Franklin" "York" ...
##  $ N_WellsTestedF: int  3744 1611 1087 2642 1291 5890 1843 1569 4297 1745 ...
##  $ F_PctExceeds  : num  0.07 0.04 0.03 0.03 0.03 0.03 0.02 0.02 0.02 0.02 ...
##  $ MedianLevelF  : num  0.28 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 ...
##  $ F95Percentile : num  0.02 0.02 0.02 0.02 0.01 0.02 0.01 0.01 0.01 0.01 ...
##  $ MaxF          : num  9.1 9.9 5.6 9.6 14 6.8 5.3 5.4 9.1 4.1 ...
kable(summary(Countyfluoride))
County N_WellsTestedF F_PctExceeds MedianLevelF F95Percentile MaxF
Length:37 Min. : 581 Min. :0.0000 Min. :0.1000 Min. :0.0000 Min. : 1.600
Class :character 1st Qu.:1206 1st Qu.:0.0100 1st Qu.:0.1000 1st Qu.:0.0100 1st Qu.: 4.000
Mode :character Median :1678 Median :0.0200 Median :0.1000 Median :0.0100 Median : 5.500
NA Mean :2190 Mean :0.0225 Mean :0.1175 Mean :0.0125 Mean : 6.594
NA 3rd Qu.:2832 3rd Qu.:0.0300 3rd Qu.:0.1000 3rd Qu.:0.0200 3rd Qu.: 9.225
NA Max. :5890 Max. :0.0700 Max. :0.2800 Max. :0.0200 Max. :14.000
NA NA’s :21 NA’s :21 NA’s :21 NA’s :21 NA’s :21

The total number of well water tests done to evaluate fluoride levels between 1999-2013 was…

library(pander)
pander(sum(Countyfluoride$N_WellsTestedF, na.rm=TRUE))

35044

I used the pander() to get nicer formatting for the output.

Practicing Dentists Actively Practicing Dentists to 10,000 Population in Maine by County, November, 2011

CountyDentists <- read.csv("PracticingDentists.csv", header = TRUE, stringsAsFactors = FALSE)
names(CountyDentists)
## [1] "County"        "DentistPer10K"
str(CountyDentists)
## 'data.frame':    16 obs. of  2 variables:
##  $ County       : chr  "Androscoggin" "Aroostook" "Cumberland" "Franklin" ...
##  $ DentistPer10K: num  4.18 3.34 7.35 3.9 4.59 6.47 6.29 3.48 2.25 5.33 ...

I do an inner join of the table that shows what percent of homes have wells (by county) and the fluoride data by county.

fluorideCombo <- inner_join(WellsInCounties, Countyfluoride)
## Joining, by = "County"
str(fluorideCombo)
## 'data.frame':    16 obs. of  9 variables:
##  $ County         : chr  "Androscoggin" "Aroostook" "Cumberland" "Franklin" ...
##  $ PctOfHomeWells : num  0.44 0.59 0.36 0.61 0.76 0.56 0.56 0.82 0.65 0.54 ...
##  $ Population     : int  107233 68628 289977 29991 54659 119980 39855 33969 57202 152692 ...
##  $ CapitaHousehold: num  2.35 2.24 2.35 2.46 2.18 2.3 2.26 2.26 2.53 2.36 ...
##  $ N_WellsTestedF : int  1569 1048 4297 1087 3744 5890 940 2106 1611 3404 ...
##  $ F_PctExceeds   : num  0.02 0.01 0.02 0.03 0.07 0.03 0.01 0.01 0.04 0.01 ...
##  $ MedianLevelF   : num  0.1 0.1 0.1 0.1 0.28 0.2 0.1 0.1 0.1 0.1 ...
##  $ F95Percentile  : num  0.01 0.01 0.01 0.02 0.02 0.02 0.01 0.01 0.02 0.01 ...
##  $ MaxF           : num  5.4 10 9.1 5.6 9.1 6.8 3.2 4.8 9.9 3.7 ...

Join Dentists by county data to the new county fluoride dataset.

DentistAndfluoride <- inner_join(fluorideCombo, CountyDentists)
## Joining, by = "County"
head(DentistAndfluoride)
##         County PctOfHomeWells Population CapitaHousehold N_WellsTestedF
## 1 Androscoggin           0.44     107233            2.35           1569
## 2    Aroostook           0.59      68628            2.24           1048
## 3   Cumberland           0.36     289977            2.35           4297
## 4     Franklin           0.61      29991            2.46           1087
## 5      Hancock           0.76      54659            2.18           3744
## 6     Kennebec           0.56     119980            2.30           5890
##   F_PctExceeds MedianLevelF F95Percentile MaxF DentistPer10K
## 1         0.02         0.10          0.01  5.4          4.18
## 2         0.01         0.10          0.01 10.0          3.34
## 3         0.02         0.10          0.01  9.1          7.35
## 4         0.03         0.10          0.02  5.6          3.90
## 5         0.07         0.28          0.02  9.1          4.59
## 6         0.03         0.20          0.02  6.8          6.47
str(DentistAndfluoride)
## 'data.frame':    16 obs. of  10 variables:
##  $ County         : chr  "Androscoggin" "Aroostook" "Cumberland" "Franklin" ...
##  $ PctOfHomeWells : num  0.44 0.59 0.36 0.61 0.76 0.56 0.56 0.82 0.65 0.54 ...
##  $ Population     : int  107233 68628 289977 29991 54659 119980 39855 33969 57202 152692 ...
##  $ CapitaHousehold: num  2.35 2.24 2.35 2.46 2.18 2.3 2.26 2.26 2.53 2.36 ...
##  $ N_WellsTestedF : int  1569 1048 4297 1087 3744 5890 940 2106 1611 3404 ...
##  $ F_PctExceeds   : num  0.02 0.01 0.02 0.03 0.07 0.03 0.01 0.01 0.04 0.01 ...
##  $ MedianLevelF   : num  0.1 0.1 0.1 0.1 0.28 0.2 0.1 0.1 0.1 0.1 ...
##  $ F95Percentile  : num  0.01 0.01 0.01 0.02 0.02 0.02 0.01 0.01 0.02 0.01 ...
##  $ MaxF           : num  5.4 10 9.1 5.6 9.1 6.8 3.2 4.8 9.9 3.7 ...
##  $ DentistPer10K  : num  4.18 3.34 7.35 3.9 4.59 6.47 6.29 3.48 2.25 5.33 ...
kable(DentistAndfluoride)
County PctOfHomeWells Population CapitaHousehold N_WellsTestedF F_PctExceeds MedianLevelF F95Percentile MaxF DentistPer10K
Androscoggin 0.44 107233 2.35 1569 0.02 0.10 0.01 5.4 4.18
Aroostook 0.59 68628 2.24 1048 0.01 0.10 0.01 10.0 3.34
Cumberland 0.36 289977 2.35 4297 0.02 0.10 0.01 9.1 7.35
Franklin 0.61 29991 2.46 1087 0.03 0.10 0.02 5.6 3.90
Hancock 0.76 54659 2.18 3744 0.07 0.28 0.02 9.1 4.59
Kennebec 0.56 119980 2.30 5890 0.03 0.20 0.02 6.8 6.47
Knox 0.56 39855 2.26 940 0.01 0.10 0.01 3.2 6.29
Lincoln 0.82 33969 2.26 2106 0.01 0.10 0.01 4.8 3.48
Oxford 0.65 57202 2.53 1611 0.04 0.10 0.02 9.9 2.25
Penobscot 0.54 152692 2.36 3404 0.01 0.10 0.01 3.7 5.33
Piscataquis 0.73 16931 2.22 581 0.00 0.10 0.00 1.6 3.42
Sagadahoc 0.58 35149 2.31 1246 0.01 0.10 0.01 3.3 6.80
Somerset 0.72 51113 2.37 1291 0.03 0.10 0.01 14.0 2.49
Waldo 0.78 39155 2.31 1745 0.02 0.10 0.01 4.1 2.58
Washington 0.75 31625 2.20 1843 0.02 0.10 0.01 5.3 3.96
York 0.50 201169 2.40 2642 0.03 0.10 0.02 9.6 3.96

To calculate the number of people with subTherapeutic fluoride:

kable(DentistAndfluoride %>% mutate(WellDrinkers=Population/CapitaHousehold*PctOfHomeWells) %>% mutate(SubTherapeutic=MedianLevelF<=0.7) %>% select(County, WellDrinkers,SubTherapeutic))
County WellDrinkers SubTherapeutic
Androscoggin 20077.668 TRUE
Aroostook 18076.125 TRUE
Cumberland 44422.009 TRUE
Franklin 7436.793 TRUE
Hancock 19055.431 TRUE
Kennebec 29212.522 TRUE
Knox 9875.575 TRUE
Lincoln 12325.035 TRUE
Oxford 14696.166 TRUE
Penobscot 34938.000 TRUE
Piscataquis 5567.401 TRUE
Sagadahoc 8825.290 TRUE
Somerset 15528.000 TRUE
Waldo 13221.169 TRUE
Washington 10781.250 TRUE
York 41910.208 TRUE

Dentists per 10,000 population and number of people who may not get enough fluoride. Is there a correlation? Do the Dentists go to where there might be needy teeth?

Chart <- DentistAndfluoride %>% mutate(WellDrinkers=Population/CapitaHousehold*PctOfHomeWells) %>% mutate(Wellper10K=WellDrinkers/10000) %>% mutate(Ratio=Wellper10K/DentistPer10K) %>%  select(County, WellDrinkers, Wellper10K,DentistPer10K,Ratio)
kable(Chart, digits = 2)
County WellDrinkers Wellper10K DentistPer10K Ratio
Androscoggin 20077.67 2.01 4.18 0.48
Aroostook 18076.12 1.81 3.34 0.54
Cumberland 44422.01 4.44 7.35 0.60
Franklin 7436.79 0.74 3.90 0.19
Hancock 19055.43 1.91 4.59 0.42
Kennebec 29212.52 2.92 6.47 0.45
Knox 9875.58 0.99 6.29 0.16
Lincoln 12325.04 1.23 3.48 0.35
Oxford 14696.17 1.47 2.25 0.65
Penobscot 34938.00 3.49 5.33 0.66
Piscataquis 5567.40 0.56 3.42 0.16
Sagadahoc 8825.29 0.88 6.80 0.13
Somerset 15528.00 1.55 2.49 0.62
Waldo 13221.17 1.32 2.58 0.51
Washington 10781.25 1.08 3.96 0.27
York 41910.21 4.19 3.96 1.06

Plot There does not appear to be a pattern between potential fluoride deprivation and Dental practices. The visualization showns the data points for the number of wellwater drinkers who are likely not getting adequate fluoride from it and the number of County dentists, More non-flouridated drinkers does not bring more dentists. This linear model illustrates a weak correlation.

library(ggvis)
Chart %>% ggvis(~WellDrinkers,~DentistPer10K) %>% layer_points(size := 50, shape := "circle", stroke := "blue") %>% layer_model_predictions(model = "lm", se = TRUE)
## Guessing formula = DentistPer10K ~ WellDrinkers

Linear regression and a low R-Sq. will confirm this. Below, I try three different codes to calculate the R.Sq.

Chart.lm = lm(WellDrinkers ~ DentistPer10K, data=Chart)
pander(summary(Chart.lm)$r.squared)

0.1561

pander(cor(Chart$WellDrinkers, Chart$DentistPer10K))

0.3951

Patients <- Chart$WellDrinkers
Dentist <- Chart$DentistPer10K
pander(cor(Patients, Dentist))

0.3951

Two for Three.

Certainly, there is more that could be brought in for more sophisticated analysis. I just have to learn the code.

Lastly, I want to save the joined table I made.

write.csv(DentistAndfluoride,"DentistAndfluorideAndPop.csv",row.names = TRUE)