This analysis explores inorganic (TDS) groundwater contamination in the Tulare Basin, California. Data was generously provided by CV-SALTS and CDM Smith, and aggregated from:
Data ranges from 1902 - 2014. Later years (1980 - 2014) are much more data rich than early years (1902 - 1940).
All data cleaning, analysis, and mapping was carried out within R version 1.0.143.

Set working directory, load requried packages, and load data

#setwd("E:/rpauloo/Documents/GitHub/Contaminant-Mapping")
setwd("/Users/richpauloo/Documents/GitHub/Contaminant Mapping/Contaminant-Mapping")

required.packages <- c("sp", "XML", "rgdal", "rgeos", "raster", "ggplot2", 
                       "dismo", "leaflet", "RColorBrewer", "classInt")
suppressMessages(lapply(required.packages, require, character.only = TRUE))

dat = read.csv(file = "TDS_Features_Tulare_Clip_TableToExcel.csv", stringsAsFactors = FALSE)

dat = dat[, c("Well_ID", "IAZ", "Database", "Samp_Date", 
              "Year", "Time_perio", "Result", "Latitude",
              "Longitude", "Depth_ft", "Well_Type", "DepthClass")]

Clean Data

library(plyr)
str(dat)

# create logical arrays to define sampling time intervals
time1 = ifelse(dat$Year <= 1940, 1, NA) 
time2 = ifelse(dat$Year > 1940 & dat$Year <= 1980, 1, NA) 
time3 = ifelse(dat$Year > 1980, 1, NA) 

record.length = lapply(list(time1, time2, time3), sum, na.rm = TRUE)
# 160 records from 1902-1940
# 19,894 records from 1940-1980
# 34,117 records from 1980-2014

# substitute logical values from time1, time2, time3 with character vectors for plyr
time1 = as.numeric(time1)
time2 = as.numeric(sub(1, 2, time2))
time3 = as.numeric(sub(1, 3, time3))
tframe = cbind(time1, time2, time3)
tframe = rowSums(tframe[,1:3], na.rm = TRUE)

# ensure all values are in tframe
unique(tframe)
table(tframe)

# add tframe to dat
dat$tframe = tframe

# filter for Shallow and Deep depth class
unique(dat$DepthClass) # need to get rid of unknown values
fildat = subset(dat, DepthClass == "Deep" | DepthClass == "Shallow") # subset deep/shallow
unique(fildat$DepthClass) # shows that Deep and Shallow are the only depth classes now
records.trimmed = nrow(dat) - nrow(fildat) # trimmed 23,209 observations without a depth class

# ddply by well_ID, year, depthclass, and tframe and take Result means 
columns = c("Well_ID", "DepthClass", "tframe", "Latitude", "Longitude", "Database")
plydat = ddply(fildat, columns, summarize, Result = round(mean(Result), digits=2))

Barplot of Data Sources Used

ggplot(dat, aes(Database)) + 
  geom_histogram(stat="count") +
  theme_bw() + 
  ggtitle("Database Count Considered for Analysis")

ggplot(plydat, aes(Database)) + 
  geom_histogram(stat="count") +
  theme_bw() + 
  ggtitle("Database Count Used in Analysis after Subsetting") # lost "Unknown" depth class and averaged many data for the same well

Histograms of TDS measurements

Data is heavily skewed to the right. This skew blurs interesting data in the freshwater 0 - 1,000 mg/L range. Bin sizes in the final map should reflect this distribution of data and not bias the graph towards the outliers. A lognormal plot is insufficient because TDS is typically not reported in log scale.
ggplot(plydat, aes(Result)) + 
  geom_histogram(bins = 50) + 
  xlab("TDS (mg/L)") +
  ggtitle("Histogram of TDS observations in the Tulare Basin (1902-2014)") +
  theme_bw()

plydat$logResult = log10(plydat$Result)
ggplot(plydat, aes(logResult)) + 
  geom_histogram(bins = 50) + 
  xlab("log TDS (mg/L)") +
  ggtitle("Log Scale Histogram of TDS observations in the Tulare Basin (1902-2014)") +
  theme_bw()

Create Spatial Points and Color Palette

well_coords = cbind(plydat$Longitude, plydat$Latitude)
pts = SpatialPoints(well_coords)
# deinfe color palette to be used. blue values correspond to fresh water, red to salty water
co = c("blue","steelblue2","steelblue1","seashell1","orangered2","red")

Find Map Center for setView

latCenter = mean(subset(plydat, plydat$DepthClass == "Deep" & plydat$tframe == 3)$Latitude) - .23 # .23 is not arbitrary. it centers the data
lonCenter = mean(subset(plydat, plydat$DepthClass == "Deep" & plydat$tframe == 3)$Longitude) 

Define Map Function

# filter plydat for only observations in time frame t and depth class d
map.TDS = function(t, d){

# create time and depth class (t.d) subset for the t and d arguments  
t.d = subset(plydat, DepthClass == d & 
                    tframe == t) # remove lat and long from df, and filter for desired results

# create logical vector for all t and d indexes to extract coordinates of these observations
t.d.pts = ifelse(plydat$DepthClass == d & 
                  plydat$tframe == t, TRUE, FALSE)

# use logical vector to subset well_coords, and create spatial points data frame for time frame t and depth d
ptsdf = SpatialPointsDataFrame(pts[t.d.pts , ], data = t.d)

# define palette based on time frame t and depth class d observations
# set bins with a vector of breaks, so they are fine from 0 - 1,000 and coarse from 1,000 - 50,000
pal = colorBin(palette = co,
         domain = t.d$Result, bins = c(0,200,400,600,800,1000,
                                        5000,10000,50000))

# overwite t argument inside the function for the legend to display time frame
if(t == 3){t = "(1980-2014)"}
if(t == 2){t = "(1940-1980)"}
if(t == 1){t = "(1902-1940)"}

# generate map assocaited with time frame t and depth d
map = leaflet(data = ptsdf) 
map = addTiles(map)
map %>%
  addCircleMarkers(map,
    lng = well_coords[t.d.pts,1], # longitude
    lat = well_coords[t.d.pts,2], # latitude
    radius = 4, # fixed radius size
    color = ~pal(Result),
    stroke = FALSE, 
    fillOpacity = 0.8,
    popup = paste(ptsdf$Result, " mg/L TDS", "<br>",
                  "Database: ", ptsdf$Database, "<br>",
                  "Well ID: ", ptsdf$Well_ID, "<br>",
                  "Latitude: ", ptsdf$Latitude, "<br>",
                  "Longitude: ", ptsdf$Longitude)
  ) %>%
  
  addLegend("topright", pal = pal, # use custom palette
            values = ~Result,
            title = paste(d, t, sep = " "), # legend displays time frame and depth class as title
            labFormat = labelFormat(suffix = " mg/L"),
            opacity = 1
  ) %>%
  
  addProviderTiles(providers$Esri.WorldTerrain # override default map with ESRI world terrain map

  ) %>%
  
  setView(lng = lonCenter, lat = latCenter, zoom = 8) # iteratively changed zoom until it centered on the data

}

Results


(1902 - 1940)

Shallow Aquifer

Blue points correspond to fresh water (< 1,000 mg/L TDS), and red points correspond to salty water (> 1,000 mg/L TDS). Click on points to view more information.

Deep Aquifer

Blue points correspond to fresh water (< 1,000 mg/L TDS), and red points correspond to salty water (> 1,000 mg/L TDS). Click on points to view more information.


(1940 - 1980)

Shallow Aquifer

Blue points correspond to fresh water (< 1,000 mg/L TDS), and red points correspond to salty water (> 1,000 mg/L TDS). Click on points to view more information.

Deep Aquifer

Blue points correspond to fresh water (< 1,000 mg/L TDS), and red points correspond to salty water (> 1,000 mg/L TDS). Click on points to view more information.


(1980 - 2014)

Shallow Aquifer

Blue points correspond to fresh water (< 1,000 mg/L TDS), and red points correspond to salty water (> 1,000 mg/L TDS). Click on points to view more information.

Deep Aquifer

Blue points correspond to fresh water (< 1,000 mg/L TDS), and red points correspond to salty water (> 1,000 mg/L TDS). Click on points to view more information.

Appendix: Depth Class

Exerpt from “Initial Conceptual Model (ICM) Technical Services Tasks 7 and 8 – Salt and Nitrate Analysis for the Central Valley Floor and a Focused Analysis of Modesto and Kings Subregions Final Report.” (Larry Walker Associates 2013):

Wells were classified into three depth classes (Shallow, Deep, and Unknown) based on information provided by the original source, as shown in Table 3-2. Most wells in the database did not contain quantitative information on well depth or screened interval; however, other information such as well type was used when available to infer the depth from which a well was sampled. Only the USGS database contained quantitative information regarding well depth. For wells lacking a specified value of well depth, the well type was used to infer the depth (see Table 3-2 for examples). Wells in the DWR, GeotrackerGAMA, and RWQCB WDR Dairy Data databases sometimes contain a description of the well type which enabled categorization of the well into a depth class. All wells from the DPH database were assumed to be drinking water supply wells. Irrigation/agricultural38, industrial, and municipal supply wells were classified as “Deep” whereas domestic wells and monitoring wells were classified as “Shallow”. All DPH wells were therefore classified as “Deep” as these were all assumed to be drinking water supply wells. All other well types were classified as “Unknown”. A large number of USGS wells provided numerical values for well depth; therefore, these were used when provided. USGS wells were assigned a depth class based on the 20-year travel depth for a particular CVHM cell that it was located within. Wells with a depth less than the 20-year travel depth were classified as “Shallow,” and those below the 20-year travel depth were classified as “Deep”. Wells without depth information or a well type were classified as “Unknown”.

References

Larry Walker Associates. 2013. “Initial Conceptual Model (ICM) Technical Services Tasks 7 and 8 – Salt and Nitrate Analysis for the Central Valley Floor and a Focused Analysis of Modesto and Kings Subregions Final Report.”


  1. United States Geologic Survey

  2. California Department of Public Health

  3. Groundwater Ambient Monitoring and Assessment Program

  4. Environmental Defense Fund

  5. California Department of Water Resources