This analysis explores inorganic groundwater contamination from Total Dissolved Solids (TDS) in the Tulare Basin, California. Data was generously provided by CV-SALTS and CDM Smith, and aggregated from:

United States Geologic Survey¹
California Department of Public Health²
Groundwater Ambient Monitoring and Assessment Program³
Environmental Defense Fund⁴
California Department of Water Resources⁵
independent dairy farm monitoring wells

All data cleaning, analysis, and mapping was carried out within R version 1.0.143.

All scripts, versions, data input files, and documentation for this project can be found in this Github repository.

Summary

Groundwater quality data in the Tulare Basin is well-distributed spatially, with more data spatially concentrated near the urban centers of Fresno and Bakersfield.
Temporally, groundwater quality data ranges from 1902-2014. Data is sparse and uneven from 1902-1983, and generally more robust from 1980-2014.
Temporally continuous individual well data is much less common than discontinuous data, which may come from a multitide of wells collected at uneven intervals. Continuous and discontinuous data were examined separately to isolate increases in TDS in individual wells over time, and together to increase regional spatial coverage.
Continuous and discountinuous data from shallow and deep wells show increases in measured TDS, which suggests depreciation of regional groundwater quality in the Tulare Basin.
The lack of a regionally-extensive, long-term groundwater quality monitoring network demonstrates the importance of monitoring water quality on a regional level.

Code

Define Useful Functions

# multiplot function from Winston Chang's "R Cookbook"
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}

Load Data and Required Packages

#setwd("E:/rpauloo/Documents/GitHub/Contaminant-Mapping")
setwd("/Users/richpauloo/Documents/GitHub/Regional_GW_Quality_Stats")

required.packages <- c("sp", "XML", "rgdal", "rgeos", "raster", "ggplot2", 
                       "dismo", "leaflet", "RColorBrewer", "classInt", "dplyr", 
                       "EnvStats", "maps", "mapdata", "cowplot", "scales")
suppressMessages(lapply(required.packages, require, character.only = TRUE))

dat = read.csv(file = "TDS_Features_Tulare_Clip_TableToExcel.csv", stringsAsFactors = FALSE) # 54,171 total observations

# add dates to dat
dat <- dat %>% 
  mutate(Date = as.Date(Samp_Date))

Data Wrangling

# of the shallow and deep wells where we know depth class, how many don't have depth information?
t <- dat %>% 
  filter(DepthClass == "Shallow" | DepthClass == "Deep") %>% 
  transmute(hasdepth = ifelse(!is.na(Depth_ft), TRUE, FALSE))
sum(t$hasdepth)
sum(!t$hasdepth)

# filter observations with depth info, add time_period
dat_with_depth <- dat %>% 
  filter(Depth_ft != "NA" | 0) %>% 
  mutate(time_period = 
           ifelse(Year %in% 1923:1959, "1923-1959",
           ifelse(Year %in% 1960:1969, "1960-1969",
           ifelse(Year %in% 1970:1979, "1970-1979",
           ifelse(Year %in% 1980:1989, "1980-1989",                  
           ifelse(Year %in% 1990:1999, "1990-1999",
           ifelse(Year %in% 2000:2014, "2000-2014", NA))))))
         )

unique(dat_with_depth$Database) # all wells with depth information come from the USGS databse
unique(dat_with_depth$time_period) # all time periods are represented

# visualize
dat_with_depth %>% 
  ggplot() +
  geom_histogram(aes(x = Depth_ft)) + 
  xlab('depth (ft)') + 
  ggtitle('USGS Wells by depth') +
  theme_light()

# most wells are shallow (< 250 feet)

# create depth classes
dat_with_depth$Depth_ft <- round(dat_with_depth$Depth_ft) # round Depth_ft to avoid creating NAs in depth_class

dat_with_depth <- dat_with_depth %>% 
  mutate(depth_class = 
           ifelse(Depth_ft %in% 0:249, "0-249 ft",
           ifelse(Depth_ft %in% 250:749, "250-749 ft",
           ifelse(Depth_ft %in% 750: 3000, "750-3000 ft", NA))))

# visualize
dat_with_depth %>% 
  ggplot() +
  geom_bar(aes(x = depth_class, fill = time_period)) + 
  ggtitle('USGS wells by depth and time period') +
  scale_x_discrete(limits = c('0-249 ft', '250-749 ft' , '750-3000 ft')) +
  theme_light()

Visualize USGS data - Develop Map Function for Time Period

# Is there sampling bias in the wells that have depth information to them? Let's look at the spatial distribution of these USGS wells which we have depth information for.

# define coordinates
well_coords = cbind(dat_with_depth$Longitude, dat_with_depth$Latitude)
pts = SpatialPoints(well_coords)

# deinfe color palette to be used. blue values correspond to fresh water, red to salty water
co = c("blue","steelblue2","steelblue1","seashell1","orangered1","red3")

# define map center for all maps and plug into setView
center_coords <- dat_with_depth %>% 
  filter(depth_class == "750-3000 ft" & time_period == "2000-2014") %>% 
  summarise(latCenter = mean(Latitude) + .23,
            lonCenter = mean(Longitude))

# define map function
map.TDS.period = function(t, d){
  # create time and depth class (t.d) subset for the t and d arguments  
  t.d <-  dat_with_depth %>% 
    filter(time_period == t & depth_class == d) 
  
  # create logical vector for all t and d indexes to extract coordinates of these observations
  t.d.pts = ifelse(dat_with_depth$depth_class == d & 
                   dat_with_depth$time_period == t, TRUE, FALSE)
  
  # use logical vector to subset well_coords, and create spatial points data frame for time frame t and depth d
  ptsdf = SpatialPointsDataFrame(pts[t.d.pts , ], data = t.d)
  
  # define palette based on time frame t and depth class d observations
  # set bins with a vector of breaks, so they are fine from 0 - 1,000 and coarse from 1,000 - 50,000
  pal = colorBin(palette = co,
           domain = t.d$Result, bins = c(0,200,400,600,800,1000,
                                          5000,10000,50000))
  
  # generate map assocaited with time frame t and depth d
  map <- leaflet(data = ptsdf) 
  map <- addTiles(map)
  map %>%
    addCircleMarkers(
      lng = well_coords[t.d.pts,1], # longitude
      lat = well_coords[t.d.pts,2], # latitude
      radius = 4, # fixed radius size
      color = ~pal(Result),
      stroke = FALSE, 
      fillOpacity = 0.8,
      popup = paste(ptsdf$Result, " mg/L TDS", "<br>",
                    "Database: ", ptsdf$Database, "<br>",
                    "Well ID: ", ptsdf$Well_ID, "<br>",
                    "Latitude: ", ptsdf$Latitude, "<br>",
                    "Longitude: ", ptsdf$Longitude)
    ) %>%
    
    addLegend("topright", pal = pal, # use custom palette
              values = ~Result,
              title = paste(t, d, sep = ", "), # legend displays time frame and depth class as title
              labFormat = labelFormat(suffix = " mg/L"),
              opacity = 1
    ) %>%
    
    addProviderTiles(providers$Esri.WorldTerrain # override default map with ESRI world terrain map
  
    ) %>% 
    
    setView(lng = center_coords$lonCenter, lat = center_coords$latCenter, zoom = 8) # iteratively changed zoom until it centered on the data
}

# run map function for all decades and depth classes
t <- c("1923-1959", "1960-1969", "1970-1979", "1980-1989", "1990-1999", "2000-2014")
d <- c('0-249 ft', '250-749 ft', '750-3000 ft')

for(i in t){
  for(j in d){
    print(list(map.TDS.period(i,j)))
  }
}

# spatial coverage of data depends on depth_class and time_frame. Deep and intermediate wells show high density around the urban centers of Fresno and Bakersfield. Shallow well data from 1980-1989, especially in the Eastern SJ Valley is very dense, likely the result of the emergence of non-point source contamination in the region and an effort to assess the magnitude and distribution of these patterns in groundwater quality. This widespread monitoring effort does not continue as time goes on.

Locate Possible Duplicate Wells between USGS and other Databases

dat_USGS <- dat_with_depth
dat_other <- dat %>% 
  filter(Database != "USGS")

merge(dat_USGS, dat_other, by = c("Latitude", "Longitude"))

# no well lat/lng coordiantes within the USGS database match well coordinates within the other databases. This indicates that USGS has never taken over wells from another database, or passed on responsibility for monitoring its own wells. This means we cannot infer the depth of wells from other databases based on the Depth_ft information we have in the USGS database.

Depth and TDS initial condition (1960)

# for the earliest decade:
boundary_condition <- dat_with_depth %>% 
  filter(Year <= 1960, Result <= 2000) %>% # start near the beginning of 1960 to set broundary conditions of model, filter out excessively contaminated wells to see the actaul pattern in groundwater TDS
  ggplot() +
  geom_point(aes(x = -Depth_ft, y = Result, color = depth_class)) +
  geom_smooth(aes(x = -Depth_ft, y = Result), color = "black") +
  ylab('TDS (mg/L)') +
  xlab('Depth (ft)') + 
  labs(color = 'Depth Class') + 
  coord_flip() +
  theme_light()

# view plot
boundary_condition

# extract regression model
boundary_dat <- ggplot_build(boundary_condition)$data[[2]]

ggplot(boundary_dat) +
  geom_point(aes(x = y, y = x))

# we will use this data to set the boundary conditions for the mixing cell model

Prepare Basemaps Maps for Next Sections

# create dataframe with coordinates for Fresno and Bakersfield, to act as markers
cities <- data.frame(city = c('Fresno', 'Bakersfield', 'Visalia'),
                     longitude = c(-119.7726, -119.0187, -119.2921), 
                     latitude = c(36.7468, 35.3733, 36.3302)
                     )

# load Bulletin 118 GW basin shapefiles
library(rgdal)
library(maptools)
filename <- "/Users/richpauloo/Documents/GitHub/Regional_GW_Quality_Stats/B118_GW_Basins/I08_B118_CA_GroundwaterBasins.shp"
s <- shapefile(filename)

# get tulare basin and other gw basins
library(spdplyr) # load spdplyr for operating on spatial objects
s %>% 
  filter(Basin_ID == '5-22') %>% 
  arrange(Basin_Subb) -> tulare_alluvial
tulare_alluvial <- tulare_alluvial[8:14, ] # subset for Tulare Basin subbasins

# string of subbasin names for subsetting for non-tulare basins
sbs <- c(paste('5', paste('0', 1:9, sep=""), sep="-"), paste('5', 10:22, sep="-"))

other_cv_basins <- s %>% 
  filter(Basin_ID %in% sbs & !is.na(Subbasin_N)) %>% 
  filter(Basin_Subb != '5-12.01' & Basin_Subb != '5-12.02' & Basin_Subb != '5-02.01' & Basin_Subb != '5-02.02' & Basin_Subb != '5-01.01' & Basin_Subb != '5-01.02')

# define projections as NAD83
tulare_alluvial <- spTransform(tulare_alluvial, CRS("+proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs +towgs84=0,0,0"))
other_cv_basins <- spTransform(other_cv_basins, CRS("+proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs +towgs84=0,0,0"))

# get California polygon from maps library
library(maps)
library(mapdata)
states <- map_data("state")
ca_df <- subset(states, region == "california")

# first plot the california base map
ca_base <- ggplot(data = ca_df, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
  geom_polygon(color = "black", fill = "white") +
  geom_polygon(data = other_cv_basins , aes(x = long, y = lat, group = group), color = 'black', fill = 'white') +
  geom_polygon(data = tulare_alluvial, fill = "cornsilk2", color = 'black') +
  theme_void()

# now create the tulare base map to use in later maps
tulare_base <- ggplot() + 
  coord_fixed(1.3) + 
  geom_polygon(data = tulare_alluvial, aes(x = long, y = lat, group = group), color = "black", fill = "cornsilk2") +
  theme_void()

# create another tulare base map with cities for displaying now. later the cities will be layered after the points for readability.
tulare_base_cities <- ggplot() + 
  coord_fixed(1.3) + 
  geom_polygon(data = tulare_alluvial, aes(x = long, y = lat, group = group), color = "black", fill = "cornsilk2") +
  geom_point(data = cities, aes(x = longitude, y = latitude)) +
  geom_text(data = cities, aes(x = longitude, y = latitude, label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) +
  theme_void()

# extract coords of tulare_alluvial for annotations
str(tulare_alluvial)
tulare_coords <- tulare_alluvial@polygons[[1]]@Polygons[[1]]@coords
min_x <- min(tulare_coords[,1])
min_y <- min(tulare_coords[,2])

# plot California and Tulare together
multiplot(ca_base, tulare_base_cities, cols = 2)

# with cowplot
map_base_inset <- ggdraw() +
  draw_plot(tulare_base_cities) +
  draw_plot(ca_base, 0.05, 0.1, 0.4, 0.4)

Examine All Shallow and Deep Well Data

library(cowplot)
# all shallow well observations
all_shallow <- dat %>% 
  filter(DepthClass == "Shallow") %>% 
  mutate(has_depth_data = ifelse(!is.na(Depth_ft), TRUE, FALSE)) 

# unique shallow well observations
unique_shallow <- all_shallow[!duplicated(all_shallow$Well_ID), ] # subsets for all unique Well_ID

# map of unique shallow wells
map_unique_shallow <- tulare_base +
  geom_point(data = unique_shallow, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city))) +
  geom_text(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city), label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) + 
  labs(caption = "(a)") + 
  annotate("text", 
           x = min_x -.2, y = min_y - .45, 
           label = paste("n = ", length(unique_shallow$Well_ID))
           ) +
  theme(plot.margin = unit(c(0,.5,0,0), "cm"), legend.position = c(.8,.8))

# stacked bar plot of unique shallow wells with fill = has_depth_data
p_unique_shallow <- unique_shallow %>% 
  ggplot(aes(x = Year)) + 
  geom_bar(color = "black", aes(fill = Database)) +
  xlab('Year') +
  ylab('Observations') +
  xlim(1940, 2020) +
  guides(fill = FALSE) +
  theme(plot.margin = unit(c(0,0,.5,.5), "cm"))


# unique shallow with depth observations
unique_shallow_with_depth <- unique_shallow %>% 
  filter(!is.na(Depth_ft))

# calculate mean and sd from lognormal distribution using Cox method (Land, 1971)
library(EnvStats)
lognorm_shallow <- elnormAlt(unique_shallow_with_depth$Depth_ft, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "land", conf.level = 0.95)

shallow_mean <- round(lognorm_shallow$parameters[1], 2)
shallow_sd <- round(lognorm_shallow$parameters[1] * lognorm_shallow$parameters[2], 2)


# plot histogram of unique shallow wells with depth information
p_unique_shallow_with_depth <- ggplot(unique_shallow_with_depth) +
  geom_histogram(aes(x = Depth_ft), bins = 100, color = "black", fill = '#00BFC4') + 
  annotate("text", 
           y = 150, x = 400, 
           label = paste("n = ", length(unique_shallow_with_depth$Well_ID), "\n",
                         "μ = ", shallow_mean, "\n",
                         "σ = ", shallow_sd)) +
  xlab('Well Depth (ft)') +
  theme(plot.margin = unit(c(.5,0,0,.5), "cm"))


# Now let's repeat these analyses for deep wells:

# all deep well observations
all_deep <- dat %>% 
  filter(DepthClass == "Deep") %>% 
  mutate(has_depth_data = ifelse(!is.na(Depth_ft), TRUE, FALSE)) 

# unique shallow well observations
unique_deep <- all_deep[!duplicated(all_deep$Well_ID), ] # subsets for all unique Well_ID

# map of unique deep wells
map_unique_deep <- tulare_base +
  geom_point(data = unique_deep, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city))) +
  geom_text(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city), label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) + 
  labs(caption = "(a)") + 
  annotate("text", 
           x = min_x -.2, y = min_y - .45, 
           label = paste("n = ", length(unique_deep$Well_ID))
           ) +
  theme(plot.margin = unit(c(0,.5,0,0), "cm"), legend.position= c(0.8, 0.8))

# stacked bar plot of unique shallow wells with fill = has_depth_data
p_unique_deep <- unique_deep %>% 
  ggplot(aes(x = Year)) + 
  geom_bar(color = "black", aes(fill = Database)) +
  xlab('Year') +
  ylab('Observations') +
  labs(caption = "(b)") +
  xlim(1940, 2020) +
  guides(fill = FALSE) +
  theme(plot.margin = unit(c(0,0,.5,.5), "cm"))

# unique shallow with depth observations
unique_deep_with_depth <- unique_deep %>% 
  filter(!is.na(Depth_ft))

# calculate mean and sd from lognormal distribution using Cox method (Land, 1971)
library(EnvStats)
lognorm_deep <- elnormAlt(unique_deep_with_depth$Depth_ft, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "land", conf.level = 0.95)

deep_mean <- round(lognorm_deep$parameters[1], 2)
deep_sd <- round(lognorm_deep$parameters[1] * lognorm_deep$parameters[2], 2)

# plot histogram 
p_unique_deep_with_depth <- ggplot(unique_deep_with_depth, aes(x = Depth_ft)) +
  geom_histogram(bins = 50, color = "black", fill = '#00BFC4') + 
  xlab('Well Depth (ft)') +
  labs(caption = "(c)") +
  annotate("text", 
           y = 35, x = 1500, 
           label = paste("n = ", length(unique_deep_with_depth$Well_ID), "\n",
                         "μ = ", deep_mean, "\n",
                         "σ = ", deep_sd)) +
  theme(plot.margin = unit(c(.5,0,0,.5), "cm"))


# draw inset maps
map_unique_shallow_inset <- ggdraw() +
  draw_plot(map_unique_shallow) +
  draw_plot(ca_base, 0.05, 0.05, 0.35, 0.35)

map_unique_deep_inset <- ggdraw() +
  draw_plot(map_unique_deep) +
  draw_plot(ca_base, 0.05, 0.05, 0.35, 0.35)

GW Quality Changes in individual wells (Shallow)

# Shallow wells with Continuous (n >= 25) data
shallow_continuous_early <- dat[ave(dat$Year, dat$Well_ID, FUN = length) >= 25, ] %>% 
  filter(DepthClass == "Shallow" & Year < 2005) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))

shallow_continuous_late <- dat[ave(dat$Year, dat$Well_ID, FUN = length) >= 25, ] %>% 
  filter(DepthClass == "Shallow" & Year >= 2005) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))
  
p_shallow_continuous <- 
  ggplot() + 
  geom_line(data = shallow_continuous_early, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = shallow_continuous_early, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  geom_line(data = shallow_continuous_late, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = shallow_continuous_late, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  geom_smooth(data = shallow_continuous_late, color = 'black', se = FALSE, aes(x = Date, y = Result_mean)) +
  xlab('Time') +
  ylab('TDS (mg/L)') + 
  xlim(as.Date("1940", "%Y"), as.Date("2020", "%Y")) +
  scale_y_log10(breaks = c(100,1000,10000, 100000), labels = comma(c(100,1000,10000, 100000)), limits = c(100,100000)) +
  guides(color = FALSE) +
  background_grid(major = "xy", minor = "none") +
  theme(plot.margin = unit(c(0,0,0,.5), "cm"))

# map for continuous shallow wells
map_shallow_continuous <- tulare_base +
  geom_point(data = shallow_continuous_early, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = shallow_continuous_late, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city))) +
  geom_text(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city), label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) +
  annotate("text", 
           x = min_x -.2, y = min_y - .45, 
           label = paste("n = ", as.character(
             length(unique(shallow_continuous_early$Well_ID)) + 
             length(unique(shallow_continuous_late$Well_ID))), sep = "")) +
  theme(plot.margin = unit(c(0,.5,0,0), "cm"), legend.position = c(.8,.8))

## Now let's look at discontinuous shallow well data

# Shallow wells with Discontinous (n >= 0) data
shallow_discontinuous_late <- dat %>% 
  filter(DepthClass == "Shallow" & Year >= 1993) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result)) 

shallow_discontinuous_middle <- dat %>% 
  filter(DepthClass == "Shallow" & Year < 1993 & Year >= 1984) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))

shallow_discontinuous_early <- dat %>% 
  filter(DepthClass == "Shallow" & Year < 1984) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))

p_shallow_discontinuous <- 
  ggplot() + 
  geom_line(data = shallow_discontinuous_late, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = shallow_discontinuous_late, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  geom_smooth(data = shallow_discontinuous_late, color = 'black', se = FALSE, aes(x = Date, y = Result_mean)) +
  geom_line(data = shallow_discontinuous_middle, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = shallow_discontinuous_middle, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  geom_smooth(data = shallow_discontinuous_middle, color = 'black', se = FALSE, aes(x = Date, y = Result_mean)) +
  geom_line(data = shallow_discontinuous_early, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = shallow_discontinuous_early, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  xlab('Time') +
  ylab('TDS (mg/L)') + 
  scale_y_log10(breaks = c(100,1000,10000, 100000), labels = comma(c(100,1000,10000, 100000)), limits = c(100,100000)) +
  xlim(as.Date("1940", "%Y"), as.Date("2020", "%Y")) +
  guides(color = FALSE) +
  background_grid(major = "xy", minor = "none") +
  theme(plot.margin = unit(c(0,0,0,.5), "cm"))

# map for discontinuous shallow wells
map_shallow_discontinuous <- tulare_base +
  geom_point(data = shallow_discontinuous_early, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = shallow_discontinuous_middle, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = shallow_discontinuous_late, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city))) +
  geom_text(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city), label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) +
  annotate("text", 
           x = min_x -.2, y = min_y - .45, 
           label = paste("n = ", as.character(
             length(unique(shallow_discontinuous_early$Well_ID)) +
             length(unique(shallow_discontinuous_middle$Well_ID)) +
             length(unique(shallow_discontinuous_late$Well_ID))), sep = "")) +
  theme(plot.margin = unit(c(0,.5,0,0), "cm"), legend.position = c(.8,.8))

# draw inset maps
map_shallow_discontinuous_inset <- ggdraw() +
  draw_plot(map_shallow_discontinuous) +
  draw_plot(ca_base, 0.05, 0.05, 0.35, 0.35)

map_shallow_continuous_inset <- ggdraw() +
  draw_plot(map_shallow_continuous) +
  draw_plot(ca_base, 0.05, 0.05, 0.35, 0.35)

GW Quality Changes in individual wells (Deep)

# start working with complete data (dat) rather than data with explicit depth (dat_with_depth)
  
# Deep wells with Continuous (n >= 25) data
deep_continuous_early <- dat[ave(dat$Year, dat$Well_ID, FUN = length) >= 25, ] %>% 
  filter(DepthClass == "Deep" & Year < 1980) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))

deep_continuous_late <- dat[ave(dat$Year, dat$Well_ID, FUN = length) >= 25, ] %>% 
  filter(DepthClass == "Deep" & Year >= 1980) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))

p_deep_continuous <-
  ggplot() + 
  geom_line(data = deep_continuous_late, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = deep_continuous_late, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  geom_smooth(data = deep_continuous_late, color = 'black', se = FALSE, aes(x = Date, y = Result_mean)) +
  geom_line(data = deep_continuous_early, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = deep_continuous_early, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  xlab('Time') +
  ylab('TDS (mg/L)') + 
  xlim(as.Date("1940", "%Y"), as.Date("2020", "%Y")) +
  scale_y_log10(breaks = c(100,1000,10000, 100000), labels = comma(c(100,1000,10000, 100000)), limits = c(100,100000)) +
  guides(color = FALSE) +
  background_grid(major = "xy", minor = "none") +
  theme(plot.margin = unit(c(0,0,0,.5), "cm"))

# map for continuous deep wells
map_deep_continuous <- tulare_base +
  geom_point(data = deep_continuous_early, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = deep_continuous_late, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5) +
  geom_point(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city))) +
  geom_text(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city), label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) +
  annotate("text", 
           x = min_x + .2, y = min_y - .45,
           label = paste("n = ", as.character(
             length(unique(deep_continuous_early$Well_ID)) +
             length(unique(deep_continuous_late$Well_ID))), sep = "")) +
  theme(plot.margin = unit(c(0,.5,0,0), "cm"), legend.position = c(.8,.8))

           
## Now let's look at deep discontinuous well data

# Deep wells with Discontinous (n >= 0) data
deep_discontinuous_late <- dat %>% 
  filter(DepthClass == "Deep" & Year >= 1970) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))

deep_discontinuous_early <- dat %>% 
  filter(DepthClass == "Deep" & Year < 1970 & Year > 1930) %>% 
  group_by(Date, Well_ID, Latitude, Longitude, Database) %>% 
  dplyr::summarise(Result_mean = mean(Result))

p_deep_discontinuous <-
  ggplot() + 
  geom_line(data = deep_discontinuous_late, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = deep_discontinuous_late, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  geom_smooth(data = deep_discontinuous_late, color = 'black', se = FALSE, aes(x = Date, y = Result_mean)) +
  geom_line(data = deep_discontinuous_early, aes(x = Date, y = Result_mean, group = Well_ID), alpha = 1/5) + 
  geom_point(data = deep_discontinuous_early, aes(x = Date, y = Result_mean, group = Well_ID, color = Database), alpha = 1/3) + 
  xlab('Time') +
  ylab('TDS (mg/L)') + 
  xlim(as.Date("1940", "%Y"), as.Date("2020", "%Y")) +
  scale_y_log10(breaks = c(100,1000,10000, 100000), labels = comma(c(100,1000,10000, 100000)), limits = c(100,100000)) +
  guides(color = FALSE) +
  background_grid(major = "xy", minor = "none") +
  theme(plot.margin = unit(c(0,0,0,.5), "cm"))

# map for discontinuous deep wells
map_deep_discontinuous <- tulare_base +
  geom_point(data = deep_discontinuous_early, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5, alpha = 1/3) +
  geom_point(data = deep_discontinuous_late, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5, alpha = 1/3) +
  geom_point(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city))) +
  geom_text(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city), label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) +
  annotate("text", 
           x = min_x + .2, y = min_y - .45,
           label = paste("n = ", as.character(
             length(unique(deep_discontinuous_late$Well_ID)) +
             length(unique(deep_discontinuous_early$Well_ID))), sep = "")) +
  theme(plot.margin = unit(c(0,.5,0,0), "cm"), legend.position = c(.8,.8))

# facet map for discontinuous deep wells
# get count data for each facet
deep_facet_count <- dat %>% 
  filter(DepthClass == "Deep" & Year >= 1970) %>% 
  group_by(Database) %>% 
  dplyr::summarise(n = paste("n = ", n(), sep =""))

# make the facet map
map_deep_discontinuous_facet <- tulare_base +
  geom_point(data = deep_discontinuous_early, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5, alpha = 1/3) +
  geom_point(data = deep_discontinuous_late, aes(x = Longitude, y = Latitude, group = Well_ID, color = Database), size = .5, alpha = 1/3) +
  geom_point(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city))) +
  geom_text(data = cities, mapping = aes(x = longitude, y = latitude, group = factor(city), label = city), vjust = 0, nudge_y = -0.1, nudge_x = -0.1) +
  theme(plot.margin = unit(c(0,.5,0,0), "cm"), legend.position = c(.8,.3)) + 
  facet_wrap(~Database) + 
  geom_text(data = deep_facet_count, aes(min_x + .2, min_y - 1.1, label = n), parse = FALSE)

# draw inset maps
map_deep_discontinuous_inset <- ggdraw() +
  draw_plot(map_deep_discontinuous) +
  draw_plot(ca_base, 0.05, 0.05, 0.35, 0.35)

map_deep_continuous_inset <- ggdraw() +
  draw_plot(map_deep_continuous) +
  draw_plot(ca_base, 0.05, 0.05, 0.35, 0.35)

Results & Discussion

Study Area

The Tulare basin comprises Fresno, Kings, Kern, and Tualre county and covers 502,5110 acres (roughly 200,000 km\(^2\)) of the southern portion of California’s Central Valley. The Tulare basin is an arid, agriculturally-dominated landscape, and heavily dependent on groundwater. Notable centers of high population density include the cities of Fresno, Visalia, and Bakersfield. Data were stratified by depth to represent: (1) a shallow depth class generally representing domestic drinking-water supply wells, and (2) a deep depth class generally representing municipal drinking-water supply wells and agricultural production wells.

Figure 1: Study Area

Shallow Wells

When depth information was not present, Domestic and Monitoring wells received a classification of “Shallow” (for a full description of well classification, see the Appendix). Not all wells came with a recorded depth. Of the 998 shallow wells in this study, roughly half (468) contained explicit depth information, and all of these wells with depth information came from the USGS database. These wells are shown in blue in Figure 2.

Shallow wells with depth data are well distributed spatially. They were mostly sampled during the period from 1980-2000, and have a mean depth of 67.67 ft., which agrees with the estimate of 60.95 ft., calculated with the Land Method (Land 1971) from a lognormal distribution fitted to these data. This analysis uses all 998 wells classified as “Shallow”; the separation of wells based on depth data served to estimate the mean depth of the wells for which depth data was present.

Figure 2: All Shallow Wells

(A) All unique shallow well observations, including wells with continuous and discontinuous data. (B) All unique shallow wells with depth data are shown in blue, whereas shallow wells without explicit depth data are shown in red. (C) All unique shallow wells with depth data follow a lognormal distribution. Parameters were calculated using the Land Method (Land 1971).

Shallow Wells with Continuous Data

Continuous data was defined as a well having >= 25 observations, independent of time. That is to say, a well with 25 observations on a single day counted as a well with “continuous data”. Only 52 wells offered continuous data, and these data offer poor spatial coverage. Within these wells, continuous data begins around 1983. From the 10-year time period between 2005-2014, where the data is clustered, average groundwater TDS in these wells increases by roughly 4,000 mg/L, although much greater changes in TDS are observed in some wells rather than others. This indicates that in some areas, groundwater quality deterioration proceeds at a higher rate than in others.

Figure 3: Shallow Wells with Continuous Data

(a) Unique shallow well observations with continuous (>= 25 observations) data are well-dispersed spatially. (b) A smoothed regression through the continuous data in the time from 2005-2014, shows an increasing trend in regional GW TDS.

Shallow Wells with Continuous and Discontinuous Data

Wells with discontinuous (< 25 observations) data make up the vast majority of the 998 Shallow Wells. They are well distributed spatially, and cluster between the dates of 1984-1993 and 1993-2014. Smoothed averages over these clusters of data show increases in TDS by roughly 5,000 and 2,000 mg/L TDS respectively. This increasing trend agrees with the data observed in continuous shallow well data.

Figure 4: Shallow Wells with Continuous and Discontinuous Data

(A) Unique shallow well observations with continuous (>= 25 observations) and discontnuous data are well-dispersed spatially. 18 outlying wells were removed. (B) A smoothed regression line in two time regions, 1984-1993 and 1993-2014, where data is clustered, shows increasing trends in regional GW quality in shallow wells.

Deep Wells

In the absence of depth data, Irrigation, Industrial, and Municipal Supply Wells were given a classification of “Deep” (for more detail on depth classification, see the Appendix). 616 of the 2,776 Deep Wells came from the USGS database, and thus contained depth information. These deep wells are well-distributed and their mean depth is calculated as 653.81 ft, which agrees with the estimate from a lognormal distribution model of 671.61 feet.

Note the scale of the y axis between the shallow and deep wells varies by an order of magnitude. Inorganic water quality is generally better in deep wells, although the continuous and discontinuous deep well data suggests it declines in inorganic water quality.

Figure 5: All Deep Wells

(A) All unique deep well observations, including wells with continuous and discontinuous data. (B) All unique deep wells with depth data are shown in blue, whereas deep wells without explicit depth data are shown in red. (C) All unique deep wells with depth data follow a lognormal distribution. Parameters were calculated using the Land Method (Land 1971).

Deep Wells with Continuous Data

Like shallow wells, deep wells with continuous data are not very well-distributed, highlighting the lack of robust water quality data in the region. 118 deep wells offer continuous data. Nonetheless, these continuous data offer greater temporal coverage than shallow well data (the result of municipal supply well sampling protocol) and show a clear increasing signal from the period between 1980-2014 (Figure 6).

Figure 6: Deep Wells with Continuous Data

(A) Deep wells with continuous (>= 25 observations) data are well-dispersed spatially. (B) A smoothed regression line from 1980-2014, where data is clustered, shows an increasing trend in regional GW TDS in deep wells.

Deep Wells with Continuous and Discontinuous Data

When discontinuous data is considered, spatial coverage of deep wells vastly increases. Deep wells are clustered around the urban centers of Fresno, Visalia and Bakersfield which is unsurprising, as industrial, agricultural, and notably, municipal supply water use converges on these urban loci The data temporally clusters around sampling events, and average groundwater TDS tends to increase from 1970-2014.

Figure 6: Deep Wells with Continuous Data

(A) Unique deep well observations with continuous (>= 25 observations) and discontnuous data are well-dispersed spatially. 20 outlying wells were removed. (B) A smoothed regression line from 1970-2014, where data is clustered, shows an increasing trend in regional GW TDS in deep wells. (C) Deep groundwater quality data by Database. Counts refer to the number of unique observations.

Appendix

Depth Class

Exerpt from “Initial Conceptual Model (ICM) Technical Services Tasks 7 and 8 – Salt and Nitrate Analysis for the Central Valley Floor and a Focused Analysis of Modesto and Kings Subregions Final Report.” (Larry Walker Associates 2013):

Wells were classified into three depth classes (Shallow, Deep, and Unknown) based on information provided by the original source, as shown in Table 3-2. Most wells in the database did not contain quantitative information on well depth or screened interval; however, other information such as well type was used when available to infer the depth from which a well was sampled. Only the USGS database contained quantitative information regarding well depth. For wells lacking a specified value of well depth, the well type was used to infer the depth (see Table 3-2 for examples). Wells in the DWR, GeotrackerGAMA, and RWQCB WDR Dairy Data databases sometimes contain a description of the well type which enabled categorization of the well into a depth class. All wells from the DPH database were assumed to be drinking water supply wells. Irrigation/agricultural, industrial, and municipal supply wells were classified as “Deep” whereas domestic wells and monitoring wells were classified as “Shallow”. All DPH wells were therefore classified as “Deep” as these were all assumed to be drinking water supply wells. All other well types were classified as “Unknown”. A large number of USGS wells provided numerical values for well depth; therefore, these were used when provided. USGS wells were assigned a depth class based on the 20-year travel depth for a particular CVHM cell that it was located within. Wells with a depth less than the 20-year travel depth were classified as “Shallow,” and those below the 20-year travel depth were classified as “Deep”. Wells without depth information or a well type were classified as “Unknown”.

References

Land, Charles E. 1971. “Confidence Intervals for Linear Functions of the Normal Mean and Variance.” The Annals of Mathematical Statistics. doi:10.1214/aoms/1177693235.

Larry Walker Associates. 2013. “Initial Conceptual Model (ICM) Technical Services Tasks 7 and 8 – Salt and Nitrate Analysis for the Central Valley Floor and a Focused Analysis of Modesto and Kings Subregions Final Report.”

Regional Inorganic Groundwater Quality Trends in California’s Tulare Basin

Rich Pauloo

August 22, 2017

This analysis explores inorganic groundwater contamination from Total Dissolved Solids (TDS) in the Tulare Basin, California. Data was generously provided by CV-SALTS and CDM Smith, and aggregated from:

All data cleaning, analysis, and mapping was carried out within R version 1.0.143.

All scripts, versions, data input files, and documentation for this project can be found in this Github repository.

Summary

Code

Define Useful Functions

Load Data and Required Packages

Data Wrangling

Visualize USGS data - Develop Map Function for Time Period

Locate Possible Duplicate Wells between USGS and other Databases

Depth and TDS initial condition (1960)

Prepare Basemaps Maps for Next Sections

Examine All Shallow and Deep Well Data

GW Quality Changes in individual wells (Shallow)

GW Quality Changes in individual wells (Deep)

Results & Discussion

Study Area

Figure 1: Study Area

Shallow Wells

Figure 2: All Shallow Wells

Shallow Wells with Continuous Data

Figure 3: Shallow Wells with Continuous Data

(a) Unique shallow well observations with continuous (>= 25 observations) data are well-dispersed spatially. (b) A smoothed regression through the continuous data in the time from 2005-2014, shows an increasing trend in regional GW TDS.

Shallow Wells with Continuous and Discontinuous Data

Figure 4: Shallow Wells with Continuous and Discontinuous Data

Deep Wells

Note the scale of the y axis between the shallow and deep wells varies by an order of magnitude. Inorganic water quality is generally better in deep wells, although the continuous and discontinuous deep well data suggests it declines in inorganic water quality.

Figure 5: All Deep Wells

Deep Wells with Continuous Data

Figure 6: Deep Wells with Continuous Data

(A) Deep wells with continuous (>= 25 observations) data are well-dispersed spatially. (B) A smoothed regression line from 1980-2014, where data is clustered, shows an increasing trend in regional GW TDS in deep wells.

Deep Wells with Continuous and Discontinuous Data

Figure 6: Deep Wells with Continuous Data

Appendix

Depth Class

Exerpt from “Initial Conceptual Model (ICM) Technical Services Tasks 7 and 8 – Salt and Nitrate Analysis for the Central Valley Floor and a Focused Analysis of Modesto and Kings Subregions Final Report.” (Larry Walker Associates 2013):

References