Working Title: An R workflow to process, analyze, and visualize select water quality data.


Import and prepare the original Excel dataset so that we can extract the necessary tibbles for analysis and visualization.

setwd("~/Documents/Hunter College/DataVisR/Week_11-15/Data")

# Import starting Excel data
df <- read_excel("AF210121 MAIN DATABASE WQ & PUMP_WY2023.xlsx", sheet = "Data", col_types = "text")
## New names:
## • `` -> `...1`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...44`
## • `` -> `...45`
## • `WLs` -> `WLs...46`
## • `WLs` -> `WLs...47`
## • `WLs` -> `WLs...48`
## • `WLs` -> `WLs...49`
## • `WLs` -> `WLs...50`
## • `CHL` -> `CHL...52`
## • `CHL` -> `CHL...53`
## • `CHL` -> `CHL...54`
## • `CHL` -> `CHL...55`
## • `TDS` -> `TDS...56`
## • `TDS` -> `TDS...57`
## • `TDS` -> `TDS...58`
## • `TDS` -> `TDS...59`
## • `SO4` -> `SO4...60`
## • `SO4` -> `SO4...61`
## • `SO4` -> `SO4...62`
## • `SO4` -> `SO4...63`
# Edit the column names so that they occupy one row 
names(df) <- paste(names(df), df[1, ], sep = "_") #merge row 1 and 2, separated by "_"
df <- df[-1,] #remove the 2nd, now empty, row
colnames(df)[1] <- "Date" #rename the first column "Date"


Recreating Figure 3.1 “Period of Record - Monitor Well 4 (4S)”


To do so, we need to isolate two tibbles:
1. Water Level vs. Date, i.e. columns named “WLs…46_Date” and “WLs…47_MW4”
2. Chloride Concentration vs. Date

Water Level

# Make tibble for Monitoring Well (MW) #4
waterlevel_MW4 <- df %>% 
  select(WLs...46_Date, WLs...47_MW4) %>% #pull only columns with date and MW #4 water level
  rename(Date = "WLs...46_Date", waterlevel = "WLs...47_MW4") %>% 
  na.omit() %>%
  mutate(waterlevel = as.numeric(waterlevel)) %>% 
  mutate(Date = as.numeric(Date)) %>% #R is currently reading the columns as characters. Convert values to num.
  filter(Date > 0 ) %>% #keep only the rows with valid dates
  mutate(Date = as.Date(Date, origin = "1899-12-30")) #convert dates to legible format

# Make plot
waterlevel_colors <- c("Water Level" = "cyan2", "LOESS Water Level" = "black") #set line colors for plot

wl_MW4_plot <- ggplot(waterlevel_MW4, aes(x = Date)) +
  geom_line(aes(y = waterlevel, color = "Water Level"), #add a line plotting water level over time
            linewidth = 0.5) +
  geom_smooth(aes(y = waterlevel, color = "LOESS Water Level"), 
              method = "loess", #apply LOWESS aka LOESS smoothing line
              span = .15, #the span, or smoothness, matched to the original figure
              se = FALSE,
              linewidth = 1,
              linetype = "dashed",
              alpha = 0.6) +
  labs(y = "Water Level (ft)",
       x = "Date",
       title = "Figure 3.1  Period of Record - Monitor Well 4 (4S)") +
  scale_color_manual(values = waterlevel_colors)+
  theme(
    legend.title = element_blank(),
    legend.position = c(.35, .95),
    legend.justification = c("right", "top"),
    legend.box.just = "right",
    legend.margin = margin(6, 6, 6, 6),
    legend.box.background = element_rect(color="black", linewidth=0.5)
  )
## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
wl_MW4_plot
## `geom_smooth()` using formula = 'y ~ x'