Introduction

This document and code provides scrape data from the scattered well logger files, combines into a single analyzable file. In addition to the screening and merging of over 179 files, the code conducts basic exploratory analyses of well logger data, primarily for further QA/QC.

The inherited state of organization for the data logger files was a mess. There are dozens of partially overlapping and inconsistently named files to munge. The DCC data (the data that was added to the DCC library repository) are assumed to be reliable.

The code crapes data from raw logger files discovered while searching for data in the different shared dropbox/drive folders and DK hard drives.

DCC URL: https://mountainscholar.org/handle/10217/195563

The code crapes data from raw logger files discovered while searching for data in the different shared dropbox/drive folders and DK hard drives.

DCC were processed to yield relative water table elevations by D. Kotter, while newer files need correction for stickup/hang depth. These data are presumed to exist somewhere and are needed to evaluate possible errors in barometric corrections used from the sole barotroll (Tower) found for the post-DCC period. As indicated, these data are raw and need connection to field measurements (e.g., hang height, well stickup). Combined data retain file_name and serial number as attributes to aid in the necessary detective work using field notes.

Loggers vary in their start and logging interval, so to synchronize, data were reduced to a daily step for export.

Time to knit: 6

Time to knit: 0

2012-2015 (DCC Collection)

Data import and initial inspection

# import of tsv 
dcc.raw <- readr::read_tsv("data/NSF_DataArchive20180208/Well_Logger_Data_2012-2015/Yell_Well_Data_Logger2012_2015.txt")

dcc <- dcc.raw %>% 
  clean_names() %>% 
  tibble::rownames_to_column(var = "RecordID")

Time to knit: 1

There are various issues apparent on inspection. For example:

There are corrupt data in the DCC file. No useable time stamps – everything is characterer in date format.
“date_and_time” is not formatted as dttm, just a simple date. It’s redundant with ‘date’ The ‘time’ column is not, in fact, time data. May wish to find the orginal logger data. This may work for daily aggregated data. Otherwise, not sure what to do with the 6 per-day records. The ’second’s column ranges from 0 to 1.0477473^{8}. I guess this is a cumulative time elapsed, but from when to when? I’m still of the mind that daily aggregates are the way to go forward…

## data type changes
dcc <- dcc %>% 
  mutate(date = mdy(date))

## change case to ease merging with non-DCC data
dcc <- dcc %>% 
  mutate(site = tolower(site)) %>% 
  mutate(wat_id2 = tolower(wat_id2)) %>% 
  mutate(year = as.integer(year)) %>% 
  rename(serial_number = serial_num)

Time to knit: 0

## make lower case
dcc <- dcc %>% 
  mutate_if(.predicate = is.character,.funs = tolower)

## modify "site" to be consistent with usage elsewhere
dcc <- dcc %>%
  rename(site.dcc = site) %>% 
  mutate(site = case_when(
    grepl("eb1", site.dcc) ~ "eb1",
    grepl("eb2", site.dcc) ~ "eb2",
    grepl("elk1", site.dcc) ~ "elk1",
    grepl("elk4", site.dcc) ~ "elk4",
    grepl("elkdx", site.dcc) ~ "elk",
    grepl("elkcc", site.dcc) ~ "elk",
    grepl("wbcc", site.dcc) ~ "wb",
    grepl("wbdx", site.dcc) ~ "wb",
    TRUE ~ site.dcc))

## add "plot" consistent with usage elsewhere
dcc <- dcc %>% 
  mutate(plot = case_when(
    grepl("cc", site.dcc) ~ "cc",
    grepl("dx", site.dcc) ~ "dx",
    grepl("dc", site.dcc) ~ "dc",
    grepl("cx", site.dcc) ~ "cx",
    TRUE ~ "obs")) %>% 
  mutate(site2 = paste0(site,"-",plot)) %>% 
  select(-site.dcc)

Time to knit: 1

# create a lu
lu.serialNum.watId2 <- dcc %>%
  distinct(serial_number, wat_id2)

Time to knit: 0

visdat::vis_dat(dcc, warn_large_data = FALSE)

dcc %>% 
  distinct(time)

Time to knit: 0

## what are the distinct wells?
# dcc %>% 
#   distinct(site_logger) %>% 
#   gt() %>% 
#   tab_header(title = "DCC logger data", subtitle = "Distinct 'site_logger'")

# note there are 19 distinct "site_logger" values and 19 distinct "serial_num"
dcc %>% 
  distinct(site, serial_number) %>% 
  gt() %>% 
  tab_header(title = "DCC logger data", subtitle = "Distinct 'site_logger'")

DCC logger data
Distinct 'site_logger'
serial_number	site
316609	eb1
316644	eb1
316744	eb2
316695	eb2
316622	eb2
316651	eb2
316605	elk1
316625	elk4
316614	elk
316604	elk
316603	ghole
316724	lava
316640	lb2
316602	lostc
316745	oxbow
316642	rose
316712	wb1
316706	wb
316747	wb

Time to knit: 0

### summarize at a daily time step
dcc.dly.mean <- dcc %>% 
  group_by(site, plot, site2, wat_id2, serial_number, date) %>% 
  summarise(mean.dly.abslevel = mean(abslevel,na.rm = TRUE), mean.water_column_m = mean(water_column_m, na.rm = TRUE), psi_logger = mean(press_psi, na.rm=TRUE),psi_baro = mean(bar_press_psi, na.rm=TRUE))

# create plot var by parsing site2.
dcc.dly.mean <- dcc.dly.mean %>% 
  mutate(plot = case_when(str_detect(site2, "cc") ~ "cc",
                          str_detect(site2, "dc") ~ "dc",
                          str_detect(site2, "dx") ~ "dx",
                          str_detect(site2, "cx") ~ "cx",
                          TRUE ~ "obs"))

## add in year and mob=nth columns
dcc.dly.mean <- dcc.dly.mean %>% 
  mutate(yr = lubridate::year(date)) %>% 
  mutate(month = lubridate::month(date)) %>% 
  mutate(doy = lubridate::yday(date)) %>% 
  mutate(doy.d = format(date, format="%m-%d"))

Time to knit: 1

## some plots for QA
dcc.dly.mean %>% 
  mutate(mean.water_column_m = mean.water_column_m*-1) %>%
  # distinct(plot)
  filter(plot != "obs") %>% 
  ggplot(aes(x=date,y=mean.water_column_m)) +
  geom_line(aes(color=serial_number)) +
  geom_hline(yintercept = 0, color = "red", lty = "dashed") +
  labs(title = "DCC: Mean daily 'water_column_m' values") +
  facet_wrap(~site, ncol=2) +
  theme_minimal() +
  labs(x = "DOY", y = "mean.dly.abslevel", caption = "mean.dly.abslevel = mean(abslevel,na.rm = TRUE)")

Time to knit: 0

dcc.dly.mean %>% 
  filter(plot != "obs") %>% 
  ggplot(aes(x=date,y=mean.dly.abslevel)) +
  geom_line(color="blue") +
  labs(title = "DCC: Mean daily 'abslevel' values") +
  facet_wrap(~serial_number, scales = "free_y") +
  theme_minimal() +
  labs(x = "Date", y = "Water table elevation (m)",caption = "mean.dly.abslevel = mean(abslevel,na.rm = TRUE)")

# ggsave("waterlevel_loggers_exp.png", width = 8.5, height = 6)

Time to knit: 1

## plot daily traces
dcc.dly.mean %>% # names()
  filter(plot != "obs") %>%
  filter(month > 2 & month < 10) %>% 
  mutate(yr = as.factor(yr)) %>% 
  # ggplot(aes(x=doy,y=mean.water_column_m)) +
  ggplot(aes(x=doy,y=mean.dly.abslevel)) +
  geom_line(aes(color=yr)) +
  labs(x = "Day of year", y = "Water table elevation (m)") +
  # labs(title = "Mean daily 'water_column_m' values") +
  facet_wrap(~wat_id2, scale = 'free_y', ncol = 3) +
  theme_minimal()

# ggsave("waterlevel_loggers_exp_alt2.png", width = 8.5, height = 6)

Time to knit: 1

DCC daily traces

There are clear out of range values evident in the DCC data, so thses don’t represent fully cleaned data. Minimal filtering is done to enable users of the data to perform cleaning as fits their needs.

## plot daily traces

dcc.dly.mean %>%
  filter(plot != "obs") %>%
  filter(month > 2 & month < 10) %>% 
  mutate(yr = as.factor(yr)) %>% 
  ggplot(aes(x=doy,y=mean.dly.abslevel)) +
  geom_line(aes(color=yr)) +
  labs(x = "Day of year", y = "Water table elevation (m)") +
  facet_wrap(~site2, scale = 'free_y', ncol = 2) +
  theme_minimal()

# ggsave("waterlevel_loggers_exp_alt2.png", width = 8.5, height = 6)

Time to knit: 1

### Export munged DCC data
dcc.dly.mean %>% 
  clean_names() %>% 
  write_csv("./data/processed/data_loggers/dcc_logger_20201205.csv")

Time to knit: 0

2016-2019 (Non-DCC files)

Inherited a mess of files and directories. The following code scrapes data from the various places data were found, merges data and eliminates duplicates. The general strategy is to work backwards from 2019 as there innumerable ‘appended’ files, hoping these capture data from earlier periods.

Time to knit: 0

Tower baro data

The only baro data found is from Tower and is provisionally used for correcting different site data.

Source file: TowerBarow_2011+_2019-09-10_11-01-36-258.csv Based on the date range, this may extend to the full time series?

tow.baro <- read_csv("data/raw/WinSituData/Logger_Data_2019/TowerBarow_2011+_2019-09-10_11-01-36-258.csv", skip=84) %>% 
  clean_names() %>% 
  select(-x5) %>%
  mutate(date = anytime::anydate(date_and_time))

## the start of the trace looks a little dodgy, trim
tow.baro <- tow.baro %>% 
  filter(date > 2011-10-02)

## extract date and calc mean daily pressure (entered in mm Hg) and temp
tow.baro.dly <- tow.baro %>% 
  group_by(date) %>% 
  dplyr::summarize(mmHg_mean = mean(pressure_mm_hg, na.rm = TRUE), temp_c_mean = mean(temperature_c)) %>% 
  mutate(yr = year(date))

# convert the mm Hg to psi (units the loggers are in)
tow.baro.dly <- tow.baro.dly %>% 
  select(-yr) %>%
  mutate(baroTow_psi = 0.0193368*mmHg_mean) %>% 
  rename(baroTow_tempC = temp_c_mean) %>% 
  rename(baroTow_mmHg = mmHg_mean)

Time to knit: 13

Exploratory plot of Tower barotroll psi

## plot baro data
tow.baro.dly %>% 
  # filter(mmHg.mean < 620) %>% 
  ggplot(aes(date, baroTow_psi)) +
  geom_line(color = 'blue') +
  theme_minimal() +
  labs(title = "Tower Baro daily trace", subtitle = "mean daily psi", caption  = "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")

Time to knit: 0

# tow.baro.dly %>%
#   mutate(yr = year(date), doy = lubridate::yday(date)) %>%
#   mutate(yr = as.factor(yr)) %>% 
#   ggplot(aes(doy, baroTow_tempC)) +
#   geom_line(aes(color = yr)) +
#   theme_minimal() +
#   labs(x = "Date", y = "Temperature (C)", title = "Tower Baro daily trace", subtitle = "mean daily temp_c", caption  = "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")

tow.baro.dly %>%
  mutate(yr = year(date), doy = lubridate::yday(date)) %>%
  mutate(yr = as.factor(yr)) %>% 
  ggplot(aes(doy, baroTow_tempC)) +
  geom_line(color="grey80") +
  # geom_line(aes(color = yr)) +
  theme_minimal() +
  facet_wrap(~yr) +
  labs(x = "Date", y = "Temperature (C)", title = "Tower Baro daily trace", subtitle = "mean daily temp_c", caption  = "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")

Time to knit: 1

Temps below -30 C seem questionable…

2019 Data

Data import and initial inspection

Time to knit: 0

# read directory for all exported logger files 
# there's a crazy number of inconsistently entered fields. Nothing is clear.
files2019csv <- fs::dir_ls("./data/raw/WinSituData/Logger_Data_2019", recurse = TRUE, glob = "*.csv")

## make a tibble, rename
files2019csv <- files2019csv %>% 
  enframe() %>% 
  rename(path_full = value) %>% 
  select(-name)

## add naked file name and naked path
files2019csv <- files2019csv %>% 
  mutate(path = fs::path_dir(path_full)) %>% 
  mutate(file_name = fs::path_file(path_full))

Time to knit: 0

For for 2019, there are 20 distinct csv files in the logger directory. Examination of these files reveals that they’re inconsistently structured. These appear to mainly be raw export files from WinSitu. Pending on the particular notes entered in the logger, data starts at a different row in the *.csv, complicating parsing. Data are uncorrected. The only baro data seems to be from Tower and is used to process the data.

## table of distinct file names
files2019csv %>% 
  distinct(file_name) %>%
  gt::gt() %>% 
  gt::tab_header(title = "Distinct csv files in the 2019 directory")

Distinct csv files in the 2019 directory

file_name
Crystal3_obs_Append_2019-08-15_12-59-18-257.csv
Crystal_Well20160819_Append_2019-07-17_12-36-02-157.csv
EB1_cc_2012+_Append_2019-08-19_13-23-14-621.csv
EB1_dx_2012+_Append_2019-08-19_12-42-23-473.csv
EB2_cc_2012+_2019-05-26_16-47-18-414.csv
EB2_dc_2012+_2019-05-26_14-34-16-494.csv
EB2_dx_2012+_Append_2019-08-19_15-20-14-836.csv
Elk1_2012+_2019-05-29_16-08-41-816.csv
Elk4_2012+_2019-05-29_10-18-47-766.csv
Elk5_SG_2019-05-29_11-17-23-283.csv
ElkCC2016_E14_2019-05-25_11-09-54-540.csv
ElkDX2016_E10_2019-05-24_14-28-09-495.csv
LB2_obs_15min_20170601_2019-05-28_14-50-38-590.csv
LB4_Well_15min_20170601_2019-05-28_17-46-32-683.csv
LostC_2012+_2019-06-02_12-28-57-572.csv
Oxb_2012+_2019-06-02_14-28-23-429.csv
TowerBarow_2011+_2019-09-10_11-01-36-258.csv
WB4_2017_Append_2019-08-21_12-16-13-003.csv
WB_CC_Upper_Slough_Append_2019-08-12_17-30-18-921.csv
WB_DX_2016_Append_2019-08-12_13-21-36-660.csv

# Pull out what appears to be the only baro file

Time to knit: 0

## exclude the baro from the remaining logger files
files2019csv <- files2019csv %>% 
  filter(file_name != "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")

Time to knit: 0

## all of the csv exports lack depth; uncorrected except...

## add some header info 
# files2019csv <- files2019csv %>% 
#   mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c"))) %>% 
#   mutate(header = map(path_full, read_csv, skip = 0,
#   n_max = 10))

# Fri Mar 26 11:37:35 2021 ------------------------------
## depth is present in some but not all.
files2019csv <- files2019csv %>% 
  mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c","depth_ft"))) %>% 
  mutate(header = map(path_full, read_csv, skip = 0,
  n_max = 10))

Time to knit: 2

Time to knit: 0

## Munge: extract out info from file header
files2019csv <- files2019csv %>% 
  mutate(hdr_info = map(path_full, read_csv, skip =13,
  n_max = 13, col_names = c("hdr_var","hdr_val"))) %>% 
  unnest(hdr_info) %>% 
  filter(!is.na(hdr_val) & !is.na(hdr_var)) %>%
  # distinct(hdr_var)
  filter(hdr_var == "Site" | hdr_var == "Serial Number") %>% 
  pivot_wider(names_from = hdr_var, values_from = hdr_val) %>% 
  clean_names() %>% 
  mutate(site_hdr = site)

Time to knit: 0

Note: ‘site_hdr’ refers to the ‘site’ attribute recorded in the export file (i.e., it was entered in the logger).

files2019csv %>%
  pluck(5,3) %>% 
  # head() %>% 
  View()

Time to knit: 0

Attribute with the serial number and site from the header. Note: the “site” is as exists in the logger. Does not match format used elswhere, so more munging…

## extract out the actual data
logger_raw2019 <- files2019csv %>%
  unnest(import)

Time to knit: 1

# select and attribute with the serial number and site from the header
logger_raw2019 <- logger_raw2019 %>%
  select(file_name,date_time, psi, temp_c, serial_number, site_hdr) %>% 
  distinct() %>% 
  mutate(date_time = anytime::anytime(date_time)) %>% 
  mutate(date = date(date_time)) %>% 
  mutate(yr = year(date_time))

Time to knit: 8

Distill to daily time step

## create daily
logger_raw2019.dly <- logger_raw2019 %>% 
  group_by(file_name,date, serial_number, site_hdr) %>% 
  dplyr::summarise(psi.dly.mean = mean(psi, na.rm=TRUE), tempC.dly.mean = mean(temp_c, na.rm=TRUE)) %>%
  distinct()

Time to knit: 1

## qa
logger_raw2019.dly %>% 
  ggplot(aes(date,psi.dly.mean)) +
  geom_line(aes(color = serial_number)) +
  facet_wrap(~site_hdr)

Time to knit: 0

## join in the baro

logger_raw2019.dly.baro <- left_join(logger_raw2019.dly, tow.baro.dly, by = "date")

Time to knit: 0

logger_raw2019.dly.baro %>% 
  visdat::vis_dat()

logger_raw2019.dly.baro %>% 
  View()

Time to knit: 0

2018 Data

Data import and initial inspection

Notes on data collected during the 2018 field season: Lewis Messner - 10/24/2018 - Data file labeled Slough Creek is WBCC. Pressure transducer was re-labeled for Slough Creek in 2017 but never deployed. Redeployed in spring to WBCC and not renamed. - Crystal well was removed in either May or June 2018 and replaced in July 2018. Check timing of removal in May/June. Logger was likely removed from the well, not paused, and returned in July after a reset. - All data need to be post-processed. All stick-up, total depth, and hanging distances are recorded in the well measurement field note books, which will eventually be copied to a spreadsheet.

# read directory for all exported logger files 
# there's a crazy number of inconsistently entered fields. Nothing is clear.
files2018csv <- fs::dir_ls("./data/raw/WinSituData/Logger_Data_2018", recurse = TRUE, glob = "*.csv")

## find WSL files
files2018wsl <- fs::dir_ls("./data/raw/WinSituData/Logger_Data_2018", recurse = TRUE, glob = "*.wsl") %>% 
  enframe() %>% 
  rename(path_full = value) %>% 
  select(-name) %>% 
  mutate(path = fs::path_dir(path_full)) %>% 
  mutate(file_name = fs::path_file(path_full))

## make a tibble, rename
files2018csv <- files2018csv %>% 
  enframe() %>% 
  rename(path_full = value) %>% 
  select(-name)

## add naked file name and naked path
files2018csv <- files2018csv %>% 
  mutate(path = fs::path_dir(path_full)) %>% 
  mutate(file_name = fs::path_file(path_full)) 

# files2018csv %>% 
#   distinct(file_name) %>% datatable()

Time to knit: 0

For for 2018 alone, there are 22 distinct csv files in the logger directory. Examination of these files reveals that they’re inconstienly structured.

## table of distinct file names
files2018csv %>% 
  distinct(file_name) %>%
  gt::gt() %>% 
  gt::tab_header(title = "Distinct csv files in the 2018 directory")

Distinct csv files in the 2018 directory

file_name
2011+_2018-08-22_08-59-44-597.csv
2016_Append_2018-08-10_16-00-00-768.csv
Crystal3_obs_2018-08-04_14-03-35-424.csv
Crystal_Well20160819_2018-05-18_10-00-40-581.csv
Crystal_Well20160819_Append_2018-08-04_14-50-00-730.csv
EB1_cc_2012+_2018-05-04_10-30-37-980.csv
EB1_cc_2012+_Append_2018-08-07_09-54-29-346.csv
EB1_dx_2012+_Append_2018-08-07_08-46-02-308.csv
EB2_2012+_2018-05-07_13-17-49-386.csv
EB2_cc_2012+_Append_2018-08-10_07-39-49-362.csv
EB2_dc_2012+_Append_2018-08-07_15-00-09-695.csv
EB2_dx_2012+_Append_2018-08-07_15-04-34-299.csv
Elk1_2012+_Append_2018-08-20_12-46-02-732.csv
Elk4_2012+_Append_2018-08-18_08-45-36-450.csv
Elk5_SG_Append_2018-08-18_09-42-06-702.csv
ElkCC2016_E14_Append_2018-08-19_11-11-24-857.csv
ElkDX2016_E10_Append_2018-08-19_10-22-35-714.csv
LB2_obs_15min_20170601_2018-05-22_13-37-17-009.csv
LB4_Well_15min_20170601_2018-05-22_12-13-47-292.csv
Oxb_2012+_Append_2018-08-24_13-44-53-228.csv
Upper_Slough_Append_2018-08-11_12-49-23-948.csv
WB4_2017_Append_2018-08-28_11-38-37-460.csv

Time to knit: 0

## all of the csv exports lack depth; uncorrected.
files2018csv <- files2018csv %>% 
  mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c"))) %>% 
  mutate(header = map(path_full, read_csv, skip = 0,
  n_max = 40))

## Munge: extract out info from file header
files2018csv <- files2018csv %>% 
  mutate(hdr_info = map(path_full, read_csv, skip =13,
  n_max = 13, col_names = c("hdr_var","hdr_val"))) %>% 
  unnest(hdr_info) %>% 
  filter(!is.na(hdr_val) & !is.na(hdr_var)) %>%
  # distinct(hdr_var)
  filter(hdr_var == "Site" | hdr_var == "Serial Number") %>% 
  pivot_wider(names_from = hdr_var, values_from = hdr_val) %>% 
  clean_names() %>% 
  rename(site_hdr = site) 

## remove baro
files2018csv <- files2018csv %>%
  filter(file_name != "2011+_2018-08-22_08-59-44-597.csv") # remove the tower baro data. I've previously pulled that out (and it's in mm Hg)

Time to knit: 3

files2018csv %>%
  pluck(5,2) %>% 
  head() %>% View()

files2018csv %>% 
  pluck(4,2) %>% datatable()

## get names
files2018csv %>% 
  mutate(names = map(import, names)) %>% 
  unnest(names) %>% 
  datatable()

Time to knit: 0

## explore parsing
t <- read_csv("./data/raw/WinSituData/Logger_Data_2018/Exported Data/Exported_20181024/EB2_dc_2012+_Append_2018-08-07_15-00-09-695.csv", skip = 69)
t
t <- read_csv("./data/raw/WinSituData/Logger_Data_2018/Exported Data/Exported_20181024/EB2_dc_2012+_Append_2018-08-07_15-00-09-695.csv")
t

Time to knit: 0

## extract out the actual data
logger_raw2018 <- files2018csv %>%
  unnest(import)

# logger_raw2018 %>% 
#   distinct(file_name)

Time to knit: 0

# select and attribute with the serial number and site from the header. Take a long time to process...
logger_raw2018 <- logger_raw2018 %>%
  select(file_name,date_time, psi, temp_c, serial_number, site_hdr) %>% 
  distinct() %>% 
  mutate(date_time = anytime::anytime(date_time)) %>% 
  mutate(date = date(date_time)) %>% 
  mutate(yr = year(date_time))

Time to knit: 5

Distill to daily time step

## create daily
logger_raw2018.dly <- logger_raw2018 %>% 
  group_by(file_name,date, serial_number, site_hdr) %>% 
  dplyr::summarise(psi.dly.mean = mean(psi, na.rm=TRUE), tempC.dly.mean = mean(temp_c, na.rm=TRUE)) %>%
  ungroup() %>% 
  distinct()

Time to knit: 1

## qa
logger_raw2018.dly %>% 
  ggplot(aes(date,psi.dly.mean)) +
  geom_line(aes(color = serial_number)) +
  facet_wrap(~site_hdr)

Time to knit: 0

## join in the baro

logger_raw2018.dly.baro <- left_join(logger_raw2018.dly, tow.baro.dly, by = "date")

Time to knit: 0

logger_raw2018.dly.baro %>% 
  visdat::vis_dat()

logger_raw2018.dly.baro %>% 
  View()

Time to knit: 0

# combine and distill down the 2018 and 2019 batches
logger_raw2018_19.dly.baro <- bind_rows(logger_raw2018.dly.baro, logger_raw2019.dly.baro) %>% 
  distinct()

## add yr
logger_raw2018_19.dly.baro <- logger_raw2018_19.dly.baro %>% 
  mutate(yr = as.integer(year(date)))

Time to knit: 0

## qa
# logger_raw2018_19.dly.baro %>% 
#   head()

logger_raw2018_19.dly.baro %>% 
  tabyl(site_hdr, yr) %>% 
  datatable(caption = "Count of records by yr and site_hdr")

logger_raw2018_19.dly.baro %>% 
  tabyl(serial_number, yr) %>% 
  datatable(caption = "Count of records by yr and serial number")

Time to knit: 0

## munge site
logger_raw2018_19.dly.baro %>% 
  distinct(site_hdr)

## # A tibble: 20 x 1
##    site_hdr             
##    <chr>                
##  1 WestBlacktail_exp dx 
##  2 Crystal_Well         
##  3 Crystal3             
##  4 EastBlacktail1_exp cc
##  5 EastBlacktail1_exp dx
##  6 EastBlacktail2_obs   
##  7 EastBlacktail2_exp cc
##  8 EastBlacktail2_exp dc
##  9 EastBlacktail2_exp dx
## 10 Elk1_obs             
## 11 Elk4_obs             
## 12 Elk5                 
## 13 Elk_exp cc           
## 14 Elk_exp dx           
## 15 LowerBlacktail2_obs  
## 16 LB4_WELL             
## 17 Oxbow_obs            
## 18 Slough               
## 19 WB4                  
## 20 LostCreek_obs

## do my best to guess site and plot from the site_hdr field derived from logger header
logger_raw2018_19.dly.baro <- logger_raw2018_19.dly.baro %>% 
  mutate(site2 = case_when(site_hdr == "WestBlacktail_exp dx" ~ "wb-dx",
                           site_hdr == "Crystal_Well" ~ "crystal-obs",
                           site_hdr == "Crystal3" ~ "crystal-obs",
                           site_hdr == "EastBlacktail1_exp cc" ~ "eb1-cc",
                           site_hdr == "EastBlacktail1_exp dx" ~ "eb1-dx",
                           site_hdr == "EastBlacktail2_obs" ~ "eb2-obs",
                           site_hdr == "EastBlacktail2_exp cc" ~ "eb2-cc",
                           site_hdr == "EastBlacktail2_exp dc" ~ "eb2-dc",
                           site_hdr == "EastBlacktail2_exp dx" ~ "eb2-dx",
                           site_hdr == "Elk1_obs" ~ "elk1-obs",
                           site_hdr == "Elk4_obs" ~ "elk4-obs",
                           site_hdr == "Elk5" ~ "elk5-obs",
                           site_hdr == "Elk_exp cc" ~ "elk-cc",
                           site_hdr == "Elk_exp dx" ~ "elk-dx",
                           site_hdr == "LowerBlacktail2_obs" ~ "lb2-obs",
                           site_hdr == "LB4_WELL" ~ "lb4-obs",
                           site_hdr == "Oxbow_obs" ~ "oxbow-obs",
                           site_hdr == "Slough" ~ "wb-cc",
                           site_hdr == "WB4" ~ "wb4-obs",
                           site_hdr == "LostCreek_obs" ~ "lostc-obs",
                           TRUE ~ site_hdr)) %>% 
  separate(site2, c("site","plot"),sep = "-", remove = FALSE)

# Notes on data collected during the 2018 field season: Lewis Messner - 10/24/2018
# - **Data file labeled Slough Creek is WBCC.

Time to knit: 1

logger_raw2018_19.dly.baro %>% 
  select(serial_number,site_hdr,site2,site,plot) %>% 
  distinct() %>% 
  gt() %>% 
  tab_header(title = "site info attributed from logger site (site_hdr)", subtitle = "LEWIS please check")

site info attributed from logger site (site_hdr)
LEWIS please check
serial_number	site_hdr	site2	site	plot
316747	WestBlacktail_exp dx	wb-dx	wb	dx
316603	Crystal_Well	crystal-obs	crystal	obs
316744	Crystal3	crystal-obs	crystal	obs
316609	EastBlacktail1_exp cc	eb1-cc	eb1	cc
316644	EastBlacktail1_exp dx	eb1-dx	eb1	dx
316744	EastBlacktail2_obs	eb2-obs	eb2	obs
316695	EastBlacktail2_exp cc	eb2-cc	eb2	cc
316622	EastBlacktail2_exp dc	eb2-dc	eb2	dc
316651	EastBlacktail2_exp dx	eb2-dx	eb2	dx
316605	Elk1_obs	elk1-obs	elk1	obs
316625	Elk4_obs	elk4-obs	elk4	obs
302228	Elk5	elk5-obs	elk5	obs
316614	Elk_exp cc	elk-cc	elk	cc
316604	Elk_exp dx	elk-dx	elk	dx
316640	LowerBlacktail2_obs	lb2-obs	lb2	obs
316706	LB4_WELL	lb4-obs	lb4	obs
316745	Oxbow_obs	oxbow-obs	oxbow	obs
302174	Slough	wb-cc	wb	cc
519943	WB4	wb4-obs	wb4	obs
316602	LostCreek_obs	lostc-obs	lostc	obs

## write to csv to share with LEWIS
# logger_raw2018_19.dly.baro %>% 
#   select(serial_number,site_hdr,site2,site,plot) %>% 
#   distinct() %>%
#   write_csv("./data/processed/logger_site_attribution_4QC.csv")

Time to knit: 0

## first attempt at baro correction
logger_raw2018_19.dly.baro <- logger_raw2018_19.dly.baro %>% 
  mutate(psi.dif = psi.dly.mean-baroTow_psi) %>%
  mutate(head_cm = psi.dif*0.70324961490205*100)

Time to knit: 0

logger_raw2018_19.dly.baro %>% 
  summarytools::descr(head_cm) %>% 
  summarytools::tb()

## # A tibble: 1 x 16
##   variable  mean    sd   min    q1   med    q3   max   mad   iqr    cv skewness
##   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 head_cm   34.7  33.6 -31.6  11.6  30.1  57.8  161.  32.5  46.2 0.967    0.506
## # ... with 4 more variables: se.skewness <dbl>, kurtosis <dbl>, n.valid <dbl>,
## #   pct.valid <dbl>

logger_raw2018_19.dly.baro %>% 
  ggplot(aes(date,head_cm)) +
  geom_line(aes(color = plot)) +
  facet_wrap(~site) +
  labs(title = "QA plot", subtitle = "logger_psi - baro_psi converted to cm of head")

Time to knit: 2

I don’t know where missing site data should be. Also, info on the specific well id and location is lacking, so not able to compare to manual well measurements. No notes on stick up or hang depth needed to further process the data.

Misc other files

There are several additional directories of files. Scraping these for any data >2016. Presume earlier data folded up in the DCC?

Data import and initial inspection

# the lack of data management on this project is ridiculous.

# read directory for all exported logger files 
# there's a crazy number of inconsistently entered fields. Nothing is clear.
filesOther_csv <- fs::dir_ls("./data/raw/WinSituData/Exported Data", recurse = TRUE, glob = "*.csv")

## make a tibble, rename
filesOther_csv <- filesOther_csv %>% 
  enframe() %>% 
  rename(path_full = value) %>% 
  select(-name)

## add naked file name and naked path
filesOther_csv <- filesOther_csv %>% 
  mutate(path = fs::path_dir(path_full)) %>% 
  mutate(file_name = fs::path_file(path_full)) 

filesOther_csv %>%
  distinct(file_name) %>% datatable()

Time to knit: 0

In this directory alone, there are 117 distinct csv files in the logger directory. Examination of these files reveals that they’re inconsistently structured.

## table of distinct file names
filesOther_csv %>% 
  distinct(file_name) %>%
  gt::gt() %>% 
  gt::tab_header(title = "Distinct csv files in the 'Exported Data' folder", subtitle = "A bunch of S$#T. All older than DCC. There are even files from Sara Bisbing's AK project.")

Distinct csv files in the 'Exported Data' folder
A bunch of S$#T. All older than DCC. There are even files from Sara Bisbing's AK project.
file_name
Well 10_2010-08-03_13-58-57-312-BaroMerge.csv
Well 11_Append_2010-08-02_15-12-59-062-BaroMerge.csv
Well 12_2010-08-03_09-47-04-125-BaroMerge.csv
Well 1_2010-08-02_11-45-02-703-BaroMerge.csv
Well 2_2010-08-02_09-20-16-687-BaroMerge.csv
Well 3_2010-08-02_10-09-55-437-BaroMerge.csv
Well 4_2010-08-03_11-35-21-109-BaroMerge.csv
Well 5_Append_2010-08-02_16-42-14-062-BaroMerge.csv
Well 6_Append_2010-08-02_12-46-16-234-BaroMerge.csv
Well 7_Append_2010-08-02_16-08-02-484-BaroMerge.csv
Well 8_2010-08-02_11-28-45-468-BaroMerge.csv
Well 9_Append_2010-08-02_14-35-51-375-BaroMerge.csv
Well 10_Append_2011-08-05_12-15-22-500-BaroMerge.csv
Well 11_Append_2011-08-03_14-22-34-953-BaroMerge.csv
Well 12_Append_2011-08-03_10-51-19-453-BaroMerge.csv
Well 1_Append_2011-08-02_11-25-03-984-BaroMerge.csv
Well 2_Append_2011-07-30_11-13-58-921-BaroMerge.csv
Well 3_Append_2011-07-30_12-45-34-359-BaroMerge.csv
Well 4_Append_2011-08-01_12-25-31-687-BaroMerge.csv
Well 5_Append_2011-07-31_14-16-01-734-BaroMerge.csv
Well 6_Append_2011-08-01_13-54-02-250-BaroMerge.csv
Well 7_Append_2011-07-31_12-20-02-453-BaroMerge.csv
Well 8_Append_2011-08-02_10-20-35-062-BaroMerge.csv
Well 9_Append_2011-07-31_15-37-16-328-BaroMerge.csv
Akaska 9_Append_2010-06-17_08-38-49-828-BaroMerge.csv
Alaska 1_Append_2010-06-22_10-09-13-109-BaroMerge.csv
Alaska 2 - WT PEM_2010-06-17_09-45-30-046-BaroMerge.csv
alaska 2_Append_2010-06-17_13-28-21-468-BaroMerge.csv
alaska 4_2010-06-08_13-37-17-546-BaroMerge.csv
Alaska 5_Append_2010-06-16_09-11-32-015-BaroMerge.csv
Alaska 6_Append_2010-06-22_13-49-31-953-BaroMerge.csv
Alaska 7_Append_2010-06-16_12-01-37-296-BaroMerge.csv
Alaska 8_Append_2010-06-21_09-59-18-203-BaroMerge.csv
well 11_2010-06-07_12-19-43-109-BaroMerge.csv
well 12_2010-06-08_09-13-51-218-BaroMerge.csv
Well 10_Append_2011-06-02_17-59-39-218-BaroMerge.csv
Well 11_Append_2011-06-03_11-49-08-015-BaroMerge.csv
Well 12_Append_2011-06-29_15-20-19-062-BaroMerge.csv
Well 1_Append_2011-06-08_14-49-44-921-BaroMerge.csv
Well 2_Append_2011-06-26_19-11-44-859-BaroMerge.csv
Well 3_Append_2011-06-07_12-39-10-234-BaroMerge.csv
Well 4_Append_2011-06-27_10-12-35-531-BaroMerge.csv
Well 5_Append_2011-06-02_12-07-40-625-BaroMerge.csv
Well 6_Append_2011-06-02_15-45-12-328-BaroMerge.csv
Well 7_Append_2011-06-02_11-35-56-468-BaroMerge.csv
Well 8_Append_2011-06-08_13-01-01-328-BaroMerge.csv
Well 9_Append_2011-06-28_13-46-56-062-BaroMerge.csv
2011+_Append_2002-01-02_10-49-46-531.csv
Alaska 10_2010-06-30_09-19-15-421-BaroMerge.csv
EB1_cc_2012+_2013-08-12_14-40-29-031.csv
EB1_cc_2012+_Append_2002-01-01_21-10-19-890.csv
EB1_dx_2012+_2013-08-12_13-09-07-515.csv
EB1_dx_2012+_Append_2002-01-01_20-42-56-265.csv
EB2_cc_2012+_2013-08-15_10-00-57-156.csv
EB2_dc_2012+_2013-08-15_10-29-51-843.csv
EB2_dc_2012+_Append_2002-01-01_23-52-51-953.csv
EB2_dx_2012+_2013-08-15_10-40-34-078.csv
EB2_dx_2012+_Append_2002-01-01_23-30-37-984.csv
EB2_2012+_2013-08-15_09-43-28-921.csv
EB2_2012+_Append_2002-01-02_21-30-07-796.csv
Elk1_2012+_2013-08-11_17-45-49-640.csv
Elk1_2012+_Append_2002-01-03_03-13-33-875.csv
Elk4_2012+_2013-08-11_10-22-35-281.csv
Elk4_2012+_Append_2002-01-05_23-47-41-500.csv
Elk5_SG_Append_2002-01-01_21-01-09-656.csv
Elk5_Well_Append_2002-01-01_03-10-40-796.csv
Elk_cc_2012+_2013-08-11_18-02-57-078.csv
Elk_cc_2012+_Append_2002-01-03_01-55-51-250.csv
Elk_dx_2012+_2013-08-11_18-04-54-625.csv
Elk_dx_2012+_Append_2002-01-03_01-48-13-703.csv
GHole_2012+_2013-08-21_13-20-44-156.csv
GHole_2012+_Append_2002-01-01_00-04-32-625.csv
Lava_2012+_2013-08-16_17-39-39-765.csv
Lava_2012+_Append_2002-01-06_02-07-45-359.csv
LostC_2012+_2013-08-22_15-18-11-921.csv
LostC_2012+_Append_2002-01-01_20-35-59-296.csv
LB2_2012+_2013-08-19_12-30-52-421.csv
2011+_2002-01-22_05-21-15-375.csv
Baro 1 - WT PSS_Append_2012-07-24_11-09-02-812.csv
Baro 2 - Eagle PSS_Append_2002-01-30_22-28-04-218.csv
Jumbo Scirpus Well_Append_2012-08-07_13-04-19-093-BaroMerge.csv
Jumbo Scirpus Well_Append_2012-08-07_13-04-19-093.csv
Sundown Well_2012-08-04_13-01-47-046-BaroMerge.csv
Sundown Well_2012-08-04_13-01-47-046.csv
Well 10_Append_2002-01-30_01-07-45-953-BaroMerge.csv
Well 10_Append_2002-01-30_01-07-45-953.csv
Well 11_Append_2012-07-25_13-06-55-218-BaroMerge.csv
Well 11_Append_2012-07-25_13-06-55-218.csv
Well 12_Append_2012-07-26_15-12-27-859-BaroMerge.csv
Well 12_Append_2012-07-26_15-12-27-859.csv
Well 1_Append_2012-08-04_18-32-03-625-BaroMerge.csv
Well 1_Append_2012-08-04_18-32-03-625.csv
Well 2_Append_2012-07-24_10-00-37-062-BaroMerge.csv
Well 2_Append_2012-07-24_10-00-37-062.csv
Well 3_Append_2002-01-29_23-22-55-390-BaroMerge.csv
Well 3_Append_2002-01-29_23-22-55-390.csv
Well 4_Append_2012-07-26_12-00-33-796-BaroMerge.csv
Well 4_Append_2012-07-26_12-00-33-796.csv
Well 5_Append_2002-01-30_22-20-16-453-BaroMerge.csv
Well 5_Append_2002-01-30_22-20-16-453.csv
Well 6_Append_2002-01-29_21-50-19-109-BaroMerge.csv
Well 6_Append_2002-01-29_21-50-19-109.csv
Well 7_Append_2002-01-31_00-38-06-281-BaroMerge.csv
Well 7_Append_2002-01-31_00-38-06-281.csv
Well 8_Append_2012-07-23_09-25-23-093-BaroMerge.csv
Well 8_Append_2012-07-23_09-25-23-093.csv
Well 9_Append_2012-07-25_14-49-11-078-BaroMerge.csv
Well 9_Append_2012-07-25_14-49-11-078.csv
Oxb_2012+_2013-08-19_10-20-37-421.csv
Oxb_2012+_Append_2002-01-05_19-03-25-375.csv
Rose_2012+_2013-08-22_11-06-10-343.csv
Rose_2012+_Append_2002-01-05_02-35-54-796.csv
2011+_2002-01-22_04-44-38-593.csv
2011+_Append_2016-08-22_13-13-09-390.csv
WB1_2012+_2013-08-17_18-06-19-406.csv
WB_cc_2012+_Append_2013-08-17_11-35-37-968.csv
WB_dx_2012+_Append_2013-08-17_10-56-58-515.csv

Time to knit: 1

## there are baromerged files. These have different columns

## extract the baromerged
filesOther_csv.baromrg <- filesOther_csv %>% 
  filter(str_detect(file_name, "BaroMerge"))

Time to knit: 0

## exclude the baromerged
filesOther_csv <- filesOther_csv %>% 
  filter(!str_detect(file_name, "BaroMerge"))         
## add some header info
filesOther_csv <- filesOther_csv %>% 
  mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c"))) %>% 
  mutate(header = map(path_full, read_csv, skip = 0,
  n_max = 10))

## Munge: extract out info from file header
filesOther_csv <- filesOther_csv %>% 
  mutate(hdr_info = map(path_full, read_csv, skip =13,
  n_max = 13, col_names = c("hdr_var","hdr_val"))) %>% 
  unnest(hdr_info) %>% 
  filter(!is.na(hdr_val) & !is.na(hdr_var)) %>%
  filter(hdr_var == "Site" | hdr_var == "Serial Number") %>% 
  pivot_wider(names_from = hdr_var, values_from = hdr_val) %>% 
  clean_names()

Time to knit: 7

Attribute with the serial number from the header.

## extract out the actual data
logger_rawOther <- filesOther_csv %>%
  unnest(import)

Time to knit: 0

# select and attribute with the serial number and site from the header
logger_rawOther <- logger_rawOther %>%
  select(file_name,date_time, psi, temp_c, serial_number) %>% 
  distinct() %>% 
  mutate(date_time = anytime::anytime(date_time)) %>% 
  mutate(date = date(date_time)) %>% 
  mutate(yr = year(date_time))

Time to knit: 11

Distill to daily time step

## create daily
logger_rawOth.dly <- logger_rawOther %>% 
  group_by(file_name,date, serial_number) %>% 
  dplyr::summarise(psi.dly.mean = mean(psi, na.rm=TRUE), tempC.dly.mean = mean(temp_c, na.rm=TRUE)) %>%
  ungroup() %>% 
  distinct()

Time to knit: 2

logger_rawOth.dly %>% 
  mutate(yr = as.integer(year(date))) %>% 
  group_by(file_name) %>% 
  summarise(min.yr = min(yr), max.yr = max(yr)) %>% 
  gt() %>% 
  tab_header(title = "Most files in this batch do not have any 2016+ data")

Most files in this batch do not have any 2016+ data

file_name	min.yr	max.yr
2011+_2002-01-22_04-44-38-593.csv	2011	2015
2011+_2002-01-22_05-21-15-375.csv	2011	2015
2011+_Append_2002-01-02_10-49-46-531.csv	2011	2015
2011+_Append_2016-08-22_13-13-09-390.csv	2011	2016
Baro 1 - WT PSS_Append_2012-07-24_11-09-02-812.csv	2010	2012
Baro 2 - Eagle PSS_Append_2002-01-30_22-28-04-218.csv	2010	2012
EB1_cc_2012+_2013-08-12_14-40-29-031.csv	2012	2013
EB1_cc_2012+_Append_2002-01-01_21-10-19-890.csv	2012	2015
EB1_dx_2012+_2013-08-12_13-09-07-515.csv	2012	2013
EB1_dx_2012+_Append_2002-01-01_20-42-56-265.csv	2012	2015
EB2_2012+_2013-08-15_09-43-28-921.csv	2012	2013
EB2_2012+_Append_2002-01-02_21-30-07-796.csv	2012	2015
EB2_cc_2012+_2013-08-15_10-00-57-156.csv	2012	2013
EB2_dc_2012+_2013-08-15_10-29-51-843.csv	2012	2013
EB2_dc_2012+_Append_2002-01-01_23-52-51-953.csv	2012	2015
EB2_dx_2012+_2013-08-15_10-40-34-078.csv	2012	2013
EB2_dx_2012+_Append_2002-01-01_23-30-37-984.csv	2012	2015
Elk_cc_2012+_2013-08-11_18-02-57-078.csv	2012	2013
Elk_cc_2012+_Append_2002-01-03_01-55-51-250.csv	2012	2015
Elk_dx_2012+_2013-08-11_18-04-54-625.csv	2012	2013
Elk_dx_2012+_Append_2002-01-03_01-48-13-703.csv	2012	2015
Elk1_2012+_2013-08-11_17-45-49-640.csv	2012	2013
Elk1_2012+_Append_2002-01-03_03-13-33-875.csv	2012	2015
Elk4_2012+_2013-08-11_10-22-35-281.csv	2012	2013
Elk4_2012+_Append_2002-01-05_23-47-41-500.csv	2012	2015
Elk5_SG_Append_2002-01-01_21-01-09-656.csv	2015	2016
Elk5_Well_Append_2002-01-01_03-10-40-796.csv	2015	2016
GHole_2012+_2013-08-21_13-20-44-156.csv	2012	2013
GHole_2012+_Append_2002-01-01_00-04-32-625.csv	2012	2015
Jumbo Scirpus Well_Append_2012-08-07_13-04-19-093.csv	2011	2012
Lava_2012+_2013-08-16_17-39-39-765.csv	2012	2013
Lava_2012+_Append_2002-01-06_02-07-45-359.csv	2012	2015
LB2_2012+_2013-08-19_12-30-52-421.csv	2012	2013
LostC_2012+_2013-08-22_15-18-11-921.csv	2012	2013
LostC_2012+_Append_2002-01-01_20-35-59-296.csv	2012	2015
Oxb_2012+_2013-08-19_10-20-37-421.csv	2012	2013
Oxb_2012+_Append_2002-01-05_19-03-25-375.csv	2012	2015
Rose_2012+_2013-08-22_11-06-10-343.csv	2012	2013
Rose_2012+_Append_2002-01-05_02-35-54-796.csv	2012	2015
Sundown Well_2012-08-04_13-01-47-046.csv	2011	2012
WB_cc_2012+_Append_2013-08-17_11-35-37-968.csv	2012	2013
WB_dx_2012+_Append_2013-08-17_10-56-58-515.csv	2012	2013
WB1_2012+_2013-08-17_18-06-19-406.csv	2012	2013
Well 1_Append_2012-08-04_18-32-03-625.csv	2010	2012
Well 10_Append_2002-01-30_01-07-45-953.csv	2010	2012
Well 11_Append_2012-07-25_13-06-55-218.csv	2010	2012
Well 12_Append_2012-07-26_15-12-27-859.csv	2010	2012
Well 2_Append_2012-07-24_10-00-37-062.csv	2010	2012
Well 3_Append_2002-01-29_23-22-55-390.csv	2010	2012
Well 4_Append_2012-07-26_12-00-33-796.csv	2010	2012
Well 5_Append_2002-01-30_22-20-16-453.csv	2010	2012
Well 6_Append_2002-01-29_21-50-19-109.csv	2010	2012
Well 7_Append_2002-01-31_00-38-06-281.csv	2010	2012
Well 8_Append_2012-07-23_09-25-23-093.csv	2010	2012
Well 9_Append_2012-07-25_14-49-11-078.csv	2010	2012

Time to knit: 1

Most this seems useless. It should be in the DCC if max year <=2015

logger_rawOth.dly %>% 
  mutate(yr = as.integer(year(date))) %>% 
  group_by(file_name) %>% 
  summarise(min.yr = min(yr), max.yr = max(yr)) %>% 
  filter(max.yr >= 2016) %>% 
  gt() %>% 
  tab_header(title = "Only three files in this batch have any 2016+ data")

Only three files in this batch have any 2016+ data

file_name	min.yr	max.yr
2011+_Append_2016-08-22_13-13-09-390.csv	2011	2016
Elk5_SG_Append_2002-01-01_21-01-09-656.csv	2015	2016
Elk5_Well_Append_2002-01-01_03-10-40-796.csv	2015	2016

Time to knit: 0

rawOth.16.logger <- logger_rawOth.dly %>% 
  filter(psi.dly.mean < 100) %>% # baro data in mmHg filtered out here
  filter(date > '2016-01-01')

Time to knit: 0

rawOth.16.logger %>% 
  distinct(file_name) %>% 
  gt() %>% 
  tab_header(title = "The only new data produced from this directory scrape")

Time to knit: 0

rawOth.16.logger <- left_join(rawOth.16.logger, tow.baro.dly) 

rawOth.16.logger <- rawOth.16.logger %>% 
  mutate(site2 = case_when(grepl("Elk5", file_name, ignore.case = TRUE) ~ "elk5-obs",
                                     TRUE ~ file_name))

rawOth.16.logger <- rawOth.16.logger %>%
  separate(site2, c("site","plot"),sep = "-", remove = FALSE)

rawOth.16.logger <- rawOth.16.logger %>% 
  mutate(psi.dif = psi.dly.mean-baroTow_psi) %>%
  mutate(head_cm = psi.dif*0.70324961490205*100) %>% 
  mutate(yr = as.integer(year(date)))

Time to knit: 0

## bind with the 2018 and 2019 scrape
logger.all <- bind_rows(rawOth.16.logger, logger_raw2018_19.dly.baro)

# logger.all %>% 
#   mutate(yr = as.integer(year(date))) %>% 
#   visdat::vis_dat(warn_large_data = FALSE)

Time to knit: 0

All that crap to net ~500 records in 2016 for one site!

Join in wat_id2 to the non-DCC scrape ASSUMPTION: serial number stay with wells. The well number associated with a given serial number in the DCC is used to attribute the non-DCC scrape.

## Filter to 2016 +
logger.all <- logger.all %>%
  filter(yr >=2016)

Time to knit: 0

## join in wat_id2 to the non-DCC scrape 
lu.serialNum.watId2 <- lu.serialNum.watId2 %>% 
  mutate(serial_number = as.character(serial_number))

logger.all <- dplyr::left_join(logger.all,lu.serialNum.watId2, by="serial_number")

Time to knit: 0

There are multiple files with serial numbers not present in the DCC. Are these new loggers?

## there are data not joining to dcc on serial number
logger.all %>% 
  filter(is.na(wat_id2)) %>%
  distinct(file_name) %>% 
  gt() %>% 
  tab_header(title = "Files lacking coresponding serial number in DCC")

Files lacking coresponding serial number in DCC

file_name
Elk5_SG_Append_2002-01-01_21-01-09-656.csv
Elk5_SG_Append_2018-08-18_09-42-06-702.csv
Upper_Slough_Append_2018-08-11_12-49-23-948.csv
WB4_2017_Append_2018-08-28_11-38-37-460.csv
Elk5_SG_2019-05-29_11-17-23-283.csv
WB_CC_Upper_Slough_Append_2019-08-12_17-30-18-921.csv
WB4_2017_Append_2019-08-21_12-16-13-003.csv

Time to knit: 0

## Filter to 2016 +
logger.all <- logger.all %>%
  filter(yr >=2016) 

## prep for row bind with DCC

dcc.dly.mean <- dcc.dly.mean %>% 
  ungroup() %>% 
  mutate(file_name = "DCC") %>% 
  mutate(serial_number = as.character(serial_number)) %>% 
  mutate(head_cm = mean.water_column_m*100) %>% 
  mutate(serial_number = as.character(serial_number)) %>% 
  select(-c(month, doy, doy.d, mean.water_column_m)) 
logger.all <- logger.all %>% 
  select(-c(baroTow_mmHg,baroTow_tempC)) %>% 
  rename(psi_logger = psi.dly.mean) %>% 
  rename(psi_baro = baroTow_psi) %>% 
  select(-c(site_hdr))

logger.all %>%
  glimpse()

## Rows: 36,320
## Columns: 13
## $ file_name      <chr> "Elk5_SG_Append_2002-01-01_21-01-09-656.csv", "Elk5_SG_~
## $ date           <date> 2016-01-02, 2016-01-03, 2016-01-04, 2016-01-05, 2016-0~
## $ serial_number  <chr> "302228", "302228", "302228", "302228", "302228", "3022~
## $ psi_logger     <dbl> 12.51200, 12.50175, 12.41521, 12.33721, 12.28196, 12.26~
## $ tempC.dly.mean <dbl> 3.216500, 3.189500, 3.157500, 3.130000, 3.093333, 3.059~
## $ psi_baro       <dbl> 11.87086, 11.85155, 11.73939, 11.64495, 11.60674, 11.59~
## $ site2          <chr> "elk5-obs", "elk5-obs", "elk5-obs", "elk5-obs", "elk5-o~
## $ site           <chr> "elk5", "elk5", "elk5", "elk5", "elk5", "elk5", "elk5",~
## $ plot           <chr> "obs", "obs", "obs", "obs", "obs", "obs", "obs", "obs",~
## $ psi.dif        <dbl> 0.6411433, 0.6501963, 0.6758177, 0.6922586, 0.6752182, ~
## $ head_cm        <dbl> 45.08838, 45.72503, 47.52685, 48.68306, 47.48469, 47.32~
## $ yr             <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2~
## $ wat_id2        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~

# compare_df_cols_same(logger.all, dcc.dly.mean)

Time to knit: 0

### BIND DCC
logger.all <- logger.all %>% 
  bind_rows(.,dcc.dly.mean) %>% 
  distinct()

Time to knit: 0

## add in dates
logger.all <- logger.all %>% 
  mutate(yr = lubridate::year(date)) %>% 
  mutate(month = lubridate::month(date)) %>% 
  mutate(doy = lubridate::yday(date)) %>% 
  mutate(doy.d = format(date, format="%m-%d"))

Time to knit: 0

Exploratory Analyses and QA

logger.all %>% 
  visdat::vis_dat(warn_large_data = FALSE) +
  labs(caption = "Combined DCC and other logger data")

Time to knit: 0

logger.all %>%
  group_by(file_name, yr) %>% 
  summarise(n = n()) %>% 
  ggplot(aes(yr,file_name)) +
  geom_tile(aes(fill = n))

Time to knit: 1

logger.all %>%
  ggplot(aes(date, head_cm)) +
  geom_line(aes(color = serial_number)) +
  viridis::scale_color_viridis(discrete=TRUE) +
  theme_minimal() +
  facet_wrap(~yr, scales = "free", ncol = 1) +
  labs(x="", y="Head (cm)", caption = "logger.all")

Time to knit: 6

logger.all %>%
  filter(plot != "obs") %>% 
  ggplot(aes(date, head_cm)) +
  geom_line(aes(color = serial_number)) +
  viridis::scale_color_viridis(discrete=TRUE) +
  theme_minimal() +
  # facet_wrap(site2~yr, scales = "free", ncol = 1) +
  facet_grid(site2~yr) +
  labs(x="", y="Head (cm)", caption = "logger.all")

logger.all %>%
  filter(plot != "obs") %>% 
  ggplot(aes(date, head_cm)) +
  geom_line(aes(color = serial_number)) +
  viridis::scale_color_viridis(discrete=TRUE) +
  theme_minimal() +
  # facet_wrap(site2~yr, scales = "free", ncol = 1) +
  facet_grid(site2~serial_number) +
  labs(x="", y="Head (cm)", caption = "logger.all")

Time to knit: 13

logger.all %>%
  group_by(file_name, yr) %>% 
  summarise(n = n()) %>% 
  ggplot(aes(yr,file_name)) +
  geom_tile(aes(fill = n))

Time to knit: 1

Interactive time series plot - Exp sites only

qa.pl.exp <- logger.all %>%
  filter(plot != "obs") %>% 
  ggplot(aes(doy.d, head_cm)) +
  geom_point(aes(color = serial_number)) +
  viridis::scale_color_viridis(discrete=TRUE) +
  theme_minimal() +
  facet_wrap(~yr, scales = "free_y", ncol = 2) +
  labs(x="", y="Head (cm)", caption = "logger.all")
plotly::ggplotly(qa.pl.exp)

Time to knit: 3

Interactive time series plot - Obs sites only

qa.pl.obs <- logger.all %>%
  filter(plot == "obs") %>% 
  ggplot(aes(date, head_cm)) +
  geom_point(aes(color = serial_number)) +
  viridis::scale_color_viridis(discrete=TRUE) +
  theme_minimal() +
  facet_wrap(~yr, scales = "free", ncol = 2) +
  labs(x="", y="Head (cm)", caption = "logger.all")
plotly::ggplotly(qa.pl.obs)

Time to knit: 3

logger.all %>%
  mutate(doy = yday(date)) %>%
  mutate(yr = as.factor(yr)) %>% 
  ggplot(aes(doy, head_cm)) +
  geom_line(aes(color = yr)) +
  viridis::scale_color_viridis(discrete=TRUE) +
  theme_minimal() +
  facet_wrap(~serial_number, scales = "free", ncol = 2) +
  labs(title = "Time series -- all logger data", x="", y="Head (cm)", caption = "logger.all+DCC")

Time to knit: 7

logger.all %>%
  group_by(serial_number, site, plot, yr) %>% 
  tally() %>% 
  ggplot(aes(yr, serial_number)) +
  theme_minimal() +
  geom_tile(aes(fill = n), color = "white") +
  facet_wrap(~site, scales = "free", ncol=2) +
  labs(fill = "n records", y = "Serial number", caption = "logger.all+DCC")

Time to knit: 4

logger.all %>%
  ggplot(aes(date, head_cm)) +
  geom_line(aes(color = serial_number)) +
  facet_wrap(~yr, scales = "free", ncol=2) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  labs(caption = "combined scrape of all raw files + DCC") +
  facet_wrap(~site, ncol=3)

Time to knit: 5

Some take homes:

There are gaps in data

(Some) out of range values likely linked to when loggers deployed/read

Calculating a z score for logger serial numbers and years. This is a pretty roug way to process.

logger.all.clean <- logger.all %>% 
  group_by(serial_number,yr) %>% 
  mutate(z_score.bySerial = (psi_logger - mean(psi_logger)) / sd(psi_logger)) %>% 
  ungroup() %>% 
  mutate(flag.zscore = case_when(abs(z_score.bySerial) > 2 ~ "z>2")
         )

Time to knit: 0

logger.all.clean <- logger.all.clean %>% 
  select(-c(tempC.dly.mean, psi.dif, mean.dly.abslevel)) 


logger.all.clean %>% 
  visdat::vis_dat(warn_large_data = FALSE)

logger.all.clean %>%
  filter(is.na(wat_id2)) %>% 
  distinct(site)

## # A tibble: 3 x 1
##   site 
##   <chr>
## 1 elk5 
## 2 wb   
## 3 wb4

Time to knit: 13

logger.all.clean %>% 
  summarytools::descr(z_score.bySerial)

Time to knit: 0

z scores (by serial number and yr): all data

## all data
logger.all.clean %>% 
  ggplot(aes(z_score.bySerial,serial_number)) +
  geom_boxplot() +
  facet_wrap(~yr) +
  theme_minimal() +
  labs(caption = "z scores (by serial number and yr): all records")

Time to knit: 6

z scores (by serial number and yr): |z-score| < 2

logger.all.clean %>%
  filter(is.na(flag.zscore)) %>% 
  ggplot(aes(z_score.bySerial,serial_number)) +
  geom_boxplot() +
  facet_wrap(~yr) +
  theme_minimal() +
  labs(caption = "z scores (by serial number and yr): |z-score| < 2")

Time to knit: 5

logger.all.clean %>%
  filter(is.na(flag.zscore)) %>% 
  ggplot(aes(date, head_cm)) +
  geom_line(aes(color = serial_number)) +
  facet_wrap(~yr, scales = "free", ncol=2) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  labs(caption = "combined scrape of all raw files + DCC; z-score < 2") +
  facet_wrap(~site, ncol=3)

Time to knit: 5

## Export for Further Analysis
logger.all.clean %>% 
  write_csv("./data/processed/data_loggers/visdat::vis_dat(warn_large_data = FALSE)combined_dataloggers_20201207.csv")

Time to knit: 0 # up

## remove char from wat_id2 (e.g., "e5" > "5")
# library(stringr)
numextract <- function(string){ 
  stringr::str_extract(string, "\\-*\\d+\\.*\\d*")
} 

## rename/munge to bind in manual
logger.all.clean <- logger.all.clean %>%
  mutate(wid = numextract(wat_id2)) %>% 
  mutate(wid = as.character(wid))
## wat_id2 was carried in from the logger side...

Time to knit: 0

Before joining manual and logger datasets, need to figure out what to do with the sites in the logger data set lacking a well id (wat_id2 in the logger data set)…

The following raw logger files/serial numbers lack a well ID. The well id (wat_id2) originated in the DCC. So these could have been new loggers or mis-attributed in the DCC. I’ll see if I can deduce their identity, but field notes would help!

logger.all.clean %>% 
  filter(is.na(wid)) %>% 
  distinct(wid, file_name, serial_number) %>% 
  gt() %>% 
  tab_header(title = "Logger files with NA for well id (wid)")

Logger files with NA for well id (wid)

file_name	serial_number	wid
Elk5_SG_Append_2002-01-01_21-01-09-656.csv	302228	NA
Elk5_SG_Append_2018-08-18_09-42-06-702.csv	302228	NA
Upper_Slough_Append_2018-08-11_12-49-23-948.csv	302174	NA
WB4_2017_Append_2018-08-28_11-38-37-460.csv	519943	NA
Elk5_SG_2019-05-29_11-17-23-283.csv	302228	NA
WB_CC_Upper_Slough_Append_2019-08-12_17-30-18-921.csv	302174	NA
WB4_2017_Append_2019-08-21_12-16-13-003.csv	519943	NA

## create lu for anitjoin
missing.wid.lu <- logger.all.clean %>% 
  filter(is.na(wid)) %>% 
  distinct(wid, file_name, serial_number)

Time to knit: 0

### attributre missing wid for logger. I did some sluething.
# No wb4 in the manaual well measurements, so attributed the wid as "wb4". The other missing wid were possible to fix.

logger.all.clean <- logger.all.clean %>%
  mutate(wid = case_when(is.na(wid) & str_detect(site,"elk5") ~ "40",
                         is.na(wid) & str_detect(file_name,"WB4") ~ "wb4",
                         is.na(wid) & str_detect(file_name,"Slough") ~ "40",
         TRUE ~ wid))
# add fullid for join with manual
logger.all.clean <- logger.all.clean %>% 
  mutate(fullid = paste0(site,"-",plot,"-",wid))

# check
# logger.all.clean %>% 
#   filter(is.na(wid)) %>% 
#   distinct(wid, file_name, serial_number) %>% 
#   gt() %>% 
#   tab_header(title = "Logger files with NA for well id (wid)")

Time to knit: 0

## Export for Further Analysis
logger.all.clean %>% 
  write_csv("./data/processed/data_loggers/combined_dataloggers_20201208.csv")

Time to knit: 0

Integrate manual well measurements

The following code joins in manual well measurements, as processed from raw files in a different Rmd file. It’s assumed loggers have fidelity to wells.

Source: ./data/processed/manual_well_data_raw_20201202.csv

# read in manual measurement
# This is the output of the manual well data Rmd that draws on the raw manual well data files
mw.01.19 <- read_csv("./data/processed/manual_well_data_raw_20201202.csv")

## type convert
mw.01.19 <- mw.01.19 %>% 
  mutate(wid = as.character(wid))

# cleave off some columns for join
mw.01.19 <- mw.01.19 %>% 
  select(-c(yr, doy, mo, mday, site, plot, wid, site2))

mw.01.19 <- mw.01.19 %>%
  mutate(rwte = rwte*-1)

mw.01.19 <- mw.01.19 %>%
  rename(mw_su_cm = su_cm, mw_dtw_cm = dtw_cm, mw_rwte = rwte, mw_flag = flag) %>%
  mutate(mw_flag = as.character(mw_flag))

Time to knit: 0

## join logger manual
log.man <- left_join(logger.all.clean, mw.01.19, by = c("date","fullid"))

log.man <- log.man %>% 
  select(-c(wat_id2, site_type))

Time to knit: 0

### qa
logger.all.clean %>% visdat::vis_dat(warn_large_data = FALSE) +
  labs(title = "logger.all.clean")
  

mw.01.19 %>% visdat::vis_dat(warn_large_data = FALSE) +
  labs(title = "mw.01.19")

mw.01.19 %>% filter(is.na(mw_dtw_cm))

log.man %>% visdat::vis_dat(warn_large_data = FALSE) +
  labs(title = "log.man")

Time to knit: 0

log.man.coincident <- log.man %>%
  filter(!is.na(mw_rwte))

Time to knit: 0

log.man.coincident %>% 
  select(file_name, yr, serial_number, site2, head_cm, mw_su_cm, mw_rwte) %>% 
  distinct() %>% datatable()

Time to knit: 0

# use SU from manual wells to calculate hang height and adjust head_cm from logger

# !!! fore records without su, assuming rwte + head = hang height

log.man.coincident <- log.man.coincident %>% 
  # mutate(hang.calc = mw_dtw_cm + head_cm) %>%
  mutate(hang.calc = case_when(
    !is.na(mw_dtw_cm) ~ (mw_dtw_cm + head_cm),
    is.na(mw_dtw_cm) ~ (mw_rwte + head_cm)))

log.man.coincident <- log.man.coincident %>% 
  mutate(rwte.calc = case_when(
    !is.na(mw_su_cm) ~ (hang.calc-head_cm-mw_su_cm),
    is.na(mw_su_cm) ~ (hang.calc-head_cm)))
  # mutate(rwte.calc = hang.calc-head_cm-mw_su_cm)

log.man.coincident %>% visdat::vis_dat(warn_large_data = FALSE) +
  labs(title = "log.man.coincident")

# #
# log.man.coincident %>% 
#   select(hang.calc, hang.calc) %>% 
#   datatable()

## distill
# log.man.hang.calc2join <- log.man.coincident %>%
#   distinct(yr, fullid, hang.calc)
log.man.hang.calc2join <- log.man.coincident %>%
  distinct(serial_number, hang.calc)
# log.man.hang.calc2join <- log.man.coincident %>%
#   distinct(fullid, hang.calc)

# dplyr::anti_join(logger.all.clean, log.man.hang.calc2join, by = "serial_number") %>% View()

Time to knit: 1

The following table shows the coincident data in manual well measurements and loggers based on date and serial number

log.man.coincident %>% 
  datatable()

Time to knit: 0

Estimate RWTE

Calculate hang height from head, su, and dtwt data. The code creates a flag field denoting whether a rwte calculated has info for hang height for that date or is estimated using past data and 0 su (applicable to SG?). Hang heights from field forms (if they exist) should be brought it

log.man.cor <- left_join(log.man, log.man.hang.calc2join, by = c("serial_number"))

log.man.cor %>%
  filter(file_name != "DCC") %>% 
  filter(is.na(hang.calc)) %>% 
  tabyl(site2, yr)

##      site2 2016 2017 2018 2019
##     elk-cc  580  730  596  145
##     elk-dx  580  730  596  144
##   elk5-obs 1092  730  595  149
##    lb4-obs    0  428  507  100
##  lostc-obs  366  365  365  153
##      wb-cc    0  404  588  224
##      wb-dx  438  730  588  225
##    wb4-obs    0  348  605  233

## create flag denoting whether a rwte calculated has info for hang height. Otherwise estimated using past data and 0 su (applicable to SG?). **Hang heights from field forms (if they exist) should be brought it**

log.man.cor <- log.man.cor %>% 
  mutate(rwte.calc = hang.calc - head_cm) %>%
  mutate(flag_hanght = case_when(!is.na(rwte.calc) ~ "estimated")) %>% 
  mutate(rwte.calc = case_when(is.na(rwte.calc) & !is.na(mw_rwte) ~ mw_rwte,
    TRUE ~ rwte.calc))
  # mutate(rwte.calc = case_when(
  #   is.na(rwte.calc) & !is.na(mw_rwte) ~ mw_rwte,
  #   TRUE ~ rwte.calc
  # ))

Time to knit: 1

# write coincident
log.man.coincident %>%
  clean_names() %>% 
  write_csv("./data/processed/data_loggers/logger_manual_coincident_20210123.csv")

Time to knit: 0

## Attempt to attribute sg/well
## add lu from the "Well_Data.shp" file in the "geospatial" directory

lu_wat_id2 <- read_csv("./data/processed/lu_well_wat_id2_from_shape.csv")

logger.all %>% 
  distinct(wat_id2)

## # A tibble: 20 x 1
##    wat_id2
##    <chr>  
##  1 <NA>   
##  2 e40    
##  3 e65    
##  4 o68    
##  5 o30    
##  6 e33    
##  7 e24    
##  8 e48    
##  9 e73    
## 10 e71    
## 11 o11    
## 12 o48    
## 13 e14    
## 14 e10    
## 15 o45_1  
## 16 o7     
## 17 o57    
## 18 o47_2  
## 19 o72    
## 20 o27

try <- left_join(logger.all,lu_wat_id2, by = "wat_id2")

try %>% 
  filter(is.na(type)) %>% 
  distinct(file_name, site, plot, wat_id2)

## # A tibble: 0 x 4
## # ... with 4 variables: file_name <chr>, site <chr>, plot <chr>, wat_id2 <chr>

# I don't know for many of these which are sg vs wells

Time to knit: 0

#### apply tile fun
## qaqc
log.man.cor %>% 
  filter(is.na(rwte.calc)) %>% 
  group_by(file_name, yr) %>% 
  tally() %>% 
  gg.tile(mydf = ., myxcol = file_name, myycol = yr, fill = n,mytitle = "", caption = "Missing rwte.calc records. combined logger/mw")

log.man.cor %>% 
  filter(is.na(rwte.calc)) %>% 
  group_by(serial_number, site, yr) %>% 
  tally() %>%
  gg.tile(mydf = .,myxcol = yr, myycol = serial_number, fill = n, mytitle = "",caption = "Missing rwte.calc records. combined logger/mw")

Time to knit: 2

log.man.cor %>% 
  group_by(serial_number, site, yr) %>% 
  tally() %>%
  gg.tile(mydf = .,myxcol = yr, myycol = serial_number, fill = n, mytitle = "",caption = "All records. combined logger/mw")

Time to knit: 1

log.man.cor %>% 
  group_by(file_name, site, plot, yr) %>% 
  summarytools::descr(rwte.calc, stats = c("mean", "min", "max")) %>% 
  summarytools::tb() %>% 
  datatable(caption = "Summary stats: rwte.calc")

Time to knit: 6

log.man.cor %>% 
  visdat::vis_dat(warn_large_data = FALSE) +
  labs(caption = "Combined logger and manual well data")

Time to knit: 3

Time to knit: 0

Next steps

Find information (if it exists) on hang depths, stickups, etc. These are back-calculated using relationships in DCC but is based on assumptions best checked (e.g., constant SU, hang ht, fidelity to well).

Continue to filter out garbage

Make spatial by incorporating coordinates

Well Data Logger Processing Report

Introduction

2012-2015 (DCC Collection)

Data import and initial inspection

2016-2019 (Non-DCC files)

Tower baro data

2019 Data

Data import and initial inspection

2018 Data

Data import and initial inspection

Misc other files

Data import and initial inspection

Exploratory Analyses and QA

Integrate manual well measurements

Estimate RWTE

Next steps