Time to knit: 1
Updated: 2021 April 13
This document and code provides scrape data from the scattered well logger files, combines into a single analyzable file. In addition to the screening and merging of over 179 files, the code conducts basic exploratory analyses of well logger data, primarily for further QA/QC.
The inherited state of organization for the data logger files was a mess. There are dozens of partially overlapping and inconsistently named files to munge. The DCC data (the data that was added to the DCC library repository) are assumed to be reliable.
The code crapes data from raw logger files discovered while searching for data in the different shared dropbox/drive folders and DK hard drives.
DCC URL: https://mountainscholar.org/handle/10217/195563
The code crapes data from raw logger files discovered while searching for data in the different shared dropbox/drive folders and DK hard drives.
DCC were processed to yield relative water table elevations by D. Kotter, while newer files need correction for stickup/hang depth. These data are presumed to exist somewhere and are needed to evaluate possible errors in barometric corrections used from the sole barotroll (Tower) found for the post-DCC period. As indicated, these data are raw and need connection to field measurements (e.g., hang height, well stickup). Combined data retain file_name and serial number as attributes to aid in the necessary detective work using field notes.
Loggers vary in their start and logging interval, so to synchronize, data were reduced to a daily step for export.
Time to knit: 6
Time to knit: 0
# import of tsv
dcc.raw <- readr::read_tsv("data/NSF_DataArchive20180208/Well_Logger_Data_2012-2015/Yell_Well_Data_Logger2012_2015.txt")
dcc <- dcc.raw %>%
clean_names() %>%
tibble::rownames_to_column(var = "RecordID")
Time to knit: 1
There are various issues apparent on inspection. For example:
There are corrupt data in the DCC file. No useable time stamps – everything is characterer in date format.
“date_and_time” is not formatted as dttm, just a simple date. It’s redundant with ‘date’ The ‘time’ column is not, in fact, time data. May wish to find the orginal logger data. This may work for daily aggregated data. Otherwise, not sure what to do with the 6 per-day records. The ’second’s column ranges from 0 to 1.0477473^{8}. I guess this is a cumulative time elapsed, but from when to when? I’m still of the mind that daily aggregates are the way to go forward…
## data type changes
dcc <- dcc %>%
mutate(date = mdy(date))
## change case to ease merging with non-DCC data
dcc <- dcc %>%
mutate(site = tolower(site)) %>%
mutate(wat_id2 = tolower(wat_id2)) %>%
mutate(year = as.integer(year)) %>%
rename(serial_number = serial_num)
Time to knit: 0
## make lower case
dcc <- dcc %>%
mutate_if(.predicate = is.character,.funs = tolower)
## modify "site" to be consistent with usage elsewhere
dcc <- dcc %>%
rename(site.dcc = site) %>%
mutate(site = case_when(
grepl("eb1", site.dcc) ~ "eb1",
grepl("eb2", site.dcc) ~ "eb2",
grepl("elk1", site.dcc) ~ "elk1",
grepl("elk4", site.dcc) ~ "elk4",
grepl("elkdx", site.dcc) ~ "elk",
grepl("elkcc", site.dcc) ~ "elk",
grepl("wbcc", site.dcc) ~ "wb",
grepl("wbdx", site.dcc) ~ "wb",
TRUE ~ site.dcc))
## add "plot" consistent with usage elsewhere
dcc <- dcc %>%
mutate(plot = case_when(
grepl("cc", site.dcc) ~ "cc",
grepl("dx", site.dcc) ~ "dx",
grepl("dc", site.dcc) ~ "dc",
grepl("cx", site.dcc) ~ "cx",
TRUE ~ "obs")) %>%
mutate(site2 = paste0(site,"-",plot)) %>%
select(-site.dcc)
Time to knit: 1
# create a lu
lu.serialNum.watId2 <- dcc %>%
distinct(serial_number, wat_id2)
Time to knit: 0
visdat::vis_dat(dcc, warn_large_data = FALSE)
dcc %>%
distinct(time)
Time to knit: 0
## what are the distinct wells?
# dcc %>%
# distinct(site_logger) %>%
# gt() %>%
# tab_header(title = "DCC logger data", subtitle = "Distinct 'site_logger'")
# note there are 19 distinct "site_logger" values and 19 distinct "serial_num"
dcc %>%
distinct(site, serial_number) %>%
gt() %>%
tab_header(title = "DCC logger data", subtitle = "Distinct 'site_logger'")
| DCC logger data | |
|---|---|
| Distinct 'site_logger' | |
| serial_number | site |
| 316609 | eb1 |
| 316644 | eb1 |
| 316744 | eb2 |
| 316695 | eb2 |
| 316622 | eb2 |
| 316651 | eb2 |
| 316605 | elk1 |
| 316625 | elk4 |
| 316614 | elk |
| 316604 | elk |
| 316603 | ghole |
| 316724 | lava |
| 316640 | lb2 |
| 316602 | lostc |
| 316745 | oxbow |
| 316642 | rose |
| 316712 | wb1 |
| 316706 | wb |
| 316747 | wb |
Time to knit: 0
### summarize at a daily time step
dcc.dly.mean <- dcc %>%
group_by(site, plot, site2, wat_id2, serial_number, date) %>%
summarise(mean.dly.abslevel = mean(abslevel,na.rm = TRUE), mean.water_column_m = mean(water_column_m, na.rm = TRUE), psi_logger = mean(press_psi, na.rm=TRUE),psi_baro = mean(bar_press_psi, na.rm=TRUE))
# create plot var by parsing site2.
dcc.dly.mean <- dcc.dly.mean %>%
mutate(plot = case_when(str_detect(site2, "cc") ~ "cc",
str_detect(site2, "dc") ~ "dc",
str_detect(site2, "dx") ~ "dx",
str_detect(site2, "cx") ~ "cx",
TRUE ~ "obs"))
## add in year and mob=nth columns
dcc.dly.mean <- dcc.dly.mean %>%
mutate(yr = lubridate::year(date)) %>%
mutate(month = lubridate::month(date)) %>%
mutate(doy = lubridate::yday(date)) %>%
mutate(doy.d = format(date, format="%m-%d"))
Time to knit: 1
## some plots for QA
dcc.dly.mean %>%
mutate(mean.water_column_m = mean.water_column_m*-1) %>%
# distinct(plot)
filter(plot != "obs") %>%
ggplot(aes(x=date,y=mean.water_column_m)) +
geom_line(aes(color=serial_number)) +
geom_hline(yintercept = 0, color = "red", lty = "dashed") +
labs(title = "DCC: Mean daily 'water_column_m' values") +
facet_wrap(~site, ncol=2) +
theme_minimal() +
labs(x = "DOY", y = "mean.dly.abslevel", caption = "mean.dly.abslevel = mean(abslevel,na.rm = TRUE)")
Time to knit: 0
dcc.dly.mean %>%
filter(plot != "obs") %>%
ggplot(aes(x=date,y=mean.dly.abslevel)) +
geom_line(color="blue") +
labs(title = "DCC: Mean daily 'abslevel' values") +
facet_wrap(~serial_number, scales = "free_y") +
theme_minimal() +
labs(x = "Date", y = "Water table elevation (m)",caption = "mean.dly.abslevel = mean(abslevel,na.rm = TRUE)")
# ggsave("waterlevel_loggers_exp.png", width = 8.5, height = 6)
Time to knit: 1
## plot daily traces
dcc.dly.mean %>% # names()
filter(plot != "obs") %>%
filter(month > 2 & month < 10) %>%
mutate(yr = as.factor(yr)) %>%
# ggplot(aes(x=doy,y=mean.water_column_m)) +
ggplot(aes(x=doy,y=mean.dly.abslevel)) +
geom_line(aes(color=yr)) +
labs(x = "Day of year", y = "Water table elevation (m)") +
# labs(title = "Mean daily 'water_column_m' values") +
facet_wrap(~wat_id2, scale = 'free_y', ncol = 3) +
theme_minimal()
# ggsave("waterlevel_loggers_exp_alt2.png", width = 8.5, height = 6)
Time to knit: 1
DCC daily traces
There are clear out of range values evident in the DCC data, so thses don’t represent fully cleaned data. Minimal filtering is done to enable users of the data to perform cleaning as fits their needs.
## plot daily traces
dcc.dly.mean %>%
filter(plot != "obs") %>%
filter(month > 2 & month < 10) %>%
mutate(yr = as.factor(yr)) %>%
ggplot(aes(x=doy,y=mean.dly.abslevel)) +
geom_line(aes(color=yr)) +
labs(x = "Day of year", y = "Water table elevation (m)") +
facet_wrap(~site2, scale = 'free_y', ncol = 2) +
theme_minimal()
# ggsave("waterlevel_loggers_exp_alt2.png", width = 8.5, height = 6)
Time to knit: 1
### Export munged DCC data
dcc.dly.mean %>%
clean_names() %>%
write_csv("./data/processed/data_loggers/dcc_logger_20201205.csv")
Time to knit: 0
Inherited a mess of files and directories. The following code scrapes data from the various places data were found, merges data and eliminates duplicates. The general strategy is to work backwards from 2019 as there innumerable ‘appended’ files, hoping these capture data from earlier periods.
Time to knit: 0
The only baro data found is from Tower and is provisionally used for correcting different site data.
Source file: TowerBarow_2011+_2019-09-10_11-01-36-258.csv Based on the date range, this may extend to the full time series?
tow.baro <- read_csv("data/raw/WinSituData/Logger_Data_2019/TowerBarow_2011+_2019-09-10_11-01-36-258.csv", skip=84) %>%
clean_names() %>%
select(-x5) %>%
mutate(date = anytime::anydate(date_and_time))
## the start of the trace looks a little dodgy, trim
tow.baro <- tow.baro %>%
filter(date > 2011-10-02)
## extract date and calc mean daily pressure (entered in mm Hg) and temp
tow.baro.dly <- tow.baro %>%
group_by(date) %>%
dplyr::summarize(mmHg_mean = mean(pressure_mm_hg, na.rm = TRUE), temp_c_mean = mean(temperature_c)) %>%
mutate(yr = year(date))
# convert the mm Hg to psi (units the loggers are in)
tow.baro.dly <- tow.baro.dly %>%
select(-yr) %>%
mutate(baroTow_psi = 0.0193368*mmHg_mean) %>%
rename(baroTow_tempC = temp_c_mean) %>%
rename(baroTow_mmHg = mmHg_mean)
Time to knit: 13
Exploratory plot of Tower barotroll psi
## plot baro data
tow.baro.dly %>%
# filter(mmHg.mean < 620) %>%
ggplot(aes(date, baroTow_psi)) +
geom_line(color = 'blue') +
theme_minimal() +
labs(title = "Tower Baro daily trace", subtitle = "mean daily psi", caption = "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")
Time to knit: 0
# tow.baro.dly %>%
# mutate(yr = year(date), doy = lubridate::yday(date)) %>%
# mutate(yr = as.factor(yr)) %>%
# ggplot(aes(doy, baroTow_tempC)) +
# geom_line(aes(color = yr)) +
# theme_minimal() +
# labs(x = "Date", y = "Temperature (C)", title = "Tower Baro daily trace", subtitle = "mean daily temp_c", caption = "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")
tow.baro.dly %>%
mutate(yr = year(date), doy = lubridate::yday(date)) %>%
mutate(yr = as.factor(yr)) %>%
ggplot(aes(doy, baroTow_tempC)) +
geom_line(color="grey80") +
# geom_line(aes(color = yr)) +
theme_minimal() +
facet_wrap(~yr) +
labs(x = "Date", y = "Temperature (C)", title = "Tower Baro daily trace", subtitle = "mean daily temp_c", caption = "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")
Time to knit: 1
Temps below -30 C seem questionable…
Time to knit: 0
# read directory for all exported logger files
# there's a crazy number of inconsistently entered fields. Nothing is clear.
files2019csv <- fs::dir_ls("./data/raw/WinSituData/Logger_Data_2019", recurse = TRUE, glob = "*.csv")
## make a tibble, rename
files2019csv <- files2019csv %>%
enframe() %>%
rename(path_full = value) %>%
select(-name)
## add naked file name and naked path
files2019csv <- files2019csv %>%
mutate(path = fs::path_dir(path_full)) %>%
mutate(file_name = fs::path_file(path_full))
Time to knit: 0
For for 2019, there are 20 distinct csv files in the logger directory. Examination of these files reveals that they’re inconsistently structured. These appear to mainly be raw export files from WinSitu. Pending on the particular notes entered in the logger, data starts at a different row in the *.csv, complicating parsing. Data are uncorrected. The only baro data seems to be from Tower and is used to process the data.
## table of distinct file names
files2019csv %>%
distinct(file_name) %>%
gt::gt() %>%
gt::tab_header(title = "Distinct csv files in the 2019 directory")
| Distinct csv files in the 2019 directory |
|---|
| file_name |
| Crystal3_obs_Append_2019-08-15_12-59-18-257.csv |
| Crystal_Well20160819_Append_2019-07-17_12-36-02-157.csv |
| EB1_cc_2012+_Append_2019-08-19_13-23-14-621.csv |
| EB1_dx_2012+_Append_2019-08-19_12-42-23-473.csv |
| EB2_cc_2012+_2019-05-26_16-47-18-414.csv |
| EB2_dc_2012+_2019-05-26_14-34-16-494.csv |
| EB2_dx_2012+_Append_2019-08-19_15-20-14-836.csv |
| Elk1_2012+_2019-05-29_16-08-41-816.csv |
| Elk4_2012+_2019-05-29_10-18-47-766.csv |
| Elk5_SG_2019-05-29_11-17-23-283.csv |
| ElkCC2016_E14_2019-05-25_11-09-54-540.csv |
| ElkDX2016_E10_2019-05-24_14-28-09-495.csv |
| LB2_obs_15min_20170601_2019-05-28_14-50-38-590.csv |
| LB4_Well_15min_20170601_2019-05-28_17-46-32-683.csv |
| LostC_2012+_2019-06-02_12-28-57-572.csv |
| Oxb_2012+_2019-06-02_14-28-23-429.csv |
| TowerBarow_2011+_2019-09-10_11-01-36-258.csv |
| WB4_2017_Append_2019-08-21_12-16-13-003.csv |
| WB_CC_Upper_Slough_Append_2019-08-12_17-30-18-921.csv |
| WB_DX_2016_Append_2019-08-12_13-21-36-660.csv |
# Pull out what appears to be the only baro file
Time to knit: 0
## exclude the baro from the remaining logger files
files2019csv <- files2019csv %>%
filter(file_name != "TowerBarow_2011+_2019-09-10_11-01-36-258.csv")
Time to knit: 0
## all of the csv exports lack depth; uncorrected except...
## add some header info
# files2019csv <- files2019csv %>%
# mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c"))) %>%
# mutate(header = map(path_full, read_csv, skip = 0,
# n_max = 10))
# Fri Mar 26 11:37:35 2021 ------------------------------
## depth is present in some but not all.
files2019csv <- files2019csv %>%
mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c","depth_ft"))) %>%
mutate(header = map(path_full, read_csv, skip = 0,
n_max = 10))
Time to knit: 2
Time to knit: 0
## Munge: extract out info from file header
files2019csv <- files2019csv %>%
mutate(hdr_info = map(path_full, read_csv, skip =13,
n_max = 13, col_names = c("hdr_var","hdr_val"))) %>%
unnest(hdr_info) %>%
filter(!is.na(hdr_val) & !is.na(hdr_var)) %>%
# distinct(hdr_var)
filter(hdr_var == "Site" | hdr_var == "Serial Number") %>%
pivot_wider(names_from = hdr_var, values_from = hdr_val) %>%
clean_names() %>%
mutate(site_hdr = site)
Time to knit: 0
Note: ‘site_hdr’ refers to the ‘site’ attribute recorded in the export file (i.e., it was entered in the logger).
files2019csv %>%
pluck(5,3) %>%
# head() %>%
View()
Time to knit: 0
Attribute with the serial number and site from the header. Note: the “site” is as exists in the logger. Does not match format used elswhere, so more munging…
## extract out the actual data
logger_raw2019 <- files2019csv %>%
unnest(import)
Time to knit: 1
# select and attribute with the serial number and site from the header
logger_raw2019 <- logger_raw2019 %>%
select(file_name,date_time, psi, temp_c, serial_number, site_hdr) %>%
distinct() %>%
mutate(date_time = anytime::anytime(date_time)) %>%
mutate(date = date(date_time)) %>%
mutate(yr = year(date_time))
Time to knit: 8
Distill to daily time step
## create daily
logger_raw2019.dly <- logger_raw2019 %>%
group_by(file_name,date, serial_number, site_hdr) %>%
dplyr::summarise(psi.dly.mean = mean(psi, na.rm=TRUE), tempC.dly.mean = mean(temp_c, na.rm=TRUE)) %>%
distinct()
Time to knit: 1
## qa
logger_raw2019.dly %>%
ggplot(aes(date,psi.dly.mean)) +
geom_line(aes(color = serial_number)) +
facet_wrap(~site_hdr)
Time to knit: 0
## join in the baro
logger_raw2019.dly.baro <- left_join(logger_raw2019.dly, tow.baro.dly, by = "date")
Time to knit: 0
logger_raw2019.dly.baro %>%
visdat::vis_dat()
logger_raw2019.dly.baro %>%
View()
Time to knit: 0
Notes on data collected during the 2018 field season: Lewis Messner - 10/24/2018 - Data file labeled Slough Creek is WBCC. Pressure transducer was re-labeled for Slough Creek in 2017 but never deployed. Redeployed in spring to WBCC and not renamed. - Crystal well was removed in either May or June 2018 and replaced in July 2018. Check timing of removal in May/June. Logger was likely removed from the well, not paused, and returned in July after a reset. - All data need to be post-processed. All stick-up, total depth, and hanging distances are recorded in the well measurement field note books, which will eventually be copied to a spreadsheet.
# read directory for all exported logger files
# there's a crazy number of inconsistently entered fields. Nothing is clear.
files2018csv <- fs::dir_ls("./data/raw/WinSituData/Logger_Data_2018", recurse = TRUE, glob = "*.csv")
## find WSL files
files2018wsl <- fs::dir_ls("./data/raw/WinSituData/Logger_Data_2018", recurse = TRUE, glob = "*.wsl") %>%
enframe() %>%
rename(path_full = value) %>%
select(-name) %>%
mutate(path = fs::path_dir(path_full)) %>%
mutate(file_name = fs::path_file(path_full))
## make a tibble, rename
files2018csv <- files2018csv %>%
enframe() %>%
rename(path_full = value) %>%
select(-name)
## add naked file name and naked path
files2018csv <- files2018csv %>%
mutate(path = fs::path_dir(path_full)) %>%
mutate(file_name = fs::path_file(path_full))
# files2018csv %>%
# distinct(file_name) %>% datatable()
Time to knit: 0
For for 2018 alone, there are 22 distinct csv files in the logger directory. Examination of these files reveals that they’re inconstienly structured.
## table of distinct file names
files2018csv %>%
distinct(file_name) %>%
gt::gt() %>%
gt::tab_header(title = "Distinct csv files in the 2018 directory")
| Distinct csv files in the 2018 directory |
|---|
| file_name |
| 2011+_2018-08-22_08-59-44-597.csv |
| 2016_Append_2018-08-10_16-00-00-768.csv |
| Crystal3_obs_2018-08-04_14-03-35-424.csv |
| Crystal_Well20160819_2018-05-18_10-00-40-581.csv |
| Crystal_Well20160819_Append_2018-08-04_14-50-00-730.csv |
| EB1_cc_2012+_2018-05-04_10-30-37-980.csv |
| EB1_cc_2012+_Append_2018-08-07_09-54-29-346.csv |
| EB1_dx_2012+_Append_2018-08-07_08-46-02-308.csv |
| EB2_2012+_2018-05-07_13-17-49-386.csv |
| EB2_cc_2012+_Append_2018-08-10_07-39-49-362.csv |
| EB2_dc_2012+_Append_2018-08-07_15-00-09-695.csv |
| EB2_dx_2012+_Append_2018-08-07_15-04-34-299.csv |
| Elk1_2012+_Append_2018-08-20_12-46-02-732.csv |
| Elk4_2012+_Append_2018-08-18_08-45-36-450.csv |
| Elk5_SG_Append_2018-08-18_09-42-06-702.csv |
| ElkCC2016_E14_Append_2018-08-19_11-11-24-857.csv |
| ElkDX2016_E10_Append_2018-08-19_10-22-35-714.csv |
| LB2_obs_15min_20170601_2018-05-22_13-37-17-009.csv |
| LB4_Well_15min_20170601_2018-05-22_12-13-47-292.csv |
| Oxb_2012+_Append_2018-08-24_13-44-53-228.csv |
| Upper_Slough_Append_2018-08-11_12-49-23-948.csv |
| WB4_2017_Append_2018-08-28_11-38-37-460.csv |
Time to knit: 0
## all of the csv exports lack depth; uncorrected.
files2018csv <- files2018csv %>%
mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c"))) %>%
mutate(header = map(path_full, read_csv, skip = 0,
n_max = 40))
## Munge: extract out info from file header
files2018csv <- files2018csv %>%
mutate(hdr_info = map(path_full, read_csv, skip =13,
n_max = 13, col_names = c("hdr_var","hdr_val"))) %>%
unnest(hdr_info) %>%
filter(!is.na(hdr_val) & !is.na(hdr_var)) %>%
# distinct(hdr_var)
filter(hdr_var == "Site" | hdr_var == "Serial Number") %>%
pivot_wider(names_from = hdr_var, values_from = hdr_val) %>%
clean_names() %>%
rename(site_hdr = site)
## remove baro
files2018csv <- files2018csv %>%
filter(file_name != "2011+_2018-08-22_08-59-44-597.csv") # remove the tower baro data. I've previously pulled that out (and it's in mm Hg)
Time to knit: 3
files2018csv %>%
pluck(5,2) %>%
head() %>% View()
files2018csv %>%
pluck(4,2) %>% datatable()
## get names
files2018csv %>%
mutate(names = map(import, names)) %>%
unnest(names) %>%
datatable()
Time to knit: 0
## explore parsing
t <- read_csv("./data/raw/WinSituData/Logger_Data_2018/Exported Data/Exported_20181024/EB2_dc_2012+_Append_2018-08-07_15-00-09-695.csv", skip = 69)
t
t <- read_csv("./data/raw/WinSituData/Logger_Data_2018/Exported Data/Exported_20181024/EB2_dc_2012+_Append_2018-08-07_15-00-09-695.csv")
t
Time to knit: 0
## extract out the actual data
logger_raw2018 <- files2018csv %>%
unnest(import)
# logger_raw2018 %>%
# distinct(file_name)
Time to knit: 0
# select and attribute with the serial number and site from the header. Take a long time to process...
logger_raw2018 <- logger_raw2018 %>%
select(file_name,date_time, psi, temp_c, serial_number, site_hdr) %>%
distinct() %>%
mutate(date_time = anytime::anytime(date_time)) %>%
mutate(date = date(date_time)) %>%
mutate(yr = year(date_time))
Time to knit: 5
Distill to daily time step
## create daily
logger_raw2018.dly <- logger_raw2018 %>%
group_by(file_name,date, serial_number, site_hdr) %>%
dplyr::summarise(psi.dly.mean = mean(psi, na.rm=TRUE), tempC.dly.mean = mean(temp_c, na.rm=TRUE)) %>%
ungroup() %>%
distinct()
Time to knit: 1
## qa
logger_raw2018.dly %>%
ggplot(aes(date,psi.dly.mean)) +
geom_line(aes(color = serial_number)) +
facet_wrap(~site_hdr)
Time to knit: 0
## join in the baro
logger_raw2018.dly.baro <- left_join(logger_raw2018.dly, tow.baro.dly, by = "date")
Time to knit: 0
logger_raw2018.dly.baro %>%
visdat::vis_dat()
logger_raw2018.dly.baro %>%
View()
Time to knit: 0
# combine and distill down the 2018 and 2019 batches
logger_raw2018_19.dly.baro <- bind_rows(logger_raw2018.dly.baro, logger_raw2019.dly.baro) %>%
distinct()
## add yr
logger_raw2018_19.dly.baro <- logger_raw2018_19.dly.baro %>%
mutate(yr = as.integer(year(date)))
Time to knit: 0
## qa
# logger_raw2018_19.dly.baro %>%
# head()
logger_raw2018_19.dly.baro %>%
tabyl(site_hdr, yr) %>%
datatable(caption = "Count of records by yr and site_hdr")
logger_raw2018_19.dly.baro %>%
tabyl(serial_number, yr) %>%
datatable(caption = "Count of records by yr and serial number")
Time to knit: 0
## munge site
logger_raw2018_19.dly.baro %>%
distinct(site_hdr)
## # A tibble: 20 x 1
## site_hdr
## <chr>
## 1 WestBlacktail_exp dx
## 2 Crystal_Well
## 3 Crystal3
## 4 EastBlacktail1_exp cc
## 5 EastBlacktail1_exp dx
## 6 EastBlacktail2_obs
## 7 EastBlacktail2_exp cc
## 8 EastBlacktail2_exp dc
## 9 EastBlacktail2_exp dx
## 10 Elk1_obs
## 11 Elk4_obs
## 12 Elk5
## 13 Elk_exp cc
## 14 Elk_exp dx
## 15 LowerBlacktail2_obs
## 16 LB4_WELL
## 17 Oxbow_obs
## 18 Slough
## 19 WB4
## 20 LostCreek_obs
## do my best to guess site and plot from the site_hdr field derived from logger header
logger_raw2018_19.dly.baro <- logger_raw2018_19.dly.baro %>%
mutate(site2 = case_when(site_hdr == "WestBlacktail_exp dx" ~ "wb-dx",
site_hdr == "Crystal_Well" ~ "crystal-obs",
site_hdr == "Crystal3" ~ "crystal-obs",
site_hdr == "EastBlacktail1_exp cc" ~ "eb1-cc",
site_hdr == "EastBlacktail1_exp dx" ~ "eb1-dx",
site_hdr == "EastBlacktail2_obs" ~ "eb2-obs",
site_hdr == "EastBlacktail2_exp cc" ~ "eb2-cc",
site_hdr == "EastBlacktail2_exp dc" ~ "eb2-dc",
site_hdr == "EastBlacktail2_exp dx" ~ "eb2-dx",
site_hdr == "Elk1_obs" ~ "elk1-obs",
site_hdr == "Elk4_obs" ~ "elk4-obs",
site_hdr == "Elk5" ~ "elk5-obs",
site_hdr == "Elk_exp cc" ~ "elk-cc",
site_hdr == "Elk_exp dx" ~ "elk-dx",
site_hdr == "LowerBlacktail2_obs" ~ "lb2-obs",
site_hdr == "LB4_WELL" ~ "lb4-obs",
site_hdr == "Oxbow_obs" ~ "oxbow-obs",
site_hdr == "Slough" ~ "wb-cc",
site_hdr == "WB4" ~ "wb4-obs",
site_hdr == "LostCreek_obs" ~ "lostc-obs",
TRUE ~ site_hdr)) %>%
separate(site2, c("site","plot"),sep = "-", remove = FALSE)
# Notes on data collected during the 2018 field season: Lewis Messner - 10/24/2018
# - **Data file labeled Slough Creek is WBCC.
Time to knit: 1
logger_raw2018_19.dly.baro %>%
select(serial_number,site_hdr,site2,site,plot) %>%
distinct() %>%
gt() %>%
tab_header(title = "site info attributed from logger site (site_hdr)", subtitle = "LEWIS please check")
| site info attributed from logger site (site_hdr) | ||||
|---|---|---|---|---|
| LEWIS please check | ||||
| serial_number | site_hdr | site2 | site | plot |
| 316747 | WestBlacktail_exp dx | wb-dx | wb | dx |
| 316603 | Crystal_Well | crystal-obs | crystal | obs |
| 316744 | Crystal3 | crystal-obs | crystal | obs |
| 316609 | EastBlacktail1_exp cc | eb1-cc | eb1 | cc |
| 316644 | EastBlacktail1_exp dx | eb1-dx | eb1 | dx |
| 316744 | EastBlacktail2_obs | eb2-obs | eb2 | obs |
| 316695 | EastBlacktail2_exp cc | eb2-cc | eb2 | cc |
| 316622 | EastBlacktail2_exp dc | eb2-dc | eb2 | dc |
| 316651 | EastBlacktail2_exp dx | eb2-dx | eb2 | dx |
| 316605 | Elk1_obs | elk1-obs | elk1 | obs |
| 316625 | Elk4_obs | elk4-obs | elk4 | obs |
| 302228 | Elk5 | elk5-obs | elk5 | obs |
| 316614 | Elk_exp cc | elk-cc | elk | cc |
| 316604 | Elk_exp dx | elk-dx | elk | dx |
| 316640 | LowerBlacktail2_obs | lb2-obs | lb2 | obs |
| 316706 | LB4_WELL | lb4-obs | lb4 | obs |
| 316745 | Oxbow_obs | oxbow-obs | oxbow | obs |
| 302174 | Slough | wb-cc | wb | cc |
| 519943 | WB4 | wb4-obs | wb4 | obs |
| 316602 | LostCreek_obs | lostc-obs | lostc | obs |
## write to csv to share with LEWIS
# logger_raw2018_19.dly.baro %>%
# select(serial_number,site_hdr,site2,site,plot) %>%
# distinct() %>%
# write_csv("./data/processed/logger_site_attribution_4QC.csv")
Time to knit: 0
## first attempt at baro correction
logger_raw2018_19.dly.baro <- logger_raw2018_19.dly.baro %>%
mutate(psi.dif = psi.dly.mean-baroTow_psi) %>%
mutate(head_cm = psi.dif*0.70324961490205*100)
Time to knit: 0
logger_raw2018_19.dly.baro %>%
summarytools::descr(head_cm) %>%
summarytools::tb()
## # A tibble: 1 x 16
## variable mean sd min q1 med q3 max mad iqr cv skewness
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 head_cm 34.7 33.6 -31.6 11.6 30.1 57.8 161. 32.5 46.2 0.967 0.506
## # ... with 4 more variables: se.skewness <dbl>, kurtosis <dbl>, n.valid <dbl>,
## # pct.valid <dbl>
logger_raw2018_19.dly.baro %>%
ggplot(aes(date,head_cm)) +
geom_line(aes(color = plot)) +
facet_wrap(~site) +
labs(title = "QA plot", subtitle = "logger_psi - baro_psi converted to cm of head")
Time to knit: 2
I don’t know where missing site data should be. Also, info on the specific well id and location is lacking, so not able to compare to manual well measurements. No notes on stick up or hang depth needed to further process the data.
There are several additional directories of files. Scraping these for any data >2016. Presume earlier data folded up in the DCC?
# the lack of data management on this project is ridiculous.
# read directory for all exported logger files
# there's a crazy number of inconsistently entered fields. Nothing is clear.
filesOther_csv <- fs::dir_ls("./data/raw/WinSituData/Exported Data", recurse = TRUE, glob = "*.csv")
## make a tibble, rename
filesOther_csv <- filesOther_csv %>%
enframe() %>%
rename(path_full = value) %>%
select(-name)
## add naked file name and naked path
filesOther_csv <- filesOther_csv %>%
mutate(path = fs::path_dir(path_full)) %>%
mutate(file_name = fs::path_file(path_full))
filesOther_csv %>%
distinct(file_name) %>% datatable()
Time to knit: 0
In this directory alone, there are 117 distinct csv files in the logger directory. Examination of these files reveals that they’re inconsistently structured.
## table of distinct file names
filesOther_csv %>%
distinct(file_name) %>%
gt::gt() %>%
gt::tab_header(title = "Distinct csv files in the 'Exported Data' folder", subtitle = "A bunch of S$#T. All older than DCC. There are even files from Sara Bisbing's AK project.")
| Distinct csv files in the 'Exported Data' folder |
|---|
| A bunch of S$#T. All older than DCC. There are even files from Sara Bisbing's AK project. |
| file_name |
| Well 10_2010-08-03_13-58-57-312-BaroMerge.csv |
| Well 11_Append_2010-08-02_15-12-59-062-BaroMerge.csv |
| Well 12_2010-08-03_09-47-04-125-BaroMerge.csv |
| Well 1_2010-08-02_11-45-02-703-BaroMerge.csv |
| Well 2_2010-08-02_09-20-16-687-BaroMerge.csv |
| Well 3_2010-08-02_10-09-55-437-BaroMerge.csv |
| Well 4_2010-08-03_11-35-21-109-BaroMerge.csv |
| Well 5_Append_2010-08-02_16-42-14-062-BaroMerge.csv |
| Well 6_Append_2010-08-02_12-46-16-234-BaroMerge.csv |
| Well 7_Append_2010-08-02_16-08-02-484-BaroMerge.csv |
| Well 8_2010-08-02_11-28-45-468-BaroMerge.csv |
| Well 9_Append_2010-08-02_14-35-51-375-BaroMerge.csv |
| Well 10_Append_2011-08-05_12-15-22-500-BaroMerge.csv |
| Well 11_Append_2011-08-03_14-22-34-953-BaroMerge.csv |
| Well 12_Append_2011-08-03_10-51-19-453-BaroMerge.csv |
| Well 1_Append_2011-08-02_11-25-03-984-BaroMerge.csv |
| Well 2_Append_2011-07-30_11-13-58-921-BaroMerge.csv |
| Well 3_Append_2011-07-30_12-45-34-359-BaroMerge.csv |
| Well 4_Append_2011-08-01_12-25-31-687-BaroMerge.csv |
| Well 5_Append_2011-07-31_14-16-01-734-BaroMerge.csv |
| Well 6_Append_2011-08-01_13-54-02-250-BaroMerge.csv |
| Well 7_Append_2011-07-31_12-20-02-453-BaroMerge.csv |
| Well 8_Append_2011-08-02_10-20-35-062-BaroMerge.csv |
| Well 9_Append_2011-07-31_15-37-16-328-BaroMerge.csv |
| Akaska 9_Append_2010-06-17_08-38-49-828-BaroMerge.csv |
| Alaska 1_Append_2010-06-22_10-09-13-109-BaroMerge.csv |
| Alaska 2 - WT PEM_2010-06-17_09-45-30-046-BaroMerge.csv |
| alaska 2_Append_2010-06-17_13-28-21-468-BaroMerge.csv |
| alaska 4_2010-06-08_13-37-17-546-BaroMerge.csv |
| Alaska 5_Append_2010-06-16_09-11-32-015-BaroMerge.csv |
| Alaska 6_Append_2010-06-22_13-49-31-953-BaroMerge.csv |
| Alaska 7_Append_2010-06-16_12-01-37-296-BaroMerge.csv |
| Alaska 8_Append_2010-06-21_09-59-18-203-BaroMerge.csv |
| well 11_2010-06-07_12-19-43-109-BaroMerge.csv |
| well 12_2010-06-08_09-13-51-218-BaroMerge.csv |
| Well 10_Append_2011-06-02_17-59-39-218-BaroMerge.csv |
| Well 11_Append_2011-06-03_11-49-08-015-BaroMerge.csv |
| Well 12_Append_2011-06-29_15-20-19-062-BaroMerge.csv |
| Well 1_Append_2011-06-08_14-49-44-921-BaroMerge.csv |
| Well 2_Append_2011-06-26_19-11-44-859-BaroMerge.csv |
| Well 3_Append_2011-06-07_12-39-10-234-BaroMerge.csv |
| Well 4_Append_2011-06-27_10-12-35-531-BaroMerge.csv |
| Well 5_Append_2011-06-02_12-07-40-625-BaroMerge.csv |
| Well 6_Append_2011-06-02_15-45-12-328-BaroMerge.csv |
| Well 7_Append_2011-06-02_11-35-56-468-BaroMerge.csv |
| Well 8_Append_2011-06-08_13-01-01-328-BaroMerge.csv |
| Well 9_Append_2011-06-28_13-46-56-062-BaroMerge.csv |
| 2011+_Append_2002-01-02_10-49-46-531.csv |
| Alaska 10_2010-06-30_09-19-15-421-BaroMerge.csv |
| EB1_cc_2012+_2013-08-12_14-40-29-031.csv |
| EB1_cc_2012+_Append_2002-01-01_21-10-19-890.csv |
| EB1_dx_2012+_2013-08-12_13-09-07-515.csv |
| EB1_dx_2012+_Append_2002-01-01_20-42-56-265.csv |
| EB2_cc_2012+_2013-08-15_10-00-57-156.csv |
| EB2_dc_2012+_2013-08-15_10-29-51-843.csv |
| EB2_dc_2012+_Append_2002-01-01_23-52-51-953.csv |
| EB2_dx_2012+_2013-08-15_10-40-34-078.csv |
| EB2_dx_2012+_Append_2002-01-01_23-30-37-984.csv |
| EB2_2012+_2013-08-15_09-43-28-921.csv |
| EB2_2012+_Append_2002-01-02_21-30-07-796.csv |
| Elk1_2012+_2013-08-11_17-45-49-640.csv |
| Elk1_2012+_Append_2002-01-03_03-13-33-875.csv |
| Elk4_2012+_2013-08-11_10-22-35-281.csv |
| Elk4_2012+_Append_2002-01-05_23-47-41-500.csv |
| Elk5_SG_Append_2002-01-01_21-01-09-656.csv |
| Elk5_Well_Append_2002-01-01_03-10-40-796.csv |
| Elk_cc_2012+_2013-08-11_18-02-57-078.csv |
| Elk_cc_2012+_Append_2002-01-03_01-55-51-250.csv |
| Elk_dx_2012+_2013-08-11_18-04-54-625.csv |
| Elk_dx_2012+_Append_2002-01-03_01-48-13-703.csv |
| GHole_2012+_2013-08-21_13-20-44-156.csv |
| GHole_2012+_Append_2002-01-01_00-04-32-625.csv |
| Lava_2012+_2013-08-16_17-39-39-765.csv |
| Lava_2012+_Append_2002-01-06_02-07-45-359.csv |
| LostC_2012+_2013-08-22_15-18-11-921.csv |
| LostC_2012+_Append_2002-01-01_20-35-59-296.csv |
| LB2_2012+_2013-08-19_12-30-52-421.csv |
| 2011+_2002-01-22_05-21-15-375.csv |
| Baro 1 - WT PSS_Append_2012-07-24_11-09-02-812.csv |
| Baro 2 - Eagle PSS_Append_2002-01-30_22-28-04-218.csv |
| Jumbo Scirpus Well_Append_2012-08-07_13-04-19-093-BaroMerge.csv |
| Jumbo Scirpus Well_Append_2012-08-07_13-04-19-093.csv |
| Sundown Well_2012-08-04_13-01-47-046-BaroMerge.csv |
| Sundown Well_2012-08-04_13-01-47-046.csv |
| Well 10_Append_2002-01-30_01-07-45-953-BaroMerge.csv |
| Well 10_Append_2002-01-30_01-07-45-953.csv |
| Well 11_Append_2012-07-25_13-06-55-218-BaroMerge.csv |
| Well 11_Append_2012-07-25_13-06-55-218.csv |
| Well 12_Append_2012-07-26_15-12-27-859-BaroMerge.csv |
| Well 12_Append_2012-07-26_15-12-27-859.csv |
| Well 1_Append_2012-08-04_18-32-03-625-BaroMerge.csv |
| Well 1_Append_2012-08-04_18-32-03-625.csv |
| Well 2_Append_2012-07-24_10-00-37-062-BaroMerge.csv |
| Well 2_Append_2012-07-24_10-00-37-062.csv |
| Well 3_Append_2002-01-29_23-22-55-390-BaroMerge.csv |
| Well 3_Append_2002-01-29_23-22-55-390.csv |
| Well 4_Append_2012-07-26_12-00-33-796-BaroMerge.csv |
| Well 4_Append_2012-07-26_12-00-33-796.csv |
| Well 5_Append_2002-01-30_22-20-16-453-BaroMerge.csv |
| Well 5_Append_2002-01-30_22-20-16-453.csv |
| Well 6_Append_2002-01-29_21-50-19-109-BaroMerge.csv |
| Well 6_Append_2002-01-29_21-50-19-109.csv |
| Well 7_Append_2002-01-31_00-38-06-281-BaroMerge.csv |
| Well 7_Append_2002-01-31_00-38-06-281.csv |
| Well 8_Append_2012-07-23_09-25-23-093-BaroMerge.csv |
| Well 8_Append_2012-07-23_09-25-23-093.csv |
| Well 9_Append_2012-07-25_14-49-11-078-BaroMerge.csv |
| Well 9_Append_2012-07-25_14-49-11-078.csv |
| Oxb_2012+_2013-08-19_10-20-37-421.csv |
| Oxb_2012+_Append_2002-01-05_19-03-25-375.csv |
| Rose_2012+_2013-08-22_11-06-10-343.csv |
| Rose_2012+_Append_2002-01-05_02-35-54-796.csv |
| 2011+_2002-01-22_04-44-38-593.csv |
| 2011+_Append_2016-08-22_13-13-09-390.csv |
| WB1_2012+_2013-08-17_18-06-19-406.csv |
| WB_cc_2012+_Append_2013-08-17_11-35-37-968.csv |
| WB_dx_2012+_Append_2013-08-17_10-56-58-515.csv |
Time to knit: 1
## there are baromerged files. These have different columns
## extract the baromerged
filesOther_csv.baromrg <- filesOther_csv %>%
filter(str_detect(file_name, "BaroMerge"))
Time to knit: 0
## exclude the baromerged
filesOther_csv <- filesOther_csv %>%
filter(!str_detect(file_name, "BaroMerge"))
## add some header info
filesOther_csv <- filesOther_csv %>%
mutate(import = map(.x = path_full, .f = read_csv, skip=100, col_names = c("date_time","seconds","psi","temp_c"))) %>%
mutate(header = map(path_full, read_csv, skip = 0,
n_max = 10))
## Munge: extract out info from file header
filesOther_csv <- filesOther_csv %>%
mutate(hdr_info = map(path_full, read_csv, skip =13,
n_max = 13, col_names = c("hdr_var","hdr_val"))) %>%
unnest(hdr_info) %>%
filter(!is.na(hdr_val) & !is.na(hdr_var)) %>%
filter(hdr_var == "Site" | hdr_var == "Serial Number") %>%
pivot_wider(names_from = hdr_var, values_from = hdr_val) %>%
clean_names()
Time to knit: 7
Attribute with the serial number from the header.
## extract out the actual data
logger_rawOther <- filesOther_csv %>%
unnest(import)
Time to knit: 0
# select and attribute with the serial number and site from the header
logger_rawOther <- logger_rawOther %>%
select(file_name,date_time, psi, temp_c, serial_number) %>%
distinct() %>%
mutate(date_time = anytime::anytime(date_time)) %>%
mutate(date = date(date_time)) %>%
mutate(yr = year(date_time))
Time to knit: 11
Distill to daily time step
## create daily
logger_rawOth.dly <- logger_rawOther %>%
group_by(file_name,date, serial_number) %>%
dplyr::summarise(psi.dly.mean = mean(psi, na.rm=TRUE), tempC.dly.mean = mean(temp_c, na.rm=TRUE)) %>%
ungroup() %>%
distinct()
Time to knit: 2
logger_rawOth.dly %>%
mutate(yr = as.integer(year(date))) %>%
group_by(file_name) %>%
summarise(min.yr = min(yr), max.yr = max(yr)) %>%
gt() %>%
tab_header(title = "Most files in this batch do not have any 2016+ data")
| Most files in this batch do not have any 2016+ data | ||
|---|---|---|
| file_name | min.yr | max.yr |
| 2011+_2002-01-22_04-44-38-593.csv | 2011 | 2015 |
| 2011+_2002-01-22_05-21-15-375.csv | 2011 | 2015 |
| 2011+_Append_2002-01-02_10-49-46-531.csv | 2011 | 2015 |
| 2011+_Append_2016-08-22_13-13-09-390.csv | 2011 | 2016 |
| Baro 1 - WT PSS_Append_2012-07-24_11-09-02-812.csv | 2010 | 2012 |
| Baro 2 - Eagle PSS_Append_2002-01-30_22-28-04-218.csv | 2010 | 2012 |
| EB1_cc_2012+_2013-08-12_14-40-29-031.csv | 2012 | 2013 |
| EB1_cc_2012+_Append_2002-01-01_21-10-19-890.csv | 2012 | 2015 |
| EB1_dx_2012+_2013-08-12_13-09-07-515.csv | 2012 | 2013 |
| EB1_dx_2012+_Append_2002-01-01_20-42-56-265.csv | 2012 | 2015 |
| EB2_2012+_2013-08-15_09-43-28-921.csv | 2012 | 2013 |
| EB2_2012+_Append_2002-01-02_21-30-07-796.csv | 2012 | 2015 |
| EB2_cc_2012+_2013-08-15_10-00-57-156.csv | 2012 | 2013 |
| EB2_dc_2012+_2013-08-15_10-29-51-843.csv | 2012 | 2013 |
| EB2_dc_2012+_Append_2002-01-01_23-52-51-953.csv | 2012 | 2015 |
| EB2_dx_2012+_2013-08-15_10-40-34-078.csv | 2012 | 2013 |
| EB2_dx_2012+_Append_2002-01-01_23-30-37-984.csv | 2012 | 2015 |
| Elk_cc_2012+_2013-08-11_18-02-57-078.csv | 2012 | 2013 |
| Elk_cc_2012+_Append_2002-01-03_01-55-51-250.csv | 2012 | 2015 |
| Elk_dx_2012+_2013-08-11_18-04-54-625.csv | 2012 | 2013 |
| Elk_dx_2012+_Append_2002-01-03_01-48-13-703.csv | 2012 | 2015 |
| Elk1_2012+_2013-08-11_17-45-49-640.csv | 2012 | 2013 |
| Elk1_2012+_Append_2002-01-03_03-13-33-875.csv | 2012 | 2015 |
| Elk4_2012+_2013-08-11_10-22-35-281.csv | 2012 | 2013 |
| Elk4_2012+_Append_2002-01-05_23-47-41-500.csv | 2012 | 2015 |
| Elk5_SG_Append_2002-01-01_21-01-09-656.csv | 2015 | 2016 |
| Elk5_Well_Append_2002-01-01_03-10-40-796.csv | 2015 | 2016 |
| GHole_2012+_2013-08-21_13-20-44-156.csv | 2012 | 2013 |
| GHole_2012+_Append_2002-01-01_00-04-32-625.csv | 2012 | 2015 |
| Jumbo Scirpus Well_Append_2012-08-07_13-04-19-093.csv | 2011 | 2012 |
| Lava_2012+_2013-08-16_17-39-39-765.csv | 2012 | 2013 |
| Lava_2012+_Append_2002-01-06_02-07-45-359.csv | 2012 | 2015 |
| LB2_2012+_2013-08-19_12-30-52-421.csv | 2012 | 2013 |
| LostC_2012+_2013-08-22_15-18-11-921.csv | 2012 | 2013 |
| LostC_2012+_Append_2002-01-01_20-35-59-296.csv | 2012 | 2015 |
| Oxb_2012+_2013-08-19_10-20-37-421.csv | 2012 | 2013 |
| Oxb_2012+_Append_2002-01-05_19-03-25-375.csv | 2012 | 2015 |
| Rose_2012+_2013-08-22_11-06-10-343.csv | 2012 | 2013 |
| Rose_2012+_Append_2002-01-05_02-35-54-796.csv | 2012 | 2015 |
| Sundown Well_2012-08-04_13-01-47-046.csv | 2011 | 2012 |
| WB_cc_2012+_Append_2013-08-17_11-35-37-968.csv | 2012 | 2013 |
| WB_dx_2012+_Append_2013-08-17_10-56-58-515.csv | 2012 | 2013 |
| WB1_2012+_2013-08-17_18-06-19-406.csv | 2012 | 2013 |
| Well 1_Append_2012-08-04_18-32-03-625.csv | 2010 | 2012 |
| Well 10_Append_2002-01-30_01-07-45-953.csv | 2010 | 2012 |
| Well 11_Append_2012-07-25_13-06-55-218.csv | 2010 | 2012 |
| Well 12_Append_2012-07-26_15-12-27-859.csv | 2010 | 2012 |
| Well 2_Append_2012-07-24_10-00-37-062.csv | 2010 | 2012 |
| Well 3_Append_2002-01-29_23-22-55-390.csv | 2010 | 2012 |
| Well 4_Append_2012-07-26_12-00-33-796.csv | 2010 | 2012 |
| Well 5_Append_2002-01-30_22-20-16-453.csv | 2010 | 2012 |
| Well 6_Append_2002-01-29_21-50-19-109.csv | 2010 | 2012 |
| Well 7_Append_2002-01-31_00-38-06-281.csv | 2010 | 2012 |
| Well 8_Append_2012-07-23_09-25-23-093.csv | 2010 | 2012 |
| Well 9_Append_2012-07-25_14-49-11-078.csv | 2010 | 2012 |
Time to knit: 1
Most this seems useless. It should be in the DCC if max year <=2015
logger_rawOth.dly %>%
mutate(yr = as.integer(year(date))) %>%
group_by(file_name) %>%
summarise(min.yr = min(yr), max.yr = max(yr)) %>%
filter(max.yr >= 2016) %>%
gt() %>%
tab_header(title = "Only three files in this batch have any 2016+ data")
| Only three files in this batch have any 2016+ data | ||
|---|---|---|
| file_name | min.yr | max.yr |
| 2011+_Append_2016-08-22_13-13-09-390.csv | 2011 | 2016 |
| Elk5_SG_Append_2002-01-01_21-01-09-656.csv | 2015 | 2016 |
| Elk5_Well_Append_2002-01-01_03-10-40-796.csv | 2015 | 2016 |
Time to knit: 0
rawOth.16.logger <- logger_rawOth.dly %>%
filter(psi.dly.mean < 100) %>% # baro data in mmHg filtered out here
filter(date > '2016-01-01')
Time to knit: 0
rawOth.16.logger %>%
distinct(file_name) %>%
gt() %>%
tab_header(title = "The only new data produced from this directory scrape")
Time to knit: 0
rawOth.16.logger <- left_join(rawOth.16.logger, tow.baro.dly)
rawOth.16.logger <- rawOth.16.logger %>%
mutate(site2 = case_when(grepl("Elk5", file_name, ignore.case = TRUE) ~ "elk5-obs",
TRUE ~ file_name))
rawOth.16.logger <- rawOth.16.logger %>%
separate(site2, c("site","plot"),sep = "-", remove = FALSE)
rawOth.16.logger <- rawOth.16.logger %>%
mutate(psi.dif = psi.dly.mean-baroTow_psi) %>%
mutate(head_cm = psi.dif*0.70324961490205*100) %>%
mutate(yr = as.integer(year(date)))
Time to knit: 0
## bind with the 2018 and 2019 scrape
logger.all <- bind_rows(rawOth.16.logger, logger_raw2018_19.dly.baro)
# logger.all %>%
# mutate(yr = as.integer(year(date))) %>%
# visdat::vis_dat(warn_large_data = FALSE)
Time to knit: 0
All that crap to net ~500 records in 2016 for one site!
Join in wat_id2 to the non-DCC scrape ASSUMPTION: serial number stay with wells. The well number associated with a given serial number in the DCC is used to attribute the non-DCC scrape.
## Filter to 2016 +
logger.all <- logger.all %>%
filter(yr >=2016)
Time to knit: 0
## join in wat_id2 to the non-DCC scrape
lu.serialNum.watId2 <- lu.serialNum.watId2 %>%
mutate(serial_number = as.character(serial_number))
logger.all <- dplyr::left_join(logger.all,lu.serialNum.watId2, by="serial_number")
Time to knit: 0
There are multiple files with serial numbers not present in the DCC. Are these new loggers?
## there are data not joining to dcc on serial number
logger.all %>%
filter(is.na(wat_id2)) %>%
distinct(file_name) %>%
gt() %>%
tab_header(title = "Files lacking coresponding serial number in DCC")
| Files lacking coresponding serial number in DCC |
|---|
| file_name |
| Elk5_SG_Append_2002-01-01_21-01-09-656.csv |
| Elk5_SG_Append_2018-08-18_09-42-06-702.csv |
| Upper_Slough_Append_2018-08-11_12-49-23-948.csv |
| WB4_2017_Append_2018-08-28_11-38-37-460.csv |
| Elk5_SG_2019-05-29_11-17-23-283.csv |
| WB_CC_Upper_Slough_Append_2019-08-12_17-30-18-921.csv |
| WB4_2017_Append_2019-08-21_12-16-13-003.csv |
Time to knit: 0
## Filter to 2016 +
logger.all <- logger.all %>%
filter(yr >=2016)
## prep for row bind with DCC
dcc.dly.mean <- dcc.dly.mean %>%
ungroup() %>%
mutate(file_name = "DCC") %>%
mutate(serial_number = as.character(serial_number)) %>%
mutate(head_cm = mean.water_column_m*100) %>%
mutate(serial_number = as.character(serial_number)) %>%
select(-c(month, doy, doy.d, mean.water_column_m))
logger.all <- logger.all %>%
select(-c(baroTow_mmHg,baroTow_tempC)) %>%
rename(psi_logger = psi.dly.mean) %>%
rename(psi_baro = baroTow_psi) %>%
select(-c(site_hdr))
logger.all %>%
glimpse()
## Rows: 36,320
## Columns: 13
## $ file_name <chr> "Elk5_SG_Append_2002-01-01_21-01-09-656.csv", "Elk5_SG_~
## $ date <date> 2016-01-02, 2016-01-03, 2016-01-04, 2016-01-05, 2016-0~
## $ serial_number <chr> "302228", "302228", "302228", "302228", "302228", "3022~
## $ psi_logger <dbl> 12.51200, 12.50175, 12.41521, 12.33721, 12.28196, 12.26~
## $ tempC.dly.mean <dbl> 3.216500, 3.189500, 3.157500, 3.130000, 3.093333, 3.059~
## $ psi_baro <dbl> 11.87086, 11.85155, 11.73939, 11.64495, 11.60674, 11.59~
## $ site2 <chr> "elk5-obs", "elk5-obs", "elk5-obs", "elk5-obs", "elk5-o~
## $ site <chr> "elk5", "elk5", "elk5", "elk5", "elk5", "elk5", "elk5",~
## $ plot <chr> "obs", "obs", "obs", "obs", "obs", "obs", "obs", "obs",~
## $ psi.dif <dbl> 0.6411433, 0.6501963, 0.6758177, 0.6922586, 0.6752182, ~
## $ head_cm <dbl> 45.08838, 45.72503, 47.52685, 48.68306, 47.48469, 47.32~
## $ yr <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2~
## $ wat_id2 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
# compare_df_cols_same(logger.all, dcc.dly.mean)
Time to knit: 0
### BIND DCC
logger.all <- logger.all %>%
bind_rows(.,dcc.dly.mean) %>%
distinct()
Time to knit: 0
## add in dates
logger.all <- logger.all %>%
mutate(yr = lubridate::year(date)) %>%
mutate(month = lubridate::month(date)) %>%
mutate(doy = lubridate::yday(date)) %>%
mutate(doy.d = format(date, format="%m-%d"))
Time to knit: 0
logger.all %>%
visdat::vis_dat(warn_large_data = FALSE) +
labs(caption = "Combined DCC and other logger data")
Time to knit: 0
logger.all %>%
group_by(file_name, yr) %>%
summarise(n = n()) %>%
ggplot(aes(yr,file_name)) +
geom_tile(aes(fill = n))
Time to knit: 1
logger.all %>%
ggplot(aes(date, head_cm)) +
geom_line(aes(color = serial_number)) +
viridis::scale_color_viridis(discrete=TRUE) +
theme_minimal() +
facet_wrap(~yr, scales = "free", ncol = 1) +
labs(x="", y="Head (cm)", caption = "logger.all")
Time to knit: 6
logger.all %>%
filter(plot != "obs") %>%
ggplot(aes(date, head_cm)) +
geom_line(aes(color = serial_number)) +
viridis::scale_color_viridis(discrete=TRUE) +
theme_minimal() +
# facet_wrap(site2~yr, scales = "free", ncol = 1) +
facet_grid(site2~yr) +
labs(x="", y="Head (cm)", caption = "logger.all")
logger.all %>%
filter(plot != "obs") %>%
ggplot(aes(date, head_cm)) +
geom_line(aes(color = serial_number)) +
viridis::scale_color_viridis(discrete=TRUE) +
theme_minimal() +
# facet_wrap(site2~yr, scales = "free", ncol = 1) +
facet_grid(site2~serial_number) +
labs(x="", y="Head (cm)", caption = "logger.all")
Time to knit: 13
logger.all %>%
group_by(file_name, yr) %>%
summarise(n = n()) %>%
ggplot(aes(yr,file_name)) +
geom_tile(aes(fill = n))
Time to knit: 1
Interactive time series plot - Exp sites only
qa.pl.exp <- logger.all %>%
filter(plot != "obs") %>%
ggplot(aes(doy.d, head_cm)) +
geom_point(aes(color = serial_number)) +
viridis::scale_color_viridis(discrete=TRUE) +
theme_minimal() +
facet_wrap(~yr, scales = "free_y", ncol = 2) +
labs(x="", y="Head (cm)", caption = "logger.all")
plotly::ggplotly(qa.pl.exp)
Time to knit: 3
Interactive time series plot - Obs sites only
qa.pl.obs <- logger.all %>%
filter(plot == "obs") %>%
ggplot(aes(date, head_cm)) +
geom_point(aes(color = serial_number)) +
viridis::scale_color_viridis(discrete=TRUE) +
theme_minimal() +
facet_wrap(~yr, scales = "free", ncol = 2) +
labs(x="", y="Head (cm)", caption = "logger.all")
plotly::ggplotly(qa.pl.obs)
Time to knit: 3
logger.all %>%
mutate(doy = yday(date)) %>%
mutate(yr = as.factor(yr)) %>%
ggplot(aes(doy, head_cm)) +
geom_line(aes(color = yr)) +
viridis::scale_color_viridis(discrete=TRUE) +
theme_minimal() +
facet_wrap(~serial_number, scales = "free", ncol = 2) +
labs(title = "Time series -- all logger data", x="", y="Head (cm)", caption = "logger.all+DCC")
Time to knit: 7
logger.all %>%
group_by(serial_number, site, plot, yr) %>%
tally() %>%
ggplot(aes(yr, serial_number)) +
theme_minimal() +
geom_tile(aes(fill = n), color = "white") +
facet_wrap(~site, scales = "free", ncol=2) +
labs(fill = "n records", y = "Serial number", caption = "logger.all+DCC")
Time to knit: 4
logger.all %>%
ggplot(aes(date, head_cm)) +
geom_line(aes(color = serial_number)) +
facet_wrap(~yr, scales = "free", ncol=2) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(caption = "combined scrape of all raw files + DCC") +
facet_wrap(~site, ncol=3)
Time to knit: 5
Some take homes:
There are gaps in data
(Some) out of range values likely linked to when loggers deployed/read
Calculating a z score for logger serial numbers and years. This is a pretty roug way to process.
logger.all.clean <- logger.all %>%
group_by(serial_number,yr) %>%
mutate(z_score.bySerial = (psi_logger - mean(psi_logger)) / sd(psi_logger)) %>%
ungroup() %>%
mutate(flag.zscore = case_when(abs(z_score.bySerial) > 2 ~ "z>2")
)
Time to knit: 0
logger.all.clean <- logger.all.clean %>%
select(-c(tempC.dly.mean, psi.dif, mean.dly.abslevel))
logger.all.clean %>%
visdat::vis_dat(warn_large_data = FALSE)
logger.all.clean %>%
filter(is.na(wat_id2)) %>%
distinct(site)
## # A tibble: 3 x 1
## site
## <chr>
## 1 elk5
## 2 wb
## 3 wb4
Time to knit: 13
logger.all.clean %>%
summarytools::descr(z_score.bySerial)
Time to knit: 0
z scores (by serial number and yr): all data
## all data
logger.all.clean %>%
ggplot(aes(z_score.bySerial,serial_number)) +
geom_boxplot() +
facet_wrap(~yr) +
theme_minimal() +
labs(caption = "z scores (by serial number and yr): all records")
Time to knit: 6
z scores (by serial number and yr): |z-score| < 2
logger.all.clean %>%
filter(is.na(flag.zscore)) %>%
ggplot(aes(z_score.bySerial,serial_number)) +
geom_boxplot() +
facet_wrap(~yr) +
theme_minimal() +
labs(caption = "z scores (by serial number and yr): |z-score| < 2")
Time to knit: 5
logger.all.clean %>%
filter(is.na(flag.zscore)) %>%
ggplot(aes(date, head_cm)) +
geom_line(aes(color = serial_number)) +
facet_wrap(~yr, scales = "free", ncol=2) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(caption = "combined scrape of all raw files + DCC; z-score < 2") +
facet_wrap(~site, ncol=3)
Time to knit: 5
## Export for Further Analysis
logger.all.clean %>%
write_csv("./data/processed/data_loggers/visdat::vis_dat(warn_large_data = FALSE)combined_dataloggers_20201207.csv")
Time to knit: 0 # up
## remove char from wat_id2 (e.g., "e5" > "5")
# library(stringr)
numextract <- function(string){
stringr::str_extract(string, "\\-*\\d+\\.*\\d*")
}
## rename/munge to bind in manual
logger.all.clean <- logger.all.clean %>%
mutate(wid = numextract(wat_id2)) %>%
mutate(wid = as.character(wid))
## wat_id2 was carried in from the logger side...
Time to knit: 0
Before joining manual and logger datasets, need to figure out what to do with the sites in the logger data set lacking a well id (wat_id2 in the logger data set)…
The following raw logger files/serial numbers lack a well ID. The well id (wat_id2) originated in the DCC. So these could have been new loggers or mis-attributed in the DCC. I’ll see if I can deduce their identity, but field notes would help!
logger.all.clean %>%
filter(is.na(wid)) %>%
distinct(wid, file_name, serial_number) %>%
gt() %>%
tab_header(title = "Logger files with NA for well id (wid)")
| Logger files with NA for well id (wid) | ||
|---|---|---|
| file_name | serial_number | wid |
| Elk5_SG_Append_2002-01-01_21-01-09-656.csv | 302228 | NA |
| Elk5_SG_Append_2018-08-18_09-42-06-702.csv | 302228 | NA |
| Upper_Slough_Append_2018-08-11_12-49-23-948.csv | 302174 | NA |
| WB4_2017_Append_2018-08-28_11-38-37-460.csv | 519943 | NA |
| Elk5_SG_2019-05-29_11-17-23-283.csv | 302228 | NA |
| WB_CC_Upper_Slough_Append_2019-08-12_17-30-18-921.csv | 302174 | NA |
| WB4_2017_Append_2019-08-21_12-16-13-003.csv | 519943 | NA |
## create lu for anitjoin
missing.wid.lu <- logger.all.clean %>%
filter(is.na(wid)) %>%
distinct(wid, file_name, serial_number)
Time to knit: 0
### attributre missing wid for logger. I did some sluething.
# No wb4 in the manaual well measurements, so attributed the wid as "wb4". The other missing wid were possible to fix.
logger.all.clean <- logger.all.clean %>%
mutate(wid = case_when(is.na(wid) & str_detect(site,"elk5") ~ "40",
is.na(wid) & str_detect(file_name,"WB4") ~ "wb4",
is.na(wid) & str_detect(file_name,"Slough") ~ "40",
TRUE ~ wid))
# add fullid for join with manual
logger.all.clean <- logger.all.clean %>%
mutate(fullid = paste0(site,"-",plot,"-",wid))
# check
# logger.all.clean %>%
# filter(is.na(wid)) %>%
# distinct(wid, file_name, serial_number) %>%
# gt() %>%
# tab_header(title = "Logger files with NA for well id (wid)")
Time to knit: 0
## Export for Further Analysis
logger.all.clean %>%
write_csv("./data/processed/data_loggers/combined_dataloggers_20201208.csv")
Time to knit: 0
The following code joins in manual well measurements, as processed from raw files in a different Rmd file. It’s assumed loggers have fidelity to wells.
Source: ./data/processed/manual_well_data_raw_20201202.csv
# read in manual measurement
# This is the output of the manual well data Rmd that draws on the raw manual well data files
mw.01.19 <- read_csv("./data/processed/manual_well_data_raw_20201202.csv")
## type convert
mw.01.19 <- mw.01.19 %>%
mutate(wid = as.character(wid))
# cleave off some columns for join
mw.01.19 <- mw.01.19 %>%
select(-c(yr, doy, mo, mday, site, plot, wid, site2))
mw.01.19 <- mw.01.19 %>%
mutate(rwte = rwte*-1)
mw.01.19 <- mw.01.19 %>%
rename(mw_su_cm = su_cm, mw_dtw_cm = dtw_cm, mw_rwte = rwte, mw_flag = flag) %>%
mutate(mw_flag = as.character(mw_flag))
Time to knit: 0
## join logger manual
log.man <- left_join(logger.all.clean, mw.01.19, by = c("date","fullid"))
log.man <- log.man %>%
select(-c(wat_id2, site_type))
Time to knit: 0
### qa
logger.all.clean %>% visdat::vis_dat(warn_large_data = FALSE) +
labs(title = "logger.all.clean")
mw.01.19 %>% visdat::vis_dat(warn_large_data = FALSE) +
labs(title = "mw.01.19")
mw.01.19 %>% filter(is.na(mw_dtw_cm))
log.man %>% visdat::vis_dat(warn_large_data = FALSE) +
labs(title = "log.man")
Time to knit: 0
log.man.coincident <- log.man %>%
filter(!is.na(mw_rwte))
Time to knit: 0
log.man.coincident %>%
select(file_name, yr, serial_number, site2, head_cm, mw_su_cm, mw_rwte) %>%
distinct() %>% datatable()
Time to knit: 0
# use SU from manual wells to calculate hang height and adjust head_cm from logger
# !!! fore records without su, assuming rwte + head = hang height
log.man.coincident <- log.man.coincident %>%
# mutate(hang.calc = mw_dtw_cm + head_cm) %>%
mutate(hang.calc = case_when(
!is.na(mw_dtw_cm) ~ (mw_dtw_cm + head_cm),
is.na(mw_dtw_cm) ~ (mw_rwte + head_cm)))
log.man.coincident <- log.man.coincident %>%
mutate(rwte.calc = case_when(
!is.na(mw_su_cm) ~ (hang.calc-head_cm-mw_su_cm),
is.na(mw_su_cm) ~ (hang.calc-head_cm)))
# mutate(rwte.calc = hang.calc-head_cm-mw_su_cm)
log.man.coincident %>% visdat::vis_dat(warn_large_data = FALSE) +
labs(title = "log.man.coincident")
# #
# log.man.coincident %>%
# select(hang.calc, hang.calc) %>%
# datatable()
## distill
# log.man.hang.calc2join <- log.man.coincident %>%
# distinct(yr, fullid, hang.calc)
log.man.hang.calc2join <- log.man.coincident %>%
distinct(serial_number, hang.calc)
# log.man.hang.calc2join <- log.man.coincident %>%
# distinct(fullid, hang.calc)
# dplyr::anti_join(logger.all.clean, log.man.hang.calc2join, by = "serial_number") %>% View()
Time to knit: 1
The following table shows the coincident data in manual well measurements and loggers based on date and serial number
log.man.coincident %>%
datatable()
Time to knit: 0
Calculate hang height from head, su, and dtwt data. The code creates a flag field denoting whether a rwte calculated has info for hang height for that date or is estimated using past data and 0 su (applicable to SG?). Hang heights from field forms (if they exist) should be brought it
log.man.cor <- left_join(log.man, log.man.hang.calc2join, by = c("serial_number"))
log.man.cor %>%
filter(file_name != "DCC") %>%
filter(is.na(hang.calc)) %>%
tabyl(site2, yr)
## site2 2016 2017 2018 2019
## elk-cc 580 730 596 145
## elk-dx 580 730 596 144
## elk5-obs 1092 730 595 149
## lb4-obs 0 428 507 100
## lostc-obs 366 365 365 153
## wb-cc 0 404 588 224
## wb-dx 438 730 588 225
## wb4-obs 0 348 605 233
## create flag denoting whether a rwte calculated has info for hang height. Otherwise estimated using past data and 0 su (applicable to SG?). **Hang heights from field forms (if they exist) should be brought it**
log.man.cor <- log.man.cor %>%
mutate(rwte.calc = hang.calc - head_cm) %>%
mutate(flag_hanght = case_when(!is.na(rwte.calc) ~ "estimated")) %>%
mutate(rwte.calc = case_when(is.na(rwte.calc) & !is.na(mw_rwte) ~ mw_rwte,
TRUE ~ rwte.calc))
# mutate(rwte.calc = case_when(
# is.na(rwte.calc) & !is.na(mw_rwte) ~ mw_rwte,
# TRUE ~ rwte.calc
# ))
Time to knit: 1
# write coincident
log.man.coincident %>%
clean_names() %>%
write_csv("./data/processed/data_loggers/logger_manual_coincident_20210123.csv")
Time to knit: 0
## Attempt to attribute sg/well
## add lu from the "Well_Data.shp" file in the "geospatial" directory
lu_wat_id2 <- read_csv("./data/processed/lu_well_wat_id2_from_shape.csv")
logger.all %>%
distinct(wat_id2)
## # A tibble: 20 x 1
## wat_id2
## <chr>
## 1 <NA>
## 2 e40
## 3 e65
## 4 o68
## 5 o30
## 6 e33
## 7 e24
## 8 e48
## 9 e73
## 10 e71
## 11 o11
## 12 o48
## 13 e14
## 14 e10
## 15 o45_1
## 16 o7
## 17 o57
## 18 o47_2
## 19 o72
## 20 o27
try <- left_join(logger.all,lu_wat_id2, by = "wat_id2")
try %>%
filter(is.na(type)) %>%
distinct(file_name, site, plot, wat_id2)
## # A tibble: 0 x 4
## # ... with 4 variables: file_name <chr>, site <chr>, plot <chr>, wat_id2 <chr>
# I don't know for many of these which are sg vs wells
Time to knit: 0
#### apply tile fun
## qaqc
log.man.cor %>%
filter(is.na(rwte.calc)) %>%
group_by(file_name, yr) %>%
tally() %>%
gg.tile(mydf = ., myxcol = file_name, myycol = yr, fill = n,mytitle = "", caption = "Missing rwte.calc records. combined logger/mw")
log.man.cor %>%
filter(is.na(rwte.calc)) %>%
group_by(serial_number, site, yr) %>%
tally() %>%
gg.tile(mydf = .,myxcol = yr, myycol = serial_number, fill = n, mytitle = "",caption = "Missing rwte.calc records. combined logger/mw")
Time to knit: 2
log.man.cor %>%
group_by(serial_number, site, yr) %>%
tally() %>%
gg.tile(mydf = .,myxcol = yr, myycol = serial_number, fill = n, mytitle = "",caption = "All records. combined logger/mw")
Time to knit: 1
log.man.cor %>%
group_by(file_name, site, plot, yr) %>%
summarytools::descr(rwte.calc, stats = c("mean", "min", "max")) %>%
summarytools::tb() %>%
datatable(caption = "Summary stats: rwte.calc")
Time to knit: 6
log.man.cor %>%
visdat::vis_dat(warn_large_data = FALSE) +
labs(caption = "Combined logger and manual well data")
Time to knit: 3
Time to knit: 0
Time to knit: 0
Time to knit: 0
Find information (if it exists) on hang depths, stickups, etc. These are back-calculated using relationships in DCC but is based on assumptions best checked (e.g., constant SU, hang ht, fidelity to well).
Continue to filter out garbage
Make spatial by incorporating coordinates