Let’s say we have one site (Crane Creek) with one plot. The plot has a datalogger monitoring two trees, one with a deep sensor, as well as soil moisture. There’s also a separate datalogger monitoring PAR.
The data flow for this setup looks like this:
Notice that the data flows for soil moisture, timestamp, and battery voltage are one-to-many: they are plot level and so go to all trees’ data. The PAR datalogger is site level and goes to all plots.
The plot table captures the data flows above based
on design_link
, and provides treatment/plot/individual
information .
plot_table <- read_csv("plot_table.csv", show_col_types = FALSE)
datatable(plot_table) %>%
formatStyle("design_link", color = "red")
Note that this plot table uses “comma notation”, so expand it before continuing:
plot_table_ex <- expand_df(plot_table)
datatable(plot_table_ex) %>%
formatStyle("design_link", color = "red")
The design table links logger variables with their
design_link
as given in the plot table. This link can
either be a specific measurement name (e.g. S807
) or a
group (logger
).
design_table <- read_csv("design_table.csv", show_col_types = FALSE)
datatable(design_table) %>%
formatStyle("design_link", color = "red") %>%
formatStyle("loggernet_variable", color = "blue")
Finally, the two logger data tables are downloaded from the logger.
library(compasstools)
logger_data1 <- read_csv("logger_data300.csv", show_col_types = FALSE)
datatable(logger_data1) %>%
formatStyle("Logger", color = "green")
logger_data2 <- read_csv("logger_data302.csv", show_col_types = FALSE)
datatable(logger_data2) %>%
formatStyle("Logger", color = "green")
# Logger and Timestamp are 'magic fields' that always travel with data,
# so we handle them differently
ld1 <- pivot_longer(logger_data1, c(-Logger, -Timestamp), names_to = "loggernet_variable")
ld2 <- pivot_longer(logger_data2, c(-Logger, -Timestamp), names_to = "loggernet_variable")
ld1 %>%
bind_rows(ld2) %>%
left_join(design_table, by = c("Logger", "loggernet_variable")) %>%
select(-scale_type) %>%
filter(!is.na(design_link)) -> # in a real pipeline we'd error if not found
dat1
datatable(dat1) %>%
formatStyle("design_link", color = "red") %>%
formatStyle("Logger", color = "green")
This is a crucial step because this join is potentially
one-to-many: more than one plot table entry may list a given
design_link
.
dat1 %>%
right_join(plot_table_ex, by = "design_link") %>%
select(-note) ->
dat2
datatable(dat2)
dat2 %>%
select(-design_link, -loggernet_variable, -Logger) %>%
pivot_wider(names_from = "research_name", values_from = "value") %>%
datatable()
Define a test data frame:
Expand it:
datatable(expand_df(test), options = list(pageLength = 50))