Less simple schema

Setup

Let’s say we have one site (Crane Creek) with one plot. The plot has a datalogger monitoring two trees, one with a deep sensor, as well as soil moisture. There’s also a separate datalogger monitoring PAR.

The data flow for this setup looks like this:

Notice that the data flows for soil moisture, timestamp, and battery voltage are one-to-many: they are plot level and so go to all trees’ data. The PAR datalogger is site level and goes to all plots.

Read the data

The plot table captures the data flows above based on design_link, and provides treatment/plot/individual information .

plot_table <- read_csv("plot_table.csv", show_col_types = FALSE)
datatable(plot_table) %>% 
    formatStyle("design_link", color = "red")

Note that this plot table uses “comma notation”, so expand it before continuing:

plot_table_ex <- expand_df(plot_table)
datatable(plot_table_ex) %>% 
    formatStyle("design_link", color = "red")

The design table links logger variables with their design_link as given in the plot table. This link can either be a specific measurement name (e.g. S807) or a group (logger).

design_table <- read_csv("design_table.csv", show_col_types = FALSE)
datatable(design_table) %>%
    formatStyle("design_link", color = "red") %>%
    formatStyle("loggernet_variable", color = "blue")

Finally, the two logger data tables are downloaded from the logger.

library(compasstools)
logger_data1 <- read_csv("logger_data300.csv", show_col_types = FALSE)
datatable(logger_data1) %>%
    formatStyle("Logger", color = "green")

logger_data2 <- read_csv("logger_data302.csv", show_col_types = FALSE)
datatable(logger_data2) %>%
    formatStyle("Logger", color = "green")

Reshape logger data and join with design info

# Logger and Timestamp are 'magic fields' that always travel with data,
# so we handle them differently
ld1 <- pivot_longer(logger_data1, c(-Logger, -Timestamp), names_to = "loggernet_variable")
ld2 <- pivot_longer(logger_data2, c(-Logger, -Timestamp), names_to = "loggernet_variable")

ld1 %>%
    bind_rows(ld2) %>% 
    left_join(design_table, by = c("Logger", "loggernet_variable")) %>% 
    select(-scale_type) %>% 
    filter(!is.na(design_link)) ->  # in a real pipeline we'd error if not found
    dat1
datatable(dat1) %>% 
    formatStyle("design_link", color = "red") %>% 
    formatStyle("Logger", color = "green")

Join with plot data

This is a crucial step because this join is potentially one-to-many: more than one plot table entry may list a given design_link.

dat1 %>% 
    right_join(plot_table_ex, by = "design_link") %>% 
    select(-note) ->
    dat2
datatable(dat2)

Reshape for final output

dat2 %>% 
    select(-design_link, -loggernet_variable, -Logger) %>% 
    pivot_wider(names_from = "research_name", values_from = "value") %>% 
    datatable()

A little test

Define a test data frame:

Expand it:

datatable(expand_df(test), options = list(pageLength = 50))