McCain / CU Boulder Pilot 2

This R Publication is intended to provide updates on current and ongoing data analysis to the researches involved in analysis of data from McCain’s experimental plots, in which CU Boulder’s decomposition sensors have been installed. Using this approach provides transparency regarding the data analysis methods used, and improves communication between researchers.

All installed sensors were parallelized ladder rung designs, with 4 traces per sensor (PHBV Formulation 2024).

###Initial data processing

# Load Libraries ----------------------------------------------------------
library(RPostgreSQL); library(beepr); library(plyr)
library(tidyverse); library(dplyr); library(lubridate);  library(scales)
library(ggplot2); library(cowplot); library(RColorBrewer); library(ggpubr);
require(RMySQL); library("data.table"); library(stringr)
library(gganimate); library(gifski)

# Import and Process Text File -------------------------------------------
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

####IMPORT LOGGED DATA####
#Pull data in from text files and format timestamps
setwd(paste(dirname(rstudioapi::getActiveDocumentContext()$path),
            "/McCain_Decomp_2",sep=""))

file1 <- read.table('Device_1_Unfumigated Jul19.txt', sep=",")
file2 <- read.table('Device_3_Fumigated Jul19.txt', sep=",")

allData <- rbind(file1, file2); rm(file1, file2)
names(allData) <- c("device", "dateTime", "temp", "ch1", "ch2",
                    "ch3", "ch4", "ch5", "ch6")

allData$dateTime <- as.POSIXct(allData$dateTime, format="%m/%d/%y %H:%M:%S")

####Fix the lost RTC timestamps for Device 3####
offData <- allData[allData$dateTime < as.POSIXct("2001-01-01"),]
offData$dateTime <- offData$dateTime + years(24) + months(5) +
                    days(13) + hours(10)

#Recombine
allData <- allData[allData$dateTime > as.POSIXct("2001-01-01"),]
allData <- rbind(allData, offData); rm(offData)

####Pull in the newer Device 3 Data####
file3 <- read.table('Device_3_Aug_Sep.txt', sep=",")

data3 <- file3
names(data3) <- c("device", "dateTime", "temp", "ch1", "ch2",
                    "ch3", "ch4", "ch5", "ch6")
data3$dateTime <- as.POSIXct(data3$dateTime, format="%m/%d/%y %H:%M:%S")

#Fix most of the rows
data3$dateTime <- data3$dateTime + years(24) + months(6) + days(29)

#Delete one weird row
data3 <- data3[-1068,]

#Fix the next RTC reset rows
data3[1068:nrow(data3),]$dateTime <- data3[1068:nrow(data3),]$dateTime +
  months(1) - days(5) + hours(6) + minutes(10)

#Recombine with main dataframe
allData <- rbind(allData, data3)

####Melt the data to split readings into rows####
allData <- melt(allData, id = c("device", "dateTime"), value.name="reading")

#Add field treatments as factors
allData$treatment <- allData$device
allData$treatment <- as.factor(mapvalues(allData$treatment,
                               from=c("Device 1", "Device 3"),
                               to=c("Unfumigated", "Fumigated")))

#Remove temperature data from the dataframe
allData <- allData[allData$variable != "temp",]

#Remove any very high readings (noise and failed sensors)
outData <- allData[allData$reading > 5000,]
allData <- allData[allData$reading <= 5000,]

# Data Visualization ------------------------------------------------------
####Visualize the raw data, color-coded by treatment####
allPlot <- ggplot(data=allData) + theme_bw() + xlab("") +
  geom_point(aes(x=dateTime,y=reading,col=treatment),alpha=0.75,size=0.5) +
  ylab(paste("Sensor Resistance (\u03a9)")) +
  scale_color_brewer(palette = "Set1") +
  ggtitle("Raw Data from Field Study")

#Show as average resistances
aveData <- allData %>% mutate(dateTime = cut(dateTime, breaks='1 hour')) %>%
  group_by(dateTime, treatment) %>% dplyr::summarise(R_mean = mean(reading),
                                                     stDev = sd(reading))

aveData$dateTime <- as.POSIXct(aveData$dateTime)

avePlot <- ggplot(data=aveData) + theme_bw() + xlab("") +
  geom_point(aes(x=dateTime,y=R_mean,col=treatment),size=1.5) +
  geom_errorbar(aes(x=dateTime, 
                              ymin=R_mean-stDev, ymax=R_mean+stDev,
                              col=treatment),width=5, alpha=0.15) +
  ylab(paste("Sensor Resistance (\u03a9)")) +
  scale_color_brewer(palette = "Set1") +
  ggtitle("Hourly Average Data by Field Treatment")

combPlot <- ggarrange(allPlot, avePlot, nrow=2, ncol=1, align="hv",
          common.legend=TRUE, legend="bottom")

The code above converts the data into a single data frame in which each line is a single reading. Two loggers are installed, with 6 sensors each.

Plot the raw data

It appears that the two treatments have resulted in a different sensor response, on average. This is clear both in the raw data, and in the averaged data. Let’s dig deeper!

Usually, we use normalized resistances, rather than raw resistance, so that we can account for initial differences in the sensors and their resistance evolution over time due to microbial decomposition activity.

Normalize Resistances

####Process into Normalized Resistances####
#First, add the very high resistance values back in:
allData <- rbind(allData, outData); rm(outData)

#R_o calcs need to be done several times, since new sensors were installed

#Assign experiment numbers
allData$experiment <- as.factor(0)

t1 <- subset(allData,dateTime < as.POSIXct("2024-06-01"))
t1$experiment <- 1; t1$rStart <- as.POSIXct("2024-04-26 15:00:00")
t1$rEnd <- as.POSIXct("2024-04-27 15:00:00")

t2 <- subset(allData,dateTime > as.POSIXct("2024-06-01"))
t2$experiment <- 2; t2$rStart <- as.POSIXct("2024-06-15 00:00:00")
t2$rEnd <- as.POSIXct("2024-06-16 00:00:00")

allData <- rbind(t1, t2); rm(t1, t2)
allData$experiment <- as.factor(allData$experiment)

#Get initial resistances for all experiments and sensors
r0Data <- allData[allData$dateTime >= allData$rStart &
                    allData$dateTime <= allData$rEnd,]

r0 <- r0Data %>% group_by(experiment, device, variable) %>%
  dplyr::summarize(r0 = mean(reading))

allData <- merge(allData, r0, by=c("experiment","device","variable"))
allData$Rnorm <- allData$reading/allData$r0

#Remove failed sensor during 2nd experiment
badData <- allData[allData$treatment=="Unfumigated" & allData$variable=="ch5" &
                     allData$experiment==2,]

allData <- setdiff(allData, badData)

#Apply a filter to remove any crazy Rnorm values
#allData <- allData[allData$Rnorm > 0.5 & allData$Rnorm < 300,]

#Create a new factor to track individual sensors across treatments
allData$sensor <- as.factor(paste(allData$treatment, allData$variable))

###Visualize the data

####Zoom in to see what things look like without big sudden sensor signals
allPlot <- allPlot + ylim(0,7.5)

#Apply a filter to remove data from failed sensors
#In this case, we'll say any sensor whose resistance is 10x its initial
#resistance has "failed" and its useful life is over:
#allData <- allData[allData$Rnorm <= 10,]

#Show as average normalized resistances
aveData <- allData %>% mutate(dateTime = cut(dateTime, breaks='1 hour')) %>%
  group_by(dateTime, treatment) %>% dplyr::summarise(R_mean = mean(Rnorm),
                                                     stDev = sd(Rnorm))

aveData$dateTime <- as.POSIXct(aveData$dateTime)

avePlot <- ggplot(data=aveData) + theme_bw() + xlab("") +
  geom_point(aes(x=dateTime,y=R_mean,col=treatment),size=1.5) +
  # geom_errorbar(aes(x=dateTime, 
  #                   ymin=R_mean-stDev, ymax=R_mean+stDev,
  #                   col=treatment),width=5, alpha=0.15) +
  ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_brewer(palette = "Set1") +
  ggtitle("Hourly Average Normalized Data by Field Treatment") + ylim(0,7.5)

combPlot <- ggarrange(allPlot, avePlot, nrow=2, ncol=1, align="hv",
                      common.legend=TRUE, legend="bottom")

combPlot

Examining Responses across Scales

Let’s look at the response of the sensors based on treatment, across a few different “decomposition windows.” How do the sensors respond initially? Does the sensor response change over time?

When the plots below reference “Decomposition Window 1” and “Decomposition Window 2,” they are referencing the two experiments carried out at different times, using the same loggers, but different sensors installed in the fields.

In this case, the cascading plots below show Decomposition Windows which are 10 days, 15 days, 22 days, and 25 days long.

The top two plots show the raw data, the middle two plots show that data averaged by treatment, and the lower box plots show the distribution of sensor slopes over time.

#Generate plots looking at different decomposition windows
wLength <- c(10, 15, 21, 25)
yMaxes <- c(1.5, 2, 3, 5)

for(i in 1:4){
  nDays <- wLength[i]; yMax <- yMaxes[i]
  allData$window <- 0
  
  winStart1 <- min(allData[allData$experiment==1,]$dateTime)
  winEnd1 <- winStart1 + days(nDays)
  allData[allData$dateTime >= winStart1 &
          allData$dateTime <= winEnd1,]$window <- 1
  
  winStart2 <- min(allData[allData$experiment==2,]$dateTime)
  winEnd2 <- winStart2 + days(nDays)
  allData[allData$dateTime >= winStart2 &
            allData$dateTime <= winEnd2,]$window <- 2
  
  ####Show data from the two windows (normalized resistance, time-series)####
  plot1 <- allPlot + xlim(winStart1, winEnd1) + ylim(0.9,yMax) +
    ggtitle(paste("Decomposition Window 1:", date(winStart1),":",
                  date(winEnd1))) + theme(legend.position = "none")
  
  plot2 <- allPlot + xlim(winStart2, winEnd2) + ylim(0.9,yMax) +
    ggtitle(paste("Decomposition Window 2:", date(winStart2),":",
                  date(winEnd2))) + theme(legend.position = "none")
  
  plot3 <- avePlot + xlim(winStart1, winEnd1) + ylim(0.9,yMax) +
    ggtitle("") + theme(legend.position = "none")
  
  plot4 <- avePlot + xlim(winStart2, winEnd2) + ylim(0.9,yMax) +
    ggtitle("") + theme(legend.position = "none")
  
####Calculate slopes within decomposition windows
dataWindow1 <- subset(allData, window == 1)
  
dataWindow1$tDay <- as.numeric(dataWindow1$dateTime - 
                                 min(dataWindow1$dateTime))/60/60/24
  
theSlopes1 <- dataWindow1 %>% group_by(treatment, variable) %>%
  dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

dataWindow2 <- subset(allData, window == 2)
  
dataWindow2$tDay <- as.numeric(dataWindow2$dateTime - 
                                 min(dataWindow2$dateTime))/60/60/24
  
theSlopes2 <- dataWindow2 %>% group_by(treatment, variable) %>%
  dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
  
  ####Show slopes as box plots####
box1 <- ggplot(data=theSlopes1,aes(x = treatment, y = slope, fill=treatment)) +
  geom_boxplot(outlier.shape = NA) + scale_fill_brewer(palette = "Set1") +
  geom_point(position = position_jitterdodge(),alpha=0.4, size=4) +
  theme_bw() + scale_color_brewer(palette = "Set1") +
  theme(legend.position = "none") + xlab("") + 
  ylab(bquote('Sensor Slope '(day^-1))) +
  ggtitle(paste("Decomposition Window 1:", date(winStart1),":",
                  date(winEnd1)))

box2 <- ggplot(data=theSlopes2,aes(x = treatment, y = slope, fill=treatment)) +
  geom_boxplot(outlier.shape = NA) + scale_fill_brewer(palette = "Set1") +
  geom_point(position = position_jitterdodge(),alpha=0.4, size=4) +
  theme_bw() + scale_color_brewer(palette = "Set1") +
  theme(legend.position = "none") + xlab("") +
  ylab(bquote('Sensor Slope '(day^-1))) +
  ggtitle(paste("Decomposition Window 2:", date(winStart2),":",
                  date(winEnd2)))


####Combine all
windowPlot <- ggarrange(plot1, plot2, plot3, plot4, nrow=2, ncol=2)
  
doubleBoxes <- ggarrange(box1, box2, nrow=1, ncol = 2)
    
  
show(windowPlot)
show(doubleBoxes)
}

Based on these plots, it looks like the unfumigated sensors responded more slowly initially; then, as time went on, they began to overtake the fumigated sensors. Over a longer timescale, the sensors installed in the unfumigated field responded more. This appears to be the case in both the first experiment, and the second.

Let’s look at this as an animation to really see how things change over time:

#Subset to only include the 2nd, more long-term experiment:
#Also only include the 2nd deployment so we have complete datasets for
#both fields.
expData <- subset(allData, dateTime >= as.POSIXct("2024-06-13"))
expData <- subset(expData, dateTime <= as.POSIXct("2024-07-22"))

expData <- data.frame(sensor = expData$sensor, 
                      dateTime = as.POSIXct(expData$dateTime),
                      Rnorm = expData$Rnorm)

#Get average readings on a 1-hourly timescale
expData <- expData %>%  mutate(dateTime = cut(dateTime, breaks='1 hour')) %>%
  group_by(sensor, dateTime) %>%
  dplyr::summarize(Rnorm = mean(Rnorm, na.rm=T))

expData$dateTime <- as.POSIXct(expData$dateTime)

#Add a column with days elapsed since the start of the experiment
expData$tDay <- as.numeric(expData$dateTime - 
                                 min(expData$dateTime))/60/60/24

#Now, loop through a bunch of decomposition windows and get slopes:
timeData <- data.frame(sensor=0,slope=0,window_length=0) #Initializing row

for(wLength in 1:37){
  sData <- subset(expData, tDay <= wLength)
  
  #Take slopes
  slopes <- sData %>% group_by(sensor) %>% 
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
  
  slopes$window_length <- wLength
  
  timeData <- rbind(timeData,slopes)
}

timeData <- timeData[-1,] #Delete initializing row

#Add treatments column
timeData$treatment <- 0
timeData[str_detect(timeData$sensor, "Fumigated"),]$treatment <- "Fumigated"
timeData[str_detect(timeData$sensor, "Unfumigated"),]$treatment <- "Unfumigated"

####Plot the slope (sensor response) over time####
ggplot(data=timeData, aes(x=window_length, y=slope, col=treatment)) +
  geom_point() + theme_bw() + scale_color_brewer(palette = "Set1") +
  xlab("Days since Experiment Start") + ylab(bquote('Sensor Slope '(day^-1)))

####Represent these data as an animated box plot####
boxPlot_anim <- timeData %>%
  ggplot(aes(treatment,slope,fill=treatment)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position = position_jitterdodge(),alpha=0.4, size=4) +
  theme_bw() + scale_fill_brewer(palette = "Set1") +
  ylim(0, 0.1) +
  labs(title = 'Days Since Start {round(frame*37/100,0)}') +
  theme_bw() + scale_color_brewer(palette = "Set1") +
  ylab(bquote('Sensor Slope '(day^-1))) + xlab("") +
  theme(legend.position = "none") +
  transition_states(window_length)

#Shows how to save the animation as a .gif file
#anim_save("test2.gif", animation = boxPlot_anim, renderer = gifski_renderer(),
#          path = dirname(rstudioapi::getActiveDocumentContext()$path))

boxPlot_anim + ease_aes('cubic-in-out')

Findings

It appears that, taken individually or plotting averages by treatment, the sensors have responded differently in fumigated and non-fumigated fields.
Depending on the decomposition window examined, we see different trends between the treatments.
Initially, Fumigated Field sensors respond more (as evaluated by taking sensor slopes). Then, the response of the Unfumigated Field sensors overtakes the response of the Fumigated Field sensors. We also see much more variation in the sensor response in the Unfumigated field.
The newest data addition, from the end of July on, unfortunately doesn’t include data from the Unfumigated field. However, it appears that sensor behavior has started to become sporadic prior to that data, so it might be fair to say that the signal we care most about has already been captured.

McCain / CU Boulder Pilot 2 - Data Analysis

Taylor Sharpe

2024-10-04

Plot the raw data

Normalize Resistances

Examining Responses across Scales

Findings