Vanderbilt / CU Boulder Microbial Sensor Analysis

This R Publication is intended to provide updates on current and ongoing data analysis to the researches involved in the deployment of CU Boulder’s PHBV sensors (gen2) by Vanderbilt researchers. Using this approach provides transparency regarding the data analysis methods used, and improves communication between researchers.

Three loggers were provided with 3 sensors per logger (parallelized PHBV on long stake PCB). Loggers were installed at three sites, referred to here as grass_burn, grass_no_burn, and juniper_no_burn.

Data Analysis, using the R programming language:

# Load Libraries ----------------------------------------------------------
library(tidyverse); library(dplyr); library(lubridate);  library(scales)
library(ggplot2); library(cowplot); library(RColorBrewer); library(ggpubr); 
library(DescTools); library(stringr); library(zoo)
library(changepoint)
library(reshape)

Import and process data from parallelized sensor loggers (Gen1)

# Import and Process Text Files -------------------------------------------
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

dev1 <- read.table('grass site 1 burn Oct 2024.txt',sep=',',header=TRUE)
names(dev1) <- c('device','time','temp','ch1','ch2','ch3','ch4','ch5','ch6')
dev1$site <- 'grass_burn'

dev3 <- read.table('grass site 3 no burn oct 2024.txt',sep=',',header=TRUE)
names(dev3) <- c('device','time','temp','ch1','ch2','ch3','ch4','ch5','ch6')
dev3$site <- 'grass_no_burn'

dev4 <- read.table('juniper site 4 no burn oct 2024.txt',sep=',',header=TRUE)
names(dev4) <- c('device','time','temp','ch1','ch2','ch3','ch4','ch5','ch6')
dev4$site <- 'juniper_no_burn'

####Combine the data####
wideData <- rbind(dev1,dev3,dev4)
rm(dev1,dev3,dev4)

#Format dates
wideData$time <- as.POSIXct(wideData$time,format='%m/%d/%y %H:%M:%S')

#Remove early dates to only include good sensor port data
wideData <- subset(wideData, time > as.POSIXct("2024-06-12 15:00:00"))

####Convert to long format####
allData <- melt(wideData,id=c('site','device','time'))
tempData <- subset(allData, variable == 'temp')
allData <- setdiff(allData,tempData)

####Clean up the data a bit####
names(allData) <- c('site','sensor','time','variable','reading')
allData$sensor <- paste(allData$sensor,allData$variable)
allData <- select(allData, -'variable')

####Normalize the data by channel####
#Get an averaged initial resistance
#r_0 will be the mean resistance value between these dateTimes:
rStart <- as.POSIXct("2024-06-13 00:00:00"); rEnd <- rStart + hours(12)

r_0 <- allData[allData$time > rStart & allData$time < rEnd,] %>% 
  group_by(sensor) %>%
  dplyr::summarise(r0 = mean(reading))

allData <- merge(allData, r_0, by="sensor")

allData$r_norm <- allData$reading / allData$r0

Plot the raw (normalized) data

####Plot raw sensor data in a facet grid####
ggplot(allData) + theme_bw() +
  geom_point(aes(x=time,y=r_norm,col=site),shape=21,size=1.5) +
  facet_wrap(~sensor) + ylim(0.5,1.25) +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1")

Simplify data by removing unused sensor channels and failed sensor

Based on the above plot, it looks like some channels were unused in this experiment. We’ll use brute force to only retain sensors that were on active channels.

Additionally, it looks like there’s a sensor sudden failure August 9th for Device 4, channel 2. We’ll remove any data past the failure for that sensor channel.

allData <- subset(allData, sensor %in% 
                    c('Device 1 ch1', 'Device 1 ch2', 'Device 1 ch3',
                      'Device 3 ch1', 'Device 3 ch2', 'Device 3 ch3',
                      'Device 4 ch1', 'Device 4 ch2', 'Device 4 ch3'))

#Replot
ggplot(allData) + theme_bw() +
  geom_point(aes(x=time,y=r_norm,col=site),shape=21,size=1.5) +
  facet_wrap(~sensor) + ylim(0.5,1.25) +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1")

#Remove bad data following sensor failure
badData <- allData[allData$sensor=='Device 4 ch2' & 
                     allData$time > as.POSIXct("2024-08-09 21:49:44"),]

allData <- setdiff(allData,badData)

#Define data types
allData$time <- as.POSIXct(allData$time)
allData$site <- as.factor(allData$site) 
allData$sensor <- as.factor(allData$sensor)

Look at overlaid sensor signals

Based on raw sensor signals, do we see significant differences in sensor response as a function of the field in which the sensor is deployed?

ggplot(allData) + theme_bw() +
  geom_point(aes(x=time,y=r_norm,fill=site),shape=21,col="black") +
  scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")"))

We often see larger sensor signal variability in healthy soil. Here, we see high variability between sensors in the juniper site, medium variability in the grass_burn site, and low variability in the grass_no_burn site. We also see generally higher normalized resistance in the juniper site, but across all of the sites, we see a negative slope dominating the response. We most often see this negative slope when the biodegradable polymer is not being colonized by bacteria. We see some interesting moments of events where the sensor sleet as a whole responds. We should examine what might be responsible for these impacts - for example, large rainfall events that impacted all three sites.

Look at averaged sensor signals

We often average sensor signal by treatment, examining these data by looking at both the mean and standard deviation of the sensor signal within each treatment (or field site).

####Let's average sensor signals by site to look for differences####
aveData <- allData %>%
  mutate(hourTime = cut(time, breaks='2 hours')) %>% 
  group_by(site,hourTime) %>%
  dplyr::summarise(stdev = sd(r_norm), r_norm = mean(r_norm, na.rm=T))

aveData$hourTime <- as.POSIXct(aveData$hourTime)
aveData$site <- as.factor(aveData$site)

sitePlot <- ggplot(aveData) + theme_bw() +
  geom_point(aes(x=hourTime,y=r_norm,fill=site),shape=21,col="black") +
  scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")"))

devPlot <- sitePlot + 
  geom_ribbon(aes(x=hourTime,ymax=r_norm+stdev,ymin=r_norm-stdev,
                                      col=site,fill=site), alpha=0.3)

show(sitePlot); show(devPlot)

Zoom in on areas of interest

Examine the first week of sensor response:

startDate <- min(aveData$hourTime); endDate <- startDate + days(7)

firstWeekPlot <- devPlot + xlim(startDate,endDate) + ggtitle("First Week") +
  geom_smooth(aes(x=hourTime,y=r_norm,col=site),method=lm,se=F)

show(firstWeekPlot)

Examine the first month of sensor response:

endDate <- startDate + months(1)

firstMonthPlot <- devPlot + xlim(startDate,endDate) + ggtitle("First Month") +
  geom_smooth(aes(x=hourTime,y=r_norm,col=site),method=lm,se=F)

show(firstMonthPlot)
## `geom_smooth()` using formula = 'y ~ x'

Examine the sudden sensor response in mid-September:

startDate <- as.POSIXct("2024-09-12 00:00:00"); endDate <- startDate + days(20)

zoomPlot <- sitePlot + geom_vline(xintercept=c(startDate,endDate),
            linetype="dashed",color="darkred", linewidth=2,alpha=0.5) +
             ggtitle(paste("September Response",startDate,"to",endDate))

septPlot <- devPlot + xlim(startDate,endDate) +
  geom_smooth(aes(x=hourTime,y=r_norm,col=site),method=lm,se=F)

ggarrange(zoomPlot,septPlot,nrow=2,common.legend = T,legend="bottom")

Include soil moisture data from the two sites

We see some interesting apparent events where the overall downward trend of the microbial sensors is temporarily reversed, indicating spikes of either activity or something environmental (for example, changes in soil moisture due to precipitation). Let’s look at the water content data from the loggers installed at the sites to see how moisture relates to the sensor signals:

#Import soil moisture and temperature data
site1 <- read.csv('Titan_Grass_site1_burn_Oct2024.csv')
names(site1) <- c('row','time','temp','vwc')
site1 <- select(site1, -row); site1$site <- "grass_burn"

site2 <- read.csv('Titan_site2_burn_Juniper_Oct2024.csv')
names(site2) <- c('row','time','temp','vwc')
site2 <- select(site2, -row); site2$site <- "juniper_no_burn"

siteData <- rbind(site1, site2); rm(site1,site2)
siteData$time <- as.POSIXct(siteData$time,format="%m/%d/%Y %H:%M")

siteData <- melt(siteData, id=c('site','time'))

smData <- subset(siteData, variable == "vwc")
tempData <- subset(siteData, variable == "temp")

Let’s look at soil moisture and temperature data from both sites:

#Manually grab the colors we've been using above
myPal <- c(RColorBrewer::brewer.pal(3,"Set1")[1],
           RColorBrewer::brewer.pal(3,"Set1")[3])

smPlot <- ggplot(smData) + theme_bw() +
  geom_point(aes(x=time,y=value,col=site),shape=22) +
  scale_color_manual(values = myPal) + xlab("") +
ylab(expression(paste("Soil Water Content, m"^"3","/m"^"3")))

tempPlot <-  ggplot(tempData) + theme_bw() +
  geom_point(aes(x=time,y=value,col=site),shape=24) +
  scale_color_manual(values = myPal) + xlab("") +
ylab("Soil Temperature (F)")   

ggarrange(smPlot,tempPlot,nrow=2, align="hv")

I don’t know the location of the installation, so I can’t look at historical weather data to quickly get rainfall events. So instead, I’ll extract likely precipitation events from the soil moisture data, just by finding times when the soil moisture content increased:

#First, smooth the data by taking 6-hourly averages:
precData <- smData %>% 
  mutate(hourTime = cut(time, breaks='12 hours')) %>% 
  group_by(site,hourTime) %>%
  dplyr::summarise(value = mean(value, na.rm=T))

precData$hourTime <- as.POSIXct(precData$hourTime)

#From visual examination, identify likely rainfall events:
rainfall <- as.POSIXct(c("2024-06-19",
                         "2024-07-5","2024-07-07","2024-07-19","2024-07-30",
                         "2024-08-11","2024-08-15","2024-08-29",
                         "2024-09-05","2024-09-15","2024-09-22",
                         "2024-10-19"))

ggplot() + theme_bw() + ggtitle("Identifying Likely Rainfall (blue bars)") +
  geom_point(data=precData,aes(x=hourTime,y=value,fill=site),
             col="black",shape=22) +
  scale_fill_manual(values = myPal) +
  geom_vline(xintercept=rainfall,col="darkblue",alpha=0.25,linewidth=2) +
  xlab("") + ylab(expression(paste("Soil Water Content, m"^"3","/m"^"3")))

By visual examination, do we see precipitation driving the spikes in resistance change in the microbial sensor?

precPlot <- sitePlot + 
  geom_vline(xintercept=rainfall,col="darkblue",alpha=0.25,linewidth=2) +
  theme(legend.position = "bottom")
  
show(precPlot)

It sure looks like precipitation likely played a significant role here! Often, we expect to see sudden increases in microbial activity in dry soils immediately following a wetting event. However, it can be tricky to extract actual microbial impacts on the sensors from polymer swelling in more wet environments. While we have run experiments in the lab showing that we do not see a response from water alone, we have also seen that sensors impacted by microbial activity can decrease in resistance when removed from microbially-active media and put in an oven to dry.

So the signal we see here is several things at once:

  1. Overall, these soils seem dry and not highly microbially-active compared to many of our experiments. This leads to a baseline negative slope for all sensors. We believe this baseline negative slope is due to the annealing of the PHBV ink, which is particularly pronounced in dry, hot soils. We have started to pre-anneal sensors before deployment to remove this negative baseline slope.

  2. During wetting events, we see a sometimes temporary sensor response which acts against the underlying negative slope. During one particularly significant wetting event in mid-September, all sensors begin to respond dramatically and overcome this baseline negative-slope tendency.

Overall, it will be necessary to collect more data to see whether these sensors are applicable in this setting, and whether we can differentiate between burned and unburned treatments. We do see a significant difference between the Juniper site and the Grass sites, both in terms of the normalized resistance of the sensors and in terms of sensor slopes in response to wetting events.