Nitrogen Mineralization / Decomp Sensors Data Analysis

This R Publication is intended to provide updates on current and ongoing data analysis to the researches involved in analysis of data from the SuperTarantula experiment at UC Berkeley. Using this approach provides transparency regarding the data analysis methods used, and improves communication between researchers.

# Load Libraries ----------------------------------------------------------
library(tidyverse); library(dplyr); library(lubridate); library(reshape2)
library(ggplot2); library(cowplot); library(RColorBrewer); library(ggpubr) 
library(DescTools); library(scales)

# Import and Process Text File -------------------------------------------
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

####IMPORT LOGGED DATA####
#Pull data in from text files and format timestamps
setwd(paste(dirname(rstudioapi::getActiveDocumentContext()$path),
            "/May 2024 Data",sep=""))

rawData <- read.table("N-min_sensors_raw.csv",sep=',', header=TRUE)
rawData <- select(rawData, -Day)
names(rawData)[names(rawData) == "Timestamp"] <- "dateTime"

#Convert columns to rows for data analysis
allData <- melt(rawData,id.vars=c("dateTime"))
names(allData)[names(allData) == "variable"] <- "pot"
names(allData)[names(allData) == "value"] <- "reading"

allData$dateTime <- as.POSIXct(allData$dateTime,format="%m/%d/%Y %H:%M")
allData <- allData[order(allData$dateTime),] #Order by timestamp

# Normalize Resistances ---------------------------------------------------
#R_o will be the mean resistance value between these dateTimes:
rStart <- as.POSIXct("2024-04-26 00:00:00"); rEnd <- rStart + hours(12)
endDate <- max(allData$dateTime)

Ro <- allData[allData$dateTime > rStart & allData$dateTime < rEnd,] %>% 
  group_by(pot) %>% summarize(Ro = mean(reading))

allData <- merge(allData,Ro,by="pot"); allData$Rnorm=allData$reading/allData$Ro

# Map to treatments -------------------------------------------------------
allData$treatment <- as.factor(sub("\\..*","",allData$pot)) #Get treatment types

The code above converts the data into a single data.frame in which each line is a single reading.

Plot the raw data

# Plot the data -----------------------------------------------------------
treatmentColors <- c("Peat" = "red3", "Sandy" = "darkgreen", "Clay" = "royalblue3")
treatmentShapes <- c("Peat" = 1, "Sandy" = 2, "Clay" = 5)

####Plot raw data with free Y scale####
rawPlot <- ggplot() +
  geom_point(data=allData, aes(dateTime,Rnorm,col=treatment, alpha=0.75), size=1) +
  scale_color_manual(name = "Soil Type", values = treatmentColors) +
  scale_shape_manual(values=treatmentShapes, guide="none") +
  scale_alpha(guide="none") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  xlim(rStart, endDate) + theme_bw()

rawPlot

####Plot raw data with constrained Y scale####
rawPlot + ylim(0.9, 1.25)

To the naked eye, it appears that the three treatments (soil types) result in a different sensor response, on average. Let’s dig deeper!

Break out the plots by individual sensor, with linear fit lines

rawPlot + 
  stat_smooth(data=allData, aes(dateTime,Rnorm,col=treatment), method="lm", 
              se=FALSE, linewidth = 1,formula = y~x) + 
              facet_wrap(~pot, nrow=3)

We see some interesting things here. The Peat installations are clearly reacting more than the other two soil types, but also in a more stochastic way. The reactions start largely near the end of the experiment, suggesting that we might be only seeing the beginning of a process that would be more clear with a longer experimental run-time. The Sandy installations appear to have a general increasing slope which is more gradual. The Clay soils appear to have a mostly flat but slightly parabolic shape - surprising, since we don’t generally expect the resistances of these sensors to decrease.

Look at average signals by soil type to visually look for overall trends

####Average sensor signals by treatment####
aveData <- allData %>% 
  mutate(dateTime = cut(dateTime, breaks='3 hour')) %>% 
  group_by(treatment, dateTime) %>%
  dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))

aveData$dateTime <- as.POSIXct(aveData$dateTime)
aveData$tDay <- as.numeric(aveData$dateTime - min(aveData$dateTime))/60/60/24

####Plot the data####
treatmentPlot <- ggplot() +
  geom_point(data=aveData,
             aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  scale_shape_manual(values = treatmentShapes, guide="none") +
  xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25)

treatmentPlot

Extract slopes from the data

In general, our approach for decomposition sensor data analysis has been to extract individual sensor slopes and then look to correlate these slopes to other parameters of interest, for example the enzymatic activity within each instrumented plot. In this case, I will provide two different tables of slopes: One for the complete slope across the entire experiment (after a 12-hour period used to baseline the initial resistances), and one for the slope within some “decomposition window”, the part of the signal that we think contains the most useful information.

####First, smooth the data a bit####
sData <- allData %>% 
  mutate(dateTime = cut(dateTime, breaks='6 hour')) %>% 
  group_by(pot, dateTime) %>%
  dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))

#Add a time elapsed column, in days, called tDay
sData$dateTime <- as.POSIXct(sData$dateTime)
sData$tDay <- as.numeric(sData$dateTime - min(sData$dateTime))/60/60/24

#Get slopes ACROSS THE WHOLE EXPERIMENT
slopesAll <- sData %>%
  group_by(pot) %>%
  dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesAll$treatment <- as.factor(sub("\\..*","",slopesAll$pot)) #Get treatments

allPlot <- ggplot(data=slopesAll) + theme_bw()+theme(legend.position = "none")+
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab(bquote('Sensor Slope '(day^-1))) + 
  ggtitle("Overall Slope")

####Define the beginning and end of a decomposition window####
decompStart <- rStart + hours(1);
decompEnd <- rStart + days(10)

#Get slopes WITHIN THE DECOMPOSITION WINDOW
slopesWindow <- sData[sData$dateTime >= decompStart &
                         sData$dateTime <= decompEnd,] %>%
    group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesWindow$treatment <- as.factor(sub("\\..*","",slopesWindow$pot))

windowPlot <- ggplot(data=slopesWindow) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")

bothBox <- ggarrange(allPlot, windowPlot, ncol=2) #Show both box plots

show(bothBox)

Look at the Decomposition Window analyzed in the previous nMin experiment

These slopes are the linear term of a 2nd-order polynomial fit. When we look at the overall slope distribution across all of the experiment, there is some separation; when we subset the data to only include a decomposition window that starts pretty much immediately after the experiment begins, and ends 10 days after the experiment begins, we see much better separation between soil types.

In the plot below, the decomposition window will be denoted using vertical black dashed lines. This represents the first of three decomposition windows that will be examined.

#Box plot for the First Decomposition Window
slopesBoxPlot <- ggplot(data=slopesWindow) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  geom_jitter(aes(x=treatment, y=slope, col=treatment), size=2, alpha=0.75) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")

#Treatment plot showing the First Decomposition Window
treatmentPlot <- ggplot() + ggtitle("Averaged Sensor Response by Treatment") +
  geom_point(data=aveData,
             aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
  geom_errorbar(data=aveData[aveData$dateTime >= decompStart &
                             aveData$dateTime <= decompEnd,],
                aes(x=dateTime, ymin=Rnorm-stDev, ymax=Rnorm+stDev,
                    col=treatment, alpha=0.5), width=50000, 
                position=position_dodge(10000)) +
  stat_smooth(data=aveData[aveData$dateTime >= decompStart &
                             aveData$dateTime <= decompEnd,],
    aes(dateTime,Rnorm,col=treatment), method="lm", se=FALSE, 
              linewidth = 1.25,formula = y~x) +
  geom_vline(xintercept = c(decompStart,decompEnd),
             color="black", linewidth = 1, linetype = "dashed") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  scale_shape_manual(values = treatmentShapes, guide="none") +
  xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25) + 
  theme(legend.position = "none")

ggarrange(treatmentPlot, slopesBoxPlot, ncol=2, widths=c(2,1)) #Combine plots

Slopes Tables

This table contains the slopes taken from each individual sensor, within three decomposition windows: 0-10 days, 10-20 days, and 20-30 days.

start1 <- rStart + minutes(1); end1 <- rStart + days(10)
start2 <- end1 + minutes(1); end2 <- start2 + days(10)
start3 <- end2 + minutes(1); end3 <- start3 + days(10)

#Get slopes WITHIN EACH DECOMPOSITION WINDOW
slopesWindow1 <- sData[sData$dateTime >= start1 & sData$dateTime <= end1,] %>%  
  group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesWindow2 <- sData[sData$dateTime >= start2 & sData$dateTime <= end2,] %>%  
  group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesWindow3 <- sData[sData$dateTime >= start3 & sData$dateTime <= end3,] %>%  
  group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

#Manipulate the dataframes a bit
slopesWindow1$treatment <- as.factor(sub("\\..*","",slopesWindow1$pot))
slopesWindow1$window <- as.factor("Decomp Window 1")
slopesWindow2$treatment <- as.factor(sub("\\..*","",slopesWindow2$pot))
slopesWindow2$window <- as.factor("Decomp Window 2")
slopesWindow3$treatment <- as.factor(sub("\\..*","",slopesWindow3$pot))
slopesWindow3$window <- as.factor("Decomp Window 3")

allSlopes <- rbind(slopesWindow1,slopesWindow2,slopesWindow3) #Combine windows

show(allSlopes)

## # A tibble: 45 × 4
##    pot              slope treatment window         
##    <fct>            <dbl> <fct>     <fct>          
##  1 Peat.soil.1   0.000668 Peat      Decomp Window 1
##  2 Peat.soil.2  -0.000604 Peat      Decomp Window 1
##  3 Peat.soil.3  -0.000556 Peat      Decomp Window 1
##  4 Peat.soil.4  -0.000191 Peat      Decomp Window 1
##  5 Peat.soil.5  -0.00126  Peat      Decomp Window 1
##  6 Sandy.soil.1  0.0103   Sandy     Decomp Window 1
##  7 Sandy.soil.2  0.00735  Sandy     Decomp Window 1
##  8 Sandy.soil.3  0.00790  Sandy     Decomp Window 1
##  9 Sandy.soil.4  0.0102   Sandy     Decomp Window 1
## 10 Sandy.soil.5  0.00829  Sandy     Decomp Window 1
## # ℹ 35 more rows

This table contains slopes taken from the AVERAGED sensor signal by treatment, across three windows:

#Get slopes WITHIN EACH DECOMPOSITION WINDOW
aveWindow1 <- aveData[aveData$dateTime >= start1 & aveData$dateTime <= end1,] %>%  
  group_by(treatment) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

aveWindow2 <- aveData[aveData$dateTime >= start2 & aveData$dateTime <= end2,] %>%  
  group_by(treatment) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

aveWindow3 <- aveData[aveData$dateTime >= start3 & aveData$dateTime <= end3,] %>%  
  group_by(treatment) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

#Manipulate the dataframes a bit
aveWindow1$window <- as.factor("Decomp Window 1")
aveWindow2$window <- as.factor("Decomp Window 2")
aveWindow3$window <- as.factor("Decomp Window 3")

aveSlopes <- rbind(aveWindow1,aveWindow2,aveWindow3) #Combine windows

show(aveSlopes)

## # A tibble: 9 × 3
##   treatment     slope window         
##   <fct>         <dbl> <fct>          
## 1 Clay       0.00621  Decomp Window 1
## 2 Peat      -0.000387 Decomp Window 1
## 3 Sandy      0.00879  Decomp Window 1
## 4 Clay      -0.00147  Decomp Window 2
## 5 Peat       0.00436  Decomp Window 2
## 6 Sandy      0.00142  Decomp Window 2
## 7 Clay      -0.000562 Decomp Window 3
## 8 Peat       0.00455  Decomp Window 3
## 9 Sandy     -0.000124 Decomp Window 3

Let’s visually examine the results of taking slopes across three decomposition windows:

#Show slope data as a box plot
windowBoxPlot <- ggplot(data=allSlopes) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  geom_jitter(aes(x=treatment, y=slope, col=treatment), size=2, alpha=0.75) +
  scale_color_manual(name = "Treatment", values = treatmentColors) +
  xlab("") + ylab(bquote('Sensor Slope '(day^-1))) + facet_wrap(~ window)

windowTreatmentPlot <- ggplot() + 
  geom_point(data=aveData,
             aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
  stat_smooth(data=aveData[aveData$dateTime >= start1 & aveData$dateTime <= end1,],
    aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
  stat_smooth(data=aveData[aveData$dateTime >= start2 & aveData$dateTime <= end2,],
    aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
  stat_smooth(data=aveData[aveData$dateTime >= start3 & aveData$dateTime <= end3,],
    aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
   geom_vline(xintercept = c(start1,end1,start2,end2,start3,end3),
             color="black", linewidth = 0.5, linetype = "dashed") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_manual(name = "Treatment", values = treatmentColors) +
  scale_shape_manual(values = treatmentShapes, guide="none") +
  xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25) + 
  theme(legend.position = "none")

ggarrange(windowTreatmentPlot, windowBoxPlot, nrow=2)

Visually, it looks like we are seeing a statistically-significant impact of treatment (soil type) on the slopes that we see from these sensors at least for the first decomposition window. Let’s check this with ANOVAs:

summary(aov(slopesWindow1$slope~slopesWindow1$treatment))

##                         Df    Sum Sq   Mean Sq F value   Pr(>F)    
## slopesWindow1$treatment  2 2.243e-04 1.121e-04   113.9 1.57e-08 ***
## Residuals               12 1.182e-05 9.800e-07                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(slopesWindow2$slope~slopesWindow2$treatment))

##                         Df   Sum Sq   Mean Sq F value Pr(>F)
## slopesWindow2$treatment  2 8.52e-05 4.258e-05   1.596  0.243
## Residuals               12 3.20e-04 2.667e-05

summary(aov(slopesWindow3$slope~slopesWindow3$treatment))

##                         Df    Sum Sq   Mean Sq F value Pr(>F)
## slopesWindow3$treatment  2 8.028e-05 4.014e-05   2.642  0.112
## Residuals               12 1.823e-04 1.519e-05

Findings

We have defined several decomposition windows where we take individual normalized resistance slopes from each sensor. The windows correspond to the first, second, and third ten-day period of the experiment.
We take slopes by running a 2nd-order polynomial regression, and taking just the linear coefficient of the fit as the slope associated with that particular sensor.
Based on a one-way ANOVA test, we see a significant difference between the slope responses from sensors as a function of the soil in which they were installed for only the first window. For the second and third decomposition window, there is not a significant difference between sensor slopes as a function of treatment. we could consider removing outlier sensors from the analysis, assigning them some kind of failure status. In particular, two Peat sensors appear to have particularly high slopes during windows 2 and 3.

Next Questions

Next, we should use the slopes extracted from these sensor signals to see if they correlate to other measurements in these experimental pots:

Do sensor slopes correlate to gas emissions as measured over the course of the experiment?
Do sensor slopes correlate to signals from other co-located sensors?
Do sensor slopes correlate to other measured parameters in the pots?
Are some windows more useful than others? Even though these sensors aren’t failing quickly the way that prior generations do, they may still be “depleted” in some sense without actually failing.