Nitrogen Mineralization / Decomp Sensors Data Analysis

This R Publication is intended to provide updates on current and ongoing data analysis to the researches involved in analysis of data from the SuperTarantula experiment at UC Berkeley. Using this approach provides transparency regarding the data analysis methods used, and improves communication between researchers.

# Load Libraries ----------------------------------------------------------
library(tidyverse); library(dplyr); library(lubridate); library(reshape2)
library(ggplot2); library(cowplot); library(RColorBrewer); library(ggpubr) 
library(DescTools); library(scales)

# Import and Process Text File -------------------------------------------
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

####IMPORT LOGGED DATA####
#Pull data in from text files and format timestamps
setwd(paste(dirname(rstudioapi::getActiveDocumentContext()$path),
            "/May 2024 Data",sep=""))

rawData <- read.table("N-min_sensors_raw.csv",sep=',', header=TRUE)
rawData <- select(rawData, -Day)
names(rawData)[names(rawData) == "Timestamp"] <- "dateTime"

#Convert columns to rows for data analysis and rename some columns
allData <- melt(rawData,id.vars=c("dateTime"))
names(allData)[names(allData) == "variable"] <- "pot"
names(allData)[names(allData) == "value"] <- "reading"

allData$dateTime <- as.POSIXct(allData$dateTime,format="%m/%d/%Y %H:%M")
allData <- allData[order(allData$dateTime),] #Order by timestamp

# Normalize Resistances ---------------------------------------------------
#R_o will be the mean resistance value between these dateTimes:
rStart <- as.POSIXct("2024-04-26 00:00:00"); rEnd <- rStart + hours(12)
endDate <- max(allData$dateTime)

Ro <- allData[allData$dateTime > rStart & allData$dateTime < rEnd,] %>% 
  group_by(pot) %>% summarize(Ro = mean(reading))

allData <- merge(allData,Ro,by="pot"); allData$Rnorm=allData$reading/allData$Ro

# Map to treatments -------------------------------------------------------
allData$treatment <- as.factor(sub("\\..*","",allData$pot)) #Get treatment types

The code above converts the data into a single data.frame in which each line is a single reading.

Plot the raw data

# Plot the data -----------------------------------------------------------
treatmentColors <- c("Peat" = "red3", "Sandy" = "darkgreen", "Clay" = "royalblue3")
treatmentShapes <- c("Peat" = 1, "Sandy" = 2, "Clay" = 5)

####Plot raw data with free Y scale####
rawPlot <- ggplot() +
  geom_point(data=allData, aes(dateTime,Rnorm,col=treatment, alpha=0.75), size=1) +
  scale_color_manual(name = "Soil Type", values = treatmentColors) +
  scale_shape_manual(values=treatmentShapes, guide="none") +
  scale_alpha(guide="none") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  xlim(rStart, endDate) + theme_bw()

rawPlot

####Plot raw data with constrained Y scale####
rawPlot + ylim(0.9, 1.25)

To the naked eye, it appears that the three treatments (soil types) result in a different sensor response, on average. Let’s dig deeper!

Break out the plots by individual sensor, with linear fit lines

rawPlot + stat_smooth(data=allData, aes(dateTime,Rnorm,col=treatment), 
           method="lm", se=FALSE, linewidth = 1,formula = y~x) + 
              facet_wrap(~pot, nrow=3)

We see some interesting things here. The Peat installations are clearly reacting more than the other two soil types, but also in a more stochastic way. The reactions start largely near the end of the experiment, suggesting that we might be only seeing the beginning of a process that would be more clear with a longer experimental run-time. The Sandy installations appear to have a general increasing slope which is more gradual. The Clay soils appear to have a mostly flat but slightly parabolic shape - surprising, since we don’t generally expect the resistances of these sensors to decrease.

#From visual examination of the fact plots, "Peat.soil.3" looks like it failed.
#So let's remove it from the rest of the analysis:

allData <- subset(allData, allData$pot != "Peat.soil.3")

Look at average signals by soil type to visually look for overall trends

####Average sensor signals by treatment####
aveData <- allData %>% mutate(dateTime = cut(dateTime, breaks='6 hour')) %>% 
  group_by(treatment, dateTime) %>%
  dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))

aveData$dateTime <- as.POSIXct(aveData$dateTime)
aveData$tDay <- as.numeric(aveData$dateTime - min(aveData$dateTime))/60/60/24

####Plot the data####
treatmentPlot <- ggplot() +
  geom_point(data=aveData,
             aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  scale_shape_manual(values = treatmentShapes, guide="none") +
  xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25)

treatmentPlot

Extract slopes from the data

In general, our approach for decomposition sensor data analysis has been to extract individual sensor slopes and then look to correlate these slopes to other parameters of interest, for example the enzymatic activity within each instrumented plot. In this case, I will provide two different tables of slopes: One for the complete slope across the entire experiment (after a 12-hour period used to baseline the initial resistances), and one for the slope within some “decomposition window”, the part of the signal that we think contains the most useful information.

####First, smooth the data a bit####
sData <- allData %>%  mutate(dateTime = cut(dateTime, breaks='6 hour')) %>% 
  group_by(pot, dateTime) %>%
  dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))

#Add a time elapsed column, in days, called tDay
sData$dateTime <- as.POSIXct(sData$dateTime)
sData$tDay <- as.numeric(sData$dateTime - min(sData$dateTime))/60/60/24

#Get slopes ACROSS THE WHOLE EXPERIMENT
slopesAll <- sData %>% group_by(pot) %>%
  dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesAll$treatment <- as.factor(sub("\\..*","",slopesAll$pot)) #Get treatments

allPlot <- ggplot(data=slopesAll) + theme_bw()+theme(legend.position = "none")+
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab(bquote('Sensor Slope '(day^-1))) + 
  ggtitle("Overall Slope")

####Define the beginning and end of a decomposition window####
decompStart <- rStart + hours(1);
decompEnd <- rStart + days(10)

#Get slopes WITHIN THE DECOMPOSITION WINDOW
slopesWindow <- sData[sData$dateTime >= decompStart &
                         sData$dateTime <= decompEnd,] %>%
    group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesWindow$treatment <- as.factor(sub("\\..*","",slopesWindow$pot))

windowPlot <- ggplot(data=slopesWindow) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")

bothBox <- ggarrange(allPlot, windowPlot, ncol=2) #Combine both box plots

show(bothBox)

Look at the Decomposition Window analyzed in the previous nMin experiment

These slopes are the linear term of a 2nd-order polynomial fit. When we look at the overall slope distribution across all of the experiment, there is some separation; when we subset the data to only include a decomposition window that starts right after the experiment begins, and ends 10 days after the experiment begins, we see much better separation between soil types.

In the plot below, the decomposition window will be denoted using vertical black dashed lines. This represents the first of three decomposition windows that will be examined.

#Box plot for the First Decomposition Window
slopesBoxPlot <- ggplot(data=slopesWindow) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  geom_jitter(aes(x=treatment, y=slope, col=treatment), size=2, alpha=0.75) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")

#Treatment plot showing the First Decomposition Window
treatmentPlot <- ggplot() + ggtitle("Averaged Sensor Response by Treatment") +
  geom_point(data=aveData,
             aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
  geom_errorbar(data=aveData[aveData$dateTime >= decompStart &
                             aveData$dateTime <= decompEnd,],
                aes(x=dateTime, ymin=Rnorm-stDev, ymax=Rnorm+stDev,
                    col=treatment, alpha=0.5), width=50000, 
                position=position_dodge(10000)) +
  stat_smooth(data=aveData[aveData$dateTime >= decompStart &
                             aveData$dateTime <= decompEnd,],
    aes(dateTime,Rnorm,col=treatment), method="lm", se=FALSE, 
              linewidth = 1.25,formula = y~x) +
  geom_vline(xintercept = c(decompStart,decompEnd),
             color="black", linewidth = 1, linetype = "dashed") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  scale_shape_manual(values = treatmentShapes, guide="none") +
  xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25) + 
  theme(legend.position = "none")

ggarrange(treatmentPlot, slopesBoxPlot, ncol=2, widths=c(2,1)) #Combine plots

Slopes Tables

This table contains the slopes taken from each individual sensor, within three decomposition windows: 0-10 days, 10-20 days, and 20-30 days.

start1 <- rStart + minutes(1); end1 <- rStart + days(10)
start2 <- end1 + minutes(1); end2 <- start2 + days(10)
start3 <- end2 + minutes(1); end3 <- start3 + days(10)

#Get slopes WITHIN EACH DECOMPOSITION WINDOW for EACH POT
slopesWindow1 <- sData[sData$dateTime >= start1 & sData$dateTime <= end1,] %>%  
  group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesWindow2 <- sData[sData$dateTime >= start2 & sData$dateTime <= end2,] %>%  
  group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesWindow3 <- sData[sData$dateTime >= start3 & sData$dateTime <= end3,] %>%  
  group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

#Manipulate the dataframes a bit
slopesWindow1$treatment <- as.factor(sub("\\..*","",slopesWindow1$pot))
slopesWindow1$window <- as.factor("Decomp Window 1")
slopesWindow2$treatment <- as.factor(sub("\\..*","",slopesWindow2$pot))
slopesWindow2$window <- as.factor("Decomp Window 2")
slopesWindow3$treatment <- as.factor(sub("\\..*","",slopesWindow3$pot))
slopesWindow3$window <- as.factor("Decomp Window 3")

allSlopes <- rbind(slopesWindow1,slopesWindow2,slopesWindow3) #Combine windows

show(slopesWindow1);show(slopesWindow2);show(slopesWindow3);

## # A tibble: 14 × 4
##    pot              slope treatment window         
##    <fct>            <dbl> <fct>     <fct>          
##  1 Peat.soil.1   0.000668 Peat      Decomp Window 1
##  2 Peat.soil.2  -0.000604 Peat      Decomp Window 1
##  3 Peat.soil.4  -0.000191 Peat      Decomp Window 1
##  4 Peat.soil.5  -0.00126  Peat      Decomp Window 1
##  5 Sandy.soil.1  0.0103   Sandy     Decomp Window 1
##  6 Sandy.soil.2  0.00735  Sandy     Decomp Window 1
##  7 Sandy.soil.3  0.00790  Sandy     Decomp Window 1
##  8 Sandy.soil.4  0.0102   Sandy     Decomp Window 1
##  9 Sandy.soil.5  0.00829  Sandy     Decomp Window 1
## 10 Clay.soil.1   0.00636  Clay      Decomp Window 1
## 11 Clay.soil.2   0.00601  Clay      Decomp Window 1
## 12 Clay.soil.3   0.00619  Clay      Decomp Window 1
## 13 Clay.soil.4   0.00509  Clay      Decomp Window 1
## 14 Clay.soil.5   0.00739  Clay      Decomp Window 1

## # A tibble: 14 × 4
##    pot                slope treatment window         
##    <fct>              <dbl> <fct>     <fct>          
##  1 Peat.soil.1   0.0000186  Peat      Decomp Window 2
##  2 Peat.soil.2   0.000930   Peat      Decomp Window 2
##  3 Peat.soil.4   0.000688   Peat      Decomp Window 2
##  4 Peat.soil.5   0.00000816 Peat      Decomp Window 2
##  5 Sandy.soil.1 -0.000408   Sandy     Decomp Window 2
##  6 Sandy.soil.2  0.00139    Sandy     Decomp Window 2
##  7 Sandy.soil.3  0.00155    Sandy     Decomp Window 2
##  8 Sandy.soil.4  0.00145    Sandy     Decomp Window 2
##  9 Sandy.soil.5  0.00308    Sandy     Decomp Window 2
## 10 Clay.soil.1  -0.000710   Clay      Decomp Window 2
## 11 Clay.soil.2  -0.00146    Clay      Decomp Window 2
## 12 Clay.soil.3  -0.00175    Clay      Decomp Window 2
## 13 Clay.soil.4  -0.00182    Clay      Decomp Window 2
## 14 Clay.soil.5  -0.00162    Clay      Decomp Window 2

## # A tibble: 14 × 4
##    pot                slope treatment window         
##    <fct>              <dbl> <fct>     <fct>          
##  1 Peat.soil.1   0.00125    Peat      Decomp Window 3
##  2 Peat.soil.2   0.00153    Peat      Decomp Window 3
##  3 Peat.soil.4   0.00232    Peat      Decomp Window 3
##  4 Peat.soil.5   0.00481    Peat      Decomp Window 3
##  5 Sandy.soil.1  0.00306    Sandy     Decomp Window 3
##  6 Sandy.soil.2 -0.00615    Sandy     Decomp Window 3
##  7 Sandy.soil.3  0.00525    Sandy     Decomp Window 3
##  8 Sandy.soil.4 -0.00232    Sandy     Decomp Window 3
##  9 Sandy.soil.5 -0.000461   Sandy     Decomp Window 3
## 10 Clay.soil.1  -0.00000494 Clay      Decomp Window 3
## 11 Clay.soil.2  -0.00200    Clay      Decomp Window 3
## 12 Clay.soil.3  -0.00175    Clay      Decomp Window 3
## 13 Clay.soil.4   0.00154    Clay      Decomp Window 3
## 14 Clay.soil.5  -0.000591   Clay      Decomp Window 3

This table contains slopes taken from the AVERAGED sensor signal by treatment, across three windows:

#Get slopes WITHIN EACH DECOMPOSITION WINDOW FOR AVERAGED SIGNALS
aveWindow1 <- aveData[aveData$dateTime >= start1 & aveData$dateTime <= end1,] %>%  
  group_by(treatment) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

aveWindow2 <- aveData[aveData$dateTime >= start2 & aveData$dateTime <= end2,] %>%  
  group_by(treatment) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

aveWindow3 <- aveData[aveData$dateTime >= start3 & aveData$dateTime <= end3,] %>%  
  group_by(treatment) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

aveWindow1$window <- as.factor("Decomp Window 1")
aveWindow2$window <- as.factor("Decomp Window 2")
aveWindow3$window <- as.factor("Decomp Window 3")

aveSlopes <- rbind(aveWindow1,aveWindow2,aveWindow3) #Combine windows

show(aveSlopes)

## # A tibble: 9 × 3
##   treatment     slope window         
##   <fct>         <dbl> <fct>          
## 1 Clay       0.00621  Decomp Window 1
## 2 Peat      -0.000348 Decomp Window 1
## 3 Sandy      0.00879  Decomp Window 1
## 4 Clay      -0.00147  Decomp Window 2
## 5 Peat       0.000411 Decomp Window 2
## 6 Sandy      0.00141  Decomp Window 2
## 7 Clay      -0.000562 Decomp Window 3
## 8 Peat       0.00248  Decomp Window 3
## 9 Sandy     -0.000123 Decomp Window 3

Decomposition Windows

Let’s visually examine the results of taking slopes across three decomposition windows:

#Show slope data as a box plot
windowBoxPlot <- ggplot(data=allSlopes) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  geom_jitter(aes(x=treatment, y=slope, col=treatment), size=2, alpha=0.75) +
  scale_color_manual(name = "Treatment", values = treatmentColors) +
  xlab("") + ylab(bquote('Sensor Slope '(day^-1))) + facet_wrap(~ window)

windowTreatmentPlot <- ggplot() + 
  geom_point(data=aveData,
             aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
  geom_errorbar(data=aveData[aveData$dateTime >= decompStart &
                             aveData$dateTime <= end3,],
                aes(x=dateTime, ymin=Rnorm-stDev, ymax=Rnorm+stDev,
                    col=treatment, alpha=0.5), width=50000, 
                position=position_dodge(10000)) +
  stat_smooth(data=aveData[aveData$dateTime >= start1 & aveData$dateTime <= end1,],
    aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
  stat_smooth(data=aveData[aveData$dateTime >= start2 & aveData$dateTime <= end2,],
    aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
  stat_smooth(data=aveData[aveData$dateTime >= start3 & aveData$dateTime <= end3,],
    aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
   geom_vline(xintercept = c(start1,end1,start2,end2,start3,end3),
             color="black", linewidth = 0.5, linetype = "dashed") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_manual(name = "Treatment", values = treatmentColors) +
  scale_shape_manual(values = treatmentShapes, guide="none") +
  xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25) + 
  theme(legend.position = "none")

ggarrange(windowTreatmentPlot, windowBoxPlot, nrow=2)

Visually, it looks like we are seeing a statistically-significant impact of treatment (soil type) on the slopes that we see from these sensors at least for the first decomposition window. Let’s check this with ANOVAs:

summary(aov(slopesWindow1$slope~slopesWindow1$treatment))

##                         Df    Sum Sq   Mean Sq F value   Pr(>F)    
## slopesWindow1$treatment  2 1.927e-04 9.636e-05   89.96 1.52e-07 ***
## Residuals               11 1.178e-05 1.070e-06                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(slopesWindow2$slope~slopesWindow2$treatment))

##                         Df    Sum Sq   Mean Sq F value   Pr(>F)    
## slopesWindow2$treatment  2 2.138e-05 1.069e-05   15.47 0.000635 ***
## Residuals               11 7.598e-06 6.910e-07                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(slopesWindow3$slope~slopesWindow3$treatment))

##                         Df    Sum Sq   Mean Sq F value Pr(>F)
## slopesWindow3$treatment  2 2.319e-05 1.159e-05   1.324  0.305
## Residuals               11 9.630e-05 8.755e-06

Findings

We have defined several decomposition windows where we take individual normalized resistance slopes from each sensor. The windows correspond to the first, second, and third ten-day period of the experiment.
We take slopes by running a 2nd-order polynomial regression, and taking just the linear coefficient of the fit as the slope associated with that particular sensor.
Based on a one-way ANOVA test, we see a significant difference between the slope responses from sensors as a function of the soil in which they were installed for only the first window. However, if we remove the one sensor that seems to have failed early in the experiment, we then see a significant difference in treatment respones for the second window as well.

Next Questions

Next, we should use the slopes extracted from these sensor signals to see if they correlate to other measurements in these experimental pots:

Do sensor slopes correlate to gas emissions as measured over the course of the experiment?
Do sensor slopes correlate to signals from other co-located sensors?
Do sensor slopes correlate to other measured parameters in the pots?
Are some windows more useful than others? Even though these sensors aren’t failing quickly the way that prior generations do, they may still be “depleted” in some sense without actually failing.