This R Publication is intended to provide updates on current and ongoing data analysis to the researches involved in analysis of data from the SuperTarantula experiment at UC Berkeley. Using this approach provides transparency regarding the data analysis methods used, and improves communication between researchers.

# Load Libraries ----------------------------------------------------------
library(tidyverse); library(dplyr); library(lubridate); library(reshape2)
library(ggplot2); library(cowplot); library(RColorBrewer); library(ggpubr) 
library(DescTools); library(scales)

# Import and Process Text File -------------------------------------------
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

####IMPORT LOGGED DATA####
#Pull data in from text files and format timestamps
setwd(paste(dirname(rstudioapi::getActiveDocumentContext()$path),
            "/May 2024 Data",sep=""))

rawData <- read.table("N-min_sensors_raw.csv",sep=',', header=TRUE)
rawData <- select(rawData, -Day)
names(rawData)[names(rawData) == "Timestamp"] <- "dateTime"

#Convert columns to rows for data analysis
allData <- melt(rawData,id.vars=c("dateTime"))
names(allData)[names(allData) == "variable"] <- "pot"
names(allData)[names(allData) == "value"] <- "reading"

allData$dateTime <- as.POSIXct(allData$dateTime,format="%m/%d/%Y %H:%M")
allData <- allData[order(allData$dateTime),] #Order by timestamp

# Normalize Resistances ---------------------------------------------------
#R_o will be the mean resistance value between these dateTimes:
rStart <- as.POSIXct("2024-04-26 00:00:00"); rEnd <- rStart + hours(12)
endDate <- max(allData$dateTime)

Ro <- allData[allData$dateTime > rStart & allData$dateTime < rEnd,] %>% 
  group_by(pot) %>% summarize(Ro = mean(reading))

allData <- merge(allData,Ro,by="pot"); allData$Rnorm=allData$reading/allData$Ro

# Map to treatments -------------------------------------------------------
allData$treatment <- as.factor(sub("\\..*","",allData$pot)) #Get treatment types

The code above converts the data into a single data.frame in which each line is a single reading.

Plot the raw data

# Plot the data -----------------------------------------------------------
treatmentColors <- c("Peat" = "darkgreen", "Sandy" = "darkorange", "Clay" = "darkred")
treatmentShapes <- c("Peat" = 1, "Sandy" = 2, "Clay" = 5)

####Plot raw data with free Y scale####
rawPlot <- ggplot() +
  geom_point(data=allData, aes(dateTime,Rnorm,col=treatment, alpha=0.75), size=1) +
  scale_color_manual(name = "Soil Type", values = treatmentColors) +
  scale_shape_manual(values=treatmentShapes, guide="none") +
  scale_alpha(guide="none") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  xlim(rStart, endDate) + theme_bw()

rawPlot

####Plot raw data with constrained Y scale####
rawPlot + ylim(0.9, 1.25)

To the naked eye, it appears that the three treatments (soil types) result in a different sensor response, on average. Let’s dig deeper!

Break out the plots by individual sensor, with linear fit lines

rawPlot + 
  stat_smooth(data=allData, aes(dateTime,Rnorm,col=treatment), method="lm", 
              se=FALSE, linewidth = 1,formula = y~x) + 
              facet_wrap(~pot, nrow=3)

We see some interesting things here. The Peat installations are clearly reacting more than the other two soil types, but also in a more stochastic way. The reactions start largely near the end of the experiment, suggesting that we might be only seeing the beginning of a process that would be more clear with a longer experimental run-time. The Sandy installations appear to have a general increasing slope which is more gradual. The Clay soils appear to have a mostly flat but slightly parabolic shape - surprising, since we don’t generally expect the resistances of these sensors to decrease.

Extract slopes from the data

In general, our approach for decomposition sensor data analysis has been to extract individual sensor slopes and then look to correlate these slopes to other parameters of interest, for example the enzymatic activity within each instrumented plot. In this case, I will provide two different tables of slopes: One for the complete slope across the entire experiment (after a 12-hour period used to baseline the initial resistances), and one for the slope within some “decomposition window”, the part of the signal that we think contains the most useful information.

####First, smooth the data a bit####
sData <- allData %>% 
  mutate(dateTime = cut(dateTime, breaks='6 hour')) %>% 
  group_by(pot, dateTime) %>%
  dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))

#Add a time elapsed column, in days, called tDay
sData$dateTime <- as.POSIXct(sData$dateTime)
sData$tDay <- as.numeric(sData$dateTime - min(sData$dateTime))/60/60/24

#Get slopes ACROSS THE WHOLE EXPERIMENT
slopesAll <- sData %>%
  group_by(pot) %>%
  dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesAll$treatment <- as.factor(sub("\\..*","",slopesAll$pot)) #Get treatments

allPlot <- ggplot(data=slopesAll) + theme_bw()+theme(legend.position = "none")+
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab("Sensor Slope") + ggtitle("Overall Slope")

####Define the beginning and end of a decomposition window####
decompStart <- rStart + hours(1)
decompEnd <- rStart + days(10)

#Get slopes WITHIN THE DECOMPOSITION WINDOW
slopesWindow <- sData[sData$dateTime >= decompStart &
                         sData$dateTime <= decompEnd,] %>%
    group_by(pot) %>%
    dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])

slopesWindow$treatment <- as.factor(sub("\\..*","",slopesWindow$pot))

windowPlot <- ggplot(data=slopesWindow) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")

bothBox <- ggarrange(allPlot, windowPlot, ncol=2) #Show both box plots

show(bothBox)

Look at the Decomposition Window analyzed in the previous nMin experiment

These slopes are the linear term of a 2nd-order polynomial fit. When we look at the overall slope distribution across all of the experiment, there is some separation; when we subset the data to only include a decomposition window that starts pretty much immediately after the experiment begins, and ends 10 days after the experiment begins, we see much better separation between soil types.

In the plot below, the decomposition window will be denoted using vertical black dashed lines.

####Remove the outlier sensor's data####
#slopesAll <- slopesAll[slopesAll$pot != "Peat.soil.5",]
#slopesWindow <- slopesWindow[slopesWindow$pot != "Peat.soil.5",]

slopesBoxPlot <- ggplot(data=slopesWindow) + theme_bw() +
  geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
  geom_jitter(aes(x=treatment, y=slope, col=treatment), size=2, alpha=0.75) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")

####Get new averaged data####
aveData <- allData %>% 
  mutate(dateTime = cut(dateTime, breaks='12 hour')) %>% 
  group_by(treatment, dateTime) %>%
  dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))

aveData$dateTime <- as.POSIXct(aveData$dateTime)

treatmentPlot <- ggplot() + ggtitle("Averaged Sensor Response by Treatment") +
  geom_point(data=aveData,
             aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
  geom_errorbar(data=aveData[aveData$dateTime >= decompStart &
                             aveData$dateTime <= decompEnd,],
                aes(x=dateTime, ymin=Rnorm-stDev, ymax=Rnorm+stDev,
                    col=treatment, alpha=0.5), width=50000, 
                position=position_dodge(10000)) +
  stat_smooth(data=aveData[aveData$dateTime >= decompStart &
                             aveData$dateTime <= decompEnd,],
    aes(dateTime,Rnorm,col=treatment), method="lm", se=FALSE, 
              linewidth = 1.25,formula = y~x) +
  geom_vline(xintercept = c(decompStart,decompEnd),
             color="black", linewidth = 1, linetype = "dashed") +
  xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
  scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
  scale_shape_manual(values = treatmentShapes, guide="none") +
  xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25) + 
  theme(legend.position = "none")

ggarrange(treatmentPlot, slopesBoxPlot, ncol=2, widths=c(2,1))

Visually, it looks like we are seeing a statistically-significant impact of treatment (soil type) on the slopes that we see from these sensors. Let’s check this with an ANOVA:

windowAnova <- aov(slopesWindow$slope~slopesWindow$treatment)
summary(windowAnova)
##                        Df    Sum Sq   Mean Sq F value   Pr(>F)    
## slopesWindow$treatment  2 2.243e-04 1.121e-04   113.9 1.57e-08 ***
## Residuals              12 1.182e-05 9.800e-07                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Slopes Tables

These tables contain the slopes taken from each individual sensor, both across the whole experiment (which we expect will be less useful) and within the decomposition window.

show(slopesAll)
## # A tibble: 15 × 3
##    pot              slope treatment
##    <fct>            <dbl> <fct>    
##  1 Peat.soil.1   0.00312  Peat     
##  2 Peat.soil.2   0.000950 Peat     
##  3 Peat.soil.3   0.00703  Peat     
##  4 Peat.soil.4   0.00226  Peat     
##  5 Peat.soil.5   0.00689  Peat     
##  6 Sandy.soil.1  0.00229  Sandy    
##  7 Sandy.soil.2  0.00141  Sandy    
##  8 Sandy.soil.3  0.00367  Sandy    
##  9 Sandy.soil.4  0.00320  Sandy    
## 10 Sandy.soil.5  0.00152  Sandy    
## 11 Clay.soil.1   0.000873 Clay     
## 12 Clay.soil.2  -0.000234 Clay     
## 13 Clay.soil.3   0.000254 Clay     
## 14 Clay.soil.4   0.000689 Clay     
## 15 Clay.soil.5   0.000490 Clay
show(slopesWindow)
## # A tibble: 15 × 3
##    pot              slope treatment
##    <fct>            <dbl> <fct>    
##  1 Peat.soil.1   0.000668 Peat     
##  2 Peat.soil.2  -0.000604 Peat     
##  3 Peat.soil.3  -0.000556 Peat     
##  4 Peat.soil.4  -0.000191 Peat     
##  5 Peat.soil.5  -0.00126  Peat     
##  6 Sandy.soil.1  0.0103   Sandy    
##  7 Sandy.soil.2  0.00735  Sandy    
##  8 Sandy.soil.3  0.00790  Sandy    
##  9 Sandy.soil.4  0.0102   Sandy    
## 10 Sandy.soil.5  0.00829  Sandy    
## 11 Clay.soil.1   0.00636  Clay     
## 12 Clay.soil.2   0.00601  Clay     
## 13 Clay.soil.3   0.00619  Clay     
## 14 Clay.soil.4   0.00509  Clay     
## 15 Clay.soil.5   0.00739  Clay

Initial Findings

  1. We have defined a “decomposition window” where we take individual normalized resistance slopes from each sensor. In this case, we define this window as starting 1 week into the experiment, and ending when the experimental data ends.

  2. We take slopes by running a 2nd-order polynomial regression, and taking just the linear coefficient of the fit as the slope associated with that particular sensor.

  3. Based on a one-way ANOVA test, we see a significant difference between the slope responses from sensors as a function of the soil in which they were installed.

Next Questions

Next, we should use the slopes extracted from these sensor signals to see if they correlate to other measurements in these experimental pots: