This R Publication is intended to provide updates on current and ongoing data analysis to the researches involved in analysis of data from the SuperTarantula experiment at UC Berkeley. Using this approach provides transparency regarding the data analysis methods used, and improves communication between researchers.
# Load Libraries ----------------------------------------------------------
library(tidyverse); library(dplyr); library(lubridate); library(reshape2)
library(ggplot2); library(cowplot); library(RColorBrewer); library(ggpubr)
library(DescTools); library(scales)
# Import and Process Text File -------------------------------------------
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
####IMPORT LOGGED DATA####
#Pull data in from text files and format timestamps
setwd(paste(dirname(rstudioapi::getActiveDocumentContext()$path),
"/May 2024 Data",sep=""))
rawData <- read.table("N-min_sensors_raw.csv",sep=',', header=TRUE)
rawData <- select(rawData, -Day)
names(rawData)[names(rawData) == "Timestamp"] <- "dateTime"
#Convert columns to rows for data analysis and rename some columns
allData <- melt(rawData,id.vars=c("dateTime"))
names(allData)[names(allData) == "variable"] <- "pot"
names(allData)[names(allData) == "value"] <- "reading"
allData$dateTime <- as.POSIXct(allData$dateTime,format="%m/%d/%Y %H:%M")
allData <- allData[order(allData$dateTime),] #Order by timestamp
# Normalize Resistances ---------------------------------------------------
#R_o will be the mean resistance value between these dateTimes:
rStart <- as.POSIXct("2024-04-26 00:00:00"); rEnd <- rStart + hours(12)
endDate <- max(allData$dateTime)
Ro <- allData[allData$dateTime > rStart & allData$dateTime < rEnd,] %>%
group_by(pot) %>% summarize(Ro = mean(reading))
allData <- merge(allData,Ro,by="pot"); allData$Rnorm=allData$reading/allData$Ro
# Map to treatments -------------------------------------------------------
allData$treatment <- as.factor(sub("\\..*","",allData$pot)) #Get treatment types
The code above converts the data into a single data.frame in which each line is a single reading.
# Plot the data -----------------------------------------------------------
treatmentColors <- c("Peat" = "red3", "Sandy" = "darkgreen", "Clay" = "royalblue3")
treatmentShapes <- c("Peat" = 1, "Sandy" = 2, "Clay" = 5)
####Plot raw data with free Y scale####
rawPlot <- ggplot() +
geom_point(data=allData, aes(dateTime,Rnorm,col=treatment, alpha=0.75), size=1) +
scale_color_manual(name = "Soil Type", values = treatmentColors) +
scale_shape_manual(values=treatmentShapes, guide="none") +
scale_alpha(guide="none") +
xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
xlim(rStart, endDate) + theme_bw()
rawPlot
####Plot raw data with constrained Y scale####
rawPlot + ylim(0.9, 1.25)
To the naked eye, it appears that the three treatments (soil types)
result in a different sensor response, on average. Let’s dig deeper!
rawPlot + stat_smooth(data=allData, aes(dateTime,Rnorm,col=treatment),
method="lm", se=FALSE, linewidth = 1,formula = y~x) +
facet_wrap(~pot, nrow=3)
We see some interesting things here. The Peat installations are
clearly reacting more than the other two soil types, but also in a more
stochastic way. The reactions start largely near the end of the
experiment, suggesting that we might be only seeing the beginning of a
process that would be more clear with a longer experimental run-time.
The Sandy installations appear to have a general increasing
slope which is more gradual. The Clay soils appear to have a
mostly flat but slightly parabolic shape - surprising, since we don’t
generally expect the resistances of these sensors to
decrease.
#From visual examination of the fact plots, "Peat.soil.3" looks like it failed.
#So let's remove it from the rest of the analysis:
allData <- subset(allData, allData$pot != "Peat.soil.3")
####Average sensor signals by treatment####
aveData <- allData %>% mutate(dateTime = cut(dateTime, breaks='6 hour')) %>%
group_by(treatment, dateTime) %>%
dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))
aveData$dateTime <- as.POSIXct(aveData$dateTime)
aveData$tDay <- as.numeric(aveData$dateTime - min(aveData$dateTime))/60/60/24
####Plot the data####
treatmentPlot <- ggplot() +
geom_point(data=aveData,
aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
scale_shape_manual(values = treatmentShapes, guide="none") +
xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25)
treatmentPlot
In general, our approach for decomposition sensor data analysis has been to extract individual sensor slopes and then look to correlate these slopes to other parameters of interest, for example the enzymatic activity within each instrumented plot. In this case, I will provide two different tables of slopes: One for the complete slope across the entire experiment (after a 12-hour period used to baseline the initial resistances), and one for the slope within some “decomposition window”, the part of the signal that we think contains the most useful information.
####First, smooth the data a bit####
sData <- allData %>% mutate(dateTime = cut(dateTime, breaks='6 hour')) %>%
group_by(pot, dateTime) %>%
dplyr::summarize(stDev = sd(Rnorm), Rnorm = mean(Rnorm, na.rm=T))
#Add a time elapsed column, in days, called tDay
sData$dateTime <- as.POSIXct(sData$dateTime)
sData$tDay <- as.numeric(sData$dateTime - min(sData$dateTime))/60/60/24
#Get slopes ACROSS THE WHOLE EXPERIMENT
slopesAll <- sData %>% group_by(pot) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
slopesAll$treatment <- as.factor(sub("\\..*","",slopesAll$pot)) #Get treatments
allPlot <- ggplot(data=slopesAll) + theme_bw()+theme(legend.position = "none")+
geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
xlab("") + ylab(bquote('Sensor Slope '(day^-1))) +
ggtitle("Overall Slope")
####Define the beginning and end of a decomposition window####
decompStart <- rStart + hours(1);
decompEnd <- rStart + days(10)
#Get slopes WITHIN THE DECOMPOSITION WINDOW
slopesWindow <- sData[sData$dateTime >= decompStart &
sData$dateTime <= decompEnd,] %>%
group_by(pot) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
slopesWindow$treatment <- as.factor(sub("\\..*","",slopesWindow$pot))
windowPlot <- ggplot(data=slopesWindow) + theme_bw() +
geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")
bothBox <- ggarrange(allPlot, windowPlot, ncol=2) #Combine both box plots
show(bothBox)
These slopes are the linear term of a 2nd-order polynomial fit. When we look at the overall slope distribution across all of the experiment, there is some separation; when we subset the data to only include a decomposition window that starts right after the experiment begins, and ends 10 days after the experiment begins, we see much better separation between soil types.
In the plot below, the decomposition window will be denoted using vertical black dashed lines. This represents the first of three decomposition windows that will be examined.
#Box plot for the First Decomposition Window
slopesBoxPlot <- ggplot(data=slopesWindow) + theme_bw() +
geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
geom_jitter(aes(x=treatment, y=slope, col=treatment), size=2, alpha=0.75) +
scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
xlab("") + ylab("") + ggtitle("Slope Within Decomp Window")
#Treatment plot showing the First Decomposition Window
treatmentPlot <- ggplot() + ggtitle("Averaged Sensor Response by Treatment") +
geom_point(data=aveData,
aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
geom_errorbar(data=aveData[aveData$dateTime >= decompStart &
aveData$dateTime <= decompEnd,],
aes(x=dateTime, ymin=Rnorm-stDev, ymax=Rnorm+stDev,
col=treatment, alpha=0.5), width=50000,
position=position_dodge(10000)) +
stat_smooth(data=aveData[aveData$dateTime >= decompStart &
aveData$dateTime <= decompEnd,],
aes(dateTime,Rnorm,col=treatment), method="lm", se=FALSE,
linewidth = 1.25,formula = y~x) +
geom_vline(xintercept = c(decompStart,decompEnd),
color="black", linewidth = 1, linetype = "dashed") +
xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
scale_color_manual(name = "Pot Treatment", values = treatmentColors) +
scale_shape_manual(values = treatmentShapes, guide="none") +
xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25) +
theme(legend.position = "none")
ggarrange(treatmentPlot, slopesBoxPlot, ncol=2, widths=c(2,1)) #Combine plots
This table contains the slopes taken from each individual sensor, within three decomposition windows: 0-10 days, 10-20 days, and 20-30 days.
start1 <- rStart + minutes(1); end1 <- rStart + days(10)
start2 <- end1 + minutes(1); end2 <- start2 + days(10)
start3 <- end2 + minutes(1); end3 <- start3 + days(10)
#Get slopes WITHIN EACH DECOMPOSITION WINDOW for EACH POT
slopesWindow1 <- sData[sData$dateTime >= start1 & sData$dateTime <= end1,] %>%
group_by(pot) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
slopesWindow2 <- sData[sData$dateTime >= start2 & sData$dateTime <= end2,] %>%
group_by(pot) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
slopesWindow3 <- sData[sData$dateTime >= start3 & sData$dateTime <= end3,] %>%
group_by(pot) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
#Manipulate the dataframes a bit
slopesWindow1$treatment <- as.factor(sub("\\..*","",slopesWindow1$pot))
slopesWindow1$window <- as.factor("Decomp Window 1")
slopesWindow2$treatment <- as.factor(sub("\\..*","",slopesWindow2$pot))
slopesWindow2$window <- as.factor("Decomp Window 2")
slopesWindow3$treatment <- as.factor(sub("\\..*","",slopesWindow3$pot))
slopesWindow3$window <- as.factor("Decomp Window 3")
allSlopes <- rbind(slopesWindow1,slopesWindow2,slopesWindow3) #Combine windows
show(slopesWindow1);show(slopesWindow2);show(slopesWindow3);
## # A tibble: 14 × 4
## pot slope treatment window
## <fct> <dbl> <fct> <fct>
## 1 Peat.soil.1 0.000668 Peat Decomp Window 1
## 2 Peat.soil.2 -0.000604 Peat Decomp Window 1
## 3 Peat.soil.4 -0.000191 Peat Decomp Window 1
## 4 Peat.soil.5 -0.00126 Peat Decomp Window 1
## 5 Sandy.soil.1 0.0103 Sandy Decomp Window 1
## 6 Sandy.soil.2 0.00735 Sandy Decomp Window 1
## 7 Sandy.soil.3 0.00790 Sandy Decomp Window 1
## 8 Sandy.soil.4 0.0102 Sandy Decomp Window 1
## 9 Sandy.soil.5 0.00829 Sandy Decomp Window 1
## 10 Clay.soil.1 0.00636 Clay Decomp Window 1
## 11 Clay.soil.2 0.00601 Clay Decomp Window 1
## 12 Clay.soil.3 0.00619 Clay Decomp Window 1
## 13 Clay.soil.4 0.00509 Clay Decomp Window 1
## 14 Clay.soil.5 0.00739 Clay Decomp Window 1
## # A tibble: 14 × 4
## pot slope treatment window
## <fct> <dbl> <fct> <fct>
## 1 Peat.soil.1 0.0000186 Peat Decomp Window 2
## 2 Peat.soil.2 0.000930 Peat Decomp Window 2
## 3 Peat.soil.4 0.000688 Peat Decomp Window 2
## 4 Peat.soil.5 0.00000816 Peat Decomp Window 2
## 5 Sandy.soil.1 -0.000408 Sandy Decomp Window 2
## 6 Sandy.soil.2 0.00139 Sandy Decomp Window 2
## 7 Sandy.soil.3 0.00155 Sandy Decomp Window 2
## 8 Sandy.soil.4 0.00145 Sandy Decomp Window 2
## 9 Sandy.soil.5 0.00308 Sandy Decomp Window 2
## 10 Clay.soil.1 -0.000710 Clay Decomp Window 2
## 11 Clay.soil.2 -0.00146 Clay Decomp Window 2
## 12 Clay.soil.3 -0.00175 Clay Decomp Window 2
## 13 Clay.soil.4 -0.00182 Clay Decomp Window 2
## 14 Clay.soil.5 -0.00162 Clay Decomp Window 2
## # A tibble: 14 × 4
## pot slope treatment window
## <fct> <dbl> <fct> <fct>
## 1 Peat.soil.1 0.00125 Peat Decomp Window 3
## 2 Peat.soil.2 0.00153 Peat Decomp Window 3
## 3 Peat.soil.4 0.00232 Peat Decomp Window 3
## 4 Peat.soil.5 0.00481 Peat Decomp Window 3
## 5 Sandy.soil.1 0.00306 Sandy Decomp Window 3
## 6 Sandy.soil.2 -0.00615 Sandy Decomp Window 3
## 7 Sandy.soil.3 0.00525 Sandy Decomp Window 3
## 8 Sandy.soil.4 -0.00232 Sandy Decomp Window 3
## 9 Sandy.soil.5 -0.000461 Sandy Decomp Window 3
## 10 Clay.soil.1 -0.00000494 Clay Decomp Window 3
## 11 Clay.soil.2 -0.00200 Clay Decomp Window 3
## 12 Clay.soil.3 -0.00175 Clay Decomp Window 3
## 13 Clay.soil.4 0.00154 Clay Decomp Window 3
## 14 Clay.soil.5 -0.000591 Clay Decomp Window 3
This table contains slopes taken from the AVERAGED sensor signal by treatment, across three windows:
#Get slopes WITHIN EACH DECOMPOSITION WINDOW FOR AVERAGED SIGNALS
aveWindow1 <- aveData[aveData$dateTime >= start1 & aveData$dateTime <= end1,] %>%
group_by(treatment) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
aveWindow2 <- aveData[aveData$dateTime >= start2 & aveData$dateTime <= end2,] %>%
group_by(treatment) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
aveWindow3 <- aveData[aveData$dateTime >= start3 & aveData$dateTime <= end3,] %>%
group_by(treatment) %>%
dplyr::summarize(slope = summary(lm(Rnorm ~ tDay + tDay^2))$coefficients[2])
aveWindow1$window <- as.factor("Decomp Window 1")
aveWindow2$window <- as.factor("Decomp Window 2")
aveWindow3$window <- as.factor("Decomp Window 3")
aveSlopes <- rbind(aveWindow1,aveWindow2,aveWindow3) #Combine windows
show(aveSlopes)
## # A tibble: 9 × 3
## treatment slope window
## <fct> <dbl> <fct>
## 1 Clay 0.00621 Decomp Window 1
## 2 Peat -0.000348 Decomp Window 1
## 3 Sandy 0.00879 Decomp Window 1
## 4 Clay -0.00147 Decomp Window 2
## 5 Peat 0.000411 Decomp Window 2
## 6 Sandy 0.00141 Decomp Window 2
## 7 Clay -0.000562 Decomp Window 3
## 8 Peat 0.00248 Decomp Window 3
## 9 Sandy -0.000123 Decomp Window 3
Let’s visually examine the results of taking slopes across three decomposition windows:
#Show slope data as a box plot
windowBoxPlot <- ggplot(data=allSlopes) + theme_bw() +
geom_boxplot(aes(x=treatment, y=slope, col=treatment)) +
geom_jitter(aes(x=treatment, y=slope, col=treatment), size=2, alpha=0.75) +
scale_color_manual(name = "Treatment", values = treatmentColors) +
xlab("") + ylab(bquote('Sensor Slope '(day^-1))) + facet_wrap(~ window)
windowTreatmentPlot <- ggplot() +
geom_point(data=aveData,
aes(dateTime,Rnorm,col=treatment,shape=treatment), size=1) +
geom_errorbar(data=aveData[aveData$dateTime >= decompStart &
aveData$dateTime <= end3,],
aes(x=dateTime, ymin=Rnorm-stDev, ymax=Rnorm+stDev,
col=treatment, alpha=0.5), width=50000,
position=position_dodge(10000)) +
stat_smooth(data=aveData[aveData$dateTime >= start1 & aveData$dateTime <= end1,],
aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
stat_smooth(data=aveData[aveData$dateTime >= start2 & aveData$dateTime <= end2,],
aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
stat_smooth(data=aveData[aveData$dateTime >= start3 & aveData$dateTime <= end3,],
aes(dateTime,Rnorm,col=treatment), method="lm",se=FALSE,linewidth = 1.25,formula = y~x) +
geom_vline(xintercept = c(start1,end1,start2,end2,start3,end3),
color="black", linewidth = 0.5, linetype = "dashed") +
xlab("") + ylab(expression("Normalized Resistance (R/R"[o]*")")) +
scale_color_manual(name = "Treatment", values = treatmentColors) +
scale_shape_manual(values = treatmentShapes, guide="none") +
xlim(rStart, endDate) + theme_bw() + ylim(0.95,1.25) +
theme(legend.position = "none")
ggarrange(windowTreatmentPlot, windowBoxPlot, nrow=2)
Visually, it looks like we are seeing a statistically-significant impact of treatment (soil type) on the slopes that we see from these sensors at least for the first decomposition window. Let’s check this with ANOVAs:
summary(aov(slopesWindow1$slope~slopesWindow1$treatment))
## Df Sum Sq Mean Sq F value Pr(>F)
## slopesWindow1$treatment 2 1.927e-04 9.636e-05 89.96 1.52e-07 ***
## Residuals 11 1.178e-05 1.070e-06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(slopesWindow2$slope~slopesWindow2$treatment))
## Df Sum Sq Mean Sq F value Pr(>F)
## slopesWindow2$treatment 2 2.138e-05 1.069e-05 15.47 0.000635 ***
## Residuals 11 7.598e-06 6.910e-07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(slopesWindow3$slope~slopesWindow3$treatment))
## Df Sum Sq Mean Sq F value Pr(>F)
## slopesWindow3$treatment 2 2.319e-05 1.159e-05 1.324 0.305
## Residuals 11 9.630e-05 8.755e-06
We have defined several decomposition windows where we take individual normalized resistance slopes from each sensor. The windows correspond to the first, second, and third ten-day period of the experiment.
We take slopes by running a 2nd-order polynomial regression, and taking just the linear coefficient of the fit as the slope associated with that particular sensor.
Based on a one-way ANOVA test, we see a significant difference between the slope responses from sensors as a function of the soil in which they were installed for only the first window. However, if we remove the one sensor that seems to have failed early in the experiment, we then see a significant difference in treatment respones for the second window as well.
Next, we should use the slopes extracted from these sensor signals to see if they correlate to other measurements in these experimental pots: