Introduction

Plants from experimental setup

Plants from experimental setup

Legumes, which are vital to global food security and sustainable agriculture, depend on symbiotic relationships with nitrogen-fixing soil bacteria like Bradyrhizobium to thrive in nutrient-poor soils. As climate change exacerbates variability in precipitation, understanding how this critical symbiosis responds to water stress becomes essential for protecting crop productivity. This pilot study investigates how soybean plants inoculated with Bradyrhizobium use their enrrgy and allocate biomass under low, ambient, and high water treatments. Specifically, our analysis focuses on metrics concerning reproductive activity (seed pod production) and plant health that can be used to predict crop yields. While the absence of a negative control (one microbial group n=120) limits data regarding the role of Bradyrhizobium, the findings shed light on drought resilience strategies, revealing trade-offs between root expansion and reproductive success under water scarcity. Such insights are urgently needed to guide farmers in adapting cultivation practices to increasingly unpredictable growing conditions.

Load Libraries & Prepare Data

library(ggplot2)
library(dplyr)
library(rstatix)
library("RColorBrewer")

#load harvest data
harvest.df <- read.csv("SoybeanSeptemberCehlar2024.csv", header = TRUE)

Note: Not shown is some additional preparation of the data. filtering outliers, setting the factors, renaming and ordering groups, etc.

Defining Our Metrics:

Root To Shoot Ratio → Biomass Allocation, is the plant investing in bigger roots or bigger shoots? (green leafy aboveground parts)

Leggy-ness Ratio → Trifoliate Count/Above ground length. Think: how leafy is this plant? In our case, leafier = healthier plant

Seed Pod Count → Seed Pod metrics relate to reproductive activity, or plant “fitness”

Seed Pod Mass → Plentiful and massive seed pods = more food for us and more profit for our farmers

Lets Explore the Raw Data

For our metrics, lets look at distribution patterns by Water Treatment using histograms. This is a good way to help us think about and visualize potential relationships in our data before analysis.

## Statistical Analysis:

It’s important to understand what kind of data you’re working with before you approach statistical analysis. In our case, we’re looking at the effects of three different treatment groups (Low Water, Ambient Water and High Water) across a series of response variables corresponding to plant health and fitness.

Since we’ll be comparing metrics across three groups, an ANOVA (analysis of variance) will be the most appropriate test to compare the means of our data. In compliance with the assumptions of ANOVA, we must assess the normality of our residuals based on a linear model. If our residuals are not normal, and cannot be made normal using a log transformation, then we can use a Kruskal Wallis test as a nonparametric verison of ANOVA. If the residuals are normally distributed, then we can assume that the ANOVA accurately quantifies/represents the relationship between our plant metrics and the watering treatment.

In the following section, we inspect our data to see which test is most appropriate. We also make use of relevant post-hoc tests as well as the summarise() function to more thoroughly examine our data.

#R:S
#build a model
r2s.model <- lm(Root2ShootRatio ~ WaterTreatment, data = subset.df)
#check for normality
qqnorm(r2s.model$residuals)
qqline(r2s.model$residuals)

hist(r2s.model$residuals)

shapiro.test(r2s.model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  r2s.model$residuals
## W = 0.94921, p-value = 0.0007897
#residuals are not normal
#R2S log transformation
subset.df$LogRoot2ShootRatio <- log(subset.df$Root2ShootRatio)
#new model using log transformed R2S
LogRoot2ShootModel <- lm(LogRoot2ShootRatio ~ WaterTreatment, data = subset.df)
#check transformed residuals
qqnorm(LogRoot2ShootModel$residuals)
qqline(LogRoot2ShootModel$residuals)

hist(LogRoot2ShootModel$residuals)

shapiro.test(LogRoot2ShootModel$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  LogRoot2ShootModel$residuals
## W = 0.98395, p-value = 0.2725
#new residuals are normal
#ANOVA for LogRoot2ShootRatio 
anova(LogRoot2ShootModel)
## Analysis of Variance Table
## 
## Response: LogRoot2ShootRatio
##                Df  Sum Sq Mean Sq F value    Pr(>F)    
## WaterTreatment  2  9.5603  4.7802  23.097 6.467e-09 ***
## Residuals      96 19.8681  0.2070                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Root2ShootRatio is afffected by water treatment
TukeyHSD(aov(LogRoot2ShootModel))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = LogRoot2ShootModel)
## 
## $WaterTreatment
##                                diff        lwr         upr     p adj
## Ambient Water-Low Water  -0.4103462 -0.6790386 -0.14165393 0.0012922
## High Water-Low Water     -0.7549388 -1.0195886 -0.49028907 0.0000000
## High Water-Ambient Water -0.3445926 -0.6113325 -0.07785265 0.0076468
#All Root2Shoot Ratios are significantly different based on water treatment. 
summarise(data_by_water, group_mean = mean(Root2ShootRatio, na.rm=TRUE))
## # A tibble: 3 Ă— 2
##   WaterTreatment group_mean
##   <fct>               <dbl>
## 1 Low Water           0.630
## 2 Ambient Water       0.415
## 3 High Water          0.297
#Root to shoot is significantly decreased in response to the low water conditions

#Legginess
#build a model
legginess.model <- lm(LegRatio ~ WaterTreatment, data = subset.df)
#check for normality
qqnorm(legginess.model$residuals)
qqline(legginess.model$residuals)

hist(legginess.model$residuals)

shapiro.test(legginess.model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  legginess.model$residuals
## W = 0.58095, p-value = 1.321e-15
#residuals are not normal
#attempted log transformation. its complicated because of our 0's. Giving in and using a nonparametric test
kruskal.test(LegRatio ~ WaterTreatment, data= subset.df)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  LegRatio by WaterTreatment
## Kruskal-Wallis chi-squared = 40.617, df = 2, p-value = 1.514e-09
#Legginess was significantly affected by water treatment 
subset.df %>% 
  dunn_test(LegRatio ~ WaterTreatment)
## # A tibble: 3 Ă— 9
##   .y.      group1      group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>    <chr>       <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 LegRatio Low Water   Ambie…    34    33     0.498 6.19e-1 6.19e-1 ns          
## 2 LegRatio Low Water   High …    34    35     5.75  8.83e-9 2.65e-8 ****        
## 3 LegRatio Ambient Wa… High …    33    35     5.21  1.92e-7 3.84e-7 ****
#Root to shoot is significantly decreased in response to the low water conditions
summarise(data_by_water, group_mean = mean(LegRatio, na.rm=TRUE))
## # A tibble: 3 Ă— 2
##   WaterTreatment group_mean
##   <fct>               <dbl>
## 1 Low Water         0.00571
## 2 Ambient Water     0.0105 
## 3 High Water        0.0394
#seed pod count 
is.numeric(subset.df$SeedPodCount)
## [1] TRUE
spc.model <- lm(SeedPodCount ~ WaterTreatment, data = subset.df)
#check for normality
qqnorm(spc.model$residuals)
qqline(spc.model$residuals)

hist(spc.model$residuals) #damn. thats good.

shapiro.test(spc.model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  spc.model$residuals
## W = 0.98007, p-value = 0.1265
#residuals are normal
anova(spc.model)
## Analysis of Variance Table
## 
## Response: SeedPodCount
##                Df  Sum Sq Mean Sq F value    Pr(>F)    
## WaterTreatment  2  99.819  49.909  15.879 1.044e-06 ***
## Residuals      99 311.171   3.143                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#significant
TukeyHSD(aov(spc.model))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = spc.model)
## 
## $WaterTreatment
##                               diff        lwr      upr     p adj
## Ambient Water-Low Water  1.8172906  0.7864198 2.848161 0.0001751
## High Water-Low Water     2.2865546  1.2707404 3.302369 0.0000017
## High Water-Ambient Water 0.4692641 -0.5543275 1.492856 0.5219668
#seed pod count was significantly impacted in the low water treatment 
summarise(data_by_water, group_mean = mean(SeedPodCount, na.rm=TRUE))
## # A tibble: 3 Ă— 2
##   WaterTreatment group_mean
##   <fct>               <dbl>
## 1 Low Water            2.97
## 2 Ambient Water        4.79
## 3 High Water           5.26
#seed pod wet mass
spwm.model <- lm(SeedPotWetMass ~ WaterTreatment, data = subset.df)
qqnorm(spwm.model$residuals)
qqline(spwm.model$residuals)

hist(spwm.model$residuals)

shapiro.test(spwm.model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  spwm.model$residuals
## W = 0.91819, p-value = 1.495e-05
#residuals are not normal
#seed pod wet mass log transformation attempt
subset.df$LogSPWM <- log(subset.df$SeedPotWetMass)
#new model using log transformed data
LogSPWM.model <- lm(LogSPWM ~ WaterTreatment, data = subset.df)
#check transformed residuals
qqnorm(LogSPWM.model$residuals)
qqline(LogSPWM.model$residuals)

hist(LogSPWM.model$residuals)

shapiro.test(LogSPWM.model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  LogSPWM.model$residuals
## W = 0.98811, p-value = 0.539
#new residuals are normal. attempt was sucessful.
anova(LogSPWM.model)
## Analysis of Variance Table
## 
## Response: LogSPWM
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## WaterTreatment  2 48.723 24.3615  65.159 < 2.2e-16 ***
## Residuals      94 35.144  0.3739                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#p value= significant 
#post hoc
TukeyHSD(aov(LogSPWM.model))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = LogSPWM.model)
## 
## $WaterTreatment
##                               diff       lwr      upr    p adj
## Ambient Water-Low Water  1.0387305 0.6745226 1.402938 0.00e+00
## High Water-Low Water     1.7379492 1.3737413 2.102157 0.00e+00
## High Water-Ambient Water 0.6992187 0.3407467 1.057691 3.27e-05
#all different
summarise(data_by_water, group_mean = mean(SeedPotWetMass, na.rm=TRUE))
## # A tibble: 3 Ă— 2
##   WaterTreatment group_mean
##   <fct>               <dbl>
## 1 Low Water           0.809
## 2 Ambient Water       2.11 
## 3 High Water          3.76

When looking at the root to shoot ratio, the results from the ANOVA test indicated that the root to shoot ratio was significantly affected by the water treatments (low, ambient, and high) as the p-value, being 6.467e-09, was less than 0.05. This was also the case after conducting the Tukey-Kramer test as the p-values for all root to shoot ratios were less than 0.05, indicating they were significantly different based on the water treatments. The summarization test for the root to shoot ratio indicated that plants shifted towards the roots as the group mean of 0.630 for the low water treatment was higher than the means of the ambient and high-water treatments (0.415 & 0.297). So, plants under low water treatment produced larger roots than shoots. The Kruskal-Wallis test for the leggy-ness showed that leggedness was significantly affected by water treatment as the final p-value was 1.514e-09. For the ANOVA test for the seed pod count, the p-value of 1.044e-06 indicated that the seed pod count was significantly affected by the water treatments. For the Tukey-Kramer test for the seed pod count, the count was only significantly impacted under low water treatment as the p-values for the differences (ambient water – low water & high water – low water) were less than 0.05. After summarizing the seed pod count data, the results indicated that seed pod count significantly decreased in response to the low water conditions as the group mean, which was 17.8, was lower compared to the group means of the ambient and high-water treatment groups (25.8 & 26.9). The ANOVA test for the seed pod wet mass indicated it was significantly impacted by water treatment as the p-value was 2.2e-16. The Tukey-Kramer test also indicated the same as the p-values for all differences were significant. The summarization for the seed pod wet mass data showed that the seed pod wet mass significantly decreased under low water treatment because the group mean, which was 0.809, was less than the group means of the ambient and high-water treatments (2.11 & 3.76).

Plot The Data

In experiments, graphs and plots are almost always used to help visualize the experimental results. The two help show readers visually how the response variables vary based on the treatments provided (explanatory variable) or not provided (positive control). Graphs and plots help the experimenters and readers visually communicate findings and interpret results.

#root2shoot
ggplot(subset.df, aes(x = WaterTreatment, y = Root2ShootRatio, fill = WaterTreatment)) +
  geom_boxplot() + 
  labs(title = "Root to Shoot Ratio by Water Treatment",
       x = NULL, fill = "Water Treatment",
       y = "Root to Shoot Ratio") + geom_jitter(shape=16, position=position_jitter(0.2)) + theme_minimal() +
  scale_fill_brewer(palette="Reds")
## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).

#legginess
ggplot(subset.df, aes(x = WaterTreatment, y = LegRatio, fill = WaterTreatment)) +
  geom_boxplot() + 
  labs(title = "Legginess Ratio by Water Treatment",
       x = NULL, fill = "Water Treatment",
       y = "Leg Raio") + geom_jitter(shape=16, position=position_jitter(0.2)) + theme_minimal() + scale_fill_brewer(palette="Greens")

#seed pod count
ggplot(subset.df, aes(x = WaterTreatment, y = SeedPodCount, fill = WaterTreatment)) +
  geom_boxplot() + 
  labs(title = "Seed Pod Count by Water Treatment",
       x = NULL, fill = "Water Treatment",
       y = "Seed Pod Count") + geom_jitter(shape=16, position=position_jitter(0.2)) + theme_minimal() + scale_fill_brewer(palette="Blues")

#Seed pod wet mass
ggplot(subset.df, aes(x = WaterTreatment, y = SeedPotWetMass, fill = WaterTreatment)) +
  geom_boxplot() + 
  labs(title = "Seed Pod Wet Mass by Water Treatment",
       x = NULL, fill = "Water Treatment",
       y = "Seed Pod Wet Mass (g)") + geom_jitter(shape=16, position=position_jitter(0.2)) + theme_minimal() + scale_fill_brewer(palette="Purples")
## Warning: Removed 5 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_point()`).

Conclusion

This experiment sought to examine how soybean plants inoculated with Bradyrhizobium respond to 3 different watering treatments: Low, Ambient, and High. The results showed that plants extended their roots, have fewer trifoliates, and produce fewer seed pods and produce them at smaller masses when water is less available.

These results could be useful for farmers by letting them know to prepare for droughts, since our data shows that soybean yields could suffer when water is scarce.

While the data suggests that soybean plants fare worse in terms producing crops when water is scarce, future studies including a negative control could properly examine the role of Bradyrhizobium in the plants’ response. Additionally, the findings of this experiment could be expanded by using different Bradyrhizobium strains and native legumes fare under different water treatments.

References:

Legumes and pulses. The Nutrition Source. (2024, November 15). https://nutritionsource.hsph.harvard.edu/legumes-pulses/

Plant-Rhizobia Relationship. Crop Science US. (n.d.) https://www.cropscience.bayer.us/articles/bayer/plant-rhizobia-relationship

Pictures By: Meghan Cehlar, La Salle University