Plant Community Recovery in Restored Wetlands

Kinga Stryszowska-Hill

August, 24, 2021

Background

Wetlands are incredibly valuable ecosystems that are under threat. Wetlands provide important ecosystem services such as nutrient and sediment retention, flood protection, groundwater recharge, and recreational opportunities (e.g. wildlife viewing, hunting, hiking). The United States has lost over 50% of its pre-colonization wetlands, mostly through conversion to agriculture.

I am part of a collaborative project researching the recovery of wetlands. Our team of scientists studies the hydrology, water quality, and biological communities of wetlands restored from agricultural uses. We want to understand how well the wetlands are functioning after restoration and, based on our research, we want to make recommendations for future restoration projects.

This is what a restored wetland looks like. This wetland was farmed about 10 years ago. Since then, the wetland has been allowed to flood from the nearby river and is now supporting a natural plant community.

The project below walks through insights generated from collecting plant community data from 14 wetland sites in western Kentucky.

Load libraries

library(dplyr)
library(tidyr) 
library(knitr)  #formatting HTML document
library(forcats) #categorical data
library(reshape2)
library(ggplot2) #graphing
library(rstatix) #shapiro_test
library(here)
library(car)

Load Data

I collected a large data set of plant community characteristics from the study wetlands. In this project, I will be looking at the proportion of species in a wetland in different plant groups (exotic species, woody species, and perennial species).

I have 3 wetland types:

  • Degraded: these wetlands were destroyed in the past and are remain in a degraded state
  • Restored: these wetlands were destroyed in the past but have been restored
  • Reference: these wetlands were never destroyed and remain in their original state
Plants <- read.csv(here::here("Data", "Plants.csv")) |>
  dplyr::select(Site, Type, Age = Age2, Pexotic, Pwoody, Pperennial) |>
  # Reorder site type
  dplyr::mutate(Type = forcats::fct_relevel(Type, "Degraded", "Restored","Reference"))

knitr::kable(Plants)
Site Type Age Pexotic Pwoody Pperennial
ALEN Restored 6 0.13 0.20 0.80
BCYP Degraded 0 0.13 0.13 0.60
COFY Restored 13 0.06 0.17 0.67
GDMN Restored 5 0.10 0.38 0.62
GUTH Restored 2 0.22 0.05 0.68
HEST Restored 7 0.10 0.43 0.81
HOPK Restored 5 0.16 0.12 0.76
HWST Restored 7 0.08 0.31 0.73
LATW Reference NA 0.04 0.50 0.89
MATW Reference NA 0.09 0.41 0.94
OBOT Reference NA 0.00 0.37 0.89
OWMA Degraded 0 0.18 0.21 0.65
SARE Reference NA 0.12 0.40 0.95
SWAN Restored 3 0.20 0.03 0.83
SWAS Restored 3 0.10 0.23 0.71
WINW Reference NA 0.04 0.75 0.96

Summary Statistics

plants_summary <- Plants %>% 
  dplyr::group_by(Type) %>%
  dplyr::select(Pexotic:Pperennial) %>%
  dplyr::summarise(across(everything(), 
                          list(mean=mean, sd=sd, min = min, max = max))) %>%
  tidyr::pivot_longer(!Type, names_to = "stat")%>%
  tidyr::separate(stat, into=c("variable", "stat"), sep="_")%>%
  tidyr::spread(stat, value)%>%
  dplyr::select(Type, variable, mean, sd, min, max)

knitr::kable(plants_summary)
Type variable mean sd min max
Degraded Pexotic 0.1550000 0.0353553 0.13 0.18
Degraded Pperennial 0.6250000 0.0353553 0.60 0.65
Degraded Pwoody 0.1700000 0.0565685 0.13 0.21
Restored Pexotic 0.1277778 0.0547215 0.06 0.22
Restored Pperennial 0.7344444 0.0712585 0.62 0.83
Restored Pwoody 0.2133333 0.1393736 0.03 0.43
Reference Pexotic 0.0580000 0.0471169 0.00 0.12
Reference Pperennial 0.9260000 0.0336155 0.89 0.96
Reference Pwoody 0.4860000 0.1553383 0.37 0.75

Visualize

melted_proportion <- Plants %>%
  dplyr::select(Type, Pexotic, Pwoody, Pperennial)%>%
  reshape2::melt(id.var = "Type")%>%
  dplyr::mutate(metric=dplyr::recode(variable,
                              Pexotic = "Exotic",
                              Pwoody = "Woody",
                              Pperennial = "Perennial")) %>%
  dplyr::mutate(Type = fct_relevel(Type, "Degraded", "Restored","Reference"))
ggplot2::ggplot(data = melted_proportion, aes(x=metric, y=value))+ 
  ggplot2::geom_boxplot(aes(fill=Type))+
  ggplot2::scale_fill_manual(values=c("#FFF591", "#05DFD7", "#FA26A0"))+ 
  ggplot2::scale_y_continuous(limits = c(0, 1), expand = c(0,0))+
  ggplot2::theme_bw()+
  ggplot2::ylab("Proportion of species in plant groups")+
  ggplot2::xlab("")+
  ggplot2::theme(legend.position="top",
        axis.title.y = element_text(size=12), 
        #axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        panel.background = element_blank(), 
       # panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_blank())

Statistical Tests

ANOVA

ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups. For this analysis, we have three groups of wetlands (Degraded, Restored, and Reference). We want to compare the plant community characteristics between the three groups. We will use a one-way anova.

The assumptions of the ANOVA statistical test are

  • Each group sample is drawn from a normally distributed population
  • All populations have a common variance
  • All samples are drawn independently of each other

Normality Tests

We will use the Shapiro-Wilk test for normality of distribution of each

normality <- dplyr::select(Plants, Pexotic, Pwoody, Pperennial)
norm <- lapply(normality, shapiro.test)
norm_table <- sapply(norm, '[', c("statistic", "p.value"))
t(norm_table)
##            statistic p.value  
## Pexotic    0.9791179 0.9562162
## Pwoody     0.9454124 0.4206575
## Pperennial 0.9396001 0.3441444

All 5 plant groups meet normality assumptions

Variance homogeneity test

leveneTest(Pexotic ~ Type, data = Plants)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.1544 0.8585
##       13
leveneTest(Pwoody ~ Type, data = Plants)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.3895 0.6851
##       13
leveneTest(Pperennial ~ Type, data = Plants)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  2.0443 0.1691
##       13

All 5 plant groups meet homogeneity of variance assumption

ANOVA tests

I perform a one-way ANOVA to test for difference in means of groups. I also preformed the post-hoc Tukey’s Honestly Significant Difference test to check which groups were different.

summary(Pexotic.aov<-aov(Pexotic~ Type, data = Plants))
##             Df  Sum Sq  Mean Sq F value Pr(>F)  
## Type         2 0.02041 0.010204   3.892 0.0474 *
## Residuals   13 0.03409 0.002622                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
stats::TukeyHSD(Pexotic.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Pexotic ~ Type, data = Plants)
## 
## $Type
##                           diff        lwr         upr     p adj
## Restored-Degraded  -0.02722222 -0.1329160 0.078471558 0.7789222
## Reference-Degraded -0.09700000 -0.2101197 0.016119731 0.0972405
## Reference-Restored -0.06977778 -0.1451909 0.005635376 0.0711586
summary(Pwoody.aov<-aov(Pwoody~ Type, data = Plants))
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Type         2 0.2736 0.13681   6.971 0.00876 **
## Residuals   13 0.2551 0.01962                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
stats::TukeyHSD(Pwoody.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Pwoody ~ Type, data = Plants)
## 
## $Type
##                          diff          lwr       upr     p adj
## Restored-Degraded  0.04333333 -0.245825522 0.3324922 0.9178473
## Reference-Degraded 0.31600000  0.006525103 0.6254749 0.0452020
## Reference-Restored 0.27266667  0.066350069 0.4789833 0.0103841
summary(Pperennial.aov<-aov(Pperennial~ Type, data = Plants))
##             Df  Sum Sq Mean Sq F value   Pr(>F)    
## Type         2 0.17330 0.08665   24.28 4.07e-05 ***
## Residuals   13 0.04639 0.00357                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
stats::TukeyHSD(Pperennial.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Pperennial ~ Type, data = Plants)
## 
## $Type
##                         diff        lwr       upr     p adj
## Restored-Degraded  0.1094444 -0.0138623 0.2327512 0.0847438
## Reference-Degraded 0.3010000  0.1690298 0.4329702 0.0001185
## Reference-Restored 0.1915556  0.1035754 0.2795357 0.0001852

Visualize Results

Before, we produce a faceted plot with the results of the ANOVA tests annotated, we need to prepare some labels. The labels are lines that will show which groups were significantly different.

line1<-tibble(metric="Perennial", 
              x=c("Degraded", "Degraded", "Reference", "Reference"), 
              y=c(1.1, 1.11, 1.11, 1.1))

line2<-tibble(metric="Perennial", 
              x=c("Restored", "Restored", "Reference", "Reference"), 
              y=c(0.97, 0.98, 0.98, 0.97))

line3<-tibble(metric="Woody", 
              x=c("Degraded", "Degraded", "Reference", "Reference"), 
              y=c(1.1, 1.11, 1.11, 1.1))

line4<-tibble(metric="Woody", 
              x=c("Restored", "Restored", "Reference", "Reference"), 
              y=c(0.97, 0.98, 0.98, 0.97))

# This text annotation will place p-values near the lines
ann_text_proportion <- data.frame(label=c("italic(p) == 0.0001", 
                                          "italic(p) == 0.0002", 
                                          "italic(p) == 0.045", 
                                          "italic(p) == 0.010"),
                             metric = c("Perennial", "Perennial", "Woody", "Woody"), 
                             x = c(2, 2.5, 2, 2.5), 
                             y = c(1.16, 1.02, 1.16, 1.02))
ggplot(data = melted_proportion, aes(x = Type, y = value))+ 
  geom_boxplot(aes(fill = Type))+
  geom_jitter(data = melted_proportion, 
              aes(y = value), 
              size = 2, 
              color = "#5d5c61", 
              alpha = 0.2)+
  facet_grid(~ factor(metric))+
  scale_fill_manual(values = c("#FFF591", "#05DFD7", "#FA26A0"))+
  theme_bw()+
  scale_y_continuous(limits = c(0, 1.2), 
                     expand = c(0,0), 
                     breaks = seq(0, 1, 0.1))+
  ylab("Proportion of species in plant group")+
  xlab("")+
  theme(axis.title.x = element_text(size=12),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        panel.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_blank(),
        legend.position="top", 
        legend.box = "horizontal",
        strip.background = element_rect(fill = "black"),
        strip.text = element_text(color = "white", size = 11)) +
  geom_line(data = line1, aes(x = x, y = y, group=1))+
  geom_line(data = line2, aes(x = x, y = y, group=1))+
  geom_line(data = line3, aes(x = x, y = y, group=1))+
  geom_line(data = line4, aes(x = x, y = y, group=1))+
  geom_text(data = ann_text_proportion, 
            mapping = aes(x = x, y = y, label = label), 
            parse=TRUE, size=3)

ANOVA Findings

I found significant differences between wetland types for the Perennial and Woody groups. My analysis revealed the the undisturbed reference wetlands are fully developed forests that have more woody species (especially woody vines) and more perennial species. Even though restoration practices for wetlands are important, I found that the wetlands in my study were still young and still different from reference wetlands.

Linear Regression

The Least Squares Regression model is used to find a line of best fit between a predictor and response variable. It is a commonly used model to test the relationship between two variables. In my case I have two variables: wetland age and plant proportion in each group. I would like to test the linear relationship between wetland restoration age and each plant group.

The assumptions of a linear regression model are

  • The relationship between X and the mean of Y is linear
  • Homogeneity of variances of residuals (Homoscedasticity)
  • Observations are independent of each other
  • The residuals are normally distributed

I don’t have the age of reference wetlands (because it is hard to guesstimate how old these native forests are), so I will eliminate them from the analysis. The degraded wetlands have an age on 0.

PlantsAge <- Plants %>% dplyr::filter(!Type == "Reference")

Build Regression Models

Exotic

summary(Pexotic.model<-lm(Pexotic~Age, data=PlantsAge))
## 
## Call:
## lm(formula = Pexotic ~ Age, data = PlantsAge)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.048189 -0.029843  0.003465  0.020433  0.062362 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.176535   0.019598   9.008 8.48e-06 ***
## Age         -0.009449   0.003357  -2.815   0.0202 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03951 on 9 degrees of freedom
## Multiple R-squared:  0.4682, Adjusted R-squared:  0.4091 
## F-statistic: 7.924 on 1 and 9 DF,  p-value: 0.02021
# Test for normality of residuals
rstatix::shapiro_test(residuals(Pexotic.model))
## # A tibble: 1 × 3
##   variable                 statistic p.value
##   <chr>                        <dbl>   <dbl>
## 1 residuals(Pexotic.model)     0.944   0.564
# Check for outliers
model.metrics <- augment(Pexotic.model)
model.metrics %>% filter(abs(.std.resid)>3) %>% as.data.frame()
## [1] Pexotic    Age        .fitted    .resid     .hat       .sigma     .cooksd   
## [8] .std.resid
## <0 rows> (or 0-length row.names)
# Test for constant variance
car::ncvTest(Pexotic.model)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 1.380061, Df = 1, p = 0.24009
# Plot
ggplot(PlantsAge, aes(x = Age, y = Pexotic))+
  geom_point(aes(fill = Type), shape = 21, size = 3)+
  scale_fill_manual(values = c("#FFF591", "#05DFD7")) +
  geom_line(aes(x = Age, 
                y = Pexotic.model$fitted.values), 
            size = 0.5, 
            linetype = "dotted")+
  theme_classic(base_size = 12)+
  theme(legend.position = c(0.85, 0.85))+
  ylab("Proportion of exotic species")+
  xlab("Age")+
  annotate("text", label = paste0("R-squared = ",
                                  round(summary(Pexotic.model)$adj.r.squared, 2)), 
           x=1, y=0.25, size=4)

Woody

summary(Pwoody.model<-lm(Pwoody~Age, data=PlantsAge))
## 
## Call:
## lm(formula = Pwoody ~ Age, data = PlantsAge)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15784 -0.10744 -0.02014  0.06678  0.19909 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.15553    0.06310   2.465   0.0359 *
## Age          0.01077    0.01081   0.996   0.3451  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1272 on 9 degrees of freedom
## Multiple R-squared:  0.09936,    Adjusted R-squared:  -0.0007134 
## F-statistic: 0.9929 on 1 and 9 DF,  p-value: 0.3451
rstatix::shapiro_test(residuals(Pwoody.model))
## # A tibble: 1 × 3
##   variable                statistic p.value
##   <chr>                       <dbl>   <dbl>
## 1 residuals(Pwoody.model)     0.941   0.532
model.metrics<-augment(Pwoody.model)
model.metrics%>%filter(abs(.std.resid)>3)%>%as.data.frame()
## [1] Pwoody     Age        .fitted    .resid     .hat       .sigma     .cooksd   
## [8] .std.resid
## <0 rows> (or 0-length row.names)
ncvTest(Pwoody.model)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.4712205, Df = 1, p = 0.49243
ggplot(PlantsAge, aes(x=Age, y=Pwoody))+
  geom_point(aes(fill=Type), shape=21, size=4)+
  scale_fill_manual(values = c("#FFF591", "#05DFD7")) +
  geom_line(aes(x = Age, 
                y = Pwoody.model$fitted.values), 
            size = 0.5, 
            linetype = "dotted")+
  theme_classic(base_size = 12)+
  ylab("Proportion of woody species")+
  xlab("Age")+
   theme(legend.position = c(0.85, 0.85))+
  annotate("text", label = paste0("R-squared = ",
                                  round(summary(Pwoody.model)$adj.r.squared, 2)), 
           x=2, y=0.6, size=4)

Perennial

summary(Pperennial.model<-lm(Pperennial~Age, data=PlantsAge))
## 
## Call:
## lm(formula = Pperennial ~ Age, data = PlantsAge)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.096457 -0.064341  0.003031  0.060915  0.124055 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.690177   0.039699  17.385 3.11e-08 ***
## Age         0.005256   0.006799   0.773    0.459    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08003 on 9 degrees of freedom
## Multiple R-squared:  0.06226,    Adjusted R-squared:  -0.04193 
## F-statistic: 0.5975 on 1 and 9 DF,  p-value: 0.4593
rstatix::shapiro_test(residuals(Pperennial.model))
## # A tibble: 1 × 3
##   variable                    statistic p.value
##   <chr>                           <dbl>   <dbl>
## 1 residuals(Pperennial.model)     0.936   0.479
model.metrics<-augment(Pperennial.model)
model.metrics%>%filter(abs(.std.resid)>3)%>%as.data.frame()
## [1] Pperennial Age        .fitted    .resid     .hat       .sigma     .cooksd   
## [8] .std.resid
## <0 rows> (or 0-length row.names)
ncvTest(Pperennial.model)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.06792006, Df = 1, p = 0.79439
ggplot(PlantsAge, aes(x=Age, y=Pperennial))+
  geom_point(aes(fill=Type), shape=21, size=4)+
  scale_fill_manual(values = c("#FFF591", "#05DFD7")) +
  geom_line(aes(x = Age, 
                y = Pperennial.model$fitted.values), 
            size = 0.5, 
            linetype = "dotted")+
  theme_classic(base_size = 12)+
  ylab("Proportion of perennial species")+
  xlab("Age")+
  theme(legend.position = c(0.85, 0.85))+
  annotate("text", 
           label = paste0("R-squared = ", 
                          round(summary(Pperennial.model)$adj.r.squared, 2)), 
           x=2, y=0.85, size=4)

Visualize Results

ann_text_regressions_p <- data.frame(label = c("italic(p) == 0.24", 
                                               "italic(p) == 0.79", 
                                               "italic(p) == 0.49"), 
                                   Proportion = c("Exotic", 
                                                  "Perennial", 
                                                  "Woody"), 
                                   x=c(7, 7, 7), 
                                   y=c(1.04, 1.04, 1.04))


melt_age <- PlantsAge %>%
  select(Age, Pexotic, Pwoody, Pperennial) %>%
  pivot_longer(!Age, names_to = "Proportion", values_to = "stat") |>
  dplyr::mutate(Proportion = dplyr::recode(Proportion,
                                Pexotic = "Exotic",
                                Pwoody = "Woody",
                                Pperennial = "Perennial")) 


ggplot(data = melt_age, aes(x=Age, y=stat)) + 
  geom_point(size=2)+
  facet_grid(~ factor(Proportion)) +
  geom_smooth(colour = "#A555EC", 
              method = lm, 
              level = 0.60, 
              size = 0.5, 
              fill = "#9DF1DF") +
  theme_bw()+
  scale_y_continuous(limits = c(0, 1.1), 
                     expand = c(0,0), 
                     breaks = seq(0, 1, 0.1)) +
  scale_x_continuous(limits = c(0, 13), breaks=c(0,5,10,13))+
   ylab("Proportion of species in plant group") +
  xlab("Age")+
  theme(axis.title.x = element_text(size=12), 
        axis.title.y = element_text(size=12), 
        panel.background = element_blank(), 
        panel.grid.major = element_blank(),  
        panel.grid.minor = element_blank(),  
        plot.background = element_blank(),
        strip.background = element_rect(fill = "black"),
        strip.text = element_text(color = "white", size = 11)) +
  geom_text(data = ann_text_regressions_p, 
            mapping = aes(x = x, y = y, label = label), 
            parse = TRUE, 
            size = 4)

Regression Findings

The relationship between wetland restoration age and plant community is inconclusive. I hypothesized that perennial and woody proportions would increase with wetland age as the forests matured, but the observed relationships were weak. The wetland in my study were all still quite young. I think this relationship should be investigated again as the wetlands continue to age.

Conclusions

Even though restored wetlands are no longer farmed and they support a native plant community, they are still slow to recover to a pre-disturbance state. It will likely take decades for these young forests to reach maturity. Furthermore, these wetlands exist in a highly fragmented and agricultural landscape, so full recovery might be unlikely. We recommend long term monitoring of the plant community in restored wetland and also intentional planting of understory species to accelerate recovery. Wetlands, regardless of condition, might still serve important functions in the landscape and restoration programs should continue their worthwhile initiatives.