Plant Community Recovery in Restored Wetlands

Kinga Stryszowska-Hill

August, 24, 2021

Background

Wetlands are incredibly valuable ecosystems that are under threat. Wetlands provide important ecosystem services such as nutrient and sediment retention, flood protection, groundwater recharge, and recreational opportunities (e.g. wildlife viewing, hunting, hiking). The United States has lost over 50% of its pre-colonization wetlands, mostly through conversion to agriculture.

I am part of a collaborative project researching the recovery of wetlands. Our team of scientists studies the hydrology, water quality, and biological communities of wetlands restored from agricultural uses. We want to understand how well the wetlands are functioning after restoration and, based on our research, we want to make recommendations for future restoration projects.

This is what a restored wetland looks like. This wetland was farmed about 10 years ago. Since then, the wetland has been allowed to flood from the nearby river and is now supporting a natural plant community.

The project below walks through insights generated from collecting plant community data from 14 wetland sites in western Kentucky.

Load libraries

library(dplyr)
library(tidyr) 
library(knitr)  #formatting HTML document
library(forcats) #categorical data
library(reshape2)
library(ggplot2) #graphing
library(rstatix) #shapiro_test
library(here)
library(car)

Load Data

I collected a large data set of plant community characteristics from the study wetlands. In this project, I will be looking at the proportion of species in a wetland in different plant groups (exotic species, woody species, and perennial species).

I have 3 wetland types:

Degraded: these wetlands were destroyed in the past and are remain in a degraded state
Restored: these wetlands were destroyed in the past but have been restored
Reference: these wetlands were never destroyed and remain in their original state

Plants <- read.csv(here::here("Data", "Plants.csv")) |>
  dplyr::select(Site, Type, Age = Age2, Pexotic, Pwoody, Pperennial) |>
  # Reorder site type
  dplyr::mutate(Type = forcats::fct_relevel(Type, "Degraded", "Restored","Reference"))

knitr::kable(Plants)

Site	Type	Age	Pexotic	Pwoody	Pperennial
ALEN	Restored	6	0.13	0.20	0.80
BCYP	Degraded	0	0.13	0.13	0.60
COFY	Restored	13	0.06	0.17	0.67
GDMN	Restored	5	0.10	0.38	0.62
GUTH	Restored	2	0.22	0.05	0.68
HEST	Restored	7	0.10	0.43	0.81
HOPK	Restored	5	0.16	0.12	0.76
HWST	Restored	7	0.08	0.31	0.73
LATW	Reference	NA	0.04	0.50	0.89
MATW	Reference	NA	0.09	0.41	0.94
OBOT	Reference	NA	0.00	0.37	0.89
OWMA	Degraded	0	0.18	0.21	0.65
SARE	Reference	NA	0.12	0.40	0.95
SWAN	Restored	3	0.20	0.03	0.83
SWAS	Restored	3	0.10	0.23	0.71
WINW	Reference	NA	0.04	0.75	0.96

Summary Statistics

plants_summary <- Plants %>% 
  dplyr::group_by(Type) %>%
  dplyr::select(Pexotic:Pperennial) %>%
  dplyr::summarise(across(everything(), 
                          list(mean=mean, sd=sd, min = min, max = max))) %>%
  tidyr::pivot_longer(!Type, names_to = "stat")%>%
  tidyr::separate(stat, into=c("variable", "stat"), sep="_")%>%
  tidyr::spread(stat, value)%>%
  dplyr::select(Type, variable, mean, sd, min, max)

knitr::kable(plants_summary)

Type	variable	mean	sd	min	max
Degraded	Pexotic	0.1550000	0.0353553	0.13	0.18
Degraded	Pperennial	0.6250000	0.0353553	0.60	0.65
Degraded	Pwoody	0.1700000	0.0565685	0.13	0.21
Restored	Pexotic	0.1277778	0.0547215	0.06	0.22
Restored	Pperennial	0.7344444	0.0712585	0.62	0.83
Restored	Pwoody	0.2133333	0.1393736	0.03	0.43
Reference	Pexotic	0.0580000	0.0471169	0.00	0.12
Reference	Pperennial	0.9260000	0.0336155	0.89	0.96
Reference	Pwoody	0.4860000	0.1553383	0.37	0.75

Visualize

melted_proportion <- Plants %>%
  dplyr::select(Type, Pexotic, Pwoody, Pperennial)%>%
  reshape2::melt(id.var = "Type")%>%
  dplyr::mutate(metric=dplyr::recode(variable,
                              Pexotic = "Exotic",
                              Pwoody = "Woody",
                              Pperennial = "Perennial")) %>%
  dplyr::mutate(Type = fct_relevel(Type, "Degraded", "Restored","Reference"))

ggplot2::ggplot(data = melted_proportion, aes(x=metric, y=value))+ 
  ggplot2::geom_boxplot(aes(fill=Type))+
  ggplot2::scale_fill_manual(values=c("#FFF591", "#05DFD7", "#FA26A0"))+ 
  ggplot2::scale_y_continuous(limits = c(0, 1), expand = c(0,0))+
  ggplot2::theme_bw()+
  ggplot2::ylab("Proportion of species in plant groups")+
  ggplot2::xlab("")+
  ggplot2::theme(legend.position="top",
        axis.title.y = element_text(size=12), 
        #axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        panel.background = element_blank(), 
       # panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_blank())

Statistical Tests

ANOVA

ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups. For this analysis, we have three groups of wetlands (Degraded, Restored, and Reference). We want to compare the plant community characteristics between the three groups. We will use a one-way anova.

The assumptions of the ANOVA statistical test are

Each group sample is drawn from a normally distributed population
All populations have a common variance
All samples are drawn independently of each other

Normality Tests

We will use the Shapiro-Wilk test for normality of distribution of each

normality <- dplyr::select(Plants, Pexotic, Pwoody, Pperennial)
norm <- lapply(normality, shapiro.test)
norm_table <- sapply(norm, '[', c("statistic", "p.value"))
t(norm_table)

##            statistic p.value  
## Pexotic    0.9791179 0.9562162
## Pwoody     0.9454124 0.4206575
## Pperennial 0.9396001 0.3441444

All 5 plant groups meet normality assumptions

Variance homogeneity test

leveneTest(Pexotic ~ Type, data = Plants)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.1544 0.8585
##       13

leveneTest(Pwoody ~ Type, data = Plants)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.3895 0.6851
##       13

leveneTest(Pperennial ~ Type, data = Plants)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  2.0443 0.1691
##       13

All 5 plant groups meet homogeneity of variance assumption

ANOVA tests

I perform a one-way ANOVA to test for difference in means of groups. I also preformed the post-hoc Tukey’s Honestly Significant Difference test to check which groups were different.

summary(Pexotic.aov<-aov(Pexotic~ Type, data = Plants))

##             Df  Sum Sq  Mean Sq F value Pr(>F)  
## Type         2 0.02041 0.010204   3.892 0.0474 *
## Residuals   13 0.03409 0.002622                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

stats::TukeyHSD(Pexotic.aov)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Pexotic ~ Type, data = Plants)
## 
## $Type
##                           diff        lwr         upr     p adj
## Restored-Degraded  -0.02722222 -0.1329160 0.078471558 0.7789222
## Reference-Degraded -0.09700000 -0.2101197 0.016119731 0.0972405
## Reference-Restored -0.06977778 -0.1451909 0.005635376 0.0711586

summary(Pwoody.aov<-aov(Pwoody~ Type, data = Plants))

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Type         2 0.2736 0.13681   6.971 0.00876 **
## Residuals   13 0.2551 0.01962                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

stats::TukeyHSD(Pwoody.aov)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Pwoody ~ Type, data = Plants)
## 
## $Type
##                          diff          lwr       upr     p adj
## Restored-Degraded  0.04333333 -0.245825522 0.3324922 0.9178473
## Reference-Degraded 0.31600000  0.006525103 0.6254749 0.0452020
## Reference-Restored 0.27266667  0.066350069 0.4789833 0.0103841

summary(Pperennial.aov<-aov(Pperennial~ Type, data = Plants))

##             Df  Sum Sq Mean Sq F value   Pr(>F)    
## Type         2 0.17330 0.08665   24.28 4.07e-05 ***
## Residuals   13 0.04639 0.00357                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

stats::TukeyHSD(Pperennial.aov)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Pperennial ~ Type, data = Plants)
## 
## $Type
##                         diff        lwr       upr     p adj
## Restored-Degraded  0.1094444 -0.0138623 0.2327512 0.0847438
## Reference-Degraded 0.3010000  0.1690298 0.4329702 0.0001185
## Reference-Restored 0.1915556  0.1035754 0.2795357 0.0001852

Visualize Results

Before, we produce a faceted plot with the results of the ANOVA tests annotated, we need to prepare some labels. The labels are lines that will show which groups were significantly different.

line1<-tibble(metric="Perennial", 
              x=c("Degraded", "Degraded", "Reference", "Reference"), 
              y=c(1.1, 1.11, 1.11, 1.1))

line2<-tibble(metric="Perennial", 
              x=c("Restored", "Restored", "Reference", "Reference"), 
              y=c(0.97, 0.98, 0.98, 0.97))

line3<-tibble(metric="Woody", 
              x=c("Degraded", "Degraded", "Reference", "Reference"), 
              y=c(1.1, 1.11, 1.11, 1.1))

line4<-tibble(metric="Woody", 
              x=c("Restored", "Restored", "Reference", "Reference"), 
              y=c(0.97, 0.98, 0.98, 0.97))

# This text annotation will place p-values near the lines
ann_text_proportion <- data.frame(label=c("italic(p) == 0.0001", 
                                          "italic(p) == 0.0002", 
                                          "italic(p) == 0.045", 
                                          "italic(p) == 0.010"),
                             metric = c("Perennial", "Perennial", "Woody", "Woody"), 
                             x = c(2, 2.5, 2, 2.5), 
                             y = c(1.16, 1.02, 1.16, 1.02))

ggplot(data = melted_proportion, aes(x = Type, y = value))+ 
  geom_boxplot(aes(fill = Type))+
  geom_jitter(data = melted_proportion, 
              aes(y = value), 
              size = 2, 
              color = "#5d5c61", 
              alpha = 0.2)+
  facet_grid(~ factor(metric))+
  scale_fill_manual(values = c("#FFF591", "#05DFD7", "#FA26A0"))+
  theme_bw()+
  scale_y_continuous(limits = c(0, 1.2), 
                     expand = c(0,0), 
                     breaks = seq(0, 1, 0.1))+
  ylab("Proportion of species in plant group")+
  xlab("")+
  theme(axis.title.x = element_text(size=12),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        panel.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_blank(),
        legend.position="top", 
        legend.box = "horizontal",
        strip.background = element_rect(fill = "black"),
        strip.text = element_text(color = "white", size = 11)) +
  geom_line(data = line1, aes(x = x, y = y, group=1))+
  geom_line(data = line2, aes(x = x, y = y, group=1))+
  geom_line(data = line3, aes(x = x, y = y, group=1))+
  geom_line(data = line4, aes(x = x, y = y, group=1))+
  geom_text(data = ann_text_proportion, 
            mapping = aes(x = x, y = y, label = label), 
            parse=TRUE, size=3)

ANOVA Findings

I found significant differences between wetland types for the Perennial and Woody groups. My analysis revealed the the undisturbed reference wetlands are fully developed forests that have more woody species (especially woody vines) and more perennial species. Even though restoration practices for wetlands are important, I found that the wetlands in my study were still young and still different from reference wetlands.

Linear Regression

The Least Squares Regression model is used to find a line of best fit between a predictor and response variable. It is a commonly used model to test the relationship between two variables. In my case I have two variables: wetland age and plant proportion in each group. I would like to test the linear relationship between wetland restoration age and each plant group.

The assumptions of a linear regression model are

The relationship between X and the mean of Y is linear
Homogeneity of variances of residuals (Homoscedasticity)
Observations are independent of each other
The residuals are normally distributed

I don’t have the age of reference wetlands (because it is hard to guesstimate how old these native forests are), so I will eliminate them from the analysis. The degraded wetlands have an age on 0.

PlantsAge <- Plants %>% dplyr::filter(!Type == "Reference")

Build Regression Models

Exotic

summary(Pexotic.model<-lm(Pexotic~Age, data=PlantsAge))

## 
## Call:
## lm(formula = Pexotic ~ Age, data = PlantsAge)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.048189 -0.029843  0.003465  0.020433  0.062362 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.176535   0.019598   9.008 8.48e-06 ***
## Age         -0.009449   0.003357  -2.815   0.0202 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03951 on 9 degrees of freedom
## Multiple R-squared:  0.4682, Adjusted R-squared:  0.4091 
## F-statistic: 7.924 on 1 and 9 DF,  p-value: 0.02021

# Test for normality of residuals
rstatix::shapiro_test(residuals(Pexotic.model))

## # A tibble: 1 × 3
##   variable                 statistic p.value
##   <chr>                        <dbl>   <dbl>
## 1 residuals(Pexotic.model)     0.944   0.564

# Check for outliers
model.metrics <- augment(Pexotic.model)
model.metrics %>% filter(abs(.std.resid)>3) %>% as.data.frame()

## [1] Pexotic    Age        .fitted    .resid     .hat       .sigma     .cooksd   
## [8] .std.resid
## <0 rows> (or 0-length row.names)

# Test for constant variance
car::ncvTest(Pexotic.model)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 1.380061, Df = 1, p = 0.24009

# Plot
ggplot(PlantsAge, aes(x = Age, y = Pexotic))+
  geom_point(aes(fill = Type), shape = 21, size = 3)+
  scale_fill_manual(values = c("#FFF591", "#05DFD7")) +
  geom_line(aes(x = Age, 
                y = Pexotic.model$fitted.values), 
            size = 0.5, 
            linetype = "dotted")+
  theme_classic(base_size = 12)+
  theme(legend.position = c(0.85, 0.85))+
  ylab("Proportion of exotic species")+
  xlab("Age")+
  annotate("text", label = paste0("R-squared = ",
                                  round(summary(Pexotic.model)$adj.r.squared, 2)), 
           x=1, y=0.25, size=4)

Woody

summary(Pwoody.model<-lm(Pwoody~Age, data=PlantsAge))

## 
## Call:
## lm(formula = Pwoody ~ Age, data = PlantsAge)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15784 -0.10744 -0.02014  0.06678  0.19909 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.15553    0.06310   2.465   0.0359 *
## Age          0.01077    0.01081   0.996   0.3451  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1272 on 9 degrees of freedom
## Multiple R-squared:  0.09936,    Adjusted R-squared:  -0.0007134 
## F-statistic: 0.9929 on 1 and 9 DF,  p-value: 0.3451

rstatix::shapiro_test(residuals(Pwoody.model))

## # A tibble: 1 × 3
##   variable                statistic p.value
##   <chr>                       <dbl>   <dbl>
## 1 residuals(Pwoody.model)     0.941   0.532

model.metrics<-augment(Pwoody.model)
model.metrics%>%filter(abs(.std.resid)>3)%>%as.data.frame()

## [1] Pwoody     Age        .fitted    .resid     .hat       .sigma     .cooksd   
## [8] .std.resid
## <0 rows> (or 0-length row.names)

ncvTest(Pwoody.model)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.4712205, Df = 1, p = 0.49243

ggplot(PlantsAge, aes(x=Age, y=Pwoody))+
  geom_point(aes(fill=Type), shape=21, size=4)+
  scale_fill_manual(values = c("#FFF591", "#05DFD7")) +
  geom_line(aes(x = Age, 
                y = Pwoody.model$fitted.values), 
            size = 0.5, 
            linetype = "dotted")+
  theme_classic(base_size = 12)+
  ylab("Proportion of woody species")+
  xlab("Age")+
   theme(legend.position = c(0.85, 0.85))+
  annotate("text", label = paste0("R-squared = ",
                                  round(summary(Pwoody.model)$adj.r.squared, 2)), 
           x=2, y=0.6, size=4)

Perennial

summary(Pperennial.model<-lm(Pperennial~Age, data=PlantsAge))

## 
## Call:
## lm(formula = Pperennial ~ Age, data = PlantsAge)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.096457 -0.064341  0.003031  0.060915  0.124055 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.690177   0.039699  17.385 3.11e-08 ***
## Age         0.005256   0.006799   0.773    0.459    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08003 on 9 degrees of freedom
## Multiple R-squared:  0.06226,    Adjusted R-squared:  -0.04193 
## F-statistic: 0.5975 on 1 and 9 DF,  p-value: 0.4593

rstatix::shapiro_test(residuals(Pperennial.model))

## # A tibble: 1 × 3
##   variable                    statistic p.value
##   <chr>                           <dbl>   <dbl>
## 1 residuals(Pperennial.model)     0.936   0.479

model.metrics<-augment(Pperennial.model)
model.metrics%>%filter(abs(.std.resid)>3)%>%as.data.frame()

## [1] Pperennial Age        .fitted    .resid     .hat       .sigma     .cooksd   
## [8] .std.resid
## <0 rows> (or 0-length row.names)

ncvTest(Pperennial.model)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.06792006, Df = 1, p = 0.79439

ggplot(PlantsAge, aes(x=Age, y=Pperennial))+
  geom_point(aes(fill=Type), shape=21, size=4)+
  scale_fill_manual(values = c("#FFF591", "#05DFD7")) +
  geom_line(aes(x = Age, 
                y = Pperennial.model$fitted.values), 
            size = 0.5, 
            linetype = "dotted")+
  theme_classic(base_size = 12)+
  ylab("Proportion of perennial species")+
  xlab("Age")+
  theme(legend.position = c(0.85, 0.85))+
  annotate("text", 
           label = paste0("R-squared = ", 
                          round(summary(Pperennial.model)$adj.r.squared, 2)), 
           x=2, y=0.85, size=4)

Visualize Results

ann_text_regressions_p <- data.frame(label = c("italic(p) == 0.24", 
                                               "italic(p) == 0.79", 
                                               "italic(p) == 0.49"), 
                                   Proportion = c("Exotic", 
                                                  "Perennial", 
                                                  "Woody"), 
                                   x=c(7, 7, 7), 
                                   y=c(1.04, 1.04, 1.04))


melt_age <- PlantsAge %>%
  select(Age, Pexotic, Pwoody, Pperennial) %>%
  pivot_longer(!Age, names_to = "Proportion", values_to = "stat") |>
  dplyr::mutate(Proportion = dplyr::recode(Proportion,
                                Pexotic = "Exotic",
                                Pwoody = "Woody",
                                Pperennial = "Perennial")) 


ggplot(data = melt_age, aes(x=Age, y=stat)) + 
  geom_point(size=2)+
  facet_grid(~ factor(Proportion)) +
  geom_smooth(colour = "#A555EC", 
              method = lm, 
              level = 0.60, 
              size = 0.5, 
              fill = "#9DF1DF") +
  theme_bw()+
  scale_y_continuous(limits = c(0, 1.1), 
                     expand = c(0,0), 
                     breaks = seq(0, 1, 0.1)) +
  scale_x_continuous(limits = c(0, 13), breaks=c(0,5,10,13))+
   ylab("Proportion of species in plant group") +
  xlab("Age")+
  theme(axis.title.x = element_text(size=12), 
        axis.title.y = element_text(size=12), 
        panel.background = element_blank(), 
        panel.grid.major = element_blank(),  
        panel.grid.minor = element_blank(),  
        plot.background = element_blank(),
        strip.background = element_rect(fill = "black"),
        strip.text = element_text(color = "white", size = 11)) +
  geom_text(data = ann_text_regressions_p, 
            mapping = aes(x = x, y = y, label = label), 
            parse = TRUE, 
            size = 4)

Regression Findings

The relationship between wetland restoration age and plant community is inconclusive. I hypothesized that perennial and woody proportions would increase with wetland age as the forests matured, but the observed relationships were weak. The wetland in my study were all still quite young. I think this relationship should be investigated again as the wetlands continue to age.

Conclusions

Even though restored wetlands are no longer farmed and they support a native plant community, they are still slow to recover to a pre-disturbance state. It will likely take decades for these young forests to reach maturity. Furthermore, these wetlands exist in a highly fragmented and agricultural landscape, so full recovery might be unlikely. We recommend long term monitoring of the plant community in restored wetland and also intentional planting of understory species to accelerate recovery. Wetlands, regardless of condition, might still serve important functions in the landscape and restoration programs should continue their worthwhile initiatives.