Presence of Phragmites australis in the 2011 NWCA

In this example we will look at Phragmites australis, a widepread invasive species of grass found in the United States. We will use multiple data sets from the 2011 NWCA data to determine:
* Which NWCA plots P. australis was found in
* What is the native status of P. australis?
* What was the % cover of P. australis on the plots where it was found?
* Whether the presence of P. australis affects the presence of other native graminoid species
* Whether the presence of P. australis is related to site quality.
All of the data used in this tutorial can be found at: https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys.

Retrieving the Data

There are multiple ways to bring data from the NWCA surveys into R. In this example we will bring in the data sets from our working directory, but it is also very easy to take the data directly from the NWCA data page. We’ll be using the following six files in this example:
1) Plant Species Cover and Height: Plot level data on individual species cover and height
2) Plant CC and Native Status Values: Species level data on the Coefficient of Conservatism and Native status of every observed species:state pair
3) Vegetation Metrics: Site level data on calculated metrics including average absolute cover of different taxon groups
4) Vegetation MMI Values: Site level data on metrics used in the VMMI including FQAI
5) Site Stressors: Site level data on stressors such as vegetation removal and ditching
6) Site Information: Site level data on site specific information such as location and wetland type
All of these datasets have unique information that we can use to look at the presence of P. australis across sites.

#load required packages
library ("RCurl")

## Warning: package 'RCurl' was built under R version 3.4.4

## Loading required package: bitops

library("tidyr")

## Warning: package 'tidyr' was built under R version 3.4.4

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:RCurl':
## 
##     complete

library("plyr")

## Warning: package 'plyr' was built under R version 3.4.4

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("data.table")

## Warning: package 'data.table' was built under R version 3.4.4

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

library("stringr")

## Warning: package 'stringr' was built under R version 3.4.4

library("ggplot2")

## Warning: package 'ggplot2' was built under R version 3.4.4

library("ggiraph")

## Warning: package 'ggiraph' was built under R version 3.4.4

library("ggpmisc")

## Warning: package 'ggpmisc' was built under R version 3.4.4

library("ggpubr")

## Warning: package 'ggpubr' was built under R version 3.4.4

## Loading required package: magrittr

## Warning: package 'magrittr' was built under R version 3.4.4

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:tidyr':
## 
##     extract

library("magrittr")
# Bring in the Plant Cover data
#plant.cover.csv <- getURL("https://www.epa.gov/sites/production/files/2016-10/nwca2011_plant_pres_cvr.csv")
#plant.cover <- read.csv(text = plant.cover.csv)

Where is Phragmites australis present on NWCA plots?

We’ll use the Plant Species Cover and Height dataset to determine which 2011 NWCA plots P. australis was found in. This dataset includes information on UID, PLOT, SPECIES and COVER (as percent) in separate columns. It is important to note that if a species is found in at least one plot at a site, it will appear in all five of the sites plots in the dataset, but will have a recorded cover and height of “0” in plots where it was not present. By filtering the data for SPECIES == “PHRAGMITES AUSTRALIS” and COVER > 0, we create a data set that only lists plots where P. australis was present.

plant.cover <- read.csv("nwca2011_plant_pres_cvr.csv")
plant.cover

# Select data where cover is > 0
plant.cover.nonzero <- filter(plant.cover, COVER > 0)

## Warning: package 'bindrcpp' was built under R version 3.4.4

# Select PHRAGMITES AUSTRALIS 
cover.paustralis <- filter(plant.cover.nonzero, SPECIES == "PHRAGMITES AUSTRALIS")
#Create a table that lists sites and plots where P. australis is present
head(cover.paustralis)

To quickly summarize simple aspects of the data we can use the summary() function. This will allow us to see:
1) The number of plots in each state that P. australis was present

2) The Min. Med. Mean and Max. cover of P. australis across our data set
3) The height classes recorded for P. australis.

# Factor; summarize the number of plots per state where P australis is present
summary(cover.paustralis$STATE)

## AL AR AZ CA CO CT DE FL GA IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC 
##  6  0  0  1  0  7 45  0  0  0  0  0  0  7  0 69 10 81  0  0  7  0  0  0  4 
## ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY 
##  5  1  1 11 11  0 18  0  0  0  4  7  3  2  0 28  0 25  0  0  4  0  0

# Integer; summarize cover of P australis across all plots where present
summary(cover.paustralis$COVER) # integer

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.10    3.00   50.00   47.19   91.00  100.00

#Factor; summarize height classes of P australis where present (height classes are 1, 2, 3, 4, 5, 6, E)
summary(cover.paustralis$HEIGHT) # factor

##       1   2   3   4   5   6   E 
##   3  15 168 171   0   0   0   0

Our summaries show:
1) There are 23 states that had plots where P. australis was recorded
2) The Minimum cover of P. australis on a plot was 0.10, the Maxiumum was 100, the Median was 50, and the Mean was 47.19
3) P. australis was recorded in height classes 1, 2 and 3, with a majority of the plants in categories 2 and 3.

Another way to take a look at our data set is to create a boxplot to look at % cover on plots by state:

In the 23 states where P. australis was found on NWCA plots, what was its % cover on those plots? This will help us see the states where it is most prevalent. (CT, DE, LA, MA, MD, MN, NC, NY, PA, RI, TX, VA, WI)

# plot STATE vs COVER
p1 <- ggboxplot(cover.paustralis, x = "STATE", y = "COVER", 
          color = "STATE",
          legend = "none",
          add = "mean",
          ylab = "% COVER", xlab = "STATE", title = "% Cover of P. australis Per Plot (where cover > 0)", xtickslab.rt = 45, ytickslab.rt = 45)
p1

What is the native status of P. australis?

The dataset Plant CC & Native Status has information on the native status of every state:taxon pair. State taxon pairs represent the combination of every species that was found in every state. The CC of a species and Native Status of a species will often differ between states. We can use this dataset to determine whether P. australis is considered an invasive species in every state that it was found in during the NWCA survey:

#Bring in the Native Status data set
#natstat.csv <- getURL("https://www.epa.gov/sites/production/files/2016-10/nwca2011_planttaxa_cc_natstat.csv")
#nat.stat <- read.csv(text = natstat.csv)
nat.stat <- read.csv("nwca2011_planttaxa_cc_natstat.csv")

#create a summary of native status
nat.stat.paustralis <- filter(nat.stat, USDA_NAME == "PHRAGMITES AUSTRALIS")
summary(nat.stat.paustralis$NWCA_NATSTAT)

##       ADV CRYP INTR  NAT  UND 
##    0    0    0   23    0    0

Our summary shows that of the 23 states in which P. australis was recorded, it is considered an introduced species in all of them. This is helpful to know going forward as we look at non-native stressors and the cover of native gramanoid species at sites where P. australis was recorded.

What is the average percent cover of P. australis on sites where it’s present?

We’ll look at average site cover (instead of plot cover as we did in the previous examples) in order to integrate other data features, such as stressors and site quality which are only available at the site level.

First we will create a table of the average cover of P. australis at each site. To do this we filter our plot.cover data set for SPECIES == PHRAGMITES AUSTRALIS, and then calculate the average of the five plot cover values.

# Filter plot.cover to only include P. australis
plot.cover <- filter(plant.cover, SPECIES == "PHRAGMITES AUSTRALIS")
# merge UID and PLOT columns to create a unique identifier for every plot
plot.cover <- unite(plot.cover, "uidplot", c("UID", "PLOT"), sep = "")
# select the columns we need (uidPlot, cover, species)
plot.cover <- plot.cover[,c(2, 11, 14, 17)]
plot.cover$observation <- 1:nrow(plot.cover)
# Spread the data 
plot.cover <- spread(plot.cover, SPECIES, COVER)
# convert data.frame to data.table
plot.cover <- data.table(plot.cover)
# replace NA with blanks
plot.cover[is.na(plot.cover)] <- ""
# Collapse data by uidplot
plot.cover <- plot.cover[, lapply(.SD, paste0, collapse=""), by=uidplot]
# remove column 2 and 3
plot.cover <- plot.cover[,-c(2,3)]
# replace blanks with 0
plot.cover[plot.cover == ""] <- 0
plot.cover$uidplot <- str_sub(plot.cover$uidplot, 1, str_length(plot.cover$uidplot)-1)
# convert columns to numeric
plot.cover <- plot.cover[, lapply(.SD, as.numeric), by=`uidplot`]
plot.cover <- plot.cover[, lapply(.SD, mean), by=uidplot]
#Rename columns
names(plot.cover)[1]<-"UID"
names(plot.cover)[2]<-"PHRAGMITES_AUSTRALIS"
plot.cover

We can then add additional information into this table, such as STATE, ECOREGION, HGM Class and Wetland Class…. from the Site Information data set. We can combine the site.info data set with our plot.cover dataset by merging a key column, in this case UID. A key column is a column shared by both data sets that you’re trying to combine. For the vegetation data UID and SPECIES are common key columns you’ll use.

# Add Class, HGM, Eco_4, and State into the dataset.
#site.info.csv <- getURL("https://www.epa.gov/sites/production/files/2016-10/nwca2011_siteinfo.csv")
#site.info <- read.csv(text = site.info.csv)
site.info <- read.csv("nwca2011_siteinfo.csv")

site.info <- site.info[, c(2, 12, 13, 34, 52)]
plot.cover[, `UID`:=as.numeric(`UID`)]
plot.cover <- merge(plot.cover, site.info, by = "UID")
plot.cover <- distinct(plot.cover)

site.info

plot.cover

How does the cover of P. australis vary between NWCA agreggated ecoregions?

p2 <- ggboxplot(plot.cover, x = "NWCA_ECO4", y = "PHRAGMITES_AUSTRALIS", 
          color = "NWCA_ECO4",
          legend = "none",
          add = "mean",
          ylab = "% COVER", xlab = "NWCA_ECO4", title = "% Cover of P. australis Per Site", xtickslab.rt = 45, ytickslab.rt = 45)
p2

How does the cover of P. australis vary between different HGM classes?

p3 <- ggboxplot(plot.cover, x = "CLASS_FIELD_HGM", y = "PHRAGMITES_AUSTRALIS", 
          color = "CLASS_FIELD_HGM",
          legend = "none",
          add = "mean",
          ylab = "% COVER", xlab = "CLASS_FIELD_HGM", title = "% Cover of P. australis Per Site", xtickslab.rt = 45, ytickslab.rt = 45)
p3

How does the cover of P. australis differ between wetland class?

p5 <- ggboxplot(plot.cover, x = "CLASS_FIELD_FWSST", y = "PHRAGMITES_AUSTRALIS", 
          color = "CLASS_FIELD_FWSST",
          legend = "none",
          add = "mean",
          ylab = "% COVER", xlab = "CLASS_FIELD_FWSST", title = "% Cover of P. australis Per Site", xtickslab.rt = 45, ytickslab.rt = 45)
p5

How does P. australis cover relate to native graminoid cover

We can look at how the cover of P. australis on sites compares to native graminoid cover

We’ll use information from the Plant Metrics file which has average absolute cover for vegetation groups such as Graminoids, Native Graminoids and Non-Native Graminoids. We’ll join the veg.metrics data set to the plant.cover data set again by using UID as a key column.

#veg.metrics.csv <- getURL("https://www.epa.gov/sites/production/files/2016-10/nwca2011_vegmetrics.csv")
#veg.metrics <- read.csv(text = veg.metrics.csv)
veg.metrics <- read.csv("nwca2011_vegmetrics.csv")
veg.metrics <- veg.metrics[,c(2, 199:201)]
plot.cover <- merge(plot.cover, veg.metrics, by = "UID")
veg.metrics

plot.cover

my.formula <- y ~ x
p6 <- ggplot(data = plot.cover, aes(x = XABCOV_GRAMINOID, y = PHRAGMITES_AUSTRALIS)) +
   geom_smooth(method = "lm", se=FALSE, color="black", formula = my.formula) +         
   geom_point() 
p6 + theme_bw() + stat_poly_eq(formula = my.formula, 
                aes(label = paste(..rr.label.., sep = "~~~")), 
                parse = TRUE) +  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")) + labs(y="P. australis Cover", x = "Average Absolute Graminoid Cover ")

my.formula <- y ~ x
p8 <- ggplot(data = plot.cover, aes(x = XABCOV_GRAMINOID_NAT, y = PHRAGMITES_AUSTRALIS)) +
   geom_smooth(method = "lm", se=FALSE, color="black") +         
   geom_point() 
p8 + theme_bw() + stat_poly_eq(formula = my.formula, 
                aes(label = paste(..rr.label.., sep = "~~~")), 
                parse = TRUE) +  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")) + labs(y="P. australis Cover", x = "Average Absolute Native Graminoid Cover ")

As the cover of P. australis increases on a plot, the cover of native grasses decreases.

Because we still have site level information such as wetland class in our dataset, we can use it to facet the dataset and view the relationship between P. australis and Native graminoid cover on smaller scales.

my.formula <- y ~ x
p9 <- ggplot(data = plot.cover, aes(x = XABCOV_GRAMINOID_NAT, y = PHRAGMITES_AUSTRALIS)) +
   geom_smooth(method = "lm", se=FALSE, color="black") +         
   geom_point() + facet_wrap(~CLASS_FIELD_FWSST)
p9 + theme_bw() + stat_poly_eq(formula = my.formula, 
                aes(label = paste(..rr.label.., sep = "~~~")), 
                parse = TRUE) +  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")) + labs(y="P. australis Cover", x = "Average Absolute Native Graminoid Cover ")

# + facet_wrap(~CLASS_FIELD_FWSST)
# + scale_y_continuous(trans='log2')

Are there differences in cover between sites with different stressors and vegetation quality?

We can bring in data from the VMMI and Stressors and Condition datasets to look at how the cover of P. australis differs between sites with different levels of stressors and vegetation condition. The VMMI data set has values such as FQAI and VMMI that we can use to assess vegetation quality, while the Stressors and Condition dataset will give us information on the level of stress a site is experiencing due to factors such as ditching, vegetation removal, invasive species and heavy metals. Again, UID is a good key column to use to combine these two data sets with plot.cover.

#vmmi.csv <- getURL("https://www.epa.gov/sites/production/files/2016-10/nwca2011_vegmmi.csv")
#vmmi <- read.csv (text = vmmi.csv)
vmmi <- read.csv("nwca2011_vegmmi.csv")


#stressors.csv <- getURL("https://www.epa.gov/sites/production/files/2016-10/nwca2011_cond_stress.csv")
#stressors <- read.csv(text = stressors.csv)
stressors <- read.csv("nwca2011_cond_stress.csv")

vmmi <- vmmi[,c(2, 8, 9, 10, 11)]
stressors <- stressors[c(2, 36:41, 45, 47, 49)]

plot.cover <- merge(plot.cover, vmmi, by = "UID")
plot.cover <- merge(plot.cover, stressors, by = "UID")#plot.cover
vmmi

stressors

plot.cover

p10 <- ggboxplot(plot.cover, x = "STRESS_DAM", y = "PHRAGMITES_AUSTRALIS", 
          #color = "VEGCOND.x",
          legend = "none",
          add = "mean",
          ylab = "% COVER of  P. australis", xlab = "Stress from Dams", title = "% Cover of P. australis in Relation to Stress from Dams  ", xtickslab.rt = 45, ytickslab.rt = 45,  order = c("Low", "Moderate", "High"))
p10

p11 <- ggboxplot(plot.cover, x = "STRESS_NONNATIVE", y = "PHRAGMITES_AUSTRALIS", 
          #color = "VEGCOND.x",
          legend = "none",
          add = "mean",
          ylab = "% COVER of  P. australis", xlab = "Stress from Non-Natives", title = "% Cover of P. australis in Relation to Non-Native Stress", xtickslab.rt = 45, ytickslab.rt = 45,  order = c("Low", "Moderate", "High"))
p11

p12 <- ggboxplot(plot.cover, x = "STRESS_HEAVYMETAL", y = "PHRAGMITES_AUSTRALIS", 
          #color = "VEGCOND.x",
          legend = "none",
          add = "mean",
          ylab = "% COVER of  P. australis", xlab = "Stress from Heavy Metals", title = "% Cover of P. australis in Relation to Stress from Heavy Metals", xtickslab.rt = 45, ytickslab.rt = 45,  order = c("Low", "Moderate", "High"))
p12

p13 <- ggboxplot(plot.cover, x = "STRESS_SOILP", y = "PHRAGMITES_AUSTRALIS", 
          #color = "VEGCOND.x",
          legend = "none",
          add = "mean",
          ylab = "% COVER of  P. australis", xlab = "Stress from Soil P", title = "% Cover of P. australis in Relation to Stress from Soil Phosphorus", xtickslab.rt = 45, ytickslab.rt = 45,  order = c("Low", "Moderate", "High"))
p13

p14 <- ggplot(data = plot.cover, aes(x = VMMI, y = PHRAGMITES_AUSTRALIS)) +
   geom_smooth(method = "lm", se=FALSE, color="black") +         
   geom_point()  + facet_wrap(~VEGCOND)
p14 + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) + labs(y="P australis Cover", x = "VMMI")

Conclusion

We brought together data from six different NWCA data sets to look at the presence of Phragmites australis on NWCA plots
Pay attention to whether a data set is at the Species, Plot or Site level
Look at key columns to see how datasets can be merged together