This tutorial will walk you through the basic skill of calculating average trait values of a species in a dataset with multiple trait values for multiple species. We will create a new data-frame that groups data by species, providing an average trait value for each species present in the original dataset. Having trait averages for species can be useful for a number of different reasons, including creating figures and using the resulting dataset in a secondary analysis or index calculation.
To calculate trait averages and organize the resulting data-frame we will be using tidyr
and dplyr
(this method will be very similar to the method we used in the community weighted mean calculation tutorial). You should start by installing tidyverse
if you have not already done so (use install.packages("tidyverse")
).You should also load the appropriate packages as shown here:
# Load packages
library(tidyr)
library(dplyr)
Be sure to set your working directory! HINT - setwd()
In order to calculate species averages for traits all you need is a data-frame with species and trait values for each specie sample.
Your data should look something like this:
Species | Height..cm. | SLA..cm2.g. | X.N..N.mass. | N15 | X.C..Cmass. | C13 |
---|---|---|---|---|---|---|
ammophila_breviligulata | 156.0 | 70.570 | 1.03 | 1.74 | 49.21 | -26.91 |
cyperus_esculentes | 130.0 | 66.726 | 0.87 | 0.85 | 47.84 | -13.14 |
panicum_amarum | 127.0 | 293.776 | 1.27 | -4.14 | 42.74 | -11.87 |
setaria_parvifolia | 77.0 | 138.056 | 0.51 | -2.92 | 46.27 | -11.68 |
spartina_patens | 116.0 | 169.155 | 1.25 | 3.76 | 46.86 | -12.69 |
ammophila_breviligulata | 68.5 | 51.094 | 0.88 | -0.54 | 48.85 | -25.78 |
andropogon_virginicus | 102.5 | 78.732 | 0.86 | 0.70 | 39.24 | -13.19 |
conyza_canadensis | 58.0 | 141.304 | 1.63 | -2.61 | 47.22 | -29.12 |
First we will want to direct the result to a new data-frame, this will help us check our work easily in R (in this example I am naming the df summarize.spp.avg). In the code chunk below you will notice an operator that may be new to you (%>%
). This is called “piping”. It takes the output of one statement and makes it the input of the next statement. It is a commonly used operator in dplyr
.
In the example below, we are taking all the data from the nutnet.spp.avg and using it as the input for the group_by()
function. Grouping by “Species” tells R that we want our final result of average trait values to be group by species names. We then use the output of group_by(Species)
as the input for the summarize
function. The summarize
function is how we can pull together different statistical functions. We can use the mean()
function to calculate averages.
In the summarize()
statement you will have to first identify how you want your summarization organized (here I use the names of the trait that will be averaged for each species). Next, you will define the statistical function you want R to perform, here we want mean()
.
# Calculating CWM using dplyr and tidyr functions
summarize.spp.avg <- # New dataframe where we can inspect the result
nutnet.spp.avg %>% # First step in the next string of statements
group_by(Species) %>% # Groups the summary file by Plot number
summarize( # Coding for how we want our CWMs summarized
Height = mean(Height..cm.), # Actual calculation of CWMs
SLA = mean(SLA..cm2.g.),
Nmass = mean(X.N..N.mass.),
N15 = mean(N15),
Cmass = mean(X.C..Cmass.),
C13 = mean(C13),
CNratio = mean(C.N..ratio.)
)
Your new summarize.spp.avg data-frame should be added to the data environment. It should look something like this:
Species | Height | SLA | Nmass | N15 | Cmass | C13 |
---|---|---|---|---|---|---|
ammophila_breviligulata | 110.74500 | 61.31740 | 0.9440000 | 1.6260000 | 48.70100 | -26.12250 |
andropogon_virginicus | 110.15455 | 65.51773 | 0.7690909 | 2.1372727 | 47.38909 | -12.94273 |
conyza_canadensis | 76.91429 | 102.59471 | 1.0414286 | -1.4557143 | 47.52714 | -20.26143 |
cyperus_esculentes | 122.25000 | 152.98050 | 0.8675000 | 0.6650000 | 46.22375 | -12.56625 |
fimbrystylis_castanae | 76.00000 | 169.70050 | 1.0350000 | -2.3150000 | 48.40500 | -20.56000 |
gnaphalium_purpureum | 57.00000 | 247.14300 | 1.6300000 | -2.4200000 | 45.98000 | -30.19000 |
panicum_amarum | 83.60000 | 183.56900 | 1.4560000 | -0.1250000 | 46.08400 | -20.69900 |
setaria_parvifolia | 96.57692 | 181.21869 | 0.8192308 | 0.9023077 | 45.94692 | -12.12231 |
You should notice that you no longer have repeating species observations because we grouped by species identity.
As a final step you should write your result data-frame as a .csv and save it so that it can be easily read into R or other statistical software if you are interested in running further analysis on the data.
write.csv(summarize.spp.avg, "summarize_spp_avg.csv")
# Not Run
# Load packages
library(tidyr)
library(dplyr)
# SET WORKING DIRECTORY!
# Calculating community weighted means and summarizing by plot - using dplyr
nutnet.spp.avg <- read.csv("fnxl.trait.nutnet_spp.avg.csv")
summarize.spp.avg <-
nutnet.spp.avg %>%
group_by(Species) %>%
summarize(Height = mean(Height..cm.),
SLA = mean(SLA..cm2.g.),
Nmass = mean(X.N..N.mass.),
N15 = mean(N15),
Cmass = mean(X.C..Cmass.),
C13 = mean(C13),
CNratio = mean(C.N..ratio.)
)
write.csv(summarize.spp.avg, "summarize_spp_avg.csv")