This tutorial will walk you through the basic skill of calculating average trait values of a species in a dataset with multiple trait values for multiple species. We will create a new data-frame that groups data by species, providing an average trait value for each species present in the original dataset. Having trait averages for species can be useful for a number of different reasons, including creating figures and using the resulting dataset in a secondary analysis or index calculation.

To calculate trait averages and organize the resulting data-frame we will be using tidyr and dplyr (this method will be very similar to the method we used in the community weighted mean calculation tutorial). You should start by installing tidyverse if you have not already done so (use install.packages("tidyverse")).You should also load the appropriate packages as shown here:

# Load packages
library(tidyr)
library(dplyr)

Be sure to set your working directory! HINT - setwd()

## How your imported data should look

In order to calculate species averages for traits all you need is a data-frame with species and trait values for each specie sample.

Your data should look something like this:

Ex. Dataframe
Species Height..cm. SLA..cm2.g. X.N..N.mass. N15 X.C..Cmass. C13
ammophila_breviligulata 156.0 70.570 1.03 1.74 49.21 -26.91
cyperus_esculentes 130.0 66.726 0.87 0.85 47.84 -13.14
panicum_amarum 127.0 293.776 1.27 -4.14 42.74 -11.87
setaria_parvifolia 77.0 138.056 0.51 -2.92 46.27 -11.68
spartina_patens 116.0 169.155 1.25 3.76 46.86 -12.69
ammophila_breviligulata 68.5 51.094 0.88 -0.54 48.85 -25.78
andropogon_virginicus 102.5 78.732 0.86 0.70 39.24 -13.19
conyza_canadensis 58.0 141.304 1.63 -2.61 47.22 -29.12

## Summarizing Species Averages in New Dataframe

First we will want to direct the result to a new data-frame, this will help us check our work easily in R (in this example I am naming the df summarize.spp.avg). In the code chunk below you will notice an operator that may be new to you (%>%). This is called “piping”. It takes the output of one statement and makes it the input of the next statement. It is a commonly used operator in dplyr.

In the example below, we are taking all the data from the nutnet.spp.avg and using it as the input for the group_by() function. Grouping by “Species” tells R that we want our final result of average trait values to be group by species names. We then use the output of group_by(Species) as the input for the summarize function. The summarize function is how we can pull together different statistical functions. We can use the mean() function to calculate averages.

In the summarize() statement you will have to first identify how you want your summarization organized (here I use the names of the trait that will be averaged for each species). Next, you will define the statistical function you want R to perform, here we want mean().

# Calculating CWM using dplyr and tidyr functions
summarize.spp.avg <-   # New dataframe where we can inspect the result
nutnet.spp.avg %>%   # First step in the next string of statements
group_by(Species) %>%   # Groups the summary file by Plot number
summarize(           # Coding for how we want our CWMs summarized
Height = mean(Height..cm.),   # Actual calculation of CWMs
SLA = mean(SLA..cm2.g.),
Nmass = mean(X.N..N.mass.),
N15 = mean(N15),
Cmass = mean(X.C..Cmass.),
C13 = mean(C13),
CNratio = mean(C.N..ratio.)
)

## How your result data should look

Your new summarize.spp.avg data-frame should be added to the data environment. It should look something like this:

Ex. Result Dataframe
Species Height SLA Nmass N15 Cmass C13
ammophila_breviligulata 110.74500 61.31740 0.9440000 1.6260000 48.70100 -26.12250
andropogon_virginicus 110.15455 65.51773 0.7690909 2.1372727 47.38909 -12.94273
conyza_canadensis 76.91429 102.59471 1.0414286 -1.4557143 47.52714 -20.26143
cyperus_esculentes 122.25000 152.98050 0.8675000 0.6650000 46.22375 -12.56625
fimbrystylis_castanae 76.00000 169.70050 1.0350000 -2.3150000 48.40500 -20.56000
gnaphalium_purpureum 57.00000 247.14300 1.6300000 -2.4200000 45.98000 -30.19000
panicum_amarum 83.60000 183.56900 1.4560000 -0.1250000 46.08400 -20.69900
setaria_parvifolia 96.57692 181.21869 0.8192308 0.9023077 45.94692 -12.12231

You should notice that you no longer have repeating species observations because we grouped by species identity.

## Write .csv of new species averages

As a final step you should write your result data-frame as a .csv and save it so that it can be easily read into R or other statistical software if you are interested in running further analysis on the data.

write.csv(summarize.spp.avg, "summarize_spp_avg.csv")

## Now try with your own data!

# Not Run
library(tidyr)
library(dplyr)

# SET WORKING DIRECTORY!

# Calculating community weighted means and summarizing by plot - using dplyr
write.csv(summarize.spp.avg, "summarize_spp_avg.csv")