install.packages("tidyverse", repos = "http://cran.us.r-project.org")
library(tidyverse)species_x_site_matrix_tidyverse
Introduction
Certain analyses, such as NMDS, require your data to appear in “wide” format referred to as a “species x site matrix”. Here, each row is a sample and each column is a unique species. Within the cells are values denoting the number of individuals recorded.
However, it is not practical to enter data in this format, whether in the field or the lab. A more typical form is “long” format, with a column for “species name” or similar, and a column for “total”. In this format, only species encountered in each sample are recorded.
Additionally, your record sheet may have columns/variables not required for analysis and sub-samples that need summing at a replicate/plot level.
Conversion to wide format and the streamlining/summarising of your data set can easily be done within R using the Tidyverse package. Using the pipe operator (%>%, or the shortcut Shift+Ctrl+M) you can string together functions to reduce the number of intermediary objects.
Running Code
If you have not done already, install Tidyverse and call it with the library function.
This tutorial will use a dummy record sheet. Head to https://github.com/brewstej/tutorials to download or use the following code to automatically load the record sheet into an object called “mydata”.
urlfile = "https://raw.githubusercontent.com/brewstej/tutorials/refs/heads/main/example_species_record.csv"
mydata <- read_csv(url(urlfile))head(mydata)# A tibble: 6 × 8
plot.id sub.samp date habitat species.name male female total
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 P1 sub1 07/03/2025 grass A.species 10 2 12
2 P1 sub1 07/03/2025 grass B.species 6 4 10
3 P1 sub1 07/03/2025 grass C.species 2 3 5
4 P1 sub2 07/03/2025 grass D.species 1 2 3
5 P1 sub2 07/03/2025 grass A.species 12 4 16
6 P2 sub1 07/03/2025 grass C.species 12 12 24
Have a look at the variable titles and first few lines of data in the object “mydata”. You will notice that each plot (our unit of replication in this example) comprises two sub samples labelled “sub1” and “sub2” here. These could be quadrats, pitfall traps, repeated timed counts. Regardless, we want to pool these data and display a single row per species and a summed count of the individuals at a plot level.
We can use the “group_by” and “summarise” functions to do this. Give the function the variables you want to retain and use for grouping. Any variables not selected will be omitted from the output or summarised based on the subsequent “summarise” function. Here, we are creating a new variable called “new.total” and performing the function “sum” on the existing variable “total”.
The output will then be presented according to the grouping variables specified.
mydata_summarised <- mydata %>%
group_by(plot.id, habitat, species.name) %>%
summarise(new.total = sum(total))head(mydata_summarised)# A tibble: 6 × 4
# Groups: plot.id, habitat [2]
plot.id habitat species.name new.total
<chr> <chr> <chr> <dbl>
1 P1 grass A.species 28
2 P1 grass B.species 10
3 P1 grass C.species 5
4 P1 grass D.species 3
5 P10 heath A.species 21
6 P10 heath B.species 18
You should now see a new object, with all previous sub samples summed and presented as a single row of data per species, while retaining plot and habitat grouping.
We can now create the pivot table as follows:
mydata_matrix <- mydata_summarised %>%
pivot_wider(names_from = species.name, values_from = new.total, values_fill = 0)head(mydata_matrix)# A tibble: 6 × 8
# Groups: plot.id, habitat [6]
plot.id habitat A.species B.species C.species D.species E.species F.species
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 P1 grass 28 10 5 3 0 0
2 P10 heath 21 18 14 11 6 9
3 P2 grass 33 0 24 25 15 10
4 P3 grass 11 22 10 0 24 0
5 P4 grass 39 0 20 0 0 0
6 P5 grass 8 12 0 9 0 20
Pivoting wider introduces NA values (you didn’t enter zeros in your record sheet!). These are transformed into “0” with the values_fill argument.
All in one code chunk
The various stages have been outlined separately, to aid understanding and allow description. However, these can be combined and run as one:
mydata_matrix2 <- mydata %>%
group_by(plot.id, habitat, species.name) %>%
summarise(new.total = sum(total)) %>%
pivot_wider(names_from = species.name, values_from = new.total, values_fill = 0)head(mydata_matrix2)# A tibble: 6 × 8
# Groups: plot.id, habitat [6]
plot.id habitat A.species B.species C.species D.species E.species F.species
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 P1 grass 28 10 5 3 0 0
2 P10 heath 21 18 14 11 6 9
3 P2 grass 33 0 24 25 15 10
4 P3 grass 11 22 10 0 24 0
5 P4 grass 39 0 20 0 0 0
6 P5 grass 8 12 0 9 0 20
You can now swap out the dataset and variables for your own!
Bonus
P.S. if you wish to filter out your data based on certain variables, you can easily do this as part of the same process. Say, for example, I only wanted to look at a species x site matrix for “heath” habitat. I can use the “filter” function and specify that “habitat” must be exactly equal to (==) “heath”. There are many other operators you can use here (for numerical values etc.).
While you no longer need to group by “habitat” as there is only one, omitting it at this stage would remove it from the resulting object.
mydata_matrix_heath <- mydata %>%
filter(habitat == "heath") %>%
group_by(plot.id, habitat, species.name) %>%
summarise(new.total = sum(total)) %>%
pivot_wider(names_from = species.name, values_from = new.total, values_fill = 0)head(mydata_matrix_heath)# A tibble: 5 × 8
# Groups: plot.id, habitat [5]
plot.id habitat A.species B.species C.species D.species E.species F.species
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 P10 heath 21 18 14 11 6 9
2 P6 heath 0 16 0 9 0 0
3 P7 heath 3 0 0 0 0 22
4 P8 heath 17 0 41 0 0 4
5 P9 heath 0 0 0 0 15 0