This tutorial will go over the basics skill of re-configuring dataframes so you can use them when R functions require specific dataframe format requirements. One of the most common reconfiguration needs you will come across in ecology is changing a dataframe from “wide” format to “long” format. This is most commonly used when you have a dataset in a matrix format.
tidyr - and dplyr for good measure!To reconfigure out dataframes we will be using packages that are part of tidyverse. You should start by installing tidyverse using install.packages("tidyverse") and loading the tidyr packages as seen in the code below.
# Load packages
library("tidyr")
library("dplyr")Be sure to set your working directory! HINT -> setwd()
You can think of the following data as a dummy dataset!
For purposes of this example, we will be taking community data in “wide”" (or matrix) format and converting to “long” formatted community data. To start, data should consist of plot and species columns (if you have multiple years or replicates in your data - they should be included as columns as well). In this example data, each row represents a plot number with abundance of each species (% cover). The example dataset is called nutnet.spp.comp Below is an example of how data should look while its in wide form:
| Plot | Ammophila_breviligulata | Andropogon_virginicus | Chamaesyce_maculata | Conyza_canadensis |
|---|---|---|---|---|
| 1 | 24 | 0 | 0 | 1 |
| 2 | 10 | 3 | 0 | 10 |
| 3 | 17 | 8 | 0 | 3 |
| 4 | 27 | 11 | 0 | 10 |
| 5 | 20 | 5 | 1 | 5 |
| 6 | 15 | 0 | 0 | 1 |
| 7 | 15 | 15 | 0 | 2 |
| 8 | 25 | 16 | 0 | 1 |
First, create a name for your new dataframe. Here, we name our new dataframe nutnet.spp.long. Below we will use the tidyr function called gather(). We will use nutnet.spp.comp as the input dataframe and will identify our “key” (Species) and “value” (Cover) columns. It is important to indicate that you do NOT want to include your “Plot” variable in the dataframe conversion. We do this by finishing the code off with -c() - in this last piece we will tell R which columns should not be included in the dataframe conversion. Once you indicate all this information, click Ctrl+Enter and let tidyr do the rest!
#Converting matrix data to long data
nutnet.spp.long <- # Name of our new df
gather(nutnet.spp.mat, # Indicate function and old data
Species, # This is your "Key"
Cover, # This is your "Value"
-c(Plot)) # This tells R not to include your "Plot" column as a "Species" | Plot | Species | Cover |
|---|---|---|
| 1 | Ammophila_breviligulata | 24 |
| 2 | Ammophila_breviligulata | 10 |
| 3 | Ammophila_breviligulata | 17 |
| 4 | Ammophila_breviligulata | 27 |
| 5 | Ammophila_breviligulata | 20 |
| 6 | Ammophila_breviligulata | 15 |
| 7 | Ammophila_breviligulata | 15 |
| 8 | Ammophila_breviligulata | 25 |
You should find that your new dataframe now has Plot, Species, and Cover as variable in your dataframe with plot numbers repeating for each species in your dataset!
# This is our imported df
nutnet.spp.mat <- read.csv("nutnet_spp_matrix_2017.csv")
# Converting long data to wide data - MATRIX STYLE
nutnet.spp.long <- nutnet.spp.long <- # Name of our new df
gather(nutnet.spp.mat, # Indicate function and old data
Species, # This is your "Key"
Cover, # This is your "Value"
-c(Plot)) # This tells R not to include your "Plot" column as a "Species"
write.csv(nutnet.spp.long, "nutnet_spp_long.csv")