dplyrPlease keep in mind that many of these tutorials are just one of the many ways you can code for a solution! We challenge you to find ways to code to a desired result that is efficient for you! Don’t be afraid to ‘Google it’!
There will be times when you will have a large master dataset containing all of your data in one data frame. Sometimes you want to take subset of that dataset to run analysis. This tutorial will show you how to make new .csv files of specific data subsets from one large dataset.
In this tutorial I am using matrix of species in plots that have been given different nutrient enrichment treatments. First, I want to select() for specific species and then slice() for my nutrient treatments to create .csv files on species abundance for each treatment. An example of my dataset can be seen in the Example Data section.
As for many things in R, there are many ways to code that result in a similar outcome. Here we will be using tidyr and dplyr these packages are part of tidyverse. If you have not already done so, you should install tidyverse using install.packages("tidyverse") and loading the tidyr and dplyr packages as seen in the code below.
#Load packages
#install.packages("tidyverse") #I have this commented out because it is already installed
library("tidyr")
library("dplyr")DON’T FORGET TO SET YOUR WORKING DIRESCTORY! HINT -> setwd()
| X | Ammophila_breviligulata | Andropogon_virginicus | Chamaesyce_maculata | Conyza_canadensis |
|---|---|---|---|---|
| 1 | 24 | 0 | 0 | 1 |
| 2 | 10 | 3 | 0 | 10 |
| 3 | 17 | 8 | 0 | 3 |
| 4 | 27 | 11 | 0 | 10 |
| 5 | 20 | 5 | 1 | 5 |
| 6 | 15 | 0 | 0 | 1 |
| 7 | 15 | 15 | 0 | 2 |
| 8 | 25 | 16 | 0 | 1 |
In the example data above, “X” represents plot number, corresponding to a replicate of a nutrient treatment. Species names are indicated as variables of the matrix and the data represents % cover of each species in each plot. Once your data is loaded and looks accurate, we can start selecting for specific species.
We will be using functions in dplyr and tidyr to select species that we want cover data for. For example, I may only want cover data from Ammophila_breviligulata, Andropgon_virginicus, and Conyza_candensis but not Chamaesyce_maculata. To make this selection we can use the select() function in dplyr. In the following chunk of code I will be using piping which is coded as %>%. Simply put, it takes the output of one statement and makes it the input of the next statement. It is a commonly used operator in R when running dplyr and/or tidyr.
# Selecting specific species
nutnet.spp.dat <- nutnet.spp.dat %>% # Tells R what datframe we want to use
select(c( # This is where we specify which columns (variables) to keep
X,
Ammophila_breviligulata,
Andropogon_virginicus,
Conyza_canadensis
))We want to keep our plot numbers for the next step!
Our result should be an edit to the nutnet.spp.dat data frame. Only the species we indicate should be included in the edited data frame. For example:
| X | Ammophila_breviligulata | Andropogon_virginicus | Conyza_canadensis |
|---|---|---|---|
| 1 | 24 | 0 | 1 |
| 2 | 10 | 3 | 10 |
| 3 | 17 | 8 | 3 |
| 4 | 27 | 11 | 10 |
| 5 | 20 | 5 | 5 |
| 6 | 15 | 0 | 1 |
| 7 | 15 | 15 | 2 |
| 8 | 25 | 16 | 1 |
You can see that all of our plots are still represented but we have effectively removed the Chamaesyce_maculata column.
Next we will choose which plots we want to include into the data subset.
We will use the slice() function to select plots (represented as X in the example dataset) that we want to include in our data subset. For example, if we only want plots that are fertilized by Nitrogen, and we know that those are plot numbers 4, 5, 8, 12, and 20, then we can pick out just those plots. We can do all of this by telling R which plots are not Nitrogen treatments, and therefore should be deleted. In the chunk of code below I make this Nitrogen subset a new df that will appear when I run the code.
I will make a new df called N.spp.mat because the final code should produce a nice matrix of my Nitrogen plots with exactly the species I told R I wanted.
slice() is not the only way to create a subset (you can also use the filter() function in dplyr. But, because of the way slice() was built it can be faster when you are dealing with large datasets.# Filtering based on plot # (X)
N.spp.mat <- slice(nutnet.spp.dat, -c(1:3, 13:19, 6:7, 9:11)) # Specifies which plots to slice outOur result is a tidy data frame that should look something like this:
| X | Ammophila_breviligulata | Andropogon_virginicus | Conyza_canadensis |
|---|---|---|---|
| 4 | 27 | 11 | 10 |
| 5 | 20 | 5 | 5 |
| 8 | 25 | 16 | 1 |
| 12 | 25 | 16 | 1 |
| 20 | 22 | 7 | 2 |
# Set working directory
setwd("~/Google Drive File Stream/My Drive/CPEL Lab Data/Personnel folders/Joe Brown/Chapter 3 - Nutrient Network/Chapter_3_Analysis/")
# Reading in .csv for total nutnet species matrix
nutnet.spp.dat <- read.csv("FD_calculations_by_treatment/FD_nutnet_spp_matrix_2017.csv", row.names = 2)
# Select for the species that have trait data
nutnet.spp.dat <- nutnet.spp.dat %>%
select(
c(
X,
Ammophila_breviligulata,
Andropogon_virginicus,
Conyza_canadensis,
Cyperus_esculentes,
Fimbristylis_castanea,
Gnaphalium_purpureum,
Panicum_amarum,
Setaria_parviflora,
Solidago_sempervirens,
Spartina_patens
)
)
# Filtering based on plot # (X) - slicing out the plots that aren't N
N.spp.mat <- slice(nutnet.spp.dat, -c(1:3, 13:19, 6:7, 9:11))