tell us

separate() is part of the dplyr package that is within tidyverse. separate() allows you to separate variables within one column into new columns. It might be useful when separating two or more variables that are clumped together within one column.

show us

The first step is to install your packages. Here we instill the tidyverse package, and the palmerpenguins package for our data.

install and load packages

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidyr)
library(palmerpenguins)

get some data

Here we load the data from the palmerpenguins package.

We use the print function to view a summary of the data.

data(package = 'palmerpenguins')
print(penguins_raw)
## # A tibble: 344 x 17
##    studyName `Sample Number` Species       Region Island Stage   `Individual ID`
##    <chr>               <dbl> <chr>         <chr>  <chr>  <chr>   <chr>          
##  1 PAL0708                 1 Adelie Pengu~ Anvers Torge~ Adult,~ N1A1           
##  2 PAL0708                 2 Adelie Pengu~ Anvers Torge~ Adult,~ N1A2           
##  3 PAL0708                 3 Adelie Pengu~ Anvers Torge~ Adult,~ N2A1           
##  4 PAL0708                 4 Adelie Pengu~ Anvers Torge~ Adult,~ N2A2           
##  5 PAL0708                 5 Adelie Pengu~ Anvers Torge~ Adult,~ N3A1           
##  6 PAL0708                 6 Adelie Pengu~ Anvers Torge~ Adult,~ N3A2           
##  7 PAL0708                 7 Adelie Pengu~ Anvers Torge~ Adult,~ N4A1           
##  8 PAL0708                 8 Adelie Pengu~ Anvers Torge~ Adult,~ N4A2           
##  9 PAL0708                 9 Adelie Pengu~ Anvers Torge~ Adult,~ N5A1           
## 10 PAL0708                10 Adelie Pengu~ Anvers Torge~ Adult,~ N5A2           
## # ... with 334 more rows, and 10 more variables: Clutch Completion <chr>,
## #   Date Egg <date>, Culmen Length (mm) <dbl>, Culmen Depth (mm) <dbl>,
## #   Flipper Length (mm) <dbl>, Body Mass (g) <dbl>, Sex <chr>,
## #   Delta 15 N (o/oo) <dbl>, Delta 13 C (o/oo) <dbl>, Comments <chr>

As you can see, the “stage variable” has two types of information: what stage of life the penguin is in, and egg stages.

We want to split this information into two columns.

use the function

This is where we can use the separate function!

To use this function, you need to indicate what column you want to separate. You then use “into = c()” to indicate what the new labels of your new columns will be. Finally, you use “sep”, to indicate how the information will be separated.

Here we indicate the data will be separated using a comma “,”. Therefore all the information on one side of the comma will be put under “stage” and the information on the right will be put under “egg stage”.

penguins_raw %>%
  separate(Stage, into = c("Stage", "Egg stage"), sep = ",") 
## # A tibble: 344 x 18
##    studyName `Sample Number` Species           Region Island  Stage `Egg stage` 
##    <chr>               <dbl> <chr>             <chr>  <chr>   <chr> <chr>       
##  1 PAL0708                 1 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  2 PAL0708                 2 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  3 PAL0708                 3 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  4 PAL0708                 4 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  5 PAL0708                 5 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  6 PAL0708                 6 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  7 PAL0708                 7 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  8 PAL0708                 8 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
##  9 PAL0708                 9 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
## 10 PAL0708                10 Adelie Penguin (~ Anvers Torger~ Adult " 1 Egg Sta~
## # ... with 334 more rows, and 11 more variables: Individual ID <chr>,
## #   Clutch Completion <chr>, Date Egg <date>, Culmen Length (mm) <dbl>,
## #   Culmen Depth (mm) <dbl>, Flipper Length (mm) <dbl>, Body Mass (g) <dbl>,
## #   Sex <chr>, Delta 15 N (o/oo) <dbl>, Delta 13 C (o/oo) <dbl>, Comments <chr>

more resources

We found out how to use the separate function by googling “tidyr separate function blog”.

Some useful links included:

https://blog.rstudio.com/2014/07/22/introducing-tidyr/

https://tidyr.tidyverse.org/reference/separate.html

Both links show examples of how to use the function! This was super helpful in aiding us in making our own tutorial.