Notes on the tidyr package

This is a compilation of notes containing examples on the use of tidyr package in R.

Reshaping data using tidyr

The tidyr package provides four functions to help you change the layout of your data set.

gather(): which collapse columns into rows
spread(): which spreads rows into columns
separate(): splits a single column into multiple
unite(): unite multiple columns into one

Example data sets

We’ll use the R built-in USArrests data sets. We start by subsetting a small data set, which will be used in the next sections as an example data set:

my_data<-USArrests[c(1,10,20,30),]
my_data

Row names are states, so let us use the function cbind() to add a column named “state” in the data. This will made the data tidy and the analysis easier

my_data<-cbind(state=rownames(my_data),my_data)

gather()

The function gather() collapses multiple columns into key-value pairs. It produces a “long” data format from a “wide” one. It’s an alternative of melt() function [in reshape2 package].

Simplified format
- data : A data frame
- key, value: Names of key and value columns to create an output
Examples of Usage:
- Gather all columns except column state

my_data2<-gather(my_data, key="arrest_attribute", value="arrest_estimate",-state)

my_data2

¹.

Gather only Murder and Assault columns

new_data<-my_data[,(1:3)]
new_data<-gather(new_data, key="arrest_attribute", value="arrest_estimate",-state)

new_data

spread()

The function spread() does the reverse of gather. It takes down two columns( key and value) and spreads into multiple columns. It produces a wide data format from a “long” one. It’s an alternative of the function cast () [in reshape2 package].

Simplified format.

data: A data frame
key: The name of column whose values will be used as column headings
value: The names of the column whose values will populate the cells.

Examples of usage Spread “my_data2” to turn back to original data:

my_data3<-spread(my_data2, key="arrest_attribute", value="arrest_estimate")

my_data3

unite()

The function unite() takes multiple columns and paste them together into one.

Simplified format

data: A data frame:
col: The new name of the of column to add
sep: Separator to use between values

Examples of usage The R code below uses the dataset “my_data” and unites the column murder and assault.

my_data4<-unite(my_data,col="Murder_Assault", Murder, Assault, sep="_")

my_data4

separate()

The function separate () is the reverse of unite (). It takes values inside a single character column and separates them into multiple columns.

Simplified format.

data: A data frame
col: column names
into: character vector specifying the names of new variables to be created.
sep: separator between columns:
- If character, is interpreted as a regular expression.
- if numeric, interpreted as positions to split at. Positive values starts at 1 at the far left of the string; nagative value start at -1 at the far right of the string.

Examples of Usage

Separate the column “Murder_Assault” into two columns.

separate(my_data4, col="Murder_Assault", into=c("Murder", "Assault"), sep="_")

Note that all column names has been collapsed into one single column except for the “state” column↩