This is an R Markdown file.. Click the hyperlink for help.

This is a compilation of notes containing examples on the use of tidyr package in R.

To begin with, let us load the package

library(tidyr)

What is a tidy data set?

A data set is called tidy when:

Reshaping data using tidyr

The tidyr package provides four functions to help you change the layout of your data set.

  1. gather(): which collapse columns into rows
  2. spread(): which spreads rows into columns
  3. separate(): splits a single column into multiple
  4. unite(): unite multiple columns into one

Example data sets

We’ll use the R built-in USArrests data sets. We start by subsetting a small data set, which will be used in the next sections as an example data set:

Row names are states, so let us use the function cbind() to add a column named “state” in the data. This will made the data tidy and the analysis easier

gather()

The function gather() collapses multiple columns into key-value pairs. It produces a “long” data format from a “wide” one. It’s an alternative of melt() function [in reshape2 package].

  1. Simplified format
    • data : A data frame
    • key, value: Names of key and value columns to create an output
  2. Examples of Usage:
    • Gather all columns except column state

1.

Gather only Murder and Assault columns

spread()

The function spread() does the reverse of gather. It takes down two columns( key and value) and spreads into multiple columns. It produces a wide data format from a “long” one. It’s an alternative of the function cast () [in reshape2 package].

  1. Simplified format.
  • data: A data frame
  • key: The name of column whose values will be used as column headings
  • value: The names of the column whose values will populate the cells.
  1. Examples of usage Spread “my_data2” to turn back to original data:

unite()

The function unite() takes multiple columns and paste them together into one.

  1. Simplified format
  • data: A data frame:
  • col: The new name of the of column to add
  • sep: Separator to use between values
  1. Examples of usage The R code below uses the dataset “my_data” and unites the column murder and assault.

separate()

The function separate () is the reverse of unite (). It takes values inside a single character column and separates them into multiple columns.

  1. Simplified format.
  • data: A data frame
  • col: column names
  • into: character vector specifying the names of new variables to be created.
  • sep: separator between columns:
    • If character, is interpreted as a regular expression.
    • if numeric, interpreted as positions to split at. Positive values starts at 1 at the far left of the string; nagative value start at -1 at the far right of the string.
  1. Examples of Usage

Separate the column “Murder_Assault” into two columns.


  1. Note that all column names has been collapsed into one single column except for the “state” column