This is an R Markdown file.. Click the hyperlink for help.
This is a compilation of notes containing examples on the use of tidyr package in R.
To begin with, let us load the package
What is a tidy data set?
A data set is called tidy when:
The tidyr package provides four functions to help you change the layout of your data set.
Example data sets
We’ll use the R built-in USArrests data sets. We start by subsetting a small data set, which will be used in the next sections as an example data set:
Row names are states, so let us use the function cbind() to add a column named “state” in the data. This will made the data tidy and the analysis easier
The function gather() collapses multiple columns into key-value pairs. It produces a “long” data format from a “wide” one. It’s an alternative of melt() function [in reshape2 package].
1.
Gather only Murder and Assault columns
new_data<-my_data[,(1:3)]
new_data<-gather(new_data, key="arrest_attribute", value="arrest_estimate",-state)
new_dataThe function spread() does the reverse of gather. It takes down two columns( key and value) and spreads into multiple columns. It produces a wide data format from a “long” one. It’s an alternative of the function cast () [in reshape2 package].
The function unite() takes multiple columns and paste them together into one.
The function separate () is the reverse of unite (). It takes values inside a single character column and separates them into multiple columns.
Separate the column “Murder_Assault” into two columns.
Note that all column names has been collapsed into one single column except for the “state” column↩