This vignette will cover the pivot_longer and pivot_wider functions. These functions are an alternative to gather() and spread() while incorporating features from other packages.
The gather and spread functions are used to “tidy” data, rearranging data that should be in rows in the columns or vice versa. Gather() moves columns into rows while the spread() function moves rows into columns.
These functions are great, however, the syntax is not easy to remember and takes multiple tries to get it right. The functions pivot_longer() and pivot_wider() are meant to replace gather() and spread(), in a more intuitive way.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
knitr::opts_chunk$set(echo = TRUE)
The dataset used to demonstrate pivot functions is the drug-use-by-age dataset from 538.
drug_df_0 = read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/drug-use-by-age/drug-use-by-age.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## age = col_character(),
## `cocaine-frequency` = col_character(),
## `crack-frequency` = col_character(),
## `heroin-frequency` = col_character(),
## `inhalant-frequency` = col_character(),
## `oxycontin-frequency` = col_character(),
## `meth-frequency` = col_character()
## )
## See spec(...) for full column specifications.
glimpse(drug_df_0)
## Observations: 17
## Variables: 28
## $ age <chr> "12", "13", "14", "15", "16", "17", "18",...
## $ n <dbl> 2798, 2757, 2792, 2956, 3058, 3038, 2469,...
## $ `alcohol-use` <dbl> 3.9, 8.5, 18.1, 29.2, 40.1, 49.3, 58.7, 6...
## $ `alcohol-frequency` <dbl> 3, 6, 5, 6, 10, 13, 24, 36, 48, 52, 52, 5...
## $ `marijuana-use` <dbl> 1.1, 3.4, 8.7, 14.5, 22.5, 28.0, 33.7, 33...
## $ `marijuana-frequency` <dbl> 4, 15, 24, 25, 30, 36, 52, 60, 60, 52, 52...
## $ `cocaine-use` <dbl> 0.1, 0.1, 0.1, 0.5, 1.0, 2.0, 3.2, 4.1, 4...
## $ `cocaine-frequency` <chr> "5.0", "1.0", "5.5", "4.0", "7.0", "5.0",...
## $ `crack-use` <dbl> 0.0, 0.0, 0.0, 0.1, 0.0, 0.1, 0.4, 0.5, 0...
## $ `crack-frequency` <chr> "-", "3.0", "-", "9.5", "1.0", "21.0", "1...
## $ `heroin-use` <dbl> 0.1, 0.0, 0.1, 0.2, 0.1, 0.1, 0.4, 0.5, 0...
## $ `heroin-frequency` <chr> "35.5", "-", "2.0", "1.0", "66.5", "64.0"...
## $ `hallucinogen-use` <dbl> 0.2, 0.6, 1.6, 2.1, 3.4, 4.8, 7.0, 8.6, 7...
## $ `hallucinogen-frequency` <dbl> 52, 6, 3, 4, 3, 3, 4, 3, 2, 4, 3, 2, 3, 2...
## $ `inhalant-use` <dbl> 1.6, 2.5, 2.6, 2.5, 3.0, 2.0, 1.8, 1.4, 1...
## $ `inhalant-frequency` <chr> "19.0", "12.0", "5.0", "5.5", "3.0", "4.0...
## $ `pain-releiver-use` <dbl> 2.0, 2.4, 3.9, 5.5, 6.2, 8.5, 9.2, 9.4, 1...
## $ `pain-releiver-frequency` <dbl> 36, 14, 12, 10, 7, 9, 12, 12, 10, 15, 15,...
## $ `oxycontin-use` <dbl> 0.1, 0.1, 0.4, 0.8, 1.1, 1.4, 1.7, 1.5, 1...
## $ `oxycontin-frequency` <chr> "24.5", "41.0", "4.5", "3.0", "4.0", "6.0...
## $ `tranquilizer-use` <dbl> 0.2, 0.3, 0.9, 2.0, 2.4, 3.5, 4.9, 4.2, 5...
## $ `tranquilizer-frequency` <dbl> 52.0, 25.5, 5.0, 4.5, 11.0, 7.0, 12.0, 4....
## $ `stimulant-use` <dbl> 0.2, 0.3, 0.8, 1.5, 1.8, 2.8, 3.0, 3.3, 4...
## $ `stimulant-frequency` <dbl> 2.0, 4.0, 12.0, 6.0, 9.5, 9.0, 8.0, 6.0, ...
## $ `meth-use` <dbl> 0.0, 0.1, 0.1, 0.3, 0.3, 0.6, 0.5, 0.4, 0...
## $ `meth-frequency` <chr> "-", "5.0", "24.0", "10.5", "36.0", "48.0...
## $ `sedative-use` <dbl> 0.2, 0.1, 0.2, 0.4, 0.2, 0.5, 0.4, 0.3, 0...
## $ `sedative-frequency` <dbl> 13.0, 19.0, 16.5, 30.0, 3.0, 6.5, 10.0, 6...
Now, lets introduce the pivot_longer() function. As the name impllies, it makes datasets longer by increasing the number of rows and decreasing the number of columns. Let’s imagine we only care about total drug users per category and not frequency. In this dataset, frequency is a character type and we would need to convert to dbl in order to use the function.
#Let's clean the data and get rid of the unnecessary columns
drug_df <- drug_df_0[,-c(4,6,8,10,12,14,16,18,20,22,24,26,28)]
drug_df_morphin_time <- drug_df %>% pivot_longer(-c(age, n), names_to = "Drugs", values_to = "User_%" )
drug_df_morphin_time
## # A tibble: 221 x 4
## age n Drugs `User_%`
## <chr> <dbl> <chr> <dbl>
## 1 12 2798 alcohol-use 3.9
## 2 12 2798 marijuana-use 1.1
## 3 12 2798 cocaine-use 0.1
## 4 12 2798 crack-use 0
## 5 12 2798 heroin-use 0.1
## 6 12 2798 hallucinogen-use 0.2
## 7 12 2798 inhalant-use 1.6
## 8 12 2798 pain-releiver-use 2
## 9 12 2798 oxycontin-use 0.1
## 10 12 2798 tranquilizer-use 0.2
## # ... with 211 more rows
In the above example, I chose to create a new data frame but that was a personal preference not mandatory for the use of the pivot_longer function.
To illustrate the pivot_wider function, we are going to revert the dataset back to wide.
drug_df_widen <- drug_df_morphin_time %>% pivot_wider(names_from = `Drugs`, values_from = `User_%` )
drug_df_widen
## # A tibble: 17 x 15
## age n `alcohol-use` `marijuana-use` `cocaine-use` `crack-use`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 12 2798 3.9 1.1 0.1 0
## 2 13 2757 8.5 3.4 0.1 0
## 3 14 2792 18.1 8.7 0.1 0
## 4 15 2956 29.2 14.5 0.5 0.1
## 5 16 3058 40.1 22.5 1 0
## 6 17 3038 49.3 28 2 0.1
## 7 18 2469 58.7 33.7 3.2 0.4
## 8 19 2223 64.6 33.4 4.1 0.5
## 9 20 2271 69.7 34 4.9 0.6
## 10 21 2354 83.2 33 4.8 0.5
## 11 22-23 4707 84.2 28.4 4.5 0.5
## 12 24-25 4591 83.1 24.9 4 0.5
## 13 26-29 2628 80.7 20.8 3.2 0.4
## 14 30-34 2864 77.5 16.4 2.1 0.5
## 15 35-49 7391 75 10.4 1.5 0.5
## 16 50-64 3923 67.2 7.3 0.9 0.4
## 17 65+ 2448 49.3 1.2 0 0
## # ... with 9 more variables: `heroin-use` <dbl>, `hallucinogen-use` <dbl>,
## # `inhalant-use` <dbl>, `pain-releiver-use` <dbl>, `oxycontin-use` <dbl>,
## # `tranquilizer-use` <dbl>, `stimulant-use` <dbl>, `meth-use` <dbl>,
## # `sedative-use` <dbl>
The new dataset is the original dataset.
There are more complicated uses of pivot_longer() and pivot_wider(). Further exploration of the pivot functions can be found here:
https://tidyr.tidyverse.org/articles/pivot.html
But even at it’s most basic use, the pivot functions are intuitive and easy to use.