Data Journalism with R - Storybench tidyr pivoting

Martin Frigaard 2019-07-12

Objectives
Part one: Tidy data
- The tidy data principles
  - Indexed vs. Cartesian
    - Indexed data
    - Cartesian data
Part two: Grouping
Part three: Pivoting
- Pivoting == rotating 90˚
Pivoting Example 1: Categorical/ordinal variable across multiple columns
Pivoting Example 2: Moving categorical or ordinal variables across columns
- Pivoting wider options
- Read more

TLDR: This tutorial was prompted by the recent changes to the tidyr package (see the tweet from Hadley Wickham below). Two functions for reshaping columns and rows (gather() and spread()) were replaced with tidyr::pivot_longer() and tidyr::pivot_wider() functions.

Thanks to all 2649 (!!!) people who completed my survey about table shapes! I've done analysed the data at https://t.co/hyu1o91xRm and the new functions will be called pivot_longer() and pivot_wider() #rstats

— Hadley Wickham (@hadleywickham) March 24, 2019

# this will require the newest version of tidyr from github
# devtools::install_github("tidyverse/tidyr")
library(tidyr)

TIDY⬢TIDY⬢TIDY⬢TIDY⬢TIDY⬢TIDY⬢TID⬢TIDY
 ⬢ __  _    __   .    ⬡           ⬢  . 
  / /_(_)__/ /_ ___  _____ _______ ___ 
 / __/ / _  / // / |/ / -_) __(_-</ -_)
 \__/_/\_,_/\_, /|___/\__/_/ /___/\__/ 
      ⬢  . /___/      ⬡      .       ⬢
TIDY⬢TIDY⬢TIDY⬢TIDY⬢TIDY⬢TIDY⬢TID⬢TIDY

Objectives

This tutorial will cover three concepts about working with data in the tidyverse:

tidy data,
grouping, and
the new pivoting verbs in tidyr

A solid understanding of these topics makes it easier to manipulate and re-structure your data for visualizations and modeling in the tidyverse.

I’ll be using examples of spreadsheets that were designed for data entry, not necessarily statistical modeling or graphics.

My goal is that by showing the reasoning behind the data entry process, you’ll walk away with a better understanding (and hopefully a little less frustration) for why data are collected in so many different ways. To follow along with this tutorial, download the data here.

Part one: Tidy data

Tidy data is… “a standardized way to link the structure of a dataset (its physical layout) with its semantics (its meaning)”

If you’ve worked with SQL and relational databases, you’ll recognize most of these concepts. Hadley Wickham distilled a lot of the technical jargon from Edgar F. Codd’s ‘normal form’ and applied it to statistical terms. More importantly, he translated these essential principles into concepts and terms a broader audience can grasp and use for data manipulation.

base::load(file = "data/tidyr-data.RData")

The tidy data principles

Tidy data refers to ‘rectangular’ data. These are the data we typically see in spreadsheet software like Googlesheets, Microsoft Excel, or in a relational database like MySQL, PostgreSQL, or Microsoft Access, The three principles for tidy data are:

Variables make up the columns
Observations (or cases) go in the rows
Values are in cells

Put them together, and these three statements make up the contents in a tidy data frame or tibble. While these principles might seem obvious at first, many of the data arrangements we encounter in real life don’t adhere to this guidance.

Indexed vs. Cartesian

I prefer to refer to tidy data as “indexed”, and wide data as “Cartesian” (this terminology is from the ggplot2 text) because these terms help me understand what is happening when the data get transformed. Two examples are below:

Indexed data

In the Indexed data frame, the group and number variables are used to keep track of each unique value of measure.

group	number	measure
A	ID001	1098
C	ID001	3049
D	ID003	2394
B	ID004	9301
C	ID006	4092

Cartesian data

The same data are represented in the Cartesian data frame, but in this table the measure values are at the intersection of group and each ID### (the number from the previous table).

group	ID001	ID003	ID004	ID006
A	1098	NA	NA	NA
B	NA	NA	9301	NA
C	3049	NA	NA	4092
D	NA	2394	NA	NA

As you can see, the Cartesian orientation uses up more columns (and fills in the gaps with missing (NA) values).

Part two: Grouping

Grouping produces summary data tables using functions from the dplyr package. Similar to GROUP BY in SQL, dplyr::group_by() silently groups a data frame (which means we don’t see any changes) and then applies aggregate functions using dplyr::summarize(). Examples of these are mean(), median(), sum(), n(), sd(), etc.

Take the data frame below, DataTibble, which has 5 variables:

group_var - a categorical group
year - the calendar year the measurements were collected
x_measurement and y_measurement - these are randomly generated numbers
ordinal_y_var - this is an ordinal variable corresponding to the values in y_measurement (greater than or equal to 800 is "high" or 3, greater than or equal to 500 and less than 750 is "med" or 2, and less than 500 is "low" or 1).

group_var	year	x_measurement	y_measurement	ordinal_y_var
A	2018	11.81	532.37	med
A	2017	28.46	116.04	low
A	2016	49.15	304.21	low
B	2018	87.56	719.38	med
B	2017	11.33	984.38	high
C	2018	15.87	959.41	high
C	2017	63.76	962.27	high
C	2016	96.03	744.52	med

If I apply dplyr::group_by() to the group_var in DataTibble, I will see no visible result.

DataTibble %>%
  dplyr::group_by(group_var)

group_var	year	x_measurement	y_measurement	ordinal_y_var
A	2018	11.81	532.37	med
A	2017	28.46	116.04	low
A	2016	49.15	304.21	low
B	2018	87.56	719.38	med
B	2017	11.33	984.38	high
C	2018	15.87	959.41	high
C	2017	63.76	962.27	high
C	2016	96.03	744.52	med

But when I combine dplyr::group_by() with dplyr::summarize(), I can collapse DataTibble into a smaller table by supplying an aggregate function to summarize(). Below I use summarize() to get the mean of x_measurement and y_measurement and n() to get the total number in each group.

DataTibble %>%
  dplyr::group_by(group_var) %>%
  dplyr::summarize(
    x_mean = mean(x_measurement),
    y_mean = mean(y_measurement),
    no = n()
  )

group_var	x_mean	y_mean	no
A	29.80667	317.5400	3
B	49.44500	851.8800	2
C	58.55333	888.7333	3

Grouping can also work with categorical/factor variables. The code below uses dplyr::count() to summarize the number of ordinal_y_var levels per category of group_var.

DataTibble %>%
  dplyr::count(group_var, ordinal_y_var)

group_var	ordinal_y_var	n
A	med	1
A	low	2
B	high	1
B	med	1
C	high	2
C	med	1

This table isn’t as easy to read, because all of the information is oriented vertically. I can move the values of group_var into individual columns to make it easier on the eyes using tidyr::spread().

DataTibble %>%
  dplyr::count(group_var,ordinal_y_var) %>%
  tidyr::spread(key = group_var,value = n)

ordinal_y_var	A	B	C
high	NA	1	2
med	1	1	1
low	2	NA	NA

Notice how this creates a table with different dimensions? This arrangement can quickly be undone with the accompanying tidyr::gather() function.

tidyr::gather() works a lot like the tidyr::spread() function, but also requires us to specify that the missing values should be removed (na.rm = TRUE).

I also added a dplyr::select() statement to arrange these values so they are similar to the table above.

DataTibble %>%
  dplyr::count(group_var, ordinal_y_var) %>%
  tidyr::spread(key = group_var, value = n) %>%
  tidyr::gather(key = group_var, value = "n", 
                -ordinal_y_var, na.rm = TRUE) %>%
  dplyr::select(group_var, ordinal_y_var, n)

group_var	ordinal_y_var	n
A	med	1
A	low	2
B	high	1
B	med	1
C	high	2
C	med	1

Part three: Pivoting

All this data manipulation brings us to pivoting, the recent additions to the tidyr package. These will be slowly replacing the functions I used above for reshaping data frames, tidyr::gather() and tidyr::spread(). I found it refreshing to learn that I wasn’t the only person struggling to use these functions. Hadley Wickham, the package developer/author, confessed he also struggles when using these functions,

Many people don’t find the names intuitive and find it hard to remember which direction corresponds to spreading and which to gathering. It also seems surprisingly hard to remember the arguments to these functions, meaning that many people (including me!) have to consult the documentation every time.

Decisions like these are examples of why I appreciate the tidyverse, because I can tell a lot of thought gets put into identifying language that accurately capture the users intentions. Knowing how to reshape data is an important skill for any data scientist, and the tidyr::pivot_ functions make restructuring very clear and explicit.

Pivoting == rotating 90˚

Whenever we use the pivot_ functions, we’re changing angles between the columns and rows. If the tables are pivoting from wide to longer, the column names and values rotate 90˚ into an index row.

When pivoting from long to wide, the index variable shifts 90˚ across the column names, and the values slide in underneath the names.

These two tidyr::pivot_ functions give users the ability to quickly rotate their data from columns to rows (and back), and provide many arguments for customizing the new data orientations. Tidying data is a great skill to start with because most of the data you’ll encounter in the tidyverse is going to be in columns and rows (or you will want to get them that way).

Pivoting Example 1: Categorical/ordinal variable across multiple columns

Vasily Lomachenko, is currently the best pound-for-pound boxer in the world. We’re going to start by manipulating a data set of Lomachenko’s fights from the BoxRec database. The fight information is oriented in a way that makes sense for the person entering the data, but it’s not ideal for analysis or modeling.

LomaWideSmall %>% utils::head()

opponent	date	fight_1	fight_2	fight_3	fight_4	fight_5	fight_6	fight_7	fight_8	fight_9	fight_10	fight_11	fight_12	fight_13	fight_14	fight_15
José Ramírez	2013-10-12	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Orlando Salido	2014-03-01	NA	Loss	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Gary Russell Jr.	2014-06-21	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Chonlatarn Piriyapinyo	2014-11-22	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Gamalier Rodríguez	2015-05-02	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Romulo Koasicha	2015-11-07	NA	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA

This data arrangement is Cartesian, because each fight number has it’s own column (fight_1 through fight_14), and the result (Win/Loss) is located at the intersection between the fight numbered columns and the corresponding opponent and date. Having the data in this arrangement might seem odd to a data scientist, but this configuration makes sense for a fan entering the fights into a spreadsheet as they happen in real time. Consider the chronological chain of events involved with a each fight,

An opponent is announced, and our excited fan enters the information into the first two column/rows in a spreadsheet and titles it, ’Lomachenko`
The date for the first fight gets entered into the B column (the second in the table),
In order to track an athlete’s win/loss record over the course of their career, a number is also marked for each fight (starting with 1) in a column titled, fight_1. and the result gets put in the corresponding cell (Win or Loss)

As you can see, the steps above are sensible for someone wanting to track their favorite athlete (or sports team) over time. I’ve encountered excel tables with orientations that mirror the sequence of events they’re capturing. This “data-entry friendly” format captures enough information to be useful, and it has a few computational abilities. For example, a fan could use filtering to count the number of matches against a particular opponent, or sort the date column to figure what Lomachenko’s late fight. Spreadsheets like these are a cross between a timeline and a record book, and they do a good enough job at both tasks to justify their current structure.

The pivot vignette conveys that ‘wide’ formats or data arrangements are common because their primary purpose and design is centered around recording data (and not visualization or modeling),

Pivoting long

We are going to start by pivoting the LomaWideSmall data frame to a long or ‘tidy’ format. We accomplish this using the tidyr::pivot_longer() function.

LomaWideSmall %>%
    
  tidyr::pivot_longer(
      
    cols = starts_with("fight"),

    names_to = "fight_no",

    values_to = "result",

    names_prefix = "fight_",

    values_drop_na = TRUE)

opponent	date	fight_no	result
José Ramírez	2013-10-12	1	Win
Orlando Salido	2014-03-01	2	Loss
Gary Russell Jr.	2014-06-21	3	Win
Chonlatarn Piriyapinyo	2014-11-22	4	Win
Gamalier Rodríguez	2015-05-02	5	Win
Romulo Koasicha	2015-11-07	6	Win
Román Martínez	2016-06-11	7	Win
Nicholas Walters	2016-11-26	8	Win
Jason Sosa	2017-04-08	9	Win
Miguel Marriaga	2017-08-05	10	Win
Guillermo Rigondeaux	2017-12-09	11	Win
Jorge Linares	2018-05-12	12	Win
José Pedraza	2018-12-08	13	Win
Anthony Crolla	2019-04-12	14	Win

What happens in pivot_longer()?

tidyr::pivot_longer() extends the tidyr::spread() function by adding a few arguments to make this re-structuring more deliberate:

The cols argument contains the columns we want to pivot (and it takes tidyselect helper functions). These are going to be the new index columns.
The names_to and values_to are the new columns we will be creating from all of the fight_ variables. The names_to is the new index column, meaning how would you ‘look up’ the value in an adjacent column. The values_to are the contents of the cells that correspond to each value in the names_to variable.
The names_prefix is a regular expression pattern I can use to clean the old variable names as they become the values in the new columns. I enter fight_ here because is will remove the text from the front the variable name and only enter the number into the new fight_no variable.

Unfortunately, we can see that the new fight_no variable is still formatted as a character. Luckily, the new tidyr::pivot_longer() function also has a tidyr::pivot_longer_spec(), which allows for additional specifications on the data transformation.

Pivoting longer (with specifications)

A pivoting spec is a data frame that describes the metadata stored in the column name, with one row for each column, and one column for each variable mashed into the column name.

The tidyr::pivot_longer_spec() function allows even more specifications on what to do with the data during the transformation. Because we are still in the tidyverse, the output is a data fame. Explicitly creating an object that contains data about your data is not necessarily a novel concept in R, but it’s very handy when that object is similar to the other objects you’re already working with (i.e. a data frame or tibble).

Creating a spec for the LomaWideSmall data frame is a three-step process:

1) Define the arguments for reshaping the data

I enter the same arguments I entered above in the tidyr::pivot_longer() function, with the exception of the na.rm = TRUE argument. These results get stored into an object I will name LomaSpec, which is a data frame with three variables in it: .name, .value, and fight_no.

LomaSpec <- LomaWideSmall %>%
    
  tidyr::pivot_longer_spec(
      
    cols = starts_with("fight"),

    names_to = "fight_no",

    values_to = "result",

    names_prefix = "fight_")

LomaSpec

.name	.value	fight_no
fight_1	result	1
fight_2	result	2
fight_3	result	3
fight_4	result	4
fight_5	result	5
fight_6	result	6
fight_7	result	7
fight_8	result	8
fight_9	result	9
fight_10	result	10
fight_11	result	11
fight_12	result	12
fight_13	result	13
fight_14	result	14
fight_15	result	15

The three columns in LomaSpec contain metadata (data about our data) on the transformation I want to perform–specifically the original columns (.name) and the corresponding cell values (.value). The other variable (fight_no) gets carried over from the transformation as well.

2) Format the variable(s) in the spec

If I want to format the fight_no variable, I can include those arguments within the LomaSpec data frame.

# format the variable
LomaSpec <- LomaSpec %>%
    
  dplyr::mutate(fight_no = as.numeric(fight_no))

LomaSpec$fight_no %>% glimpse(78)

##  num [1:15] 1 2 3 4 5 6 7 8 9 10 ...

3) Supply the formatted spec object to pivot_longer_spec() function

Now LomaSpec can get supplied to the tidyr::pivot_longer() function and the fight_no variable is properly formatted (note I still need to provide the na.rm = TRUE argument).

# supply it to the pivot_longer
LomaFightsLong <- LomaWideSmall %>%
    
  tidyr::pivot_longer(
      
    spec = LomaSpec,
    
    values_drop_na = TRUE)

LomaFightsLong

opponent	date	fight_no	result
José Ramírez	2013-10-12	1	Win
Orlando Salido	2014-03-01	2	Loss
Gary Russell Jr.	2014-06-21	3	Win
Chonlatarn Piriyapinyo	2014-11-22	4	Win
Gamalier Rodríguez	2015-05-02	5	Win
Romulo Koasicha	2015-11-07	6	Win
Román Martínez	2016-06-11	7	Win
Nicholas Walters	2016-11-26	8	Win
Jason Sosa	2017-04-08	9	Win
Miguel Marriaga	2017-08-05	10	Win
Guillermo Rigondeaux	2017-12-09	11	Win
Jorge Linares	2018-05-12	12	Win
José Pedraza	2018-12-08	13	Win
Anthony Crolla	2019-04-12	14	Win

These data are now in a tidy (indexed) format.

Pivoting wider

It’s less common, but sometimes you’ll need to move variables from a tidy (or indexed) column arrangement into a wide (Cartesian) format.

Pivoting Example 2: Moving categorical or ordinal variables across columns

In order to restructure from a wide to long data frame, we need to use the tidyr::pivot_wider() function. This works much like the pivot_longer() function, but with a few different arguments.

Consider a data frame with the same Lomachenko fight records, but this time arranged by date. Imagine the following events that would lead to the creation of the LomaDatesWide data:

It’s announced that Lomachenko will be fighting an opponent and location is announced, and our fan enters the information into the first two column/rows in her ’Lomachenko` spreadsheet
A fight_number column is created to document the number of fights (as they happen)
The date for the first fight (2013-10-12) gets entered into the the fourth column in the table, and the result (Win) gets put in the corresponding cell
After the scorecards are collected, the result is announced (see key below), the fight_record is updated (with the official outcome of the bout)
The official round and time (rounds and time) is recorded for when the fight had to stop (or the total number of rounds if it went the distance),
Titles and belts are listed in the notes section
When the next fight happens, our fan right-clicks on the last recorded fight, and inserts a new column and date, and repeats steps 1-6

I tend to find tables like these when the data entry is done using ‘Freeze Columns’ or ‘Freeze Rows’.

$freeze cells$

The LomaFights data frame has the fight records for Lomachenko from his wikipedia table. To create a spreadsheet like the one above, we can use the tidyr::pivot_wider() function.

I add a few dplyr functions to get the data frame to look identical to the excel file above.

LomaFights %>% 
    # the columns come from the dates
    tidyr::pivot_wider(names_from = date, 
                       # the values will be the result of the fight
                       values_from = result) %>% 
    # arrange by the fight number 
    dplyr::arrange(no) %>% 
    # rearrange the columns
    dplyr::select(opponent, location,
                  
                  dplyr::starts_with("20"),
                  
                  dplyr::everything())

opponent	location	2013-10-12	2014-03-01	2014-06-21	2014-11-22	2015-05-02	2015-11-07	2016-06-11	2016-11-26	2017-04-08	2017-08-05	2017-12-09	2018-05-12	2018-12-08	2019-04-12	2019-08-31	no	record	type	rounds	time	notes
José Ramírez	Thomas & Mack Center, Paradise, Nevada, US	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	1	1–0	TKO	4 (10)	2:55	Won WBO International featherweight title
Orlando Salido	Alamodome, San Antonio, Texas, US	NA	Loss	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2	1–1	SD	12	NA	For vacant WBO featherweight title
Gary Russell Jr.	StubHub Center, Carson, California, US	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3	2–1	MD	12	NA	Won vacant WBO featherweight title
Chonlatarn Piriyapinyo	Cotai Arena, Macau, SAR	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	4	3–1	UD	12	NA	Retained WBO featherweight title
Gamalier Rodríguez	MGM Grand Garden Arena, Paradise, Nevada, US	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	4–1	KO	9 (12)	0:50	Retained WBO featherweight title
Romulo Koasicha	Thomas & Mack Center, Paradise, Nevada, US	NA	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	NA	6	5–1	KO	10 (12)	2:35	Retained WBO featherweight title
Román Martínez	The Theater at Madison Square Garden, New York City, New York, US	NA	NA	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	NA	7	6–1	KO	5 (12)	1:09	Won WBO junior lightweight title
Nicholas Walters	Cosmopolitan of Las Vegas, Paradise, Nevada, US	NA	NA	NA	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	NA	8	7–1	RTD	7 (12)	3:00	Retained WBO junior lightweight title
Jason Sosa	MGM National Harbor, Oxon Hill, Maryland, US	NA	NA	NA	NA	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	NA	9	8–1	RTD	9 (12)	3:00	Retained WBO junior lightweight title
Miguel Marriaga	Microsoft Theater, Los Angeles, California, US	NA	NA	NA	NA	NA	NA	NA	NA	NA	Win	NA	NA	NA	NA	NA	10	9–1	RTD	7 (12)	3:00	Retained WBO junior lightweight title
Guillermo Rigondeaux	The Theater at Madison Square Garden, New York City, New York, US	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Win	NA	NA	NA	NA	11	10–1	RTD	6 (12)	3:00	Retained WBO junior lightweight title
Jorge Linares	Madison Square Garden, New York City, New York, US	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Win	NA	NA	NA	12	11–1	TKO	10 (12)	2:08	Won WBA (Super) and The Ring lightweight titles
José Pedraza	Hulu Theater, New York City, New York, US	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Win	NA	NA	13	12–1	UD	12	NA	Retained WBA (Super) and The Ring lightweight titles;Won WBO lightweight title
Anthony Crolla	Staples Center, Los Angeles, California, US	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Win	NA	14	13–1	KO	4 (12)	0:58	Retained WBA (Super), WBO, and The Ring lightweight titles
Luke Campbell	The O2 Arena, London, England, UK	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	15	NA	NA	NA	NA	Defending WBA (Super), WBO, and The Ring lightweight titlesFor vacant WBC lightweight title

Pivoting wider options

The output is very close to the excel sheet above, except missing values in excel are empty (not an NA like R). I can use the tidyr::pivot_wider(x, values_fill()) to include this change to the data.

LomaFights %>% 
    tidyr::pivot_wider(names_from = date, 
                       values_from = result,
                       values_fill = list(result = "")) %>% 
    dplyr::arrange(no) %>% 
    dplyr::select(opponent, 
                  location,
                  dplyr::starts_with("20"))

opponent	location	2013-10-12	2014-03-01	2014-06-21	2014-11-22	2015-05-02	2015-11-07	2016-06-11	2016-11-26	2017-04-08	2017-08-05	2017-12-09	2018-05-12	2018-12-08	2019-04-12	2019-08-31
José Ramírez	Thomas & Mack Center, Paradise, Nevada, US	Win
Orlando Salido	Alamodome, San Antonio, Texas, US		Loss
Gary Russell Jr.	StubHub Center, Carson, California, US			Win
Chonlatarn Piriyapinyo	Cotai Arena, Macau, SAR				Win
Gamalier Rodríguez	MGM Grand Garden Arena, Paradise, Nevada, US					Win
Romulo Koasicha	Thomas & Mack Center, Paradise, Nevada, US						Win
Román Martínez	The Theater at Madison Square Garden, New York City, New York, US							Win
Nicholas Walters	Cosmopolitan of Las Vegas, Paradise, Nevada, US								Win
Jason Sosa	MGM National Harbor, Oxon Hill, Maryland, US									Win
Miguel Marriaga	Microsoft Theater, Los Angeles, California, US										Win
Guillermo Rigondeaux	The Theater at Madison Square Garden, New York City, New York, US											Win
Jorge Linares	Madison Square Garden, New York City, New York, US												Win
José Pedraza	Hulu Theater, New York City, New York, US													Win
Anthony Crolla	Staples Center, Los Angeles, California, US														Win
Luke Campbell	The O2 Arena, London, England, UK															NA

Another neat part of the pivot_wider() function is the values_from argument can take multiple columns. For example, if I wanted to see how man Wins resulted in TKOs (technical knock-outs) or KOs (knock-outs) and their corresponding rounds and time, I could use the following to see rounds and time in separate columns.

LomaFights %>% 
    dplyr::filter(result == "Win") %>% 
    pivot_wider(names_from = c(type),
                values_from = c(rounds, time),
                values_fill = list(rounds = "",
                                   time = "")) %>% 
    dplyr::select(opponent,
                  dplyr::contains("_KO"),
                  dplyr::contains("_TKO"))

opponent	rounds_KO	time_KO	rounds_TKO	time_TKO
José Ramírez			4 (10)	2:55
Gary Russell Jr.
Chonlatarn Piriyapinyo
Gamalier Rodríguez	9 (12)	0:50
Romulo Koasicha	10 (12)	2:35
Román Martínez	5 (12)	1:09
Nicholas Walters
Jason Sosa
Miguel Marriaga
Guillermo Rigondeaux
Jorge Linares			10 (12)	2:08
José Pedraza
Anthony Crolla	4 (12)	0:58

We can also use the tidyr::pivot_wider() function like tidyr::spread() function above. If I want to see the fight results by type, but with Win and Loss in separate columns, I can use the following sequence.

LomaFights %>% 
    dplyr::filter(!is.na(result)) %>% 
    dplyr::count(type, result) %>% 
    tidyr::pivot_wider(
            names_from = result, 
            values_from = n,
            values_fill = list(n = 0))

type	Win	Loss
KO	4	0
MD	1	0
RTD	4	0
SD	0	1
TKO	2	0
UD	2	0

As you can see, these new tidyr::pivot_ functions extend tidyr::gather() and tidyr::spread() with more options and flexibility. I think you’ll find these are a welcome addition to the data wrangling packages and in the tidyverse.

Check out the pivoting vignette on the tidyverse website.
Alison Presmanes Hill has some great examples in this Github repo.
Read the NEWS.md file on the tidyr github page for new changes to these functions (and others!)
Be sure to check out Vasiliy Lomachenko when he fights Anthony Crolla on April 12!