Setup

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(nycflights23)
library(fivethirtyeight)

Comments

The focus of this chapter in Modern Dive is making a dataframe “tidy” by pivotting to a longer version.

The pivot_longer() command has three arguments.

The columns to pivot, frequently specified by excluding one or more columns.
A column in the new, longer dataframe, which will contain the names of the existing column names. This is the parameter names_to.
A column in the new, longer dataframe, which will contain the values in the existing columns, which are to be pivoted. This is the parameter values_to.

Task

You can use pivot_longer() even if it doesn’t make analytical sense. As an exercise, do the following:

Start with the cdc dataframe.

Add a variable ID =1:nrow(cdc).

Use pivot longer to place the values of height and weight in a new column called metric. Call the new dataframe nonsense.

Keep only gender, age, metric and value.

Use glimpse() to verify your work.

Solution

load("cdc.Rdata")

cdc = cdc %>% 
  mutate(ID = 1:nrow(cdc))

nonsense = cdc %>% 
  pivot_longer(cols = c(height,weight),
               names_to = "metric",
               values_to = "value") %>% 
  select(ID,gender,age,metric,value)

head(nonsense)

## # A tibble: 6 × 5
##      ID gender   age metric value
##   <int> <fct>  <int> <chr>  <dbl>
## 1     1 m         77 height    70
## 2     1 m         77 weight   175
## 3     2 f         33 height    64
## 4     2 f         33 weight   125
## 5     3 f         49 height    60
## 6     3 f         49 weight   105

Note that every row in the original dataframe has turned into two rows in the longer dataframe.

Whoops!

We now have a problem if we want to compute the BMI. It requires an algebraic expression in height and weight. To do this, height and weight have to be in the same row of the dataframe.

Use pivot_wider() to undo nonsense and turn it back into sense. We just need to reverse what we did. Use head() to verify your work.

Solution

sense = nonsense %>% 
  pivot_wider(names_from = metric,
              values_from = value)

head(sense)

## # A tibble: 6 × 5
##      ID gender   age height weight
##   <int> <fct>  <int>  <dbl>  <dbl>
## 1     1 m         77     70    175
## 2     2 f         33     64    125
## 3     3 f         49     60    105
## 4     4 f         42     66    132
## 5     5 f         55     61    150
## 6     6 f         55     64    114

If we hadn’t created the ID variable, this would have failed.

Pivot

Setup

Comments

Task

Solution

Whoops!

Solution