weekly challenge: pivot

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readxl)
library(here)

## here() starts at /Users/clairebattaglia

the challenge

This week’s challenge is to successfully use the pivot_longer() function. I’ll be using the dataset debt_in_trillions.xlsx. I’ve already loaded the appropriate packages (see above code chunk) so the first step is to read in the data and see what we’re dealing with. Instead of setting the working directory as I’ve been doing, I’ll try a new method, just for the sake of learning something new: the here() function, from the here package.

# read in data using here() function and assign to object
debt <- read_excel(here("Documents", "DACSS601Fall21", "_data"))

It looks like the here package has set my starting place as /Users/clairebattaglia and I’m having trouble figuring out how to access _data from there. For the sake of moving forward today, then, I am going to revert back to simply setting the working directory.

# set working directory
setwd("~/Documents/DACSS601Fall21/_data")

# read in and assign data to object
debt <- read_excel("debt_in_trillions.xlsx")

Let’s check it by previewing the dataset.

# preview dataset
head(debt)

## # A tibble: 6 × 8
##   `Year and Quarter` Mortgage `HE Revolving` `Auto Loan` `Credit Card`
##   <chr>                 <dbl>          <dbl>       <dbl>         <dbl>
## 1 03:Q1                  4.94          0.242       0.641         0.688
## 2 03:Q2                  5.08          0.26        0.622         0.693
## 3 03:Q3                  5.18          0.269       0.684         0.693
## 4 03:Q4                  5.66          0.302       0.704         0.698
## 5 04:Q1                  5.84          0.328       0.72          0.695
## 6 04:Q2                  5.97          0.367       0.743         0.697
## # … with 3 more variables: Student Loan <dbl>, Other <dbl>, Total <dbl>

Looks good. As far as datasets go, this one is actually in reasonable shape. I can see a few things that need to be done, though.

to-do

split the column Year and Quarter into two columns: 1) Year and 2) Quarter
use the pivot_longer() function to transform this “wide” table into a “long” table
get rid of the column Total, not because it’s not important but because I can easily calculate the totals from the other columns if/when I need that information so it’s redundant

splitting one column into two

I’ll use the separate() function to do this. The two elements year and quarter are separated by a colon and consistently formatted so it should be pretty straightforward to separate them.

# separate Year and Quarter column into 2 columns
debt <- debt %>%
  separate(col = 1, into = c('Year', 'Quarter'), sep = ':')

# preview dataset to check work
head(debt)

## # A tibble: 6 × 9
##   Year  Quarter Mortgage `HE Revolving` `Auto Loan` `Credit Card` `Student Loan`
##   <chr> <chr>      <dbl>          <dbl>       <dbl>         <dbl>          <dbl>
## 1 03    Q1          4.94          0.242       0.641         0.688          0.241
## 2 03    Q2          5.08          0.26        0.622         0.693          0.243
## 3 03    Q3          5.18          0.269       0.684         0.693          0.249
## 4 03    Q4          5.66          0.302       0.704         0.698          0.253
## 5 04    Q1          5.84          0.328       0.72          0.695          0.260
## 6 04    Q2          5.97          0.367       0.743         0.697          0.263
## # … with 2 more variables: Other <dbl>, Total <dbl>

Success! I now have two columns: Year and Quarter.

pivoting from wide to long

Next I’ll use the pivot_longer() function to put everything in the right place. Year and quarter are both variables and are already in their own columns—good! The remaining columns, however, are actually all values of the variable type of debt. Thus, I need to create a column Type.

# pivot_longer to create new column for type of debt
debt <- debt %>%
  pivot_longer(
    cols = c(`Mortgage`, `HE Revolving`, `Auto Loan`, `Credit Card`, `Student Loan`, `Other`),
    names_to = "Type",
    values_to = "Amount"
  )

# view dataset to check work
debt

## # A tibble: 444 × 5
##    Year  Quarter Total Type         Amount
##    <chr> <chr>   <dbl> <chr>         <dbl>
##  1 03    Q1       7.23 Mortgage      4.94 
##  2 03    Q1       7.23 HE Revolving  0.242
##  3 03    Q1       7.23 Auto Loan     0.641
##  4 03    Q1       7.23 Credit Card   0.688
##  5 03    Q1       7.23 Student Loan  0.241
##  6 03    Q1       7.23 Other         0.478
##  7 03    Q2       7.38 Mortgage      5.08 
##  8 03    Q2       7.38 HE Revolving  0.26 
##  9 03    Q2       7.38 Auto Loan     0.622
## 10 03    Q2       7.38 Credit Card   0.693
## # … with 434 more rows

selecting the columns I want

Finally, let’s get rid of the column Total. I’ll use the select() function to recreate the dataset with only the columns I want. Since I want to keep four columns and remove one, it will actually be easier to specify the one column I don’t want to include.

# recreate dataset without Total column
debt <- debt %>%
  select(-Total)

# view dataset to check work
debt

## # A tibble: 444 × 4
##    Year  Quarter Type         Amount
##    <chr> <chr>   <chr>         <dbl>
##  1 03    Q1      Mortgage      4.94 
##  2 03    Q1      HE Revolving  0.242
##  3 03    Q1      Auto Loan     0.641
##  4 03    Q1      Credit Card   0.688
##  5 03    Q1      Student Loan  0.241
##  6 03    Q1      Other         0.478
##  7 03    Q2      Mortgage      5.08 
##  8 03    Q2      HE Revolving  0.26 
##  9 03    Q2      Auto Loan     0.622
## 10 03    Q2      Credit Card   0.693
## # … with 434 more rows

Sweet! Next up: turning this into some kind of cool graph!

moving forward

I ran into a few challenges today that I’ll need to figure out:

using the here package
using the separate() function using column names instead of column numbers (line 55). When I used the column name I received an error msg that there was no column with that name.

weekly challenge: pivot_longer