knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
library(here)
## here() starts at /Users/clairebattaglia
This week’s challenge is to successfully use the pivot_longer() function. I’ll be using the dataset debt_in_trillions.xlsx. I’ve already loaded the appropriate packages (see above code chunk) so the first step is to read in the data and see what we’re dealing with. Instead of setting the working directory as I’ve been doing, I’ll try a new method, just for the sake of learning something new: the here() function, from the here package.
# read in data using here() function and assign to object
debt <- read_excel(here("Documents", "DACSS601Fall21", "_data"))
It looks like the here package has set my starting place as /Users/clairebattaglia and I’m having trouble figuring out how to access _data from there. For the sake of moving forward today, then, I am going to revert back to simply setting the working directory.
# set working directory
setwd("~/Documents/DACSS601Fall21/_data")
# read in and assign data to object
debt <- read_excel("debt_in_trillions.xlsx")
Let’s check it by previewing the dataset.
# preview dataset
head(debt)
## # A tibble: 6 × 8
## `Year and Quarter` Mortgage `HE Revolving` `Auto Loan` `Credit Card`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 03:Q1 4.94 0.242 0.641 0.688
## 2 03:Q2 5.08 0.26 0.622 0.693
## 3 03:Q3 5.18 0.269 0.684 0.693
## 4 03:Q4 5.66 0.302 0.704 0.698
## 5 04:Q1 5.84 0.328 0.72 0.695
## 6 04:Q2 5.97 0.367 0.743 0.697
## # … with 3 more variables: Student Loan <dbl>, Other <dbl>, Total <dbl>
Looks good. As far as datasets go, this one is actually in reasonable shape. I can see a few things that need to be done, though.
I’ll use the separate() function to do this. The two elements year and quarter are separated by a colon and consistently formatted so it should be pretty straightforward to separate them.
# separate Year and Quarter column into 2 columns
debt <- debt %>%
separate(col = 1, into = c('Year', 'Quarter'), sep = ':')
# preview dataset to check work
head(debt)
## # A tibble: 6 × 9
## Year Quarter Mortgage `HE Revolving` `Auto Loan` `Credit Card` `Student Loan`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 03 Q1 4.94 0.242 0.641 0.688 0.241
## 2 03 Q2 5.08 0.26 0.622 0.693 0.243
## 3 03 Q3 5.18 0.269 0.684 0.693 0.249
## 4 03 Q4 5.66 0.302 0.704 0.698 0.253
## 5 04 Q1 5.84 0.328 0.72 0.695 0.260
## 6 04 Q2 5.97 0.367 0.743 0.697 0.263
## # … with 2 more variables: Other <dbl>, Total <dbl>
Success! I now have two columns: Year and Quarter.
Next I’ll use the pivot_longer() function to put everything in the right place. Year and quarter are both variables and are already in their own columns—good! The remaining columns, however, are actually all values of the variable type of debt. Thus, I need to create a column Type.
# pivot_longer to create new column for type of debt
debt <- debt %>%
pivot_longer(
cols = c(`Mortgage`, `HE Revolving`, `Auto Loan`, `Credit Card`, `Student Loan`, `Other`),
names_to = "Type",
values_to = "Amount"
)
# view dataset to check work
debt
## # A tibble: 444 × 5
## Year Quarter Total Type Amount
## <chr> <chr> <dbl> <chr> <dbl>
## 1 03 Q1 7.23 Mortgage 4.94
## 2 03 Q1 7.23 HE Revolving 0.242
## 3 03 Q1 7.23 Auto Loan 0.641
## 4 03 Q1 7.23 Credit Card 0.688
## 5 03 Q1 7.23 Student Loan 0.241
## 6 03 Q1 7.23 Other 0.478
## 7 03 Q2 7.38 Mortgage 5.08
## 8 03 Q2 7.38 HE Revolving 0.26
## 9 03 Q2 7.38 Auto Loan 0.622
## 10 03 Q2 7.38 Credit Card 0.693
## # … with 434 more rows
Finally, let’s get rid of the column Total. I’ll use the select() function to recreate the dataset with only the columns I want. Since I want to keep four columns and remove one, it will actually be easier to specify the one column I don’t want to include.
# recreate dataset without Total column
debt <- debt %>%
select(-Total)
# view dataset to check work
debt
## # A tibble: 444 × 4
## Year Quarter Type Amount
## <chr> <chr> <chr> <dbl>
## 1 03 Q1 Mortgage 4.94
## 2 03 Q1 HE Revolving 0.242
## 3 03 Q1 Auto Loan 0.641
## 4 03 Q1 Credit Card 0.688
## 5 03 Q1 Student Loan 0.241
## 6 03 Q1 Other 0.478
## 7 03 Q2 Mortgage 5.08
## 8 03 Q2 HE Revolving 0.26
## 9 03 Q2 Auto Loan 0.622
## 10 03 Q2 Credit Card 0.693
## # … with 434 more rows
Sweet! Next up: turning this into some kind of cool graph!
I ran into a few challenges today that I’ll need to figure out: