the challenge

This week’s challenge is to play around with visualizing the data I wrangled last week as a time series. In addition to the tidyverse and ggplot2 packages, I’ll likely need the lubridate package, which I’ve already loaded. After reading this review of packages for handling dates and times in R, I’ve also loaded the anytime and flipTime packages, as they also may come in handy.

the data

The dataset I’m working with is simply named Debt in Trillions so there is a lot I don’t know about it, actually. For the sake of practice I am going to make some assumptions about it, while noting that if I were actually analyzing it, I would need to answer some questions before proceeding.

Last week I separated the Year and Quarter column into two columns, Year and Quarter, and pivoted the table from wide to long by collapsing the columns that were values of the variable type of debt into the column Type.

To get started, I’ll pass the tidy dataset into this new file.

# load tidy dataset
load("~/Documents/DACSS601Fall21/_data/debtintrillionstidy")

# print
debt
## # A tibble: 444 × 4
##    Year  Quarter Type         Amount
##    <chr> <chr>   <chr>         <dbl>
##  1 03    Q1      Mortgage      4.94 
##  2 03    Q1      HE Revolving  0.242
##  3 03    Q1      Auto Loan     0.641
##  4 03    Q1      Credit Card   0.688
##  5 03    Q1      Student Loan  0.241
##  6 03    Q1      Other         0.478
##  7 03    Q2      Mortgage      5.08 
##  8 03    Q2      HE Revolving  0.26 
##  9 03    Q2      Auto Loan     0.622
## 10 03    Q2      Credit Card   0.693
## # … with 434 more rows

Looks good. This is, indeed, the tidy version of the data.

time series

In order to visualize this dataset as a time series, I will need to have a single time variable of the variable type date. Because I currently have two time variables, year and quarter, I’ll need to do some further wrangling before I can plot my data.

# get class of year and quarter columns
class("Year")
## [1] "character"
class("Quarter")
## [1] "character"

Calling the class() command tells me how R is interpreting a particular type of data. In the above code chunk I can see that R views everything in the Year and Quarter columns as a character. This is important to know because it determines the conversion method.

I’m going to start by recoding the quarters as a partial date, in the format DD-MM, using the last day of each quarter. I think the simplest way to do this is with mutate() and case_when().

# recode quarters as DD-MM
debt <- debt %>%
  mutate(month_day = case_when(
    Quarter == "Q1" ~ '03-31',
    Quarter == "Q2" ~ '06-30',
    Quarter == "Q3" ~ '09-30',
    Quarter == "Q4" ~ '12-31'
  ))

# print
debt
## # A tibble: 444 × 5
##    Year  Quarter Type         Amount month_day
##    <chr> <chr>   <chr>         <dbl> <chr>    
##  1 03    Q1      Mortgage      4.94  03-31    
##  2 03    Q1      HE Revolving  0.242 03-31    
##  3 03    Q1      Auto Loan     0.641 03-31    
##  4 03    Q1      Credit Card   0.688 03-31    
##  5 03    Q1      Student Loan  0.241 03-31    
##  6 03    Q1      Other         0.478 03-31    
##  7 03    Q2      Mortgage      5.08  06-30    
##  8 03    Q2      HE Revolving  0.26  06-30    
##  9 03    Q2      Auto Loan     0.622 06-30    
## 10 03    Q2      Credit Card   0.693 06-30    
## # … with 434 more rows

Next I’ll convert the year variable type from a character to an integer.

# convert year variable type to integer
debt$Year <- as.integer(as.character(debt$Year))

# print
debt
## # A tibble: 444 × 5
##     Year Quarter Type         Amount month_day
##    <int> <chr>   <chr>         <dbl> <chr>    
##  1     3 Q1      Mortgage      4.94  03-31    
##  2     3 Q1      HE Revolving  0.242 03-31    
##  3     3 Q1      Auto Loan     0.641 03-31    
##  4     3 Q1      Credit Card   0.688 03-31    
##  5     3 Q1      Student Loan  0.241 03-31    
##  6     3 Q1      Other         0.478 03-31    
##  7     3 Q2      Mortgage      5.08  06-30    
##  8     3 Q2      HE Revolving  0.26  06-30    
##  9     3 Q2      Auto Loan     0.622 06-30    
## 10     3 Q2      Credit Card   0.693 06-30    
## # … with 434 more rows

Now that R recognizes those values as integers, I can simply add 2000 to each value to give me the year in the format YYYY.

# add 2000 to each value in year
debt$fullyear <- debt$Year + 2000

# print
debt
## # A tibble: 444 × 6
##     Year Quarter Type         Amount month_day fullyear
##    <int> <chr>   <chr>         <dbl> <chr>        <dbl>
##  1     3 Q1      Mortgage      4.94  03-31         2003
##  2     3 Q1      HE Revolving  0.242 03-31         2003
##  3     3 Q1      Auto Loan     0.641 03-31         2003
##  4     3 Q1      Credit Card   0.688 03-31         2003
##  5     3 Q1      Student Loan  0.241 03-31         2003
##  6     3 Q1      Other         0.478 03-31         2003
##  7     3 Q2      Mortgage      5.08  06-30         2003
##  8     3 Q2      HE Revolving  0.26  06-30         2003
##  9     3 Q2      Auto Loan     0.622 06-30         2003
## 10     3 Q2      Credit Card   0.693 06-30         2003
## # … with 434 more rows

Now I’ll concatenate the two columns I just created: fullyear and month_day into a new column: date.

# concatenate fullyear and month_date into date column
debt$date <- paste(debt$fullyear, debt$month_day)

# print
debt
## # A tibble: 444 × 7
##     Year Quarter Type         Amount month_day fullyear date      
##    <int> <chr>   <chr>         <dbl> <chr>        <dbl> <chr>     
##  1     3 Q1      Mortgage      4.94  03-31         2003 2003 03-31
##  2     3 Q1      HE Revolving  0.242 03-31         2003 2003 03-31
##  3     3 Q1      Auto Loan     0.641 03-31         2003 2003 03-31
##  4     3 Q1      Credit Card   0.688 03-31         2003 2003 03-31
##  5     3 Q1      Student Loan  0.241 03-31         2003 2003 03-31
##  6     3 Q1      Other         0.478 03-31         2003 2003 03-31
##  7     3 Q2      Mortgage      5.08  06-30         2003 2003 06-30
##  8     3 Q2      HE Revolving  0.26  06-30         2003 2003 06-30
##  9     3 Q2      Auto Loan     0.622 06-30         2003 2003 06-30
## 10     3 Q2      Credit Card   0.693 06-30         2003 2003 06-30
## # … with 434 more rows

Now I’ve got all the pieces of a date (YYYY, MM, DD) in a single column, date, but I haven’t yet told R to recognize it as a such. After playing around with how to do this for hours (days?), I ended up with something embarrassingly simple: mutate the date column into a new column, using anydate() from the anytime package to change the variable type in the process.

# create new column in date format
debt <- debt %>%
  mutate(Date = anydate(date))

# print
debt
## # A tibble: 444 × 8
##     Year Quarter Type         Amount month_day fullyear date       Date      
##    <int> <chr>   <chr>         <dbl> <chr>        <dbl> <chr>      <date>    
##  1     3 Q1      Mortgage      4.94  03-31         2003 2003 03-31 2003-03-31
##  2     3 Q1      HE Revolving  0.242 03-31         2003 2003 03-31 2003-03-31
##  3     3 Q1      Auto Loan     0.641 03-31         2003 2003 03-31 2003-03-31
##  4     3 Q1      Credit Card   0.688 03-31         2003 2003 03-31 2003-03-31
##  5     3 Q1      Student Loan  0.241 03-31         2003 2003 03-31 2003-03-31
##  6     3 Q1      Other         0.478 03-31         2003 2003 03-31 2003-03-31
##  7     3 Q2      Mortgage      5.08  06-30         2003 2003 06-30 2003-06-30
##  8     3 Q2      HE Revolving  0.26  06-30         2003 2003 06-30 2003-06-30
##  9     3 Q2      Auto Loan     0.622 06-30         2003 2003 06-30 2003-06-30
## 10     3 Q2      Credit Card   0.693 06-30         2003 2003 06-30 2003-06-30
## # … with 434 more rows

Success!

In the process of figuring this out, I’ve created an entire slew of unnecessary columns that are now clutter so before going any further, I’m going to use subset() to create a subset of my dataset containing only the columns I need.

# create subset to remove unnecessary columns
finaldebt <- subset(debt, select = -c(Year, Quarter, month_day, fullyear, date))

# view to check
finaldebt
## # A tibble: 444 × 3
##    Type         Amount Date      
##    <chr>         <dbl> <date>    
##  1 Mortgage      4.94  2003-03-31
##  2 HE Revolving  0.242 2003-03-31
##  3 Auto Loan     0.641 2003-03-31
##  4 Credit Card   0.688 2003-03-31
##  5 Student Loan  0.241 2003-03-31
##  6 Other         0.478 2003-03-31
##  7 Mortgage      5.08  2003-06-30
##  8 HE Revolving  0.26  2003-06-30
##  9 Auto Loan     0.622 2003-06-30
## 10 Credit Card   0.693 2003-06-30
## # … with 434 more rows

Much cleaner. Now to visualize!

visualization

To start, I’m going to try a simple scatterplot.

# create scatterplot
ggplot(data = finaldebt) +
  geom_point(mapping = aes(x = Date, y = Amount, color = Type))

Too many points for my taste so I’m going to “smooth” the data points into a line. I’ll also add a title and tweak the axis labels. Because of what I don’t know about this dataset, I won’t be able to create a very specific title or axis labels.

# create smooth
ggplot(data = finaldebt) +
  geom_smooth(mapping = aes(x = Date, y = Amount, color = Type)) +
  labs(title = "Debt", y = "amount in trillions", x = "date")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Finally, let’s separate this plot into facets.

# facet plot by type of debt
ggplot(data = finaldebt) +
  geom_smooth(mapping = aes(x = Date, y = Amount)) +
  facet_wrap(~ Type, nrow = 2) +
  labs(title = "Debt", y = "amount in trillions", x = "date")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

There we go! A first stab at visualization.