Challenge 6

Read in the Data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
debt <- read_excel("../challenge_datasets/debt_in_trillions.xlsx")
debt
# A tibble: 74 × 8
   `Year and Quarter` Mortgage `HE Revolving` `Auto Loan` `Credit Card`
   <chr>                 <dbl>          <dbl>       <dbl>         <dbl>
 1 03:Q1                  4.94          0.242       0.641         0.688
 2 03:Q2                  5.08          0.26        0.622         0.693
 3 03:Q3                  5.18          0.269       0.684         0.693
 4 03:Q4                  5.66          0.302       0.704         0.698
 5 04:Q1                  5.84          0.328       0.72          0.695
 6 04:Q2                  5.97          0.367       0.743         0.697
 7 04:Q3                  6.21          0.426       0.751         0.706
 8 04:Q4                  6.36          0.468       0.728         0.717
 9 05:Q1                  6.51          0.502       0.725         0.71 
10 05:Q2                  6.70          0.528       0.774         0.717
# ℹ 64 more rows
# ℹ 3 more variables: `Student Loan` <dbl>, Other <dbl>, Total <dbl>

Briefly describe the data

While I am making some leaps in assumptions, I believe that the data shows the cumulative debt of Americans in trillions of dollars in different areas, such as mortgages or credit cards. It also gives the time the debt was measured as a string and the total debt across all measured areas.

Tidy Data (as needed)

The only problem with our data is that the time column is given as a string of years and dates. Using the yq function from lubridate we can parse these entries into a date variable.

debt <- debt %>%
  mutate(`Year and Quarter` = yq(`Year and Quarter`)) %>%
  rename(Date = `Year and Quarter`)

Time Dependent Visualization

First, we can create a graph of all the various measures of debt vs time to see the big picture. I chose to use a line graph vs time as it can easily show trends over time for multiple numerical variables on a single graph.

ggplot(debt, aes(x = Date)) +
  geom_line(aes(y = Mortgage, color = "Mortgage")) +
  geom_line(aes(y = `HE Revolving`, color = "HE Revolving")) +
  geom_line(aes(y = `Auto Loan`, color = "Auto Loan")) +
  geom_line(aes(y = `Credit Card`, color = "Credit Card")) +
  geom_line(aes(y = `Student Loan`, color = "Student Loan")) +
  geom_line(aes(y = `Other`, color = "Other")) +
  geom_line(aes(y = `Total`, color = "Total")) +
  labs(title = "Debt Measures Over Time",
       x = "Date",
       y = "Debt in Trillions") +
  scale_color_discrete(name = "Measure of Debt") +
  theme_minimal()

We see that really the only two lines which are easily interpretable in our graph are the Total and Mortgage. We see that Mortgage debt trended upward until the 2008 financial crisis where it began to trend downward. However around 2014 it began to trend upward again and has since surpassed the level it was at before 2008.

In order to get a better look at the debt measures which are squashed on the bottom, let’s only graph those 5 measures:

ggplot(debt, aes(x = Date)) +
  geom_line(aes(y = `HE Revolving`, color = "HE Revolving")) +
  geom_line(aes(y = `Auto Loan`, color = "Auto Loan")) +
  geom_line(aes(y = `Credit Card`, color = "Credit Card")) +
  geom_line(aes(y = `Student Loan`, color = "Student Loan")) +
  geom_line(aes(y = `Other`, color = "Other")) +
  labs(title = "Debt Measures Over Time",
       x = "Date",
       y = "Debt in Trillions") +
  scale_color_discrete(name = "Measure of Debt") +
  theme_minimal()

Now we can more clearly see the overall trends in the data. For instance, we see that student loan debt has risen linearly and outpaced all other measures. We can also see that auto loans were stable until roughly 2012 when they started to increase at a similar rate to student loan debt.

Visualizing Part-Whole Relationships

We can observe some interesting similarities and differences among the measures of debt. For instance, let’s isolate Auto Loans and Student Loans from 2012 onward and compare their curves:

subset_debt <- subset(debt, Date >= as.Date("2012-01-01"))

ggplot(subset_debt, aes(x = Date)) +
  geom_line(aes(y = `Auto Loan`, color = "Auto Loan")) +
  geom_line(aes(y = `Student Loan`, color = "Student Loan")) +
  labs(title = "Debt Measures Over Time",
       x = "Date",
       y = "Debt in Trillions") +
  scale_color_discrete(name = "Measure of Debt") +
  theme_minimal()

We see that the two lines almost perfectly match each other in rate of increase (slope) over the course of those 10 years, with auto loans being offset above student loans.

I also noticed that credit card and mortgage appeared to have similar overall shapes, although there magnitude’s were very different. Let’s scale down mortgage by a factor of 10 and then graph it compared to credit card debt over time:

debt$Mortgage_scaled <- debt$Mortgage / 10

ggplot(debt, aes(x = Date)) +
  geom_line(aes(y = `Credit Card`, color = "Credit Card")) +
  geom_line(aes(y = Mortgage_scaled, color = "Mortgage scaled")) +
  labs(title = "Debt Measures Over Time",
       x = "Date",
       y = "Debt in Trillions") +
  scale_color_discrete(name = "Measure of Debt") +
  theme_minimal()

We see that generally, when credit card debt increases so does mortgage. Both increased prior to 2008, decreased after to 2012, and then began generally increasing until 2019.