Tasks

1. Debt over time

scf.2016.stats.2 <- scf.2016 %>%
  group_by(YEAR) %>%
  summarise_at(vars(PIRMORT, PIRTOTAL, DEBT2INC, LEVRATIO), list(mean = mean, median = median, total = sum)) 
p1 <- ggplot(data = scf.2016.stats.2, aes(x = YEAR)) +
  geom_smooth(aes(y = LEVRATIO_median, color = "Median debt to assets")) +
  geom_smooth(aes(y = DEBT2INC_median, color = "Median debt to income"))
p1 +
  theme_wsj() +
  scale_color_jcolors(palette = "pal6") +
  scale_x_continuous(breaks=seq(1990, 2015, 5)) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(plot.caption = element_text(size = 8),
        plot.title = element_text(size=18),
        legend.title=element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Median ratio of debt to assets and income")

Plot 1 is a fitted curve of the median debt-to-income ratio and the debt-to-assets ratio over time. Both ratios are for all debt, not only student loan debt, which shows the historical changes to household debt as a whole. The debt-to-income ratio grew until the 2008 recession, and has been shrinking since then. However the debt-to-assets ratio has grown steadily over the same time, somewhat flattening in recent years.

scf.2016.stats.8 <- scf.2016 %>%
  group_by(YEAR) %>%
  summarise_at(vars(NH_MORT, CCBAL, EDN_INST, VEH_INST, OTHLOC), list(mean = mean)) %>%
  gather(NH_MORT_mean:OTHLOC_mean, key = "type", value = "mean") %>%
  arrange(type, YEAR)
scf.2016.stats.9 <- scf.2016.stats.8 %>%
  mutate(type = case_when(type == "NH_MORT_mean" ~ "Mortgage", 
                          type == "CCBAL_mean" ~ "Credit card", 
                          type == "EDN_INST_mean" ~ "Student loan", 
                          type == "VEH_INST_mean" ~ "Car payment", 
                          type == "OTHLOC_mean" ~ "Other"))
p2 <- ggplot(scf.2016.stats.9, aes(x = YEAR, 
                             y = mean,
                             fill = fct_reorder(type, mean))) + 
  geom_area() 
p2i <- p2 +
  theme_wsj() +
  scale_fill_viridis_d() +
  scale_x_continuous(breaks=seq(1990, 2015, 5)) +
  scale_y_continuous(labels = dollar) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average household debt")
p2i

Plot 2 is a stacked area chart of average debt over time, color-coded for type of debt. This allows comparison between different types of debt, both in raw numbers and proportionally. For instance, although the average household debt has decreased since 2008, student loans grew in both raw numbers and as a proportion of the total debt, although they still account for only a small fraction of that debt.

I would recommend plot 2 over plot 1, since it is easier for readers to understand dollar values than it is to intrpret ratios, and better highlights the historical patterns of both student debt and debt as a whole.

2. Tell me who you are

scf.2016.only <- filter(scf.2016, YEAR == 2016)
scf.2016.only.2 <- scf.2016.only %>%
  mutate(ed_rat = EDN_INST / DEBT) %>%
  replace_na(list(ed_rat = 0))
scf.2016.only.3 <- scf.2016.only.2 %>%
  mutate(AGECL.cat = as.character(AGECL),
         HHSEX.cat = as.character(HHSEX),
         EDCL.cat = as.character(EDCL),
         FAMSTRUCT.cat = as.character(FAMSTRUCT),
         HOUSECL.cat = as.character(HOUSECL),
         MARRIED.cat = as.character(MARRIED),
         RACE.cat = as.character(RACE))
scf.2016.only.4 <- scf.2016.only.3 %>%
  mutate(KIDS.cat = case_when(
    KIDS > 0 ~ 1,
    TRUE     ~ 0)) %>%
  mutate(KIDS.cat = as.character(KIDS.cat)) %>%
  mutate(gen_kids = case_when(HHSEX.cat == "1" & KIDS.cat == "1" ~ "Man with children", 
                              HHSEX.cat == "1" & KIDS.cat == "0" ~ "Man without children",
                              HHSEX.cat == "2" & KIDS.cat == "1" ~ "Woman with children",
                              HHSEX.cat == "2" & KIDS.cat == "0" ~ "Woman without children")) %>%
  mutate(KIDS.cat.2 = case_when(KIDS.cat == "0" ~ "No children",
                                KIDS.cat == "1" ~ "Children")) %>%
  mutate(HHSEX.cat.2 = case_when(HHSEX.cat == "1" ~ "Men",
                                 HHSEX.cat == "2" ~ "Women"))
scf.2016.only.educ.gen_kids.2 <- scf.2016.only.4 %>%
  group_by(EDUC, KIDS.cat.2, HHSEX.cat.2) %>%
  summarise_at(vars(PIRMORT, PIRTOTAL, DEBT2INC, LEVRATIO, DEBT, EDN_INST, ed_rat), list(mean = mean, median = median, total = sum))
p3 <- ggplot(scf.2016.only.educ.gen_kids.2, aes(x = EDUC, y = EDN_INST_mean, group = KIDS.cat.2, color = KIDS.cat.2)) +
  geom_line(lwd = 1.5) +
  xlim(7, NA) +
  facet_wrap(~HHSEX.cat.2)
p3 +
  theme_wsj() +
  scale_color_jcolors(palette = "pal6") +
  scale_y_continuous(labels = dollar) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  xlab("Years of education") +
  theme(axis.title = element_text(size = 14),
        axis.title.y = element_blank(),
        plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average student loan debt")

Plot 3 is a line chart of household debt for 2016 only, comparing the average student loan debt over years of education for heads of household both with and without children, for both men and women. Predictably, student loan debt increases after high school (10 years), and is slightly higher for parents of both genders. However, women’s student debt rises much faster than men’s; the difference is not so apparent at the undergraduate level (12 years), but at the doctoral level women’s debt is more than double men’s.

scf.2016.only.5 <- scf.2016.only.4 %>%
  mutate(MARRIED.cat.2 = case_when(
    MARRIED.cat == "1" ~ "Married", 
    MARRIED.cat == "2" ~ "Single"))
scf.2016.only.educ.gen_marr.2 <- scf.2016.only.5 %>%
  group_by(EDUC, MARRIED.cat.2, HHSEX.cat.2) %>%
  summarise_at(vars(PIRMORT, PIRTOTAL, DEBT2INC, LEVRATIO, DEBT, EDN_INST, ed_rat), list(mean = mean, median = median, total = sum))
p4 <- ggplot(scf.2016.only.educ.gen_marr.2, aes(x = EDUC, y = EDN_INST_mean, group = MARRIED.cat.2, color = MARRIED.cat.2)) +
  geom_smooth(se = FALSE, lwd = 1.5) +
  xlim(7, NA) +
  facet_wrap(~HHSEX.cat.2)
p4 +
  theme_wsj() +
  scale_color_jcolors(palette = "pal6") +
  scale_y_continuous(labels = dollar) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  xlab("Years of education") +
  theme(axis.title = element_text(size = 14),
        axis.title.y = element_blank(),
        plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average student debt")

Plot 4 is similar except that is displays a fitted curve instead of a line, and it compares average student debt by marriage, not parenthood. Here the pattern differs by gender. For men, being married increases student debt until the graduate level (13 years), after which is goes down. This suggests that married men’s education is subsidized by their spouses. For women, marriage only increases student loan debt after starting college (9 years), and only drops below the rate for single women at the graduate level (13 years), but only by a smakk margin. Once again, the average student loan debt for women is higher than for men, increasing sharply after the start of college (9 years).

scf.2016.only.7 <- scf.2016.only.5 %>%
  mutate(EDUC_cat = case_when(
    EDUC <= 7  ~ "None", 
    EDUC == 8  ~ "High school",
    EDUC == 9  ~ "High school",
    EDUC == 10 ~ "Associate's",
    EDUC == 11 ~ "Associate's",
    EDUC == 12 ~ "Bachelor's",
    EDUC == 13 ~ "Master's",
    EDUC == 14 ~ "Doctorate",
    TRUE       ~ "other")) %>%
  filter(!(EDUC_cat == "other")) %>%
  mutate(EDUC_fac = factor(EDUC_cat, levels = c("None", "High school", "Associate's", "Bachelor's", "Master's", "Doctorate")))
scf.2016.only.educ_fac <- scf.2016.only.7 %>%
  group_by(EDUC_fac) %>%
  summarise_at(vars(NH_MORT, CCBAL, EDN_INST, VEH_INST, OTHLOC), list(mean = mean)) %>%
  gather(NH_MORT_mean:OTHLOC_mean, key = "type", value = "mean") %>%
  arrange(type, EDUC_fac) %>%
  mutate(type = case_when(type == "NH_MORT_mean" ~ "Mortgage", 
                          type == "CCBAL_mean" ~ "Credit card", 
                          type == "EDN_INST_mean" ~ "Student loan", 
                          type == "VEH_INST_mean" ~ "Car payment", 
                          type == "OTHLOC_mean" ~ "Other"))
p5 <- ggplot(scf.2016.only.educ_fac, aes(fill = fct_reorder(type, mean), y = mean, x = EDUC_fac)) + 
  geom_bar(position="stack", stat="identity")
p5i <- p5 +
  theme_wsj() +
  scale_fill_jcolors(palette = "pal7") +
  scale_y_continuous(labels = dollar) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average debt by highest degree earned")
p5i

Plot 5 is a stacked bar chart of average debt for 2016 only by highest degree obtained, color-coded for type of debt. It is similar to the second chart, but groups the debt by categorical level of education instead of by continuous years. Once again this presentation facilitates comparing different types of debt in both raw numbers and proportion. As expected, the average total debt increases with each level of education up to the doctoral level. Debt jumps significantly from a two-year degree to a four-year degree, and again from a bachelor’s degree to a master’s degree, likely due to the average age difference between degree-holders of each type. Interestingly, the amount of student loan debt specifically is the same for both associate’s degrees and bachelor’s dgrees, and again the same for both master’s degrees and PhDs.

I would recommend using plot 5, along either plot 3 or 4. While years of education is a somewhat difficult variable to understand, it is important to highlight demographic patterns.

3. Wealth and income distribution

scf.2016.2 <- scf.2016 %>%
  mutate(mort_rat = NH_MORT / DEBT,
         cred_rat = CCBAL / DEBT,
         ed_rat = EDN_INST / DEBT,
         car_rat = VEH_INST / DEBT) %>%
  replace_na(list(mort_rat = 0,
                  cred_rat = 0,
                  ed_rat = 0,
                  car_rat = 0)) %>%
  mutate(INCCAT.cat = as.character(INCCAT),
         NWCAT.cat = as.character(NWCAT)) %>%
  mutate(INCCAT.cat.2 = case_when(
    INCCAT.cat == "1" ~ "0-19", 
    INCCAT.cat == "2" ~ "20-39",
    INCCAT.cat == "3" ~ "40-59",
    INCCAT.cat == "4" ~ "60-79",
    INCCAT.cat == "5" ~ "80-99",
    INCCAT.cat == "6" ~ "80-9")) %>%
  mutate(INCCAT.cat.2 = factor(INCCAT.cat.2, levels = c("0-19", "20-39", "40-59", "60-79", "80-100"))) %>%
  mutate(NWCAT.cat.2 = case_when(
    NWCAT.cat == "1" ~ "0-25", 
    NWCAT.cat == "2" ~ "25-50",
    NWCAT.cat == "3" ~ "50-75",
    NWCAT.cat == "4" ~ "75-100",
    NWCAT.cat == "5" ~ "75-100")) %>%
  mutate(NWCAT.cat.2 = factor(NWCAT.cat.2, levels = c("0-25", "25-50", "50-75", "75-100")))
scf.2016.2.only <- scf.2016.2 %>%
  filter(YEAR == 2016)
scf.2016.stats.15 <- scf.2016.2.only %>%
  group_by(INCOME) %>%
  summarise_at(vars(DEBT, EDN_INST, ed_rat, DEBT2INC, LEVRATIO), list(mean = mean, median = median, total = sum))
p6 <- ggplot() + 
  geom_smooth(mapping = aes(x = scf.2016.stats.15$INCOME, y = scf.2016.stats.15$EDN_INST_mean), color = "#009E73") +
  geom_smooth(mapping = aes(x = scf.2016.stats.15$INCOME, y = scf.2016.stats.15$DEBT_mean/100), color = "#0072B2") + 
  scale_x_continuous(limits = c(0, 1000000), labels = dollar) +
  scale_y_continuous(name = "Student loans", labels = dollar,
                     sec.axis = sec_axis(~.*100, name = "Total debt", labels = dollar)) + 
  coord_cartesian(ylim = c(0, 7500))
 p6 + 
  theme_wsj() +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(axis.title = element_text(size = 12),
        axis.title.y = element_text(color = "#009E73"),
        axis.title.y.right = element_text(color = "#0072B2"), 
        axis.title.x = element_blank(),
        plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average debt by income")

Plot 6 is a fitted curve of the student loan debt and total debt averages by income up to $1 million. Because student loan debt is so much smaller than total loan debt, the plot has two y-axes. This makes it possible to see the patterns of both types of debt on the same graph: Student loan debt increases slightly up to around $300,000, after which it decreases sharply, while total debt increases steadily with income, flattening briefly around $500,000.

scf.2016.stats.18 <- scf.2016.2.only %>%
  group_by(NETWORTH) %>%
  summarise_at(vars(DEBT, EDN_INST, ed_rat, DEBT2INC, LEVRATIO), list(mean = mean, median = median, total = sum))
p7 <- ggplot() + 
  geom_smooth(mapping = aes(x = scf.2016.stats.18$NETWORTH, y = scf.2016.stats.18$EDN_INST_mean), color = "#009E73") +
  geom_smooth(mapping = aes(x = scf.2016.stats.18$NETWORTH, y = scf.2016.stats.18$DEBT_mean/20), color = "#0072B2") + 
  scale_x_continuous(limits = c(0, 1000000), labels = dollar) +
  scale_y_continuous(name = "Student loans", labels = dollar,
                     sec.axis = sec_axis(~.*20, name = "Total debt", labels = dollar)) + 
  coord_cartesian(ylim = c(0, 10000))
p7 +
  theme_wsj() +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(axis.title = element_text(size = 12),
        axis.title.y = element_text(color = "#009E73"),
        axis.title.y.right = element_text(color = "#0072B2"), 
        axis.title.x = element_blank(),
        plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average debt by net worth")

Plot 7 is similar to plot 6, only the x-axis is for net worth. By this measure, student loan debt follows a more nuanced pattern, with sveral local maxima and minima, but generally increasing exponentially up to $200,000, after which it decreases until $750,000. By contrast, total debt increases logarithmically as it approaches the $1 million cutoff, with only a slight dip around $500,000.

I recommend using plot 6, although the double axes might not be immediately clear to readers. I would only recommend using plot 7 as a comparison to plot 6, and not on its own, since the relationship of debt to net worth is also more difficult to interpret.

4. Going broke

scf.2016.bank <- filter(scf.2016.2, BNKRUPLAST5 == 1)
scf.2016.bank.stats <- scf.2016.bank %>%
  group_by(YEAR) %>%
  summarise_at(vars(PIRMORT, PIRTOTAL, DEBT2INC, LEVRATIO, DEBT, EDN_INST), list(mean = mean, median = median, total = sum))
p8 <- ggplot(scf.2016.bank.stats, aes(x = YEAR)) +
  geom_smooth(se = FALSE, aes(y = DEBT_mean, color = "Total debt")) + 
  geom_smooth(se = FALSE, aes(y = EDN_INST_mean, color="Student loans"))
p8 + 
  theme_wsj() +
  scale_color_jcolors(palette = "pal6") +
  scale_x_continuous(breaks=seq(1990, 2015, 5)) +
  scale_y_continuous(labels = dollar) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(plot.caption = element_text(size = 8),
        plot.title = element_text(size=18),
        legend.title=element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average debt in households \nwith a recent bankruptcy")

Plot 8 is a fitted curve of mean student loan debt and total debt over time for households that had a bankruptcy in the past 5 years. It shows that while the average total debt for recently bankrupt households has gone up and down over the years, the average student loan debt for these households has steadily increased.

scf.2016.bank.stats.2 <- scf.2016.bank %>%
  group_by(YEAR) %>%
  summarise_at(vars(NH_MORT, CCBAL, EDN_INST, VEH_INST, OTHLOC), list(mean = mean)) %>%
  gather(NH_MORT_mean:OTHLOC_mean, key = "type", value = "mean") %>%
  arrange(type, YEAR) %>%
  mutate(type = case_when(type == "NH_MORT_mean" ~ "Mortgage", 
                          type == "CCBAL_mean" ~ "Credit card", 
                          type == "EDN_INST_mean" ~ "Student loan", 
                          type == "VEH_INST_mean" ~ "Car payment", 
                          type == "OTHLOC_mean" ~ "Other"))
p9 <- ggplot(scf.2016.bank.stats.2, aes(x = YEAR, 
                             y = mean,
                             fill = fct_reorder(type, mean))) + 
    geom_area()
p9 +
  theme_wsj() +
  scale_fill_viridis_d() +
  scale_x_continuous(breaks=seq(1990, 2015, 5)) +
  scale_y_continuous(labels = dollar) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Average debt in households \nwith a recent bankruptcy")

Plot 9 is a stacked area chart of average debt over time for households that had a bankruptcy in the past 5 years, color-coded for type of debt. It is the same as plot 2, but only for those households with recent bankruptcies. Once again this allows for easy comparison between different types of debt. In this case, while the average debt of recently bankrupt households has flatlined since 2010, their average student loan debt has steadily increased.

scf.2016.bank.2 <- scf.2016.2 %>%
  mutate(bank.cat = case_when(BNKRUPLAST5 == 0 ~ "No recent bankruptcy", 
                              BNKRUPLAST5 == 1 ~ "Recent bankruptcy",
                              TRUE             ~ "unknown")) %>%
  filter(!(bank.cat == "unknown")) %>%
  mutate(food_all = (FOODHOME + FOODDELV + FOODAWAY)) %>%
  mutate(FOODDELV.pct = (FOODDELV/food_all)) %>%
  mutate(FOODAWAY.pct = (FOODAWAY/food_all)) %>%
  rationalize() %>%
  replace_na(list(FOODDELV.pct = 0,
                  FOODAWAY.pct = 0))
scf.2016.bank.stats.7 <- scf.2016.bank.2 %>%
  group_by(YEAR, bank.cat) %>%
  summarise_at(vars(FOODDELV.pct, FOODAWAY.pct), list(mean = mean, median = median, total = sum))
p10 <- ggplot(scf.2016.bank.stats.7, aes(x = YEAR, y = FOODDELV.pct_mean, group = bank.cat, color = bank.cat)) + 
  geom_line(lwd = 1.5) +
  xlim(2004, NA)
p10 +
  theme_wsj() +
  scale_color_jcolors(palette = "pal6") +
  scale_y_continuous(labels = percent) +
  labs(caption = "Source: 2016 Survey of Consumer Finances (SCF)") +
  theme(plot.caption = element_text(size = 8),
        plot.title = element_text(size = 18),
        legend.title = element_blank(),
        panel.background = element_rect(fill = NA),
        panel.ontop = TRUE) +
  ggtitle("Food budget spent on delivery")

As requested, my final static plot concerns food spending and bankruptcies. Plot 10 is a line chart showing the percentage of the average food budget spent on delivery for both households that recently experienced bankruptcy and those that did not. While recently bankrupt households did have a small spike of delivery spending in 2007, it has been decreasing overall since then. For non-bankrupt households, it has also been decreasing steadily, with a small uptick in 2016. It is important to note that this data is only available starting in 2004, and as such there is not enough data to calculate a reliable fitted line or curve. In addition, all of these changes are with 1% and 3% of the total food budget.

I would recommend against using plot 10, and also recommend using plot 9 over plot 8. Plot 9 conveys a great deal of information clearly and efficiently, especially if displayed alongside plot 2 to highlight the differences in debt composition between and bankrupt-only households and households in general. However, in the absence of a deeper discussion on bankruptcy, plot 8 should suffice.

Interactivity

5. Make two plots interactive

ggplotly(p2i)
ggplotly(p5i)

Plots 2 and 5 would benefit from interactive online versions, since stacked plots can be difficult to interpret. Being able to hover over each type of debt for a given year or category would allow readers to further explore the data and reliably compare values that do not begin at the same point on the y-axis.

6. Data Table

scf.2016.stats.9.p <- scf.2016 %>%
  group_by(YEAR) %>%
  summarise_at(vars(DEBT, NH_MORT, CCBAL, EDN_INST, VEH_INST, OTHLOC), list(mean = mean)) %>%
  rename("All debt mean" = "DEBT_mean",
         "Mortgage mean" = "NH_MORT_mean",
         "Credit card mean" = "CCBAL_mean",
         "Student loan mean" = "EDN_INST_mean",
         "Car payment mean" = "VEH_INST_mean",
         "Other debt mean" = "OTHLOC_mean")
datatable(scf.2016.stats.9.p)

This table shows the data used for plot 2, the mean values of all debt types for each year of the survey. Presenting the data in this way would allow readers to order the data by each data type, to see which years had the highest and lowest average student loan debt, for instance, compared to other types of debt.

