Exploring the Data from The CFPB’s National Financial Well-Being Survey

Author

Annet Isa

Source: Consumer Financial Protection Bureau

Introduction

In 2015, the Consumer Financial Protection Bureau, the federal agency responsible for the protection of consumers in the financial sector, published a scale to measure individual financial well-being. The CFPB describes financial well-being as “having financial security and financial freedom of choice, in the present and in the future”(1). In 2017, the CFPB published the results of a national survey of 6,394 adults that measured the financial well-being of a population sample of the nation. The survey included “measures of individual and household characteristics, income and employment, savings and safety nets, financial experiences, and behaviors, skills and attitudes that have been hypothesized to influence adults’ levels of financial well-being”(2).

The resulting dataset is rich for mining and could yield answers to questions such as:

How does one’s socioeconomic class affect financial well-being?
Does a high income guarantee a high level of financial well-being?
How does age impact one’s financial well-being?

I am curious to see if financial well-being is equal to a high income. Can you have a high financial well-being score if you are not in possession of a high income?

Beginning Steps

#load relevant libraries
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggthemes)

Warning: package 'ggthemes' was built under R version 4.3.3

#I first opened the R stub code the CFPB provided and read the original NFWBS_PUF_2016_data.csv into that document. After remapping the factor values, I saved the resulting dataframe as a separate file. That is the file being read in below. 
cfpb_df <- readRDS("cfpb_df.rds")

I refer to the dataset’s codebook for explanations of the variable names(4).

The measured characteristics serve as categorical data. In addition to a CSV of the survey responses, the CFPB provided stub code to recode survey answers so they were suitable for quantitative exploration(3). The dataset required no cleaning.

Question

Are high self-scores on financial knowledge reflective of reality? Of the four variables below, which is the best predictor of financial well-being?

SUBKNOWL1: Respondents rated their overall financial knowledge on a scale of 1 - 7 (7 being the highest).
FSscore: An assigned score based on the respondent’s answer to questions about how they manage money.
LMscore: A score based on the respondent’s answers to three specific questions.
KHscore: A score based on the respondent’s answers to nine specific questions.

#Examining the relationship between the self-rate on overall financial knowledge and the financial well-being score

p1 <- ggplot(cfpb_df, aes(x = SUBKNOWL1, y = FWBscore)) +
  geom_point(alpha = 0.5) +
  geom_jitter() +
  theme_bw()
p1

m1 <- lm(FWBscore ~ SUBKNOWL1, data = cfpb_df)
summary(m1)


Call:
lm(formula = FWBscore ~ SUBKNOWL1, data = cfpb_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-69.398  -7.346  -0.294   7.654  54.810 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2163     0.6222   59.81   <2e-16 ***
SUBKNOWL1     4.0260     0.1284   31.36   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 13.18 on 6392 degrees of freedom
Multiple R-squared:  0.1334,    Adjusted R-squared:  0.1332 
F-statistic: 983.6 on 1 and 6392 DF,  p-value: < 2.2e-16

cor(cfpb_df$SUBKNOWL1, cfpb_df$FWBscore)

[1] 0.3651878

p3 <- ggplot(cfpb_df, aes(x = FSscore, y = FWBscore)) +
  geom_point(alpha = 0.5) +
  geom_jitter() +
  theme_bw()
p3

Filtering

6,394 observations leads to scatterplots that look like Rorschach blots. For this project, I will focus on the subjects with a financial well-being score of 70 or higher.

FWBS_high <- cfpb_df %>%
  filter(FWBscore >= 70)

hp1 <- ggplot(FWBS_high, aes(x = SUBKNOWL1, y = FWBscore)) +
  geom_point(alpha = 0.5) +
  geom_jitter() +
  theme_economist()
hp1

hm1 <- lm(FWBscore ~ SUBKNOWL1, data = FWBS_high)
summary(hm1)


Call:
lm(formula = FWBscore ~ SUBKNOWL1, data = FWBS_high)

Residuals:
   Min     1Q Median     3Q    Max 
-9.634 -5.410 -1.522  3.759 19.816 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  71.8468     1.0301  69.744  < 2e-16 ***
SUBKNOWL1     1.1125     0.1868   5.956 3.58e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.793 on 1005 degrees of freedom
Multiple R-squared:  0.03409,   Adjusted R-squared:  0.03313 
F-statistic: 35.47 on 1 and 1005 DF,  p-value: 3.576e-09

cor(FWBS_high$SUBKNOWL1, FWBS_high$FWBscore)

[1] 0.1846372

Analysis 1

When the data is limited to subjects with financial well-being scores of 70 or higher, there is even less of a correlation between the self-rating on financial knowledge and the financial well-being score. A correlation of 0.184 is close to 0, no correlation at all. The adjusted R-squared value of 0.033 says the model can explain only 3% of the displayed variability. The model equation

Financial Well-Being Score = 1.1125(Self-Rate Knowledge Score) + 71.847

FWBscore = 1.113(SUBKNOWL1) + 71.847

says a one unit increase in the self-rated financial knowledge score leads to an increase of 1.113 in the financial well-being score.

Diagnostic Plots

#load the relevant library
library(DataExplorer)
#create a smaller dataframe focusing on the 5 variables of interest
FWBS2 <- FWBS_high %>%
  select(FWBscore, SUBKNOWL1, FSscore, LMscore, KHscore)
#use DataExplorer to create a correlation plot
plot_correlation(FWBS2)

Analysis 2

The strongest correlations are between 1) the self-rating of financial knowledge & the financial skill score and 2) the graded KHscore & the graded LMscore. Relatively speaking, a correlation of (+/-) 0.50 is considered weak. Of the variables above, the financial skill score has the strongest correlation to financial well-being.

Out of curiosity, I did a plot correlation on the entire FWBS_high dataframe (1007 obs, 217 variables) to see if there were any interesting color clusters. I then selected a subset of 30 columns for exploration. If I had a wall-sized screen, it would be interesting to see which pairs of variables had the deepest hues.

plot_correlation(FWBS_high)

#Note to self, get a bigger monitor
FWBS3 <- FWBS_high %>%
  select(1:30)
#use DataExplorer to create a correlation plot
plot_correlation(FWBS3)

Main Visualization

Is there a relationship between income level and the financial well-being score?

#PPINCIMP is the houshold income from 1($20K) - 9(150K+)
FWBS_plot <- FWBS_high %>%
  group_by(FWBscore, PPINCIMP) %>%
  summarise(count = n())

`summarise()` has grouped output by 'FWBscore'. You can override using the
`.groups` argument.

library(highcharter)

Warning: package 'highcharter' was built under R version 4.3.3

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

hc_FWBS <- highchart() %>%
  hc_add_theme(
    hc_theme_ffx()
    ) %>%
  hc_chart(type = "column") %>%
  hc_title(text = "Distribution of Financial Well-Being Scores Across Income Brackets") %>%
  hc_caption(text = "Source: Consumer Financial Protection Bureau") %>%
  hc_xAxis(categories = unique(FWBS_plot$FWBscore),
           title = list(text = "Financial Well-Being Score")) %>%
  hc_yAxis(title = list(text = "Number of Respondents")) %>%
  hc_plotOptions(column = list(stacking = "normal")) %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "1") %>% 
      .$count,
    name = "Less than $20,000") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "2") %>% 
      .$count,
    name = "$20,000 - $29,999") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "3") %>% 
      .$count,
    name = "$30,000 - $39,999") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "4") %>% 
      .$count,
    name = "40,000 - $49,999") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "5") %>% 
      .$count,
    name = "$50,000 - $59,999") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "6") %>% 
      .$count,
    name = "$60,000 - $74,999") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "7") %>% 
      .$count,
    name = "$75,000 - $99,999") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "8") %>% 
      .$count,
    name = "$100,000 - $149,999") %>%
  hc_add_series(
    data = FWBS_plot %>% 
      filter(PPINCIMP == "9") %>% 
      .$count,
    name = "$150,000+") %>%
  hc_legend(title = list(text= "Income Brackets")) %>%
  hc_tooltip(
    headerFormat = "<b>Score: {point.x}</b><br>",
    pointFormat = "{series.name}: {point.y} <br>Total: {point.stackTotal}"
  )  

hc_FWBS

The visualization suggests there is a relationship. Interactivity makes it easy to see how as income grows, the financial well-being increases. It is not a strictly linear relationship - the highest financial well-being scores (95) are in the second-highest income bracket ($100,000 - $149,999).

As you add or subtract income brackets from the columns, the column total automatically updates.

Conclusions

I was surprised at how little impact financial knowledge had on financial well-being. In our capitalistic society, money equals choice, one of the pillars of financial well-being according to the CFPB. I hope the CFPB continues to publish these national surveys. I wonder how the pandemic and inflation have changed the national financial well-being scores (if at all).

Citations

“Measuring financial well-being: A guide to using the CFPB Financial Well-Being Scale” https://files.consumerfinance.gov/f/201512_cfpb_financial-well-being-user-guide-scale.pdf
“National Financial Well-Being Survey: Public Use File User’s Guide” https://files.consumerfinance.gov/f/documents/cfpb_nfwbs-puf-user-guide.pdf
“Financial well-being survey data” https://www.consumerfinance.gov/data-research/financial-well-being-survey-data/
“National Financial Well-Being Survey: Public Use File Codebook” https://files.consumerfinance.gov/f/documents/cfpb_nfwbs-puf-codebook.pdf