Exploring the Data from The CFPB’s National Financial Well-Being Survey
Author
Annet Isa
Source: Consumer Financial Protection Bureau
Introduction
In 2015, the Consumer Financial Protection Bureau, the federal agency responsible for the protection of consumers in the financial sector, published a scale to measure individual financial well-being. The CFPB describes financial well-being as “having financial security and financial freedom of choice, in the present and in the future”(1). In 2017, the CFPB published the results of a national survey of 6,394 adults that measured the financial well-being of a population sample of the nation. The survey included “measures of individual and household characteristics, income and employment, savings and safety nets, financial experiences, and behaviors, skills and attitudes that have been hypothesized to influence adults’ levels of financial well-being”(2).
The resulting dataset is rich for mining and could yield answers to questions such as:
How does one’s socioeconomic class affect financial well-being?
Does a high income guarantee a high level of financial well-being?
How does age impact one’s financial well-being?
I am curious to see if financial well-being is equal to a high income. Can you have a high financial well-being score if you are not in possession of a high income?
Beginning Steps
#load relevant librarieslibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
Warning: package 'ggthemes' was built under R version 4.3.3
#I first opened the R stub code the CFPB provided and read the original NFWBS_PUF_2016_data.csv into that document. After remapping the factor values, I saved the resulting dataframe as a separate file. That is the file being read in below. cfpb_df <-readRDS("cfpb_df.rds")
I refer to the dataset’s codebook for explanations of the variable names(4).
The measured characteristics serve as categorical data. In addition to a CSV of the survey responses, the CFPB provided stub code to recode survey answers so they were suitable for quantitative exploration(3). The dataset required no cleaning.
Question
Are high self-scores on financial knowledge reflective of reality? Of the four variables below, which is the best predictor of financial well-being?
SUBKNOWL1: Respondents rated their overall financial knowledge on a scale of 1 - 7 (7 being the highest).
FSscore: An assigned score based on the respondent’s answer to questions about how they manage money.
LMscore: A score based on the respondent’s answers to three specific questions.
KHscore: A score based on the respondent’s answers to nine specific questions.
#Examining the relationship between the self-rate on overall financial knowledge and the financial well-being scorep1 <-ggplot(cfpb_df, aes(x = SUBKNOWL1, y = FWBscore)) +geom_point(alpha =0.5) +geom_jitter() +theme_bw()p1
m1 <-lm(FWBscore ~ SUBKNOWL1, data = cfpb_df)summary(m1)
Call:
lm(formula = FWBscore ~ SUBKNOWL1, data = cfpb_df)
Residuals:
Min 1Q Median 3Q Max
-69.398 -7.346 -0.294 7.654 54.810
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2163 0.6222 59.81 <2e-16 ***
SUBKNOWL1 4.0260 0.1284 31.36 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 13.18 on 6392 degrees of freedom
Multiple R-squared: 0.1334, Adjusted R-squared: 0.1332
F-statistic: 983.6 on 1 and 6392 DF, p-value: < 2.2e-16
6,394 observations leads to scatterplots that look like Rorschach blots. For this project, I will focus on the subjects with a financial well-being score of 70 or higher.
hm1 <-lm(FWBscore ~ SUBKNOWL1, data = FWBS_high)summary(hm1)
Call:
lm(formula = FWBscore ~ SUBKNOWL1, data = FWBS_high)
Residuals:
Min 1Q Median 3Q Max
-9.634 -5.410 -1.522 3.759 19.816
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 71.8468 1.0301 69.744 < 2e-16 ***
SUBKNOWL1 1.1125 0.1868 5.956 3.58e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.793 on 1005 degrees of freedom
Multiple R-squared: 0.03409, Adjusted R-squared: 0.03313
F-statistic: 35.47 on 1 and 1005 DF, p-value: 3.576e-09
cor(FWBS_high$SUBKNOWL1, FWBS_high$FWBscore)
[1] 0.1846372
Analysis 1
When the data is limited to subjects with financial well-being scores of 70 or higher, there is even less of a correlation between the self-rating on financial knowledge and the financial well-being score. A correlation of 0.184 is close to 0, no correlation at all. The adjusted R-squared value of 0.033 says the model can explain only 3% of the displayed variability. The model equation
says a one unit increase in the self-rated financial knowledge score leads to an increase of 1.113 in the financial well-being score.
Diagnostic Plots
#load the relevant librarylibrary(DataExplorer)#create a smaller dataframe focusing on the 5 variables of interestFWBS2 <- FWBS_high %>%select(FWBscore, SUBKNOWL1, FSscore, LMscore, KHscore)#use DataExplorer to create a correlation plotplot_correlation(FWBS2)
Analysis 2
The strongest correlations are between 1) the self-rating of financial knowledge & the financial skill score and 2) the graded KHscore & the graded LMscore. Relatively speaking, a correlation of (+/-) 0.50 is considered weak. Of the variables above, the financial skill score has the strongest correlation to financial well-being.
Out of curiosity, I did a plot correlation on the entire FWBS_high dataframe (1007 obs, 217 variables) to see if there were any interesting color clusters. I then selected a subset of 30 columns for exploration. If I had a wall-sized screen, it would be interesting to see which pairs of variables had the deepest hues.
plot_correlation(FWBS_high)
#Note to self, get a bigger monitorFWBS3 <- FWBS_high %>%select(1:30)#use DataExplorer to create a correlation plotplot_correlation(FWBS3)
Main Visualization
Is there a relationship between income level and the financial well-being score?
#PPINCIMP is the houshold income from 1($20K) - 9(150K+)FWBS_plot <- FWBS_high %>%group_by(FWBscore, PPINCIMP) %>%summarise(count =n())
`summarise()` has grouped output by 'FWBscore'. You can override using the
`.groups` argument.
library(highcharter)
Warning: package 'highcharter' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
The visualization suggests there is a relationship. Interactivity makes it easy to see how as income grows, the financial well-being increases. It is not a strictly linear relationship - the highest financial well-being scores (95) are in the second-highest income bracket ($100,000 - $149,999).
As you add or subtract income brackets from the columns, the column total automatically updates.
Conclusions
I was surprised at how little impact financial knowledge had on financial well-being. In our capitalistic society, money equals choice, one of the pillars of financial well-being according to the CFPB. I hope the CFPB continues to publish these national surveys. I wonder how the pandemic and inflation have changed the national financial well-being scores (if at all).
Citations
“Measuring financial well-being: A guide to using the CFPB Financial Well-Being Scale” https://files.consumerfinance.gov/f/201512_cfpb_financial-well-being-user-guide-scale.pdf
“National Financial Well-Being Survey: Public Use File User’s Guide” https://files.consumerfinance.gov/f/documents/cfpb_nfwbs-puf-user-guide.pdf