Malfeasance in the financial industry has been an area of controversy in America and the world over the past several years, from misleading homeowners and investors during the Great Recession to more recent scandals with Wells Fargo. As a consumer, I myself have recently left one bank after one too many hidden fees revealed themselves. I was thus interested in studying data around what types of issues consumers in the US lodge complaints about most often, and what that data can tell us.
This project involved exploring the U.S. Consumer Financial Protection Bureau's (CFPB) Consumer Complaints database, and relating trends in the data to actual mortgage default activity. This database has tracked characteristics of consumer complaints to the CFPB, including type of financial product, institution that was the target of the complaint, and state in which the complaint was generated. The database started collecting information in Nov 2011, and has since documented over 650K complaints in its 240MB database.
I have paired this data with S&P's First Mortgage Default Index which tracks monthly defaults of first mortgages to compare activity in the complaint database to mortgage defaults in the US.
The key result from this analysis is that the percentage of all complaints that relate to mortgages is statistically signifcantly correlated to the mortgage default index; over the period studied (past 5 years), both mortgage defaults and the percentage of complaints related to mortgages have decreased at similar rates over time.
par(mar=c(5,4,4,5)+.1)
plot(sp_comp$Date,sp_comp$Index,type="l",col="red",xlab="Date",ylab="Default Index",
main="Mortgage Default Index and Mortgage Complaint % Over Time")
par(new=TRUE)
plot(sp_comp$Date, sp_comp$Percent,type="l",col="blue",xaxt="n",yaxt="n",xlab="",ylab="")
axis(4)
mtext("Mortgage Complaints as % Overall Complaints",side=4,line=3)
legend("topright",col=c("red","blue"),lty=1,legend=c("Default Index","Complaint %"))
To complete my analysis, I first ran exploratory analysis on the Consumer Complaints database. Key findings include:
compyrplot
cosplot
compprodplot
plot1
I then joined the Consumer Complaints database to S&P's First Mortgage Default Index to identify how consumer compaints may relate to mortgage defaults. First, I regressed the percent of mortgage complaints and determined there was a signficant relationship.
plot(sp_comp$Index,sp_comp$Percent,main="Mortgage Percent of Complaints by Default Index")
abline(sp_model)
Call:
lm(formula = sp_comp$Percent ~ sp_comp$Index)
Residuals: Min 1Q Median 3Q Max -0.181796 -0.041419 -0.008645 0.028899 0.158355
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.404e-04 2.805e-02 0.005 0.996
Signif. codes: 0 '**' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.06858 on 57 degrees of freedom Multiple R-squared: 0.7587, Adjusted R-squared: 0.7544 F-statistic: 179.2 on 1 and 57 DF, p-value: < 2.2e-16
I then looked to determined whether a year over year change in the montly index was related to a year over year change in the percentage of mortgage compaints. This would let us know if the changes in values followed a pattern (for instance, if larger changes in one value resulted in similarly large changes in another). However, the pattern here, while positive, was weak and not statistically signficant.
plot(sp_yoy$YoYChange, sp_yoy$YChange,main = "Year over Year Monthly Change in Default Index by Change in Mortgage Complaint %",xlab="Mortgage Complaint % Change",ylab="Default Index Change")
abline(spyoy_model)
Call: lm(formula = sp_yoy$YChange ~ sp_yoy$YoYChange)
Residuals: Min 1Q Median 3Q Max -0.23732 -0.06572 0.02544 0.05407 0.15616
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05175 0.02873 -1.801 0.0782 .
Signif. codes: 0 '**' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09401 on 46 degrees of freedom Multiple R-squared: 0.02361, Adjusted R-squared: 0.002385 F-statistic: 1.112 on 1 and 46 DF, p-value: 0.2971
In summary, mortgage defaults and the percentage of complaints that are related to mortgages are found to be highly correlated. This result suggests that as the mortgage crisis abated, consumers shifted their financial problems away from mortgages to focus on other products. The Consumer Complaints database is a valuable dataset, but is still fairly young, and will benefit from more data collection when it represents a 'steady state' stage rather than its current growth stage of data collection.
There are more interesting analyses to be performed on this database. Complaints about specific companies may be linked to stock price movements or settlement amounts for malfeasance. Per capita analyses can find which states have the largest complaints per person on average. Overall, I believe if maintained the Consumer Complaints database can lend valuable insights into what is troubling American financial consumers well into the future.