I was planning on using complaints from the Consumer Financial Protection Bureau as part of my data. It might only serve as a temporary placeholder or to inform some insight because after doing a little research I found a paper from 2013 on textual analysis in financial literature. This paper itself explores using a linearized phrase-structure model for identifying multiple semantic orientations in short economic articles. It gives some good guidance on sentiment analysis within financial/economic/business language and I think this may me to look at sentiment of consumers debt through articles in newspapers or magazines. I am also searching through some reddits to see if I can find useful discussions being had to analyze and possibly compare to official agency reports.
I also was able to find one paper on text analysis of debt collection calls. This may also be an avenue to consider as it can be valuable to understand what methods are most effective and ineffective in debt collection. This study used automatic speech recongnition, voice mining, natural language processing and machine learning to uncover that both moral appeal strategies and legal warning strategies work to increase repayment because they trigger either happy emotions or fear emotions but that one methods should be used consitently as both used together do nothing to decrease repayment times.
With the dataset I am currently exporing there is a great deal of information. There are 1,048,575 observations and 18 variables. I have been trying to get this directly from the website itself btu have been running into trouble when running my code. I believe this may be a simple issue of using chrome and not firefox.
summary(complaints)
## Date.received Product Sub.product Issue
## Length:1048575 Length:1048575 Length:1048575 Length:1048575
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Sub.issue Consumer.complaint.narrative Company.public.response
## Length:1048575 Length:1048575 Length:1048575
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Company State ZIP.code Tags
## Length:1048575 Length:1048575 Length:1048575 Length:1048575
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Consumer.consent.provided. Submitted.via Date.sent.to.company
## Length:1048575 Length:1048575 Length:1048575
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Company.response.to.consumer Timely.response. Consumer.disputed.
## Length:1048575 Length:1048575 Length:1048575
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Complaint.ID
## Min. : 1
## 1st Qu.:1644959
## Median :3202842
## Mean :2770479
## 3rd Qu.:3757252
## Max. :5238788
The variable I am mainly focused on exploring further is the Consumer Complaint Narrative. This variable gives me an opportunity to use some NLP methods. I want to attempt to pull out common themes from complaints among the same type of issue being complained about. The way this data is organized it will be fairly straightforward to pull out the entities who received/responded to complaints and gain insights into which are better at handling consumer complaints.
However as of now I am using this as a placeholder for a project. I am not very thrilled about the idea right now and am considering doing other more fun analyses projects. One idea I think would be fun is a compilation of cookbooks which all focus on a different cuisine. Finding common ingredients in dishes from around the world I think might be a fun activity and possibly be easier to begin working with text as data.