This report was provided in R markdown (format:html).

Q1 - What financial topics do consumers discuss on social media and what caused the consumers to post about this topic?

Analyzing the model generated topics using a clustering algorithm (seen below in large blue bars) reveals four categories.

They are:

  1. News & marketing
  2. Complaints about fees associated with banking
  3. Banking customer service complaints
  4. Overall Dissatisfaction with banks

The most frequently ocurring topic in the data was about stock trading news. The second most popular topic in the data was about a marketing campaign organized around the hashtag #getcollegeready. In aggregate, customer service related topics (clusters 2, 3, and 4 - In this report, subsequently referred to as Hot Topics.) accounted for approximately 15-20% of the data.

Click anywhere on the object below to collapse or expand sections/clusters or to get more information like high probability words and representative sample documents.

## Performing hierarchical topic clustering ... 
## Generating JSON representation of the model ...

Deliverable A - Describe your Approach and Methodology. Include a visual representation of your analytic process flow.

In order to address the questions raised by the challenge, multiple topic models using different algorithms were generated from data that had been filtered to include only relevant banks, cleaned, and then processed. Once topic models were evaluated for their usefulness, a model was selected. Visualizations were created to interpret the output of the model and used to explore in detail the topics identified. Once the customer-related topics were identified, analysis was focused on those topics. Bank name related sentiment of the records within each topic, correlation of bank names and frequent words was calculated, and the distribution of bank names across Hot Topics was explored.

Deliverable B - Discuss the data and its relationship to social conversation drivers.

In analyzing the relationship between the content of posts and what was driving the converstations taking place, a sentiment analysis was performed. First, a polarity score was calculated for all of the documents in the corpus used by our model. Then, in order to better understand the positive or negative relationship between posters and certain topics, individual topics were scored for sentiment as well. Now we can compare the sentiment, represented by the Standardized Polarity, of All Topics to that of the Hot Topics. Topics 3 & 19 (related to customer service/experience) contain the most relative negative sentiment.

##            Standardized Polarity
## All Topics            0.09322253
## Topic 2              -0.20194620
## Topic 3              -0.55033973
## Topic 6              -0.38922330
## Topic 12             -0.01211667
## Topic 19             -0.45077812

In order to explore further the relationship between topics, a plot was constructed to compare correlations between topics. You can select topics 3 and 19 to explore the relationship, and click on the plots to explore high probability words and representative (stemmed) samples.

*By selecting the classification variable ‘MediaType’ from the color drop down, we see whether the comment came from Twitter or FaceBook. Select Color=MediaType on the below visualization and explore the topics.

## Sampling 5000 documents for visualization.

Deliverable C - Document your code and reference the analytic process flow-diagram from deliverable A.

All R code files submitted separately, labeled as major stages of development. Code is commented for readability and understanding.

Code is also available at Github:

https://github.com/sco-lo-digital/WellsFargo_Analytics_Challenge.git

Question #2: Are the topics and “substance” consistent across the industry or are they isolated to individual banks?

Topics and substance are not consistent across industry. An initial investigation of topics identifies which bank names are prominent words in which topic.

To further explore the consistency of discussions the number of occurences for each bank name in each of the Hot Topics, the results were surprising. The table below shows that references to Bank B and Bank A we most prevalent in the Hot Topic discussions, followed by references to Bank D and then Bank C.

##                            Bank A Bank B Bank C Bank D
## Topic 2: Branch Cust Serv     394    460     57    190
## Topic 3: Overall Cust Serv    429    426    111    107
## Topic 6:ATM Fees              442    465     54    152
## Topic 12:Transaction Fees     289    523     56    203
## Topic 19:Cursing out Banks    433    447     33    164

Deliverable D - Create a list of topics and substance you found

A plain text representation of topics with associated words can be found in the appendix. For a summary of topics, see legend below.

Topic Legend

  • 1 - Credit card related

  • 2 - Branch customer service

  • 3 - Customer service in general

  • 4 - Events taking place at bank named arenas/stadiums

  • 5 - Financial stock upgrades/downgrades

  • 6 - Atm fees

  • 7 - Online market campaign #getCollegeReady

  • 8 - Bank mortgage litigation settlement news

  • 9 - Social commentary along with bank names mentioned(see T13)

  • 10 - Bank robbery news

  • 11 - Bank investment in companies

  • 12 - Transaction fees complaints

  • 13 - Social commentary along with bank names mentioned

  • 14 - Bank traders going to jail news

  • 15 - Bank named stadiums mentioned

  • 16 - Bank marketing programs targeted at “Main Street”

  • 17 - Bank earnings announcements

  • 18 - Bank responses to customer complaints

  • 19 - Cursing out banks

  • 20 - Bank named arena concert marketing

*Topics 5, 7, 11, 15, 18 appear to be more heavily discussed on Twitter than on FaceBook.

Deliverable E - Create a narrative of insights supported by the quantitative results (should include graphs or charts)

People take to social media to express their views on a wide variety of subjects that intersect with the financial industry. As you might recall from the Sentiment Table above, individuals in this data set appeared to be most negative about two topics, 3 and 19. For example, Topic 3, about customer service, was particularly negative. Below shows a plot of polarity for 1000 records in Topic 3.

plot(sentimentTopic3)

The extreme readings on this graph are noteworthy, but lack a framework for comparison. We can plot the sentiment for all topics to see just how negative Topic 3 really is in comparison. As you can see, the difference is not just marginal.

plot(sentimentAll)

What would be more useful, however, is to capture a polarity index over time as new data becomes available. In this way we could track the overall sentiment for certain topics and see whether customer attitudes were changing or even seasonal.

Nevertheless, the sentiment is just part of the story. Understanding the conversation at a deeper level can be done through exploring correlations among words. To demonstrate the usefulness on this approach, we can explore the words in Topic 3 that are highly correlated with Bank B, the name that most frequently came up in the Hot Topics.

Bank B, it turns out, is highly associated with the word “switch”" (likely an action that customers are considering), “phone”, “wait”, and “call”. Long wait times could certainly lead to customers venting their frustrations on social media.

A look at the dendogram below shows the relationship of similarities amongst words in Topic 3. Bank B’s close distance to words like “worst”,“horrible”, and “terrible” further emphasizes its high negative polarity and its frequency of occurence in Hot Topics. The outlier, Bank A, is depicted on the far right hand side, on a branch of its own.

In conclusion, the analyis shows that customer perceptions of Bank B appear to be more negative than that of the other banks. A key focus area for Bank B should be shortening customer call wait times. Bank A is also struggling with negative perceptions and comments. In fact, Banks A not only shows up with more frequency in Topic 3 (Overall customer service discussions), but it also has a more negative sentiment score within that topic compared to Bank B. Sentiment scores for Banks C and D within Topic 3 suggest that they are less negatively viewed, but still negative. All four banks need to improve customer perceptions and should continue to monitor social media sentiment and topic development.

Appendix

What follows are supplementary exhibits to enhance the report reader’s understanding of the author’s analytical process.

In selecting the optimal number of topics, 20 topics were chosen based on the below plot.

STM Model 3 of 4 was selected for its perceived optimal combination of semantic coherence and exclusivity.

Appendix

List of topics with Top Words in the following categories

## Topic 1 Top Words:
##       Highest Prob: card, call, credit, use, phone, number, debit 
##       FREX: card, credit, debit, app, purbankd, reward, replac 
##       Lift: android, bankbmobl, bankbquickc, cell, chipen, clone, internationbank 
##       Score: card, call, credit, debit, use, phone, number 
## Topic 2 Top Words:
##       Highest Prob: wait, ever, branch, person, two, teller, morn 
##       FREX: teller, line, car, commerci, branch, drive, bankbnew 
##       Lift: mngr, navi, saspa, bankbnew, car, clerk, commerci 
##       Score: ever, wait, branch, teller, worst, line, car 
## Topic 3 Top Words:
##       Highest Prob: time, custom, servic, chang, stop, anoth, sinc 
##       FREX: servic, custom, switch, poor, suck, absolut, horribl 
##       Lift: bankbupd, accountfraud, custom, idvni, moron, servic, switch 
##       Score: custom, servic, time, suck, chang, stop, bad 
## Topic 4 Top Words:
##       Highest Prob: will, day, today, come, great, love, last 
##       FREX: lot, build, park, veteran, communiti, tomorrow, friend 
##       Lift: breakfast, habitat, veteran, birthday, easter, lunch, mortgagefre 
##       Score: day, will, great, come, love, today, famili 
## Topic 5 Top Words:
##       Highest Prob: bankc, bankd, bit, rate, ift, buy, stock 
##       FREX: rate, ift, stock, neutral, reiter, overweight, initi 
##       Lift: analystcolor, bankcopen, csco, cso, dakota, groupon, hedg 
##       Score: bankc, rate, bit, ift, neutral, stock, bankd 
## Topic 6 Top Words:
##       Highest Prob: money, just, dont, tri, give, cant, never 
##       FREX: that, didnt, told, someon, enough, took, ask 
##       Lift: owe, boo, husband, huh, runaround, spoke, bother 
##       Score: money, just, dont, tri, cant, never, someon 
## Topic 7 Top Words:
##       Highest Prob: banka, contest, getcollegereadi, support, move, chicago, run 
##       FREX: contest, getcollegereadi, chicago, marathon, pack, prize, winner 
##       Lift: chariti, org, plannedparenthood, prize, runner, winner, bankske 
##       Score: banka, getcollegereadi, contest, chicago, marathon, pack, support 
## Topic 8 Top Words:
##       Highest Prob: pay, loan, mortgag, home, million, report, deal 
##       FREX: loan, mortgag, million, billion, settlement, unit, feder 
##       Lift: appeal, cfpb, circuit, civil, complianc, countrywid, fdic 
##       Score: mortgag, pay, loan, million, settlement, billion, home 
## Topic 9 Top Words:
##       Highest Prob: like, one, make, look, week, way, next 
##       FREX: place, there, coupl, mayb, way, week, presid 
##       Lift: parenthood, rack, anonym, pcmxln, hmu, interc, vacat 
##       Score: like, make, one, look, way, week, next 
## Topic 10 Top Words:
##       Highest Prob: bank, bankd, new, man, offic, post, citi 
##       FREX: rob, polic, robberi, suspect, search, hacker, south 
##       Lift: hillhurst, bankrol, chandler, deputi, male, search, sheriff 
##       Score: bank, bankd, new, robberi, rob, polic, man 
## Topic 11 Top Words:
##       Highest Prob: compani, market, news, say, corpor, goo, ceo 
##       FREX: news, corpor, ceo, appl, exclus, break, outlook 
##       Lift: structur, analys, appl, barcelona, bigdata, cartel, ceo 
##       Score: market, news, downgrad, goo, upgrad, corpor, compani 
## Topic 12 Top Words:
##       Highest Prob: account, check, charg, close, open, deposit, fee 
##       FREX: deposit, fee, cash, check, save, close, direct 
##       Lift: minus, paperless, consent, insuffici, legbank, save, fee 
##       Score: account, check, deposit, fee, charg, close, cash 
## Topic 13 Top Words:
##       Highest Prob: work, want, take, peopl, still, think, rebank 
##       FREX: lol, pretti, anybodi, work, far, rebank, rememb 
##       Lift: anybodi, bush, satan, pretti, henc, homosexu, everyday 
##       Score: work, peopl, want, take, think, rebank, still 
## Topic 14 Top Words:
##       Highest Prob: year, first, big, interest, world, employe, offer 
##       FREX: trader, dead, trust, own, tom, american, public 
##       Lift: conspir, libor, abcn, babe, bbcnew, chiefobi, contributor 
##       Score: year, first, big, trader, interest, former, offer 
## Topic 15 Top Words:
##       Highest Prob: bankb, rettwit, stadium, real, hire, school, student 
##       FREX: stadium, hire, charlott, panther, carolina, nfl, estat 
##       Lift: bankster, buyday, midfield, milan, nascar, pleyuk, undeclar 
##       Score: bankb, rettwit, stadium, panther, charlott, hire, financ 
## Topic 16 Top Words:
##       Highest Prob: bankd, vote, street, main, busi, mission, program 
##       FREX: vote, street, main, mission, program, small, learn 
##       Lift: photographi, main, mission, small, bakeri, cafe, full 
##       Score: mission, vote, main, street, swing, program, small 
## Topic 17 Top Words:
##       Highest Prob: manag, bankd, financi, asset, advis, wealth, stockport 
##       FREX: manag, financi, asset, advis, wealth, stockport, profit 
##       Lift: ebankc, manchest, onforb, ounc, virgin, weigh, manag 
##       Score: manag, stockport, asset, financi, wealth, advis, bankd 
## Topic 18 Top Words:
##       Highest Prob: thank, can, need, help, resp, pleas, know 
##       FREX: help, resp, pleas, feedbankb, donat, store, appreci 
##       Lift: checkmark, funer, hello, help, nearest, ollow, pleas 
##       Score: resp, thank, help, pleas, can, feedbankb, need 
## Topic 19 Top Words:
##       Highest Prob: get, now, got, good, fuck, right, job 
##       FREX: fuck, shit, ass, damn, interview, nigga, bitch 
##       Lift: dumb, bitch, bouta, dis, doe, fuckin, gettin 
##       Score: get, got, fuck, now, good, shit, hate 
## Topic 20 Top Words:
##       Highest Prob: center, see, photo, share, ticket, game, watch 
##       FREX: center, photo, ticket, event, arena, philadelphia, tour 
##       Lift: admiss, bankacnkotb, bankersaf, exhibit, hockey, meek, minnesota 
##       Score: center, photo, share, ticket, see, bankac, game