Instead of using hashtags (#) I decided to find the tweets sent directly to the company’s being studied by using the @ + Bank name (@ + TDBank_US, PNCBank). While only 850 tweets out of 1000 could be pulled from TD Bank, the difference was not substantial and the data sets were cleaned and readied for analysis.
td_df <- twListToDF(td)
td_df$statusSource = substr(td_df$statusSource,
regexpr('>', td_df$statusSource) + 1,
regexpr('</a>', td_df$statusSource) - 1 )
td_wrds <- td_df %>% filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% c("@tdbank","rt", stop_words$word),
str_detect(word, "[a-z]"))
pnc_df <- twListToDF(pnc)
pnc_df$statusSource = substr(pnc_df$statusSource,
regexpr('>', td_df$statusSource) + 1,
regexpr('</a>', td_df$statusSource) - 1 )
pnc_wrds <- pnc_df %>% filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% c("@pncbank","rt", "https", stop_words$word),
str_detect(word, "[a-z]"))
Social Media As Customer Service
I work for a bank, which means every year I will never fail to get an email touting our ranking in the JD Power standings across our footprint. JD Power does ratings for a number of industries, including financial institutions, based on set criteria. The one that interested me was Customer Service where PNC Bank was the #1 Institution nationally. I decided to spend this assignment comparing the twitter presence of PNC Bank to my employer, TD Bank.
Data Collection
Instead of using hashtags (#) I decided to find the tweets sent directly to the company’s being studied by using the @ + Bank name (@ + TDBank_US, PNCBank). While only 850 tweets out of 1000 could be pulled from TD Bank, the difference was not substantial and the data sets were cleaned and readied for analysis.
Company WordClouds: High Level
To get an idea of each bank’s brand identity on twitter, let’s see what everyone is saying to them. Below are wordclouds that were constructed using the text of each tweet collected for the company. Words in the center appear more frequent than those at the end. We can already see many differences and similarities by taking in this high level view of the data pulled from twitter. You’ll notice a heavy focus on local sports teams seems to be one of the main themes for both of them.
TD
PNC
Daily Sentiment in November
Let’s look at what the overall emotional feel was for each company through the last week or so. The bar graphs below show the sum of the overall sentiment (positive words - negative words) for each day’s tweets. The bing sentiment set was used as it provides easy to work with binary categories. Looks like most of this month has been calculated to be net positive when it comes to their tweets. Of course we’d have to get into the nitty-gritty of which words are being evaluated to be sure this just isn’t happening by chance.
The Effects of Retweeting
So far we’ve been treating each tweet equally, but the reality is that many tweets are shared through retweeting throughout long time periods. It’s probaby best we observe any sentiment differences that exist between retweeted tweets and 1x tweets.
Daily Sentiment
Looking at the daily chart with retweets seperate, we can see that their overall tone is generally more positive based on the analysis. Now remember that the chart magnifies any retweets on a given day, so if one tweet had a score of 4 and was shared 6 times we’d have a +24 as part of the daily sum. That being said we shouldn’t discount this as retweets are what strengthens a company’s core brand on twitter. It appears that TD Bank may actually get an advantage from retweets just based on the lower volume it has compared to PNC.
Contribution to Sentiment
The charts below show the words that contribute the most to both positive and negative retweet sentiment. You’ll notice right away that the two charts have a very common word at the top. Check out the words categorized as negative, you’ll see that TD has “fierce” towards the top. Now we’re not given the context but based on the heavy sports message we saw before we can guess this is probably related to a game or team. This is the danger of relying to heavily on sentiment analysis without customizing sentiments to your field of study.
Word Associations
The frequency of the word “support” and “supporting” on both of the company’s sentiment contribution chart means that we’ve found a buzz word that permeates quite a few of the communications relate to the institution. We’re going to dive into the qdap and tm packages to do a word association chart to find out what these banks are supporting. Below you can see the different words that appear most frequently with either “support” or “Supporting”. It’s easy to see how they differ when it comes to the causes they support.
TD “Support” Associations
PNC “Supporting” Associations
Conclusion
Well there’s no way we’d find PNC’s magic sauce from only 1,000 tweets but we’re off to a good start. We can see from their wordclouds and buzz words how they get their message to be unique and stick. TD also has its own unique brand and regional differences can definitely be seen in the analysis. We’re in a brave new world with the advent of text mining analysis, it’s amazing how much information we can attain at the click of a button.