In preparation for tonight’s game I thought it would be appropriate to talk about beer. I initially started with my favorite beer, #Yuengling, but Twitter had too few responses to even get me 1,000 observations. Lesson to be learned, Yuengling, you need to be more active on social media, especially for the “cult” beer that you claim to be.
I employed the following packages to help me in my analysis:
library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(dplyr)
library(knitr)
library(wordcloud)
I set me keys and tokens (hidden) and then pulled in 1,000 tweets with hashtag beer (#beer)
I then converted it to a data frame that I could work with.
#retrieve Twitter data
num_tweets <- 1000
bt <- searchTwitter('#Beer', n = num_tweets)
#Convert to data frame
bt_df <- twListToDF(bt)
I looked at the top platforms used to post these tweets. Interestingly, over the past several days of running this analysis, the results have changed. Initially, Twitter Web Client was by far the primary source of #beer tweets. Coming into the weekend, you can see that Instagram has largely taken over the position as the largest source of #beer tweets.
So what causes this? Who uses the Twitter Web Client? My guess is that businesses and organizations who are focused on promoting beer and beer related events are much more active during the weekdays. Come the weekend, far more individual users are tweeting about their experiences, which are at much higher volumes on weekends.
#Tweets by platform
bt_df$statusSource = substr(bt_df$statusSource,
regexpr('>', bt_df$statusSource) + 1,
regexpr('</a>', bt_df$statusSource) - 1)
beer_platform <- bt_df %>% group_by(statusSource) %>%
summarize(n = n()) %>%
mutate(percent_of_tweets = 100 * n/sum(n)) %>%
arrange(desc(n)) %>% top_n(5)
#List platform
kable(beer_platform, digits = 1, align = 'c')
| statusSource | n | percent_of_tweets |
|---|---|---|
| 182 | 18.2 | |
| Twitter for iPhone | 169 | 16.9 |
| Twitter Web Client | 142 | 14.2 |
| Twitter for Android | 110 | 11.0 |
| IFTTT | 71 | 7.1 |
Below, we look at the platform use as a percentage of all tweets.
#Platform Chart
pf <- c("Instagram", "Twitter for iPhone", "Twitter for Android", "Twitter Web Client", "IFTTT")
pf_df <- beer_platform %>% filter(statusSource %in% pf)
ggplot(pf_df, aes(x = statusSource, y = percent_of_tweets, fill=statusSource)) +
geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette="Spectral") +
xlab("Platform") +
ylab("Percent of Tweets") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(legend.position = "none")
An additional piece of data that would help would be the ability to segregate individual users vs. corporate enterprises. I think this would prove or disprove my hypothesis
The top users support my claim that beer organizations are some of the most active tweeters. Which, if you are tweeting multiple times within the same couple hours of any given day, I hope you are a beer organization. If you are an individual tweeting 10+ times within 5 hours, I suggest you stop reading this analysis and seek superior resources.
#Top users
beer_user <- bt_df %>% group_by(screenName) %>% summarize(n = n()) %>% mutate(percent_of_tweets = 100 * n/sum(n)) %>% arrange(desc(n)) %>% slice(1:10)
kable(beer_user, digits = 1, align = 'c')
| screenName | n | percent_of_tweets |
|---|---|---|
| shop_beer | 37 | 3.7 |
| BeerBulletin | 35 | 3.5 |
| ISO_FT | 19 | 1.9 |
| BeerAndPizzaDay | 11 | 1.1 |
| BrewStuds | 7 | 0.7 |
| Ohio_Digital | 7 | 0.7 |
| CCCrewHumor | 6 | 0.6 |
| flavorfulworld | 6 | 0.6 |
| jbeer_en | 6 | 0.6 |
| wtf_compilation | 5 | 0.5 |
For a visualization, I have created a word cloud on the top 25 words associated with #beer. Again, we can suspect that the beer organizations are some of the top users - with other top words such as: * tix * https * enter * brewery suggest that many tweets are promotion related and directing users to their website.
#Tidytext
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
beer_words <- bt_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
#Top Words
beer_words_list <- beer_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(25)
#Word Cloud
beer_words_list %>% arrange(desc(n)) %>% with (wordcloud(word, n, max.words = 25, scale=c(5,.5),min.freq=3, random.order=FALSE, rot.per=.1, colors=brewer.pal(10, "Spectral")))
#Sentiment data frame
nrc <- sentiments %>%
filter(lexicon == "nrc") %>%
select(word, sentiment)
#Join Words and Sentiments
beer_words_sentiments <- beer_words %>% inner_join(nrc, by = "word")
beer_words_sentiments_list <- beer_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))
Almost 50% of posts have a sentiment of positive - this seems both likely an impact of the beer selling block (distributors, manufacturers, retailers) and the generally good times people who are tweeting about beer are experiencing.
Conversely, there are a number of lesser sentiments such as negative, fear, anger, and disgust. One can only assume that these are probably the users rather than the marketing efforts at work. Hungover perhaps?
kable(beer_words_sentiments_list, align = 'c')
| sentiment | n |
|---|---|
| positive | 522 |
| joy | 343 |
| anticipation | 223 |
| trust | 166 |
| negative | 152 |
| sadness | 82 |
| surprise | 73 |
| anger | 63 |
| fear | 60 |
| disgust | 39 |
#Postive Sentiment
pos_tw_ids <- beer_words_sentiments %>% filter(sentiment == "positive") %>% distinct(id, word)
pos_tw_ids_list <- bt_df %>% inner_join(pos_tw_ids, by = "id") %>% select(word) %>% group_by(word) %>% summarize(n = n())%>% arrange(desc(n)) %>% slice(1:5)
#Negative Sentiment
disg_tw_ids <- beer_words_sentiments %>% filter(sentiment == "disgust") %>% distinct(id, word)
disg_tw_ids_list <- bt_df %>% inner_join(disg_tw_ids, by = "id") %>% select(word) %>% group_by(word) %>% summarize(n = n())%>% arrange(desc(n)) %>% slice(1:5)
Below, the top sentiments for positive and negative tweets seem to be more accurate than what we saw in the lecture notes exercise. The top sentiments are significantly more condensed then the negative sentiments. People are associating beer with activities (football) and enjoyment (love, tasty).
Please forgive the vulgarity of the negative sentiments. People clearly have strong, though less frequent, feelings towards beer. Even so, some negative sentiments such as failure could imply a positive message for beer. “The game was a failure, time for a beer!”
#List Top Sentiments
kable(pos_tw_ids_list, align = 'c')
| word | n |
|---|---|
| beer | 135 |
| craft | 21 |
| football | 15 |
| tasty | 15 |
| perfect | 13 |
kable(disg_tw_ids_list, align = 'c')
| word | n |
|---|---|
| poison | 3 |
| dank | 2 |
| disgraceful | 2 |
| shit | 2 |
| sour | 2 |
In a final, simple analysis, I looked at how many posts were re-tweeted. Surprisingly, this breakdown did not change heavily from earlier in the week. I would have guessed that many beer supporting organizations would have higher re-tweets. Nonetheless, we see that a large portion of all tweets are re-tweets. I think this breakdown would be interesting to see in other analyses, suggesting which topics are more often re-tweets versus being organically created.
This analysis could go further, such as looking at Retweets by platform, suggesting whether tweets by organizations or individuals are more often retweeted.
#Retweet Count
#Go further - turn into pie char
retweet <- bt_df %>% select(text, isRetweet) %>% group_by(isRetweet) %>% summarize(n = n())
kable(retweet, align = 'c')
| isRetweet | n |
|---|---|
| FALSE | 741 |
| TRUE | 259 |
And with that, I will leave you with one of the finest works by Edgar Allen Poe:
Fill with mingled cream and amber,
I will drain that glass again.
Such hilarious visions clamber
Through the chamber of my brain -
Quaintest thoughts - queerest fancies
Come to life and fade away;
What care I how time advances?
I am drinking ale today.