The Tom Brady Stats: Trends and Word-cloud

This tutorial teaches you how to create trend plots (daily like and share count) and word-clouds from Facebook data. You can easily edit and apply the R codes to Twitter data.

Let’s load the libraries. Make sure these libraries are installed on your computer.

library(ggplot2)
library(lubridate)
library(scales)
library(tm)
library(stringr)
library(wordcloud)
library(syuzhet)
library(reshape2)
library(dplyr)
library(Rfacebook)
library(rCharts)

Set up your Facebook API for data mining. Use your own API key and token.

With the API set up, we can download posts from Tom Brady’s official Facebook page. Q: What is the name of the dataframe that has Tom Brady’s posts?

brady_posts <- getPage(page='TomBrady', token=token, n = 2000, feed = FALSE,reactions = TRUE)

Clean up timestamps in his posts. Refer to the previous tutorial on cleaning timestamps in Twitter data. Q: Are timestamps in Facebook data different from Twitter’s? How?

#extrct year, month, day, hour, minute and second from timestamps. But notice a new column is created, called created_date. 
brady_posts$created_time <- ymd_hms(brady_posts$created_time) 
brady_posts$created_time <- with_tz(brady_posts$created_time,"America/New_York")
brady_posts$created_date <- as.Date(brady_posts$created_time)

Let’s use his posts since 2012

brady_posts <- brady_posts[brady_posts$created_date>="2012-01-01",]

Calculate the average number of likes and shares for Tom Brady.

#we create a new column for created_dates, called created_date_label. This column will be used as a label in the summary calculation. 
brady_posts$created_date_label <- as.factor(brady_posts$created_date)

print(c("the average number of likes:", mean(brady_posts$likes_count)))

## [1] "the average number of likes:" "55385.2112676056"

print(c("the average number of shares:",mean(brady_posts$shares_count)))

## [1] "the average number of shares:" "5579.7382629108"

brady_posts_popularity <- brady_posts %>% 
  group_by(created_date_label) %>% 
  summarise(avg_likes = mean(likes_count),
            avg_shares = mean(shares_count),
            post_count = length(unique(id))) %>% melt

## Using created_date_label as id variables

names(brady_posts_popularity) <- c("day", "type", "value")
brady_posts_popularity$day <- as.Date(brady_posts_popularity$day)

Let’s use the library ggplot2 to visualize the daily likes and shares count.

ggplot(data = brady_posts_popularity, aes(x = day, y = value, group = type)) +
  geom_line(size = 0.9, alpha = 0.7, aes(color = type)) +
  geom_point(size = 0) +
  ylim(0, NA) +
  theme(legend.title=element_blank(), axis.title.x = element_blank()) +
  ylab("Engagement indicators") + 
  ggtitle("Popularity over time")

Hate or love? Since Feb. 2016, each Facebook post has reactions count (e.g., love, haha, wow, angry, sad, etc.). They offer a window into audience sentiment. Let’s take a peek at his most beloved posts. Let’s first take his posts with love_count > 100 since March, 2016

brady_posts_since16 <- brady_posts[brady_posts$created_date>="2016-03-01",]
brady_posts_since16_loved <- brady_posts_since16[brady_posts_since16$love_count >=100,]
loved_posts <- str_replace_all(brady_posts_since16_loved$message, "@\\w+", "") 
wordCorpus <- Corpus(VectorSource(loved_posts)) 
wordCorpus <- tm_map(wordCorpus, removePunctuation) 
wordCorpus <- tm_map(wordCorpus, content_transformer(tolower)) #converted to lower case
wordCorpus <- tm_map(wordCorpus, removeWords, stopwords("english")) #remove stopwords
wordCorpus <- tm_map(wordCorpus, removeWords, c("nfl"))
wordCorpus <- tm_map(wordCorpus, stripWhitespace)

Now, let’s create a word-cloud of his beloved posts. There are many ways you can tweak the wordcloud to make it more aesthetically appealing. See wordcloud document: https://cran.r-project.org/web/packages/wordcloud/wordcloud.pdf

pal <- brewer.pal(8,"Dark2")
pal <- pal[-(1:4)]
set.seed(123)
wordcloud(words = wordCorpus, scale=c(8,.2), min.freq=3,max.words=Inf, random.order=FALSE, 
          rot.per=0.15, use.r.layout=FALSE, colors=pal)

This tutorial is developed for COMM497DB Fall 2017, taught at UMass-Amherst.

If you find this tutorial helpful and would like to use it in your projects, please acknowledge the source:

Xu, Weiai W. (2017). How to Detect Sentiments from Donald Trump’s Tweets?. Amherst, MA: http://curiositybits.com

The Tom Brady Stats: Trends and Word-cloud

Wayne Weiai Xu

09/30/2017