Introduction

For this assignment I wanted to see how two NBA teams compared on twitter. The first team I chose was the Boston Celtics, my favorite team, and the second team was the Philadelphia 76ers. The Celtics have had an up and down season so far, while the 76ers are one of the worst teams in the league so far. I thought this would be an interesting comparison because Boston (and New England) is known for strongly supporting their sports teams, but the same can’t always be said for Philadelphia basketball fans. I think the sentiments of the tweets for each team could be especially interesting to compare.

Loading libraries and twitter authentication

First I loaded all the libraries I’ll need.

library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(dplyr)
library(knitr)
library(wordcloud2)

Then I logged in to twitter.

Boston Celtics

First I pulled the last 1000 tweets that used #Celtics and created a dataframe.

num_tweets <- 1000
Celtics <- searchTwitter('#Celtics', n = num_tweets)
Celtics_df <- twListToDF(Celtics)
head(Celtics_df)

Next I looked at the tweet count by platform.

Celtics_df$statusSource = substr(Celtics_df$statusSource, 
                            regexpr('>', Celtics_df$statusSource) + 1, 
                            regexpr('</a>', Celtics_df$statusSource) - 1)
Celtics_platform <- Celtics_df %>% group_by(statusSource) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n))
kable(Celtics_platform %>% top_n(10))
statusSource n percent_of_tweets
Twitter for iPhone 347 0.347
Twitter for Android 294 0.294
Twitter Web Client 112 0.112
IFTTT 87 0.087
SocialOomph 41 0.041
TweetDeck 32 0.032
Twitter for iPad 23 0.023
Facebook 6 0.006
Libsyn On-Publish 6 0.006
celtics_fanly 5 0.005
Hootsuite 5 0.005

It looks like two of the most popular platforms are cellphones. This might be from fans that are tweeting while watching the game live, or at a bar.

After that I wanted to see if there were any superfans that showed up in the most active users.

kable(Celtics_df %>% 
  group_by(screenName) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n)) %>%
  top_n(10))
screenName n percent_of_tweets
celtic_rookie 67 0.067
CelticsViews 42 0.042
EspiriTruth 14 0.014
peskydefender 9 0.009
CSNNE 8 0.008
celtics_fanly 5 0.005
SMHerlin 5 0.005
CelticsPregame 4 0.004
FaguinhoMV 4 0.004
JmCeltics 4 0.004
kc1nyk 4 0.004
MaNiNhO_t2p2 4 0.004
NBA_Scholar 4 0.004

It looks like there is a mix of Celtics fans and Celtics media sources among the top tweeters.

Once I knew who was doing the majority of the #Celtics tweeting, I wanted to take a closer look at what they were tweeting.

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Celtics_words <- Celtics_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

kable(Celtics_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(20))
word n
#celtics 997
rt 665
horford 603
al 443
block 413
@celtics 365
@nba 358
ahead 294
bucket 294
crucial 292
la 187
game 97
#nba 92
win 92
victoire 84
@parlonsnba 82
le 76
celtics 74
pistons 71
qui 70
CelticsWC <- Celtics_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(40)
wordcloud2(CelticsWC, size = 3, gridSize = 1, color = 'green', minSize = 12)

Most of the common words were pretty generic, and could easily come from just about any NBA team or city. However, one player did show up in two places on the top 20 list. Al Horford’s name makes up two of the top four most common words tweeted. His first game back from injury was last night, so it makes sense that fans would be excited to see him. He also had a block at the end of the game to help the Celtics win, which could be why the word block is so high up on the list as well.

Then I looked at the sentiments found in the #Celtics tweets. The Celtics have won the last two games, so I expected that most tweets would be pretty positive.

nrc <- sentiments %>%
  filter(lexicon == "nrc") %>%
  select(word, sentiment)
Celtics_words_sentiments <- Celtics_words %>% inner_join(nrc, by = "word")

kable(Celtics_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))
sentiment n
positive 770
trust 434
negative 151
fear 136
anticipation 130
anger 129
disgust 114
joy 85
sadness 79
surprise 65

As I expected, the majority (over three quarters) of the tweets were positive. The Celtics had an exciting win last night, so I would expect the recent tweets to reflect that.

A quick look at the positive tweets and specific words shows that four of the tweets are talking about how the Celtics finally have all their players healthy. There are also a couple tweets describing how good specific players are.

pos_tw_ids <- Celtics_words_sentiments %>% filter(sentiment == "positive") %>% distinct(id, word)

kable(Celtics_df %>% inner_join(pos_tw_ids, by = "id") %>% select(word) %>% slice(1:10))

word

pick
don
passion winning passion winning passion winning lead
ahead

I also looked at tweets categorized with the fear sentiment. There were a couple tweets from after Friday night’s loss and before Saturday’s win, so sadness makes sense for those tweets. A couple tweets don’t seem to fit with the sadness sentiment, but specific words like ‘killing’ pulled from the tweet are what cuased them to be labeled as sadness. This shows how important it is to look at the overall tweet before determining sentiment, rather than just pulling out key words.

sadness_tw_ids <- Celtics_words_sentiments %>% filter(sentiment == "sadness") %>% distinct(id, word)
kable(Celtics_df %>% inner_join(sadness_tw_ids, by = "id") %>% select(word) %>% slice(1:10))

word

inter
winning
winning
winning
harry
ruined
trickery bad
tough
killing

Philadelphia 76ers

Again, I started with pulling the 1000 most recent tweets that used #76ers and created a dataframe.

Philly <- searchTwitter('#76ers', n = num_tweets)
Philly_df <- twListToDF(Philly)

Then I looked at the most common platforms used by Philadelphia fans.

Philly_df$statusSource = substr(Philly_df$statusSource, 
                            regexpr('>', Philly_df$statusSource) + 1, 
                            regexpr('</a>', Philly_df$statusSource) - 1)
Philly_platform <- Philly_df %>% group_by(statusSource) %>% 
  summarize(n = n()) %>% 
  mutate(percent_of_tweets = n / sum(n)) %>% 
  arrange(desc(n))
kable(Philly_platform %>% top_n(10))
statusSource n percent_of_tweets
SocialOomph 211 0.211
dlvr.it 179 0.179
Twitter Web Client 164 0.164
Twitter for iPhone 140 0.140
Twitter for Android 114 0.114
ri_76ers 17 0.017
Rotoinfo.com NBA 17 0.017
TweetDeck 16 0.016
SocialNewsDesk 13 0.013
NBA Daily Lineups 12 0.012

I looked at the most common words found in the tweets.

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Philly_words <- Philly_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

PhillyWC <- Philly_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(30)
wordcloud2(PhillyWC, size = 3, gridSize = 1, color = 'red', minSize = 12)
## Warning in if (class(data) == "table") {: the condition has length > 1 and
## only the first element will be used

Like the Celtics tweets, some of the words on this list were related to basketball in general. Only one player broke into the most common words, Joel Embiid. Two other NBA teams (the Suns and the Timberwolves) also showed up in the top 20 words, this makes sense because the teams played each other recently.

I also looked at the sentiments from Philadelphia tweets.

Philly_words <- Philly_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))
Philly_words_sentiments <- Philly_words %>% inner_join(nrc, by = "word")
kable(Philly_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))
sentiment n
positive 369
anticipation 299
negative 240
trust 224
joy 198
fear 180
sadness 165
anger 152
surprise 85
disgust 63

Fewer than half the tweets were positive, which I found not very suprising. The team has won only three of 13 games so far this season, so I don’t expect fans to be very positive. However, one of those wins was last night, so the positive sentiment could definitely been higher.

Comparing the Boston Celtics and Philadelphia 76ers

First I created a city variable that I used to combine the dataframes.

Celtics_platform$city <- "Boston"
Philly_platform$city <- "Philadelphia"
Celtics_words_sentiments$city <- "Boston"
Philly_words_sentiments$city <- "Philadelphia"
platform <- rbind(Celtics_platform, Philly_platform)
words_sentiments <- rbind(Celtics_words_sentiments, Philly_words_sentiments)

The I started by comparing the most common platforms.

pf <- c("dlvr.it", "Twitter for iPhone", "Twitter for Android", "SocialOomph", "Twitter Web Client")
pf_df <- platform %>% filter(statusSource %in% pf)
ggplot(pf_df, aes(x = statusSource, y = percent_of_tweets, fill = city)) + 
  geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette="Dark2") +
  xlab("Platform") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Boston fans tweet much more frequently from their cell phones. Philadelphia fans had a much more even spread of which platforms they used for their tweets.

I finished by comparing the sentiment of the tweets from the two teams.

sent_df <- words_sentiments %>% 
  group_by(city, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n/sum(n))

ggplot(sent_df, aes(x = sentiment, y = frequency, fill = city)) + 
  geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette="Dark2") +
  xlab("Sentiment") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

The Celtics had a much higher percent of tweets in the positive and trust categories. The team had a great win last night and the fans are usually very supportive of their team, even when they are struggling. The Celtics tweets also had a slightly higher number of tweets in the disgust category, although without looking at the full text of the tweets it’s hard to guess why. The 76ers have show a much higher number of tweets in the anticipation, joy, negative, and sadness categories. The negative and sadness make sense with how the season is going so far for the team. The joy and anticipation could be in response to the recent win, but again it is hard to tell without pulling out the specific tweets in those categories. It would be interesting to check again later in the season to see if this comparison changes as teams go on winning or losing streaks.