Assignment 4- Analysis of Boston #Celtics Tweets

Introduction

For this assignment I wanted to see how two NBA teams compared on twitter. The first team I chose was the Boston Celtics, my favorite team, and the second team was the Philadelphia 76ers. The Celtics have had an up and down season so far, while the 76ers are one of the worst teams in the league so far. I thought this would be an interesting comparison because Boston (and New England) is known for strongly supporting their sports teams, but the same can’t always be said for Philadelphia basketball fans. I think the sentiments of the tweets for each team could be especially interesting to compare.

Loading libraries and twitter authentication

First I loaded all the libraries I’ll need.

library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(dplyr)
library(knitr)
library(wordcloud2)

Then I logged in to twitter.

Boston Celtics

First I pulled the last 1000 tweets that used #Celtics and created a dataframe.

num_tweets <- 1000
Celtics <- searchTwitter('#Celtics', n = num_tweets)
Celtics_df <- twListToDF(Celtics)
head(Celtics_df)

Next I looked at the tweet count by platform.

Celtics_df$statusSource = substr(Celtics_df$statusSource, 
                            regexpr('>', Celtics_df$statusSource) + 1, 
                            regexpr('</a>', Celtics_df$statusSource) - 1)
Celtics_platform <- Celtics_df %>% group_by(statusSource) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n))
kable(Celtics_platform %>% top_n(10))

statusSource	n	percent_of_tweets
Twitter for iPhone	347	0.347
Twitter for Android	294	0.294
Twitter Web Client	112	0.112
IFTTT	87	0.087
SocialOomph	41	0.041
TweetDeck	32	0.032
Twitter for iPad	23	0.023
Facebook	6	0.006
Libsyn On-Publish	6	0.006
celtics_fanly	5	0.005
Hootsuite	5	0.005

It looks like two of the most popular platforms are cellphones. This might be from fans that are tweeting while watching the game live, or at a bar.

After that I wanted to see if there were any superfans that showed up in the most active users.

kable(Celtics_df %>% 
  group_by(screenName) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n)) %>%
  top_n(10))

screenName	n	percent_of_tweets
celtic_rookie	67	0.067
CelticsViews	42	0.042
EspiriTruth	14	0.014
peskydefender	9	0.009
CSNNE	8	0.008
celtics_fanly	5	0.005
SMHerlin	5	0.005
CelticsPregame	4	0.004
FaguinhoMV	4	0.004
JmCeltics	4	0.004
kc1nyk	4	0.004
MaNiNhO_t2p2	4	0.004
NBA_Scholar	4	0.004

It looks like there is a mix of Celtics fans and Celtics media sources among the top tweeters.

Once I knew who was doing the majority of the #Celtics tweeting, I wanted to take a closer look at what they were tweeting.

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Celtics_words <- Celtics_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

kable(Celtics_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(20))

word	n
#celtics	997
rt	665
horford	603
al	443
block	413
@celtics	365
@nba	358
ahead	294
bucket	294
crucial	292
la	187
game	97
#nba	92
win	92
victoire	84
@parlonsnba	82
le	76
celtics	74
pistons	71
qui	70

CelticsWC <- Celtics_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(40)
wordcloud2(CelticsWC, size = 3, gridSize = 1, color = 'green', minSize = 12)

Most of the common words were pretty generic, and could easily come from just about any NBA team or city. However, one player did show up in two places on the top 20 list. Al Horford’s name makes up two of the top four most common words tweeted. His first game back from injury was last night, so it makes sense that fans would be excited to see him. He also had a block at the end of the game to help the Celtics win, which could be why the word block is so high up on the list as well.

Then I looked at the sentiments found in the #Celtics tweets. The Celtics have won the last two games, so I expected that most tweets would be pretty positive.

nrc <- sentiments %>%
  filter(lexicon == "nrc") %>%
  select(word, sentiment)
Celtics_words_sentiments <- Celtics_words %>% inner_join(nrc, by = "word")

kable(Celtics_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))

sentiment	n
positive	770
trust	434
negative	151
fear	136
anticipation	130
anger	129
disgust	114
joy	85
sadness	79
surprise	65

As I expected, the majority (over three quarters) of the tweets were positive. The Celtics had an exciting win last night, so I would expect the recent tweets to reflect that.

A quick look at the positive tweets and specific words shows that four of the tweets are talking about how the Celtics finally have all their players healthy. There are also a couple tweets describing how good specific players are.

pos_tw_ids <- Celtics_words_sentiments %>% filter(sentiment == "positive") %>% distinct(id, word)

kable(Celtics_df %>% inner_join(pos_tw_ids, by = "id") %>% select(word) %>% slice(1:10))

word

pick
don
passion winning passion winning passion winning lead
ahead

I also looked at tweets categorized with the fear sentiment. There were a couple tweets from after Friday night’s loss and before Saturday’s win, so sadness makes sense for those tweets. A couple tweets don’t seem to fit with the sadness sentiment, but specific words like ‘killing’ pulled from the tweet are what cuased them to be labeled as sadness. This shows how important it is to look at the overall tweet before determining sentiment, rather than just pulling out key words.

sadness_tw_ids <- Celtics_words_sentiments %>% filter(sentiment == "sadness") %>% distinct(id, word)
kable(Celtics_df %>% inner_join(sadness_tw_ids, by = "id") %>% select(word) %>% slice(1:10))

word

inter
winning
winning
winning
harry
ruined
trickery bad
tough
killing

Philadelphia 76ers

Again, I started with pulling the 1000 most recent tweets that used #76ers and created a dataframe.

Philly <- searchTwitter('#76ers', n = num_tweets)
Philly_df <- twListToDF(Philly)

Then I looked at the most common platforms used by Philadelphia fans.

Philly_df$statusSource = substr(Philly_df$statusSource, 
                            regexpr('>', Philly_df$statusSource) + 1, 
                            regexpr('</a>', Philly_df$statusSource) - 1)
Philly_platform <- Philly_df %>% group_by(statusSource) %>% 
  summarize(n = n()) %>% 
  mutate(percent_of_tweets = n / sum(n)) %>% 
  arrange(desc(n))
kable(Philly_platform %>% top_n(10))

statusSource	n	percent_of_tweets
SocialOomph	211	0.211
dlvr.it	179	0.179
Twitter Web Client	164	0.164
Twitter for iPhone	140	0.140
Twitter for Android	114	0.114
ri_76ers	17	0.017
Rotoinfo.com NBA	17	0.017
TweetDeck	16	0.016
SocialNewsDesk	13	0.013
NBA Daily Lineups	12	0.012

I looked at the most common words found in the tweets.

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
Philly_words <- Philly_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

PhillyWC <- Philly_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(30)
wordcloud2(PhillyWC, size = 3, gridSize = 1, color = 'red', minSize = 12)

## Warning in if (class(data) == "table") {: the condition has length > 1 and
## only the first element will be used

Like the Celtics tweets, some of the words on this list were related to basketball in general. Only one player broke into the most common words, Joel Embiid. Two other NBA teams (the Suns and the Timberwolves) also showed up in the top 20 words, this makes sense because the teams played each other recently.

I also looked at the sentiments from Philadelphia tweets.

Philly_words <- Philly_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))
Philly_words_sentiments <- Philly_words %>% inner_join(nrc, by = "word")
kable(Philly_words_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n)))

sentiment	n
positive	369
anticipation	299
negative	240
trust	224
joy	198
fear	180
sadness	165
anger	152
surprise	85
disgust	63

Fewer than half the tweets were positive, which I found not very suprising. The team has won only three of 13 games so far this season, so I don’t expect fans to be very positive. However, one of those wins was last night, so the positive sentiment could definitely been higher.

Comparing the Boston Celtics and Philadelphia 76ers

First I created a city variable that I used to combine the dataframes.

Celtics_platform$city <- "Boston"
Philly_platform$city <- "Philadelphia"
Celtics_words_sentiments$city <- "Boston"
Philly_words_sentiments$city <- "Philadelphia"
platform <- rbind(Celtics_platform, Philly_platform)
words_sentiments <- rbind(Celtics_words_sentiments, Philly_words_sentiments)

The I started by comparing the most common platforms.

pf <- c("dlvr.it", "Twitter for iPhone", "Twitter for Android", "SocialOomph", "Twitter Web Client")
pf_df <- platform %>% filter(statusSource %in% pf)
ggplot(pf_df, aes(x = statusSource, y = percent_of_tweets, fill = city)) + 
  geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette="Dark2") +
  xlab("Platform") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Boston fans tweet much more frequently from their cell phones. Philadelphia fans had a much more even spread of which platforms they used for their tweets.

I finished by comparing the sentiment of the tweets from the two teams.

sent_df <- words_sentiments %>% 
  group_by(city, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n/sum(n))

ggplot(sent_df, aes(x = sentiment, y = frequency, fill = city)) + 
  geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette="Dark2") +
  xlab("Sentiment") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

The Celtics had a much higher percent of tweets in the positive and trust categories. The team had a great win last night and the fans are usually very supportive of their team, even when they are struggling. The Celtics tweets also had a slightly higher number of tweets in the disgust category, although without looking at the full text of the tweets it’s hard to guess why. The 76ers have show a much higher number of tweets in the anticipation, joy, negative, and sadness categories. The negative and sadness make sense with how the season is going so far for the team. The joy and anticipation could be in response to the recent win, but again it is hard to tell without pulling out the specific tweets in those categories. It would be interesting to check again later in the season to see if this comparison changes as teams go on winning or losing streaks.

Assignment 4- Analysis of Boston #Celtics Tweets

Lily Parenteau

November 13, 2016

Introduction

Loading libraries and twitter authentication

Boston Celtics

word

word

Philadelphia 76ers

Comparing the Boston Celtics and Philadelphia 76ers