Introduction

For this project I am going to be looking at twitter data for my favorite sports teams. I will be interested in looking at the sentiments of the tweets for each team. Both teams are beloved by the city, but recently the Bills have trended upwards while the Sabres have struggled mightily. I will be interested in seeing if there is a tangible difference in sentiments.

Packages

Before getting into any sort of analysis or data collection we need a few packages. We are concerned with twitter so we are going to need the rtweet package to help us work through collecting that data. The tidytext and lubridate will allow us to make the data more manageable. Finally, the tidyverse will allow us to properly anaylze this data through visualizations and stargazer will make regressions look better.

Setup and Authentification

We need to use a spotify developer account to gain access to the tweets. Once we have done this we can use the key and secret to authenticate our console. Once this is complete we can begin to collect data from twitter.

Teams

There was some work that was completed in a separate R script to make these files into individual CSVs. I searched for all the tweets that referenced the twitter handles for the Bills or Sabres. Then I was able to get the NRC sentiment data into my console. I combined the NRC data with each data frame that showed the sentiments of each tweet. Beyond the positive and negative sentiments, we include sentiments like trust, fear, joy, and many more. After this step, we know have two complete data sets that we can work with because the tweet data will not change since we have placed them in their own CSVs before into data frames.

NRC Sentiments

Here is an example of how we got the NRC data that is already in the R console. We used this in the setup in a different r script and combined it with the twitter data. We will use this directly in some piping later.

nrc <- get_sentiments("nrc")

Who has more the positive Twitter

The first analysis we want to look at is a simple visualization about the positivity of tweets. We want to see what the general opinions of people were on twitter that referenced the twitter handles. We collected the most recent 2000 tweets to see what the general feelings were.

The results are not surprising because I know that the teams seem to be going in completely different directions right now. I think that the previous seasons and recent performance makes a lot of sense for disparity. The sabres have similar amounts of negative and positive tweets which is concerning for them. I think the big issue with this is that the amount of trust and joy are closer to anger and fear. I think the key difference between the two is seeing how many more positive words the Bills have. The Bills Twitter has almost double the amount of positive words Another interesting thing for the Bills is seeing the amount of anticipation words With the NFL draft this week it makes sense that the their is not only a lot of positive words but anticipatory ones as well.

Further Setup

We are going to generate some more workable data by parsing out every word to look at the individual sentiments. We are going to make these into actual seperate data frames with the many sentiment variables. Most importantly we will know have an overall positivity variable.

Working with some regression

The first thing that I am going to look into is how favorite counts are affected. I want to see what variables most effect the varoite count as far as sentiments go.

## 
## Call:
## glm(formula = favorite_count ~ negative + positive, data = rtweetsabres_2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
##  -7.00   -2.32   -0.99   -0.99  416.10  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.98634    0.14966   6.591 4.49e-11 ***
## negative     0.41896    0.06127   6.838 8.25e-12 ***
## positive     0.91447    0.12898   7.090 1.38e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 285.6201)
## 
##     Null deviance: 5768169  on 20084  degrees of freedom
## Residual deviance: 5735824  on 20082  degrees of freedom
## AIC: 170578
## 
## Number of Fisher Scoring iterations: 2
## 
## Sabres Regression Results
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                         favorite_count       
## ---------------------------------------------
## negative               0.419*** (0.061)      
## positive               0.914*** (0.129)      
## Constant               0.986*** (0.150)      
## ---------------------------------------------
## Observations                20,085           
## Log Likelihood            -85,285.830        
## Akaike Inf. Crit.         170,577.700        
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01
## 
## Call:
## glm(formula = favorite_count ~ negative + positive, data = rtweetBills_2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
##   -4.12    -4.12    -4.12    -4.12  1694.88  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.1221     0.3669  11.234   <2e-16 ***
## negative     -1.8276     5.9311  -0.308    0.758    
## positive     -1.9253     7.0556  -0.273    0.785    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2922.429)
## 
##     Null deviance: 63554788  on 21749  degrees of freedom
## Residual deviance: 63554072  on 21747  degrees of freedom
## AIC: 235298
## 
## Number of Fisher Scoring iterations: 2
## 
## Bills Regression Results
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                         favorite_count       
## ---------------------------------------------
## negative                -1.828 (5.931)       
## positive                -1.925 (7.056)       
## Constant               4.122*** (0.367)      
## ---------------------------------------------
## Observations                21,750           
## Log Likelihood           -117,645.800        
## Akaike Inf. Crit.         235,297.500        
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01

This regression is somewhat surprsing because it seems like the Bills get less likes for positive or negative tweets. Yet, the Sabres seem to get more likes for positive or negative tweets. I think that nothing here is statistically significant enough to explain why this is occuring. We are going to move to a different regression that is more sentiment driven.

Sabres Gameday

The Sabres lost a difficult game on Sunday April 25th to the New York Rangers. I wanted to inspect what kind of result this would have on positive tweets occuring the day before, day of, and day after the defeat. I want to see if their is a trend to be followed. I elected. I’m including positivity to also generate a better feel of whether the tweets are high in positivity or their is simply a greater number of positve words being used.

I think that the relationship can be seen that the amount of positive words used in tweets greatly decreased on the day of and after the game. I felt that the positivity would help us and I feel that it did. I think that Saturday showed the most variance while including the most positive words. I think the idea of people getting ready for the game the day before makes sense. I feel like the positive words being lower the day of and after the game makes sense. Even on sunday there were more positives in anticipation for the game itself I think. Monday there were not many positive things to say about the Sabres. Also, I think this could follow the trend that nobody really enjoys Mondays anyways.

Conclusion

I think that my overall assumption that Bills Twitter is more positive than Sabres Twitter has rung true. I feel that the two franchises have gone in different directions recently. Somewhat unfortunately for the Sabres social media folks, the Buffalo fans are very vocal on twtiter whether they are positive or negative about them. As a fan with twitter, I can say that I make my feelings known to them when I feel necessary.