Assignment 5: Football Analytics Twitter

Option 1

Introduction

For this assignment, I will be analyzing and comparing tweets from the accounts of Pro Football Focus (@PFF) and NFL Next Gen Stats (@NextGenStats). From both accounts, I have scraped approximately 2,000 Tweets from the Twitter API and will be looking at tweets from 9/19/2019 up until 5/10/2020. I have selected these two twitter profiles because there are tweets are very interesting and come through my feed whenever I am on Twitter. Additionally, I am passionate about football analytics and wondering how these two accounts compare on this specific social media platform.

Packages

I will be using the following packages that will be critical for my analysis of these Twitter Accounts.

## -- Attaching packages -------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0     v purrr   0.3.3
## v tibble  3.0.0     v dplyr   0.8.5
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ----------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

3.1: Importing the Dataset

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   created_at = col_datetime(format = ""),
##   display_text_width = col_double(),
##   is_quote = col_logical(),
##   is_retweet = col_logical(),
##   favorite_count = col_double(),
##   retweet_count = col_double(),
##   quote_count = col_logical(),
##   reply_count = col_logical(),
##   symbols = col_logical(),
##   ext_media_type = col_logical(),
##   quoted_created_at = col_datetime(format = ""),
##   quoted_favorite_count = col_double(),
##   quoted_retweet_count = col_double(),
##   quoted_followers_count = col_double(),
##   quoted_friends_count = col_double(),
##   quoted_statuses_count = col_double(),
##   quoted_description = col_logical(),
##   quoted_verified = col_logical(),
##   retweet_created_at = col_datetime(format = ""),
##   retweet_favorite_count = col_double()
##   # ... with 20 more columns
## )
## See spec(...) for full column specifications.

My Analysis

Question 1

Which source do Pro Football Focus and NFL Next Gen Stats tweet from the most? How many favorites do they get from tweeting from these sources and how does each account compare?

Results and Interpertation: As you can see, the majority of tweets from Pro Football Focus and Next Gen Stats are from the Twitter Web App. It was interesting to notice that Pro Football Focus has tweeted nearly twice as much as Next Gen Stats during this span as well. Ususally the best way to tweet out things from Twitter is through the Twitter Web App, so that makes more sense why the majority of both accounts Tweets are from their. I have also noticed that PFF has tried to diversify their content by also using Sprout Social, as well as minimal from Twitter Media Studio and Tweet Deck.

Question 2

Is there a correlation between the number of favorites and retweets from Pro Football Focus and NFL Next Gen Stats? What can we infer based on this data since early 2019?

Looking at this graph, both Pro Football Focus and Next Gen Stats have a decent correlation between the number of favorites and retweets, when they are tweeting. Overall, I can infer that Next Gen Stats has more fit of a line than Pro Football Focus, in terms of having a good correlation between getting favorites or retweets on a tweet they put out. Additionally, a big clutter between 0 to around 3,000 favorites indicates that there accounts will generally get more favs than retweets, regardless of the tweet they put out. Both accounts also do not get as many retweets as favorites, which generally happens on Twitter. One thing that both accounts can do is put out more content that would apply to both the casual, die-hard, and analytical football fans, such as being more interactive with people on Twitter.

Question 3:

What are the most popular tweets between Pro Football Focus and Next Gen Stats, in terms of the number of favorites the tweet gets? What do you think is the reason behind this?

screen_name favorite_count retweet_count hashtags source
NextGenStats 8190 2104 CLEvsBAL Browns Twitter Web App
NextGenStats 5878 1537 TNFonPrime Twitter Web App
NextGenStats 5086 1077 LACvsKC ChiefsKingdom Twitter Web App
PFF 4882 864 NA Sprout Social
PFF 4420 898 NA Twitter Web App
NextGenStats 4326 1026 MINvsKC ChiefsKingdom Twitter Web App
PFF 3557 507 NA Sprout Social
NextGenStats 3454 886 TENvsHOU Titans Twitter Web App
PFF 3130 357 NA Sprout Social
NextGenStats 2941 417 SBLIV ChiefsKingdom Twitter Web App

Looking at the top 10 most popular tweets based on twitter account, NFL’s Next Gen Stats owns six of the top 10 tweets of this 2,000 tweet dataset, including the top 3 most favorited tweets. What was even more shocking to me was that the game involved the Browns, who played the Ravens at the time. The tweet was Nick Chubb of the Browns running approximately 21 MPH on an 88 yard TD run, quite impressive! This was early in the season, so the Ravens, who went 14-2 during the 2019 NFL Season, did not get as much recognition early in the season. Another interesting thing to note was that all of PFF top favorited tweets did not include a hashtag. Based on this trend, though a small sample size right here, PFF seems to get more attention with favorites with no hashtags, while NFL’s Next Gen’s Stats creates more media attention using hashtags, resulting in getting more favorites via Twitter.