On this notebook we are going to analysis tweets from march madness 2018
Use regular expression to clean the tweets text
Familiarize with some natural language processing tools
# Here we are checking if the package is installed
if(!require("tidyverse")){
install.packages("tidyverse", dependencies = TRUE)
library("tidyverse")
}
if(!require("syuzhet")){
install.packages("syuzhet", dependencies = TRUE)
library("syuzhet")
}
if(!require("cleanNLP")){
install.packages("cleanNLP", dependencies = TRUE)
library("cleanNLP")
}
if(!require("magrittr")){
install.packages("magrittr", dependencies = TRUE)
library("magrittr")
}
if(!require("wordcloud")){
install.packages("wordcloud", dependencies = TRUE)
library("wordcloud")
}
mydata = read.csv(file="data/march_madness.csv")
head(mydata)
## tweet_id
## 1 9.802206e+17
## 2 9.802732e+17
## 3 9.802187e+17
## 4 9.784337e+17
## 5 9.802407e+17
## 6 9.802403e+17
## text
## 1 Good #Caturday evening. Chloe is watching the Loyola vs Michigan game. She's trying to help. She knows mom & dad's grandaughter attends Loyola. We hope everyone's having a fabulous day. #LoyolaChicago #FinalFour pic.twitter.com/gg8DWm6Hu3
## 2 Look, I get that that you're all excited that you beat an 11 seed, but the fact that some of you guys are attacking a 98 year old nun is absolutely disrespectful and disgusting #LoyolaChicago #Michigan #SisterJean #FinalFour
## 3 #LoyolaChicago is here for a reason . #Michigan better WAKE UP or the dream season is OVER!!#FinalFour
## 4 Check it out! Full Video on YouTube#YouTube #MarchMadness#NCAA #NCAABasketball #Basketball #Duke #Michigan #LoyolaChicago #SisterJean #Kansas #KansasState #FloridaState #TexasTech #Villanova #NCAAMens #NCAATournament#ChiseledAdonishttps://youtu.be/zTvoa2uFlhQ pic.twitter.com/qUjUWqzxcI
## 5 Bye Sister Jean! Good run #LoyolaChicago. #GoBlue #MarchMadnesspic.twitter.com/d3heqA1ljL
## 6 Ben Richardson was extremely emotional leaving the court, screaming into the Jersey he had pulled to cover his face. Valiant warrior, left it all on the floor. #LoyolaChicago pic.twitter.com/Zu2btOkKxl
## username fullname date datetime
## 1 @mill_cats Mill Cats - RIP Itsy 2018-03-31 2018-03-31T23:09:34Z
## 2 @HoodieMobster A.J. Schlott 2018-04-01 2018-04-01T02:38:48Z
## 3 @DanLeach971 Dan Leach 2018-03-31 2018-03-31T23:01:56Z
## 4 @chiseledadonis ChiseledAdonis 2018-03-27 2018-03-27T00:49:15Z
## 5 @ProSportsExtra Pro Sports Extra 2018-04-01 2018-04-01T00:29:18Z
## 6 @RyanSchuiling Ryan Schuiling 2018-04-01 2018-04-01T00:27:55Z
## verified reply retweets favorite
## 1 0 13 31 151
## 2 0 3 4 17
## 3 1 12 14 74
## 4 0 0 6 20
## 5 0 2 2 5
## 6 0 4 6 45
## links
## 1 #Caturday; #LoyolaChicago; #FinalFour; https://t.co/gg8DWm6Hu3
## 2 #LoyolaChicago; #Michigan; #SisterJean; #FinalFour
## 3 #LoyolaChicago; #Michigan; #FinalFour
## 4 #YouTube; #MarchMadness; #NCAA; #NCAABasketball; #Basketball; #Duke; #Michigan; #LoyolaChicago; #SisterJean; #Kansas; #KansasState; #FloridaState; #TexasTech; #Villanova; #NCAAMens; #NCAATournament; #ChiseledAdonis; https://t.co/yWdM1tD00K; https://t.co/qUjUWqzxcI
## 5 #LoyolaChicago; #GoBlue; #MarchMadness; https://t.co/d3heqA1ljL
## 6 #LoyolaChicago; https://t.co/Zu2btOkKxl
This data is titled “March Maddness” and it includes twitter statistics from the basketball games. The data features data such as tweet, retweets, favorites and time stamps. There are also unique tweet IDs, hashtags, and usernames.
knitr::include_graphics('images/Screen Shot 2018-04-18 at 7.33.19 PM.png')
This is a bar chart with retweets on the left side and the top 6 hashtags on the bottom. The bar indicates the count of retweets per hashtag. I used Tableau’s filter function and I had to use the If Elseif function on Tableau that looks like “IF CONTAINS([Links],”#sisterjean“) OR CONTAINS([Links],”#SisterJean“) THEN”Sister Jean" ELSEIF CONTAINS([Links],“#OnwardLU”) OR CONTAINS([Links],“#onwardLU”) THEN “Onward LU”“.
knitr::include_graphics('images/Screen Shot 2018-04-18 at 7.33.26 PM.png')
March number of tweets! Almost 10k on March 24th and almost 3k on the day prior to the game.
knitr::include_graphics('images/Screen Shot 2018-04-18 at 7.33.32 PM.png')
Top Sister Jean related tweets by retweet. The bigger the bubble the more retweets.
knitr::include_graphics('images/Screen Shot 2018-04-18 at 7.33.45 PM.png')
Top Retweets in general. Same specs as above but not Sister Jean focused.
knitr::include_graphics('images/Screen Shot 2018-04-18 at 8.27.01 PM.png')
Tweets by ten or more by unique username. “@LALATE” is the user with the most tweets in the data’s timeframe.
Done above.
According to my analysis through Tableau, Loyola is viewed postively as most hashtags and mentions involved basketball, making the final four, and Sister Jean. Tweets mentioning Loyola occurred most frequently on the day of the March 24th game. There are 20189 rows of data, therefore the sentiment being positive overall is a very accurate portrait of Loyola since there is a large sample size.
summary(mydata)
## tweet_id
## Min. :3.542e+16
## 1st Qu.:9.774e+17
## Median :9.777e+17
## Mean :9.753e+17
## 3rd Qu.:9.777e+17
## Max. :9.824e+17
##
## text
## Yes! My favorite sports city has a team in the #FinalFour of #MarchMadness! The Loyola-Chicago Ramblers are in the final 4 for the 1st time since 1963! Theyâ\u0080\u0099ve been underdogs this whole tournament & still keep winning! Go @RamblersMBB! Win for #SisterJean! #OnwardLU #NoFinishLine: 8
## Congrats @RamblersMBB : 7
## Letâ\u0080\u0099s go @RamblersMBB : 4
## Congrats @RamblersMBB. : 3
## Congratulations @RamblersMBB : 3
## I love #SisterJean : 3
## (Other) :20159
## username fullname date
## @LALATE : 81 LALATE : 81 2018-03-25:10708
## @RamblersMBB : 30 Loyola Basketball: 31 2018-03-23: 2976
## @SkywayChicago : 27 Steve Timble : 27 2018-03-24: 2274
## @chicagomargaret: 21 Margaret Holt : 21 2018-03-26: 1504
## @sschrimp : 18 Mark : 21 2018-03-18: 1099
## @loyolaforus : 16 Steve : 19 2018-03-27: 241
## (Other) :19994 (Other) :19987 (Other) : 1385
## datetime verified reply
## 2018-03-25T00:21:10Z: 16 Min. :0.00000 Min. : 0.0000
## 2018-03-25T00:21:31Z: 16 1st Qu.:0.00000 1st Qu.: 0.0000
## 2018-03-25T00:21:09Z: 15 Median :0.00000 Median : 0.0000
## 2018-03-25T00:21:35Z: 15 Mean :0.06192 Mean : 0.3467
## 2018-03-25T00:21:08Z: 14 3rd Qu.:0.00000 3rd Qu.: 0.0000
## 2018-03-25T00:21:11Z: 14 Max. :1.00000 Max. :591.0000
## (Other) :20097
## retweets favorite
## Min. : 0.000 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.0
## Median : 0.000 Median : 1.0
## Mean : 3.146 Mean : 15.8
## 3rd Qu.: 0.000 3rd Qu.: 3.0
## Max. :5143.000 Max. :32180.0
##
## links
## @RamblersMBB : 1139
## #LoyolaChicago : 1027
## #SisterJean : 778
## https://twitter.com#SisterJean: 231
## #LoyolaChicago; #MarchMadness : 208
## (Other) :16117
## NA's : 687
LALATE has the most tweets at 81. The min value of retweets is 0, the max value of retweets 5143. The biggest link is atRamblersMBB as well as hashtag LoyolaChicago. These are neutral/postive statistics that backup my argument.
time <- mydata$datetime
favorite <- mydata$favorite
qplot( x = time, y = favorite, data = mydata)
More favorites over time!
I would recommend Loyola create their own official hashtag. There should be a twitter competition with prizes as well to drive traffic. For example, Loyola could say that whoever posts the best fan picture with the hashtag loyolachicago could win a free tshirt or something. This would drive trending tweets and increase the university’s visbility!
Data is of 53% quality - updated data of Sentiment_March_Madness (provided on Sakai).
knitr::include_graphics('images/Screen Shot 2018-04-18 at 9.00.22 PM.png')
Above is the dashboard I was presented with for Watson Exploration.
knitr::include_graphics('images/Screen Shot 2018-04-23 at 7.21.51 PM.png')
The graph above shows that the most anger is in the sentiment bing range of -2 to 2. Mostly by verified accounts.
knitr::include_graphics('images/Screen Shot 2018-04-23 at 7.24.41 PM.png')
Most favorites are done by verified account! That is good as a verified account is a stronger more relevant account.
knitr::include_graphics('images/Unknown.png')
This is a watson graph of how favorite and retweets are associated. The more favorites, the more retweets!