On this notebook we are going to analysis tweets from march madness 2018
Use regular expression to clean the tweets text
Familiarize with some natural language processing tools
# Here we are checking if the package is installed
if(!require("tidyverse")){
install.packages("tidyverse", dependencies = TRUE)
library("tidyverse")
}
if(!require("syuzhet")){
install.packages("syuzhet", dependencies = TRUE)
library("syuzhet")
}
if(!require("cleanNLP")){
install.packages("cleanNLP", dependencies = TRUE)
library("cleanNLP")
}
if(!require("magrittr")){
install.packages("magrittr", dependencies = TRUE)
library("magrittr")
}
if(!require("wordcloud")){
install.packages("wordcloud", dependencies = TRUE)
library("wordcloud")
}
mydata = read_csv('data/sentiment_march_madness.csv')
## Parsed with column specification:
## cols(
## .default = col_integer(),
## tweet_id = col_double(),
## text = col_character(),
## username = col_character(),
## fullname = col_character(),
## date = col_date(format = ""),
## datetime = col_datetime(format = ""),
## links = col_character()
## )
## See spec(...) for full column specifications.
summary(mydata)
## tweet_id text username
## Min. :3.542e+16 Length:20187 Length:20187
## 1st Qu.:9.774e+17 Class :character Class :character
## Median :9.777e+17 Mode :character Mode :character
## Mean :9.753e+17
## 3rd Qu.:9.777e+17
## Max. :9.824e+17
## fullname date datetime
## Length:20187 Min. :2011-02-09 Min. :2011-02-09 19:42:51
## Class :character 1st Qu.:2018-03-24 1st Qu.:2018-03-24 11:03:02
## Mode :character Median :2018-03-25 Median :2018-03-25 00:22:38
## Mean :2018-03-18 Mean :2018-03-18 10:40:33
## 3rd Qu.:2018-03-25 3rd Qu.:2018-03-25 03:08:07
## Max. :2018-04-06 Max. :2018-04-06 21:24:17
## verified reply retweets favorite
## Min. :0.00000 Min. : 0.0000 Min. : 0.000 Min. : 0.0
## 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0
## Median :0.00000 Median : 0.0000 Median : 0.000 Median : 1.0
## Mean :0.06192 Mean : 0.3467 Mean : 3.146 Mean : 15.8
## 3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 3.0
## Max. :1.00000 Max. :591.0000 Max. :5143.000 Max. :32180.0
## anger anticipation disgust fear
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.1342 Mean :0.4359 Mean :0.07143 Mean :0.1612
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :4.0000 Max. :7.0000 Max. :3.00000 Max. :6.0000
## joy sadness surprise trust
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.421 Mean :0.1122 Mean :0.1798 Mean :0.4806
## 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :8.000 Max. :5.0000 Max. :4.0000 Max. :7.0000
## negative positive sentiment_bing links
## Min. :0.0000 Min. :0.0000 Min. :-5.0000 Length:20187
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.0000 Class :character
## Median :0.0000 Median :0.0000 Median : 0.0000 Mode :character
## Mean :0.2395 Mean :0.6676 Mean : 0.5141
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.: 1.0000
## Max. :6.0000 Max. :9.0000 Max. :11.0000
tweets = read_csv('data/march_madness.csv')
## Parsed with column specification:
## cols(
## tweet_id = col_double(),
## text = col_character(),
## username = col_character(),
## fullname = col_character(),
## date = col_date(format = ""),
## datetime = col_datetime(format = ""),
## verified = col_integer(),
## reply = col_integer(),
## retweets = col_integer(),
## favorite = col_integer(),
## links = col_character()
## )
summary(tweets)
## tweet_id text username
## Min. :3.542e+16 Length:20187 Length:20187
## 1st Qu.:9.774e+17 Class :character Class :character
## Median :9.777e+17 Mode :character Mode :character
## Mean :9.753e+17
## 3rd Qu.:9.777e+17
## Max. :9.824e+17
## fullname date datetime
## Length:20187 Min. :2011-02-09 Min. :2011-02-09 19:42:51
## Class :character 1st Qu.:2018-03-24 1st Qu.:2018-03-24 11:03:02
## Mode :character Median :2018-03-25 Median :2018-03-25 00:22:38
## Mean :2018-03-18 Mean :2018-03-18 10:40:33
## 3rd Qu.:2018-03-25 3rd Qu.:2018-03-25 03:08:07
## Max. :2018-04-06 Max. :2018-04-06 21:24:17
## verified reply retweets favorite
## Min. :0.00000 Min. : 0.0000 Min. : 0.000 Min. : 0.0
## 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0
## Median :0.00000 Median : 0.0000 Median : 0.000 Median : 1.0
## Mean :0.06192 Mean : 0.3467 Mean : 3.146 Mean : 15.8
## 3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 3.0
## Max. :1.00000 Max. :591.0000 Max. :5143.000 Max. :32180.0
## links
## Length:20187
## Class :character
## Mode :character
##
##
##
Features such as tweet, retweets, favorite and time were analyzed. The max number of retweets, the max number of replies was 592, and the max number of favoriates was 32,149.
knitr::include_graphics("Img1.png")
knitr::include_graphics("Img2.png")
knitr::include_graphics("Img3.png")
knitr::include_graphics("Img4.png")
knitr::include_graphics("Img6.png")
Overall it is clear that Loyola kind of controlled their own presence on social media as a clear frontrunner for retweets, tweets, and replies, with the exception of the likely fake LALATE account. The majority of the tweets occured at the late night and end of the day. After the game would be over.
The image of loyola is very positive, as it should be for a cinderella final four team with an adorable nun mascot. The sentiment analysis really shows a positive trend in the attitude of tweeters.
mydata = read_csv('data/sentiment_march_madness.csv')
## Parsed with column specification:
## cols(
## .default = col_integer(),
## tweet_id = col_double(),
## text = col_character(),
## username = col_character(),
## fullname = col_character(),
## date = col_date(format = ""),
## datetime = col_datetime(format = ""),
## links = col_character()
## )
## See spec(...) for full column specifications.
summary(mydata)
## tweet_id text username
## Min. :3.542e+16 Length:20187 Length:20187
## 1st Qu.:9.774e+17 Class :character Class :character
## Median :9.777e+17 Mode :character Mode :character
## Mean :9.753e+17
## 3rd Qu.:9.777e+17
## Max. :9.824e+17
## fullname date datetime
## Length:20187 Min. :2011-02-09 Min. :2011-02-09 19:42:51
## Class :character 1st Qu.:2018-03-24 1st Qu.:2018-03-24 11:03:02
## Mode :character Median :2018-03-25 Median :2018-03-25 00:22:38
## Mean :2018-03-18 Mean :2018-03-18 10:40:33
## 3rd Qu.:2018-03-25 3rd Qu.:2018-03-25 03:08:07
## Max. :2018-04-06 Max. :2018-04-06 21:24:17
## verified reply retweets favorite
## Min. :0.00000 Min. : 0.0000 Min. : 0.000 Min. : 0.0
## 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0
## Median :0.00000 Median : 0.0000 Median : 0.000 Median : 1.0
## Mean :0.06192 Mean : 0.3467 Mean : 3.146 Mean : 15.8
## 3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 3.0
## Max. :1.00000 Max. :591.0000 Max. :5143.000 Max. :32180.0
## anger anticipation disgust fear
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.1342 Mean :0.4359 Mean :0.07143 Mean :0.1612
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :4.0000 Max. :7.0000 Max. :3.00000 Max. :6.0000
## joy sadness surprise trust
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.421 Mean :0.1122 Mean :0.1798 Mean :0.4806
## 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :8.000 Max. :5.0000 Max. :4.0000 Max. :7.0000
## negative positive sentiment_bing links
## Min. :0.0000 Min. :0.0000 Min. :-5.0000 Length:20187
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.0000 Class :character
## Median :0.0000 Median :0.0000 Median : 0.0000 Mode :character
## Mean :0.2395 Mean :0.6676 Mean : 0.5141
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.: 1.0000
## Max. :6.0000 Max. :9.0000 Max. :11.0000
The maximum of positive is 9, higher than any other. ### 2C)Any recommendations to Loyola’s marketing team Play off of the successful underdog angle, an us against the world play per se. ————-
knitr::include_graphics("Img7.png")
The most used hashtags were for the final four and loyola chicago, which makes sense as that was the team that was in said position.
knitr::include_graphics("Img8.png")
The anticipation spikes around games, specifically the one on or around march 25.
knitr::include_graphics("Img9.png")
The main drivers of trust are positivity, joy, and anticipation, which makes sense as they are all seemingly positive sentiments.