About this Notebook



Analytics Toolkit: Require Packages


# Here we are checking if the package is installed
if(!require("tidyverse")){
  install.packages("tidyverse", dependencies = TRUE)
  library("tidyverse")
}

if(!require("syuzhet")){
  install.packages("syuzhet", dependencies = TRUE)
  library("syuzhet")
}

if(!require("cleanNLP")){
  install.packages("cleanNLP", dependencies = TRUE)
  library("cleanNLP")
}

if(!require("magrittr")){
  install.packages("magrittr", dependencies = TRUE)
  library("magrittr")
}

if(!require("wordcloud")){
  install.packages("wordcloud", dependencies = TRUE)
  library("wordcloud")
}
mydata = read_csv('data/sentiment_march_madness.csv')
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   tweet_id = col_double(),
##   text = col_character(),
##   username = col_character(),
##   fullname = col_character(),
##   date = col_date(format = ""),
##   datetime = col_datetime(format = ""),
##   links = col_character()
## )
## See spec(...) for full column specifications.
summary(mydata)
##     tweet_id             text             username        
##  Min.   :3.542e+16   Length:20187       Length:20187      
##  1st Qu.:9.774e+17   Class :character   Class :character  
##  Median :9.777e+17   Mode  :character   Mode  :character  
##  Mean   :9.753e+17                                        
##  3rd Qu.:9.777e+17                                        
##  Max.   :9.824e+17                                        
##    fullname              date               datetime                  
##  Length:20187       Min.   :2011-02-09   Min.   :2011-02-09 19:42:51  
##  Class :character   1st Qu.:2018-03-24   1st Qu.:2018-03-24 11:03:02  
##  Mode  :character   Median :2018-03-25   Median :2018-03-25 00:22:38  
##                     Mean   :2018-03-18   Mean   :2018-03-18 10:40:33  
##                     3rd Qu.:2018-03-25   3rd Qu.:2018-03-25 03:08:07  
##                     Max.   :2018-04-06   Max.   :2018-04-06 21:24:17  
##     verified           reply             retweets           favorite      
##  Min.   :0.00000   Min.   :  0.0000   Min.   :   0.000   Min.   :    0.0  
##  1st Qu.:0.00000   1st Qu.:  0.0000   1st Qu.:   0.000   1st Qu.:    0.0  
##  Median :0.00000   Median :  0.0000   Median :   0.000   Median :    1.0  
##  Mean   :0.06192   Mean   :  0.3467   Mean   :   3.146   Mean   :   15.8  
##  3rd Qu.:0.00000   3rd Qu.:  0.0000   3rd Qu.:   0.000   3rd Qu.:    3.0  
##  Max.   :1.00000   Max.   :591.0000   Max.   :5143.000   Max.   :32180.0  
##      anger         anticipation       disgust             fear       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.0000  
##  Mean   :0.1342   Mean   :0.4359   Mean   :0.07143   Mean   :0.1612  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :4.0000   Max.   :7.0000   Max.   :3.00000   Max.   :6.0000  
##       joy           sadness          surprise          trust       
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.421   Mean   :0.1122   Mean   :0.1798   Mean   :0.4806  
##  3rd Qu.:1.000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :8.000   Max.   :5.0000   Max.   :4.0000   Max.   :7.0000  
##     negative         positive      sentiment_bing       links          
##  Min.   :0.0000   Min.   :0.0000   Min.   :-5.0000   Length:20187      
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 0.0000   Class :character  
##  Median :0.0000   Median :0.0000   Median : 0.0000   Mode  :character  
##  Mean   :0.2395   Mean   :0.6676   Mean   : 0.5141                     
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.: 1.0000                     
##  Max.   :6.0000   Max.   :9.0000   Max.   :11.0000
tweets = read_csv('data/march_madness.csv')
## Parsed with column specification:
## cols(
##   tweet_id = col_double(),
##   text = col_character(),
##   username = col_character(),
##   fullname = col_character(),
##   date = col_date(format = ""),
##   datetime = col_datetime(format = ""),
##   verified = col_integer(),
##   reply = col_integer(),
##   retweets = col_integer(),
##   favorite = col_integer(),
##   links = col_character()
## )

Task 1: Data Exploration - Tableau


1A) Generally describe the data (summary)

summary(tweets)
##     tweet_id             text             username        
##  Min.   :3.542e+16   Length:20187       Length:20187      
##  1st Qu.:9.774e+17   Class :character   Class :character  
##  Median :9.777e+17   Mode  :character   Mode  :character  
##  Mean   :9.753e+17                                        
##  3rd Qu.:9.777e+17                                        
##  Max.   :9.824e+17                                        
##    fullname              date               datetime                  
##  Length:20187       Min.   :2011-02-09   Min.   :2011-02-09 19:42:51  
##  Class :character   1st Qu.:2018-03-24   1st Qu.:2018-03-24 11:03:02  
##  Mode  :character   Median :2018-03-25   Median :2018-03-25 00:22:38  
##                     Mean   :2018-03-18   Mean   :2018-03-18 10:40:33  
##                     3rd Qu.:2018-03-25   3rd Qu.:2018-03-25 03:08:07  
##                     Max.   :2018-04-06   Max.   :2018-04-06 21:24:17  
##     verified           reply             retweets           favorite      
##  Min.   :0.00000   Min.   :  0.0000   Min.   :   0.000   Min.   :    0.0  
##  1st Qu.:0.00000   1st Qu.:  0.0000   1st Qu.:   0.000   1st Qu.:    0.0  
##  Median :0.00000   Median :  0.0000   Median :   0.000   Median :    1.0  
##  Mean   :0.06192   Mean   :  0.3467   Mean   :   3.146   Mean   :   15.8  
##  3rd Qu.:0.00000   3rd Qu.:  0.0000   3rd Qu.:   0.000   3rd Qu.:    3.0  
##  Max.   :1.00000   Max.   :591.0000   Max.   :5143.000   Max.   :32180.0  
##     links          
##  Length:20187      
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Features such as tweet, retweets, favorite and time were analyzed. The max number of retweets, the max number of replies was 592, and the max number of favoriates was 32,149.

1B) Use tableau to create at least 5 plots

knitr::include_graphics("Img1.png")

knitr::include_graphics("Img2.png")

knitr::include_graphics("Img3.png")

knitr::include_graphics("Img4.png")

knitr::include_graphics("Img6.png")

1C) Explain each plot make a relation to date of the tweets/time

Overall it is clear that Loyola kind of controlled their own presence on social media as a clear frontrunner for retweets, tweets, and replies, with the exception of the likely fake LALATE account. The majority of the tweets occured at the late night and end of the day. After the game would be over.


Task 3: Data Analysis


2A)Based on your plots and data description make give a general narrative for the image of loyola in twitter

The image of loyola is very positive, as it should be for a cinderella final four team with an adorable nun mascot. The sentiment analysis really shows a positive trend in the attitude of tweeters.

2B) Use descriptive statistics to backup your arguments

mydata = read_csv('data/sentiment_march_madness.csv')
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   tweet_id = col_double(),
##   text = col_character(),
##   username = col_character(),
##   fullname = col_character(),
##   date = col_date(format = ""),
##   datetime = col_datetime(format = ""),
##   links = col_character()
## )
## See spec(...) for full column specifications.
summary(mydata)
##     tweet_id             text             username        
##  Min.   :3.542e+16   Length:20187       Length:20187      
##  1st Qu.:9.774e+17   Class :character   Class :character  
##  Median :9.777e+17   Mode  :character   Mode  :character  
##  Mean   :9.753e+17                                        
##  3rd Qu.:9.777e+17                                        
##  Max.   :9.824e+17                                        
##    fullname              date               datetime                  
##  Length:20187       Min.   :2011-02-09   Min.   :2011-02-09 19:42:51  
##  Class :character   1st Qu.:2018-03-24   1st Qu.:2018-03-24 11:03:02  
##  Mode  :character   Median :2018-03-25   Median :2018-03-25 00:22:38  
##                     Mean   :2018-03-18   Mean   :2018-03-18 10:40:33  
##                     3rd Qu.:2018-03-25   3rd Qu.:2018-03-25 03:08:07  
##                     Max.   :2018-04-06   Max.   :2018-04-06 21:24:17  
##     verified           reply             retweets           favorite      
##  Min.   :0.00000   Min.   :  0.0000   Min.   :   0.000   Min.   :    0.0  
##  1st Qu.:0.00000   1st Qu.:  0.0000   1st Qu.:   0.000   1st Qu.:    0.0  
##  Median :0.00000   Median :  0.0000   Median :   0.000   Median :    1.0  
##  Mean   :0.06192   Mean   :  0.3467   Mean   :   3.146   Mean   :   15.8  
##  3rd Qu.:0.00000   3rd Qu.:  0.0000   3rd Qu.:   0.000   3rd Qu.:    3.0  
##  Max.   :1.00000   Max.   :591.0000   Max.   :5143.000   Max.   :32180.0  
##      anger         anticipation       disgust             fear       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.0000  
##  Mean   :0.1342   Mean   :0.4359   Mean   :0.07143   Mean   :0.1612  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :4.0000   Max.   :7.0000   Max.   :3.00000   Max.   :6.0000  
##       joy           sadness          surprise          trust       
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.421   Mean   :0.1122   Mean   :0.1798   Mean   :0.4806  
##  3rd Qu.:1.000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :8.000   Max.   :5.0000   Max.   :4.0000   Max.   :7.0000  
##     negative         positive      sentiment_bing       links          
##  Min.   :0.0000   Min.   :0.0000   Min.   :-5.0000   Length:20187      
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 0.0000   Class :character  
##  Median :0.0000   Median :0.0000   Median : 0.0000   Mode  :character  
##  Mean   :0.2395   Mean   :0.6676   Mean   : 0.5141                     
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.: 1.0000                     
##  Max.   :6.0000   Max.   :9.0000   Max.   :11.0000

The maximum of positive is 9, higher than any other. ### 2C)Any recommendations to Loyola’s marketing team Play off of the successful underdog angle, an us against the world play per se. ————-

Task 3: Watson Analysis


3A)Use watson analytics to explore the data

3B)Give at least 3 plots or discoveries using watson. Explain your findings.

knitr::include_graphics("Img7.png")

The most used hashtags were for the final four and loyola chicago, which makes sense as that was the team that was in said position.

knitr::include_graphics("Img8.png")

The anticipation spikes around games, specifically the one on or around march 25.

knitr::include_graphics("Img9.png")

The main drivers of trust are positivity, joy, and anticipation, which makes sense as they are all seemingly positive sentiments.