About this Notebook



Analytics Toolkit: Require Packages


# Here we are checking if the package is installed
if(!require("tidyverse")){
  install.packages("tidyverse", dependencies = TRUE)
  library("tidyverse")
}
## Warning: package 'ggplot2' was built under R version 3.4.4
if(!require("syuzhet")){
  install.packages("syuzhet", dependencies = TRUE)
  library("syuzhet")
}
## Warning: package 'syuzhet' was built under R version 3.4.4
if(!require("cleanNLP")){
  install.packages("cleanNLP", dependencies = TRUE)
  library("cleanNLP")
}
## Warning: package 'cleanNLP' was built under R version 3.4.4
if(!require("magrittr")){
  install.packages("magrittr", dependencies = TRUE)
  library("magrittr")
}

if(!require("wordcloud")){
  install.packages("wordcloud", dependencies = TRUE)
  library("wordcloud")
}

Sentiment Analysis: Natural Language Processing


First lets read the dataset and inspect the first 10 rows

tweets <- read.csv("data/sentiment_march_madness.csv")
mydata <- read.csv("data/march_madness.csv")
tweets$tweet_id <- as.character(tweets$tweet_id)
head(tweets[12:21])
##   anticipation disgust fear joy sadness surprise trust negative positive
## 1            2       0    0   2       0        2     2        0        2
## 2            1       3    3   1       2        2     3        4        1
## 3            0       0    0   0       0        0     0        0        1
## 4            1       0    0   1       0        0     0        0        3
## 5            2       0    0   1       0        1     1        0        1
## 6            1       1    3   0       0        0     1        1        2
##   sentiment_bing
## 1              2
## 2             -1
## 3              1
## 4              0
## 5              1
## 6              1

Here we can see the change of sentiment in the tweets

qplot(x = 1:length(tweets$sentiment_bing), 
      y = tweets$sentiment_bing, 
      geom = "line", 
      xlab = "Narrative Time", 
      ylab = "Emotional Valence", 
      main = "Tweets Sentiment Trajectory")

Look at the tweets with negative sentiment

angry_tweets <- which(tweets$anger > 0)
data_frame(tweet = tweets$text[angry_tweets][1:2])
## # A tibble: 2 x 1
##   tweet                                                                   
##   <fct>                                                                   
## 1 Look  I get that that you re all excited that you beat an 11 seed  but ~
## 2 "Ben Richardson was extremely emotional leaving the court  screaming in~

Look a tweets with positive sentiment

joy_tweets <- which(tweets$joy > 0)
data_frame(tweet = tweets$text[joy_tweets][5:7])
## # A tibble: 3 x 1
##   tweet                                                                   
##   <fct>                                                                   
## 1 Thank you Loyola of Chicago and Sr Jean  What great a basketball run pl~
## 2 After being honored at tomorrow s  chicagobulls game the  FinalFour  Ra~
## 3 With everything being said  I respect Loyola so much for what they acco~

Lets explore the emotions in the tweets more in-depth. Here we are going to extract the variables regarding emotions and create a subset.

value <- as.double(colSums(prop.table(tweets[, 11:18])))
emotion <- names(tweets)[11:18]
emotion <- factor(emotion, levels = names(tweets)[11:18][order(value, decreasing = FALSE)])
emotions <- data_frame(emotion, percent = value * 100)

head(emotions)
## # A tibble: 6 x 2
##   emotion      percent
##   <fct>          <dbl>
## 1 anger           6.72
## 2 anticipation   21.8 
## 3 disgust         3.58
## 4 fear            8.07
## 5 joy            21.1 
## 6 sadness         5.62

Now we can create a plot of the emotions in the march madness tweets

ggplot(data = emotions, aes(x = emotion, y = percent)) + 
  geom_bar(stat = "identity", aes(fill = emotion)) + 
  scale_fill_brewer(palette="RdYlGn") + 
  coord_flip() +
  xlab("Emotion") +
  ylab("Percentage")



Task 1: Data Exploration - Tableau


1A) Generally describe the data (summary)

summary(tweets)
##    tweet_id                   text                   username    
##  Length:20187                   : 1273   @LALATE         :   81  
##  Class :character               : 1245   @RamblersMBB    :   30  
##  Mode  :character               :  197   @SkywayChicago  :   27  
##                                 :   51   @chicagomargaret:   21  
##                                 :   35   @sschrimp       :   18  
##                       SisterJean:   15   @loyolaforus    :   16  
##                     (Other)     :17371   (Other)         :19994  
##               fullname             date                       datetime    
##  LALATE           :   81   2018-03-25:10708   2018-03-25T00:21:10Z:   16  
##  Loyola Basketball:   31   2018-03-23: 2976   2018-03-25T00:21:31Z:   16  
##  Steve Timble     :   27   2018-03-24: 2274   2018-03-25T00:21:09Z:   15  
##  Margaret Holt    :   21   2018-03-26: 1504   2018-03-25T00:21:35Z:   15  
##  Mark             :   21   2018-03-18: 1099   2018-03-25T00:21:08Z:   14  
##  Steve            :   19   2018-03-27:  241   2018-03-25T00:21:11Z:   14  
##  (Other)          :19987   (Other)   : 1385   (Other)             :20097  
##     verified           reply             retweets           favorite      
##  Min.   :0.00000   Min.   :  0.0000   Min.   :   0.000   Min.   :    0.0  
##  1st Qu.:0.00000   1st Qu.:  0.0000   1st Qu.:   0.000   1st Qu.:    0.0  
##  Median :0.00000   Median :  0.0000   Median :   0.000   Median :    1.0  
##  Mean   :0.06192   Mean   :  0.3467   Mean   :   3.146   Mean   :   15.8  
##  3rd Qu.:0.00000   3rd Qu.:  0.0000   3rd Qu.:   0.000   3rd Qu.:    3.0  
##  Max.   :1.00000   Max.   :591.0000   Max.   :5143.000   Max.   :32180.0  
##                                                                           
##      anger         anticipation       disgust             fear       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.0000  
##  Mean   :0.1342   Mean   :0.4359   Mean   :0.07143   Mean   :0.1612  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :4.0000   Max.   :7.0000   Max.   :3.00000   Max.   :6.0000  
##                                                                      
##       joy           sadness          surprise          trust       
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.421   Mean   :0.1122   Mean   :0.1798   Mean   :0.4806  
##  3rd Qu.:1.000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :8.000   Max.   :5.0000   Max.   :4.0000   Max.   :7.0000  
##                                                                    
##     negative         positive      sentiment_bing   
##  Min.   :0.0000   Min.   :0.0000   Min.   :-5.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 0.0000  
##  Median :0.0000   Median :0.0000   Median : 0.0000  
##  Mean   :0.2395   Mean   :0.6676   Mean   : 0.5141  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.: 1.0000  
##  Max.   :6.0000   Max.   :9.0000   Max.   :11.0000  
##                                                     
##                             links      
##  @RamblersMBB                  : 1139  
##  #LoyolaChicago                : 1027  
##  #SisterJean                   :  778  
##  https://twitter.com#SisterJean:  231  
##  #LoyolaChicago; #MarchMadness :  208  
##  (Other)                       :16117  
##  NA's                          :  687

The dataset looks at the number of replies, retweets, favorites, and general tweets over the period of 2/9/11 until 4/6/2018. Most of the accounts included are unverified accounts.There are significantly more Favorites than there are Retweets and Replies. Most of the data in this dataset is character data and it consists of some numeric, categorical and quantitative data to go along with it. The sentiment is largely positive and overwhelmingly expresses trust and joy. Of course, Sister Jean was talked about often.

1B) Use tableau to create at least 5 plots

1C) Explain each plot make a relation to date of the tweets/time

knitr::include_graphics("imgs/Final4Tweets.png")

Top 10 Accounts by # of Tweets that contain #FinalFour. The color indicates if it is an account that belongs to Michigan, Loyola, or Other. U of M had more accounts tweeting about the final four than Loyola did.

knitr::include_graphics("imgs/LoyolaMentions.png")

This graph shows that Loyola was mentioned in more tweets after losing the game on March 31 than on the day of their Final Four game. Additionally, Loyola was tweeted about the most on March 25 by far.

knitr::include_graphics("imgs/LoyolaTweeters.png")

The plot shows the accounts who Tweeted about Loyola the most and how many tweets they made including “Loyola” in the Tweet.

knitr::include_graphics("imgs/Top10Tweeters.png")

This image shows the Top 10 Tweeters in the time period that tweets were pulled for.

knitr::include_graphics("imgs/TweetRetweet.png")

This plot shows a comparison between the number of Tweets and Retweets over the time period. There was a spike in tweets on March 26 while there was the largest spike in Retweets on the day of the final game.


Task 2: Data Analysis


2A)Based on your plots and data description make give a general narrative for the image of loyola in twitter

Loyola has a large, positive presence on Twitter. They were talked about both before and after the game. Most negative tweets were in defense of Sr. Jean or the players, not about other teams which is a great testament to Loyola’s Sportsmanship. They had largely positive emotions associated with most of their tweets.

2B) Use descriptive statistics to backup your arguments

ggplot(data = emotions, aes(x = emotion, y = percent)) + 
  geom_bar(stat = "identity", aes(fill = emotion)) + 
  scale_fill_brewer(palette="RdYlGn") + 
  coord_flip() +
  xlab("Emotion") +
  ylab("Percentage")

About 23% of the of the emotions expressed were trusting and a little over 20% were anticipation and joy. There was less than 5% that were showing disgust and less than 7.5% showed anger. All sentiment analysis for Loyola is largely positive.

2C)Any recommendations to Loyola’s marketing team

Have more accounts tweeting about Loyola and Sr. Jean. Retweet more often. While Loyola had a great opportunity, they were outtweeted by U of M because U of M had more accounts dedicated to their team. The sentiment was very positive, so the tweets from March Madness could be incorporated into promotions for upcoming athletic events and promotions to new students. Capitalized on the good PR and use analytics to find all the good comments from people unassociated with Loyola. They have less bias.


Task 3: Watson Analysis


3A)Use watson analytics to explore the data

3B)Give at least 3 plots or discoveries using watson. Explain your findings.

knitr::include_graphics("imgs/Watson.Activity.PNG")

This image compares the total favoritews, retweets, and replies seen in March 2018. There were significantly more favorites than retweets and replies. This trend shows that people are more likely to engage with tweets in less committal ways (favoriting does not show up on your feed, while replying involves typing a comment back).

  knitr::include_graphics("imgs/Watson.Favorite.PNG")

This image compares the March Madness favorites by official, verified accounts and unofficial, unverified accounts over the year thus far. The verified accounts saw much more favoriting activity with the bulk of the favorites occurring in April. Unverified accounts saw much more favoriting in March.

  knitr::include_graphics("imgs/Watson3.31v4.1.PNG")

This chart compares the numbers of Favorites, Retweets, and Replies for the day of the Final Four games and the day of the Final Game. The final game saw much more twitter activity than the Final Four game.