SENTIMENT ANALYSIS FOR THE US WOMEN NATIONAL TEAM TWEETS

DESCRIPTION

This Sentiment Analysis was conducted using Twitter API using a mining script in python to collect tweets for #USWNT (US Women National Team - Soccer) for 01/01/2015 to 07/15/15. The analysis found that the public sentiment was fairly positive for more in 7,975 observations, and slightly negative in 500 observations or less.

This analysis was based on the Sentiment Analysis in R by Paeng Angnakoon at the University of North Texas Information Research and Analysis (IRA) Lab (https://iralab.unt.edu/)

TWITTER SENTIMENT ANALYSIS - DATA PROCESSING

Environment set up

library(plyr)
## Warning: package 'plyr' was built under R version 3.1.3
library(stringr)
## Warning: package 'stringr' was built under R version 3.1.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
setwd("~/Documents/TWITTERPROJ")

MINING TWITTER API

Twitter mining was executed using Tweets_02.py script for #USWNT for tweets from 2015-01-01 and 2015-07-15

PYTHON SCRIPT FOR MINING

import tweepy import csv

Comment:Twitter API credentials set up and authorization

consumer_key=“u…………….9” consumer_secret=“9…………………………R” access_token=“3……………………………………r” access_token_secret=“D…………………………………1”

auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret) auth.secure = True auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

Comment: Open/Create a file to append data

csvFile = open(‘result.csv’, ‘a’)

Comment: Use csv Writer

csvWriter = csv.writer(csvFile)

Comment: Tweet mining

for tweet in tweepy.Cursor(api.search, q=“#USWNT”, count=100, since=“2015-01-01”, until=“2015-07-15”, include_entities=True, lang=“en”).items(): #Write a row to the csv file/ I use encode utf-8 csvWriter.writerow([tweet.created_at, tweet.text.encode(‘utf-8’)]) #print tweet.created_at, tweet.text print (tweet.created_at, tweet.text.encode(“utf-8”))

csvFile.close()

LOADING DATA

Created a copy of result.csv named result_tweets.csv to use as workspace

TweetsFile <- "result_tweets.csv"
twf <- read.table(TweetsFile, header= FALSE, sep=",", as.is=TRUE)
head(twf,5)
##              V1
## 1 7/14/15 23:59
## 2 7/14/15 23:59
## 3 7/14/15 23:58
## 4 7/14/15 23:57
## 5 7/14/15 23:57
##                                                                                                                                         V2
## 1 b"RT @SInow: Here's where you can buy any of the #USWNT SI covers, or even the whole set! http://t.co/gkVaaFatxA http://t.co/GA6rHUEVxH"
## 2 b"RT @SInow: Here's where you can buy any of the #USWNT SI covers, or even the whole set! http://t.co/gkVaaFatxA http://t.co/GA6rHUEVxH"
## 3 b"RT @SInow: Here's where you can buy any of the #USWNT SI covers, or even the whole set! http://t.co/gkVaaFatxA http://t.co/GA6rHUEVxH"
## 4                 bRT @GeorgiaSoccer: Couple of hometown heroes on their own SI covers! @kohara19 @moeebrian #USWNT http://t.co/lymrIMYn99
## 5                                                              bRT @hopesolo: Generations of World Cup gold! #USWNT http://t.co/JhOvDWY4B3

DATA PREPROCESSING

colnames(twf) <- c("DateTime", "Tweet")
write.csv(twf, file="twf.csv")

CREATING SENTIMENT WORD DICTIONARIES

Load sentiment word lists from Hu.liu Sentiment Analysis

Opinion Lexicon: Positive and Negative

These files contain a list of POSITIVE and NEGATIVE opinion words (or sentiment words).

These files and papers can all be downloaded from http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

If you use this list, please cite one of the following two papers:

Minqing Hu and Bing Liu. “Mining and Summarizing Customer Reviews.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA, Bing Liu, Minqing Hu and Junsheng Cheng. “Opinion Observer: Analyzing and Comparing Opinions on the Web.” Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

Notes: 1. The appearance of an opinion word in a sentence does not necessarily
mean that the sentence expresses a positive or negative opinion. See the paper below:

Bing Liu. “Sentiment Analysis and Subjectivity.” An chapter in Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010.

  1. You will notice many misspelled words in the list. They are not mistakes. They are included as these misspelled words appear frequently in social media content.
hu.liu.pos = scan('~/Documents/TWITTERPROJ/positive-words_0.txt', what='character', comment.char=';')
hu.liu.neg = scan('~/Documents/TWITTERPROJ/negative-words_0.txt', what='character', comment.char=';')

Add words to list that are relevant to the US Women Soccer context

pos.words = c(hu.liu.pos, 'sweet', 'goal', 'champions', 'champs')
neg.words = c(hu.liu.neg, 'wtf', 'wait','waiting', 'epicfail', 'mechanical')

SCORING TWEETS

Loading libraries

library (plyr)
library (stringr)

Sentiment Score function

This function performs for each tweet: 1. Tweet cleaning 2. Converting tweet to lower case 3. Tweet words splitting to create a list 4. Tweet list matching to the Positive and Negative opinion dictionaries 5. Sentiment score assignment 6. Calculates total sentiment score

score.sentiment = function(sentences, pos.words, neg.words, .progress='none')  
{  
  require(plyr)  
  require(stringr)       
  
  # we got a vector of sentences. plyr will handle a list  
  # or a vector as an "l" for us  
  # we want a simple array ("a") of scores back, so we use   
  # "l" + "a" + "ply" = "laply":  
  
  scores = laply(sentences, function(sentence, pos.words, neg.words) {  
    
    # clean up sentences with R's regex-driven global substitute, gsub():  
    
    # Eliminate punctuation characters
    sentence = gsub('[[:punct:]]', '', sentence)  
    
    # Eliminate control characters
    sentence = gsub('[[:cntrl:]]', '', sentence)  
    
    # Eliminate digits
    sentence = gsub('\\d+', '', sentence)  
    
    # and convert to lower case:  
    sentence = tolower(sentence)  
    
    # split into words. str_split is in the stringr package  
    word.list = str_split(sentence, '\\s+')  
    
    # sometimes a list() is one level of hierarchy too much  
    words = unlist(word.list)  
    
    # compare our words to the dictionaries of positive & negative terms  
    
    pos.matches = match(words, pos.words)  
    neg.matches = match(words, neg.words)  
    
    # match() returns the position of the matched term or NA  
    # we just want a TRUE/FALSE:  
    
    pos.matches = !is.na(pos.matches)  
    
    neg.matches = !is.na(neg.matches)  
    
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():  
    
    score = sum(pos.matches) - sum(neg.matches)  
    
    return(score)  
    
  }, pos.words, neg.words, .progress=.progress )  
  
  scores.df = data.frame(score=scores, text=sentences)  
  return(scores.df)  
} 

SCORING THE TWEETS

Read Tweet File into a data and coerce it to factor

twfdata <- read.csv("~/Documents/TWITTERPROJ/twf.csv")
twfdata$Tweet<-as.factor(twfdata$Tweet)

Score Tweets using Score.sentiment function

USWNT.scores = score.sentiment(twfdata$Tweet, pos.words, neg.words, .progress='text')
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=======                                                          |  12%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |=========                                                        |  15%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |====================                                             |  32%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |=================================                                |  52%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |==============================================                   |  72%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |===========================================================      |  92%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |=============================================================    |  95%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%

VISUALIZING SCORES FOR #USWNT

plot = ggplot(USWNT.scores, aes(x=score))
plot = plot + geom_histogram(alpha=.20, binwidth=.75, colour="black")
plot = plot + ylab("# of Tweets (01/01/2015-07/15/15)")
plot = plot + xlab("Sentiment Score")
plot = plot + ggtitle("USWNT Twitter Sentiment Analysis")
plot

CREATING A SCORE FILE

write.csv(USWNT.scores, file="USWNT_scores.csv")

RESULTS

The Sentiment Analysis conducted using Twitter API using a mining script in python to collect tweets for #USWNT (US Women National Team - Soccer) for 01/01/2015 to 07/15/15 found that the public sentiment was fairly positive for more in 7,975 observations, and slightly negative in 500 observations or less. Showing a strong positive sentiment towards the USWNT in preparation and during the FIFA Women World Cup.