This Sentiment Analysis was conducted using Twitter API using a mining script in python to collect tweets for #USWNT (US Women National Team - Soccer) for 01/01/2015 to 07/15/15. The analysis found that the public sentiment was fairly positive for more in 7,975 observations, and slightly negative in 500 observations or less.
This analysis was based on the Sentiment Analysis in R by Paeng Angnakoon at the University of North Texas Information Research and Analysis (IRA) Lab (https://iralab.unt.edu/)
library(plyr)
## Warning: package 'plyr' was built under R version 3.1.3
library(stringr)
## Warning: package 'stringr' was built under R version 3.1.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
setwd("~/Documents/TWITTERPROJ")
Twitter mining was executed using Tweets_02.py script for #USWNT for tweets from 2015-01-01 and 2015-07-15
import tweepy import csv
Comment:Twitter API credentials set up and authorization
consumer_key=“u…………….9” consumer_secret=“9…………………………R” access_token=“3……………………………………r” access_token_secret=“D…………………………………1”
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret) auth.secure = True auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
Comment: Open/Create a file to append data
csvFile = open(‘result.csv’, ‘a’)
Comment: Use csv Writer
csvWriter = csv.writer(csvFile)
Comment: Tweet mining
for tweet in tweepy.Cursor(api.search, q=“#USWNT”, count=100, since=“2015-01-01”, until=“2015-07-15”, include_entities=True, lang=“en”).items(): #Write a row to the csv file/ I use encode utf-8 csvWriter.writerow([tweet.created_at, tweet.text.encode(‘utf-8’)]) #print tweet.created_at, tweet.text print (tweet.created_at, tweet.text.encode(“utf-8”))
csvFile.close()
Created a copy of result.csv named result_tweets.csv to use as workspace
TweetsFile <- "result_tweets.csv"
twf <- read.table(TweetsFile, header= FALSE, sep=",", as.is=TRUE)
head(twf,5)
## V1
## 1 7/14/15 23:59
## 2 7/14/15 23:59
## 3 7/14/15 23:58
## 4 7/14/15 23:57
## 5 7/14/15 23:57
## V2
## 1 b"RT @SInow: Here's where you can buy any of the #USWNT SI covers, or even the whole set! http://t.co/gkVaaFatxA http://t.co/GA6rHUEVxH"
## 2 b"RT @SInow: Here's where you can buy any of the #USWNT SI covers, or even the whole set! http://t.co/gkVaaFatxA http://t.co/GA6rHUEVxH"
## 3 b"RT @SInow: Here's where you can buy any of the #USWNT SI covers, or even the whole set! http://t.co/gkVaaFatxA http://t.co/GA6rHUEVxH"
## 4 bRT @GeorgiaSoccer: Couple of hometown heroes on their own SI covers! @kohara19 @moeebrian #USWNT http://t.co/lymrIMYn99
## 5 bRT @hopesolo: Generations of World Cup gold! #USWNT http://t.co/JhOvDWY4B3
colnames(twf) <- c("DateTime", "Tweet")
write.csv(twf, file="twf.csv")
Opinion Lexicon: Positive and Negative
These files contain a list of POSITIVE and NEGATIVE opinion words (or sentiment words).
These files and papers can all be downloaded from http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
If you use this list, please cite one of the following two papers:
Minqing Hu and Bing Liu. “Mining and Summarizing Customer Reviews.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA, Bing Liu, Minqing Hu and Junsheng Cheng. “Opinion Observer: Analyzing and Comparing Opinions on the Web.” Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.
Notes: 1. The appearance of an opinion word in a sentence does not necessarily
mean that the sentence expresses a positive or negative opinion. See the paper below:
Bing Liu. “Sentiment Analysis and Subjectivity.” An chapter in Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010.
hu.liu.pos = scan('~/Documents/TWITTERPROJ/positive-words_0.txt', what='character', comment.char=';')
hu.liu.neg = scan('~/Documents/TWITTERPROJ/negative-words_0.txt', what='character', comment.char=';')
pos.words = c(hu.liu.pos, 'sweet', 'goal', 'champions', 'champs')
neg.words = c(hu.liu.neg, 'wtf', 'wait','waiting', 'epicfail', 'mechanical')
library (plyr)
library (stringr)
This function performs for each tweet: 1. Tweet cleaning 2. Converting tweet to lower case 3. Tweet words splitting to create a list 4. Tweet list matching to the Positive and Negative opinion dictionaries 5. Sentiment score assignment 6. Calculates total sentiment score
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
require(plyr)
require(stringr)
# we got a vector of sentences. plyr will handle a list
# or a vector as an "l" for us
# we want a simple array ("a") of scores back, so we use
# "l" + "a" + "ply" = "laply":
scores = laply(sentences, function(sentence, pos.words, neg.words) {
# clean up sentences with R's regex-driven global substitute, gsub():
# Eliminate punctuation characters
sentence = gsub('[[:punct:]]', '', sentence)
# Eliminate control characters
sentence = gsub('[[:cntrl:]]', '', sentence)
# Eliminate digits
sentence = gsub('\\d+', '', sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
# compare our words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# match() returns the position of the matched term or NA
# we just want a TRUE/FALSE:
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
score = sum(pos.matches) - sum(neg.matches)
return(score)
}, pos.words, neg.words, .progress=.progress )
scores.df = data.frame(score=scores, text=sentences)
return(scores.df)
}
Read Tweet File into a data and coerce it to factor
twfdata <- read.csv("~/Documents/TWITTERPROJ/twf.csv")
twfdata$Tweet<-as.factor(twfdata$Tweet)
Score Tweets using Score.sentiment function
USWNT.scores = score.sentiment(twfdata$Tweet, pos.words, neg.words, .progress='text')
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======= | 12%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========= | 15%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|==================== | 32%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================= | 52%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|=================================== | 55%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|============================================== | 72%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|=========================================================== | 92%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
plot = ggplot(USWNT.scores, aes(x=score))
plot = plot + geom_histogram(alpha=.20, binwidth=.75, colour="black")
plot = plot + ylab("# of Tweets (01/01/2015-07/15/15)")
plot = plot + xlab("Sentiment Score")
plot = plot + ggtitle("USWNT Twitter Sentiment Analysis")
plot
write.csv(USWNT.scores, file="USWNT_scores.csv")
The Sentiment Analysis conducted using Twitter API using a mining script in python to collect tweets for #USWNT (US Women National Team - Soccer) for 01/01/2015 to 07/15/15 found that the public sentiment was fairly positive for more in 7,975 observations, and slightly negative in 500 observations or less. Showing a strong positive sentiment towards the USWNT in preparation and during the FIFA Women World Cup.