Sentiment plays an important role in human’s life. Emotions are something that human used to expressed in various forms.
Since the begining of the social media, it provided a nice platform to express joy, excitment, anger, love, hate and so on.
According to Google Trends, the word “sentiment analysis” has been gaining steady traction over the past 5 years. Sentiment refers to the attitude expressed by an individual regarding a certain topic. Below google trend shows use of sentiments in twitter’s tweets.
Reference : https://trends.google.com/trends/explore?date=today%205-y&q=sentiment%20analysis%20twitter
‘Sentiment Analysis’ of tweets is an approach to forecast the stock market.
Below are few of the sentiments which helps in potential gain for a stock.
Below are few of the sentiments which helps in potential loss for a stock.
Below are technical aspects for achieving this analysis.
R Libraries : ‘quantmod’,‘dplyr’,‘TTR’,‘rtweet’,‘tm’,‘wordcloud’,‘tidytext’,‘purrr’,‘ggplot2’,‘knitr’,‘tibble’ and ‘tidyr’
For a better approach, all the custom functions wrote in separate functions.R file and for the privacy, twitter authentication details wrote in twitterAuth.R file. These files loaded using source into this RMD file, below is code block for same.
setwd("D:/PersonalStuff/Rushi/Study/MS_BU_MET/CS688/TermProject/")
liveTweets <- TRUE
startDate <- '2019-04-18'
todaysGainers <- c("HELE","RGEN","F")
todaysLosers <- c("INTC","TRQ","FII")
todaysTrending <- c("TSLA","FB","MSFT")
todaysMostActive <- c("SNAP","SIRI","AMD")
customWords = c("th","thi","thow")
Authentication to Twitter Account. (I hide code and output intentionally to keep the privacy.)
We are calling stockSymbolsAPI from TTR library for getting company header information.
Below table shows today’s company profile available for our stocks.
kable(allstockData[round(runif(10, 50, 500)),], caption = "Company Profile for some random stocks with their Industry and exchange name")
| Symbol | Name | LastSale | MarketCap | IPOyear | Sector | Industry | Exchange | |
|---|---|---|---|---|---|---|---|---|
| 94 | ERH | Wells Fargo Utilities and High Income Fund | 12.9800 | $120.14M | 2004 | NA | NA | AMEX |
| 301 | AAON | AAON, Inc. | 50.2100 | $2.61B | NA | Capital Goods | Industrial Machinery/Components | NASDAQ |
| 66 | CUO | Continental Materials Corporation | 18.4931 | $31.07M | NA | Capital Goods | Building Materials | AMEX |
| 350 | ADMS | Adamas Pharmaceuticals, Inc. | 6.3200 | $173.93M | 2014 | Health Care | Major Pharmaceuticals | NASDAQ |
| 338 | ACTG | Acacia Research Corporation | 3.1800 | $157.88M | NA | Miscellaneous | Multi-Sector Companies | NASDAQ |
| 191 | NGD | New Gold Inc. | 0.8799 | $509.56M | NA | Basic Industries | Precious Metals | AMEX |
| 120 | GLU-PB | The Gabelli Global Utility and Income Trust | 52.4600 | NA | NA | NA | NA | AMEX |
| 382 | AGMH | AGM Group Holdings Inc. | 17.9400 | $595.9M | 2018 | Technology | EDP Services | NASDAQ |
| 85 | EIM | Eaton Vance Municipal Bond Fund | 12.4400 | $843.95M | 2002 | NA | NA | AMEX |
| 433 | ALOT | AstroNova, Inc. | 25.0200 | $174.94M | 1983 | Technology | Computer peripheral equipment | NASDAQ |
kable(todaysGainers, caption = "Today's Gainers")
| Trade Time | Last | stock | Name | LastSale | MarketCap | IPOyear | Sector | Industry | Exchange |
|---|---|---|---|---|---|---|---|---|---|
| 2019-04-30 16:00:01 | 144.00 | HELE | Helen of Troy Limited | 144.00 | $3.69B | NA | Consumer Durables | Home Furnishings | NASDAQ |
| 2019-04-30 16:00:01 | 67.38 | RGEN | Repligen Corporation | 67.38 | $2.96B | 1986 | Health Care | Biotechnology: Biological Products (No Diagnostic Substances) | NASDAQ |
| 2019-04-30 16:00:53 | 10.45 | F | Ford Motor Company | 10.45 | $41.69B | NA | Capital Goods | Auto Manufacturing | NYSE |
kable(todaysLosers, caption = "Today's Loser")
| Trade Time | Last | stock | Name | LastSale | MarketCap | IPOyear | Sector | Industry | Exchange |
|---|---|---|---|---|---|---|---|---|---|
| 2019-04-30 16:00:01 | 51.04 | INTC | Intel Corporation | 51.04 | $228.51B | NA | Technology | Semiconductors | NASDAQ |
| 2019-04-30 16:02:03 | 1.50 | TRQ | Turquoise Hill Resources Ltd. | 1.50 | $3.02B | NA | Basic Industries | Precious Metals | NYSE |
| 2019-04-30 16:04:17 | 30.73 | FII | Federated Investors, Inc. | 30.73 | $3.1B | 1998 | Finance | Investment Managers | NYSE |
We searched 100 tweets associated with each of three stocks for gainer and loser set.
if(liveTweets == TRUE){
#Get tweets for Ganers
tweets.gainer1 = getTweetsFromCompany(todaysGainers[which(todaysGainers$stock == todaysGainers$stock[1]),])
tweets.gainer2 = getTweetsFromCompany(todaysGainers[which(todaysGainers$stock == todaysGainers$stock[2]),])
tweets.gainer3 = getTweetsFromCompany(todaysGainers[which(todaysGainers$stock == todaysGainers$stock[3]),])
#Get tweets for Losers
tweets.loser1 = getTweetsFromCompany(todaysLosers[which(todaysLosers$stock == todaysLosers$stock[1]),])
tweets.loser2 = getTweetsFromCompany(todaysLosers[which(todaysLosers$stock == todaysLosers$stock[2]),])
tweets.loser3 = getTweetsFromCompany(todaysLosers[which(todaysLosers$stock == todaysLosers$stock[3]),])
}
if(liveTweets == TRUE){
#Create a set of 300 tweets for top three gainers
paste("There are total of", nrow(gainerTweets) , "tweets found for today's top 3 gainers")
#Create a set of 300 tweets for top three gainers
paste("There are total of", nrow(loserTweets) , "tweets found for today's top 3 losers")
}
## [1] "There are total of 300 tweets found for today's top 3 losers"
if(liveTweets == TRUE){
data.corpus1 <- getCorpusFromTweets(gainerTweets$stripped_text)
data.corpus2 <- getCorpusFromTweets(loserTweets$stripped_text)
head(data.corpus1)
head(data.corpus2)
}
## <<VCorpus>>
## Metadata: corpus specific: 0, document level (indexed): 0
## Content: documents: 6
if(liveTweets == TRUE){
save(data.corpus1, file=paste0("data/","stock_gainer_corpus_", Sys.Date(), ".RData"))
save(data.corpus2, file=paste0("data/","stock_loser_corpus_", Sys.Date(), ".RData"))
}
if(liveTweets == FALSE){
load(paste0("data/","stock_gainer_corpus_", Sys.Date(), ".RData"))
load(paste0("data/","stock_loser_corpus_", Sys.Date(), ".RData"))
}
#process the corpus
gainer.processedCorpus <- getProcessedCorpus(data.corpus1)
loser.processedCorpus <- getProcessedCorpus(data.corpus2)
#Below code creates DTM where control parameter passed as remove numbers and minimum of 2 character length of words
#Optionally we can pass other parameters like bounds, which only includes documents if a word inccluded in specified documents
gainer.DTM <- DocumentTermMatrix(gainer.processedCorpus, control = list(
removeNumbers = TRUE, #Remove numbers
wordLengths=c(2,Inf) # words between 3 and 20 characters long
#bounds=list(global=c(20,Inf)) # only include words in DTM if they happen in 20 or more documents
))
loser.DTM <- DocumentTermMatrix(loser.processedCorpus, control = list(
removeNumbers = TRUE, #Remove numbers
wordLengths=c(2,Inf) # words between 3 and 20 characters long
#bounds=list(global=c(20,Inf)) # only include words in DTM if they happen in 20 or more documents
))
loser.DTM <- as.matrix(loser.DTM) # Document term matrix
gainer.DTM <- as.matrix(gainer.DTM) # Document term matrix
gainer.wordFrequency <- colSums(gainer.DTM)
gainer.wordOrder <- order(gainer.wordFrequency, decreasing = TRUE) # Ordering the frequencies
loser.wordFrequency <- colSums(loser.DTM)
loser.wordOrder <- order(loser.wordFrequency, decreasing = TRUE) # Ordering the frequencies
|
|
Below plot compares sentiment scores between Gainers’ sentiments vs Losers’ sentiments
Sentiment Score Summary
| stock | Count | Mean | SD | max | min |
|---|---|---|---|---|---|
| Gainer | 324 | 0.4104938 | 0.7090837 | 3 | -3 |
| Loser | 300 | 0.0933333 | 1.0172207 | 5 | -4 |
Sentiment Comparison Cloud
## Joining, by = "word"
## Joining, by = "word"