Demonetization Sentiment Analysis: Event Study using Twitter Data

Wordcloud: Based on 1.7 lakhs tweets

Demonetization Sentiment: Post Press Release of RBI’s Annual Account 30th August, 2017

The Government of India decided to cancel the Legal Tender Status of Rs.1000 and Rs.500 denomination currency notes on 8th November 2016. On 30th August, 2017, http://pib.nic.in/newsite/PrintRelease.aspx?relid=170378 Reserve Bank of India (RBI) has reported in their Annual Accounts that Specified Bank Notes (SBNs) of estimated value of Rs. 15.28 lakh crore have been deposited back. Demonetization was envisaged with the broad objectives of: (i) flushing out black money, (ii) eliminate Fake Indian Currency Notes (FICN), (iii) to convert non-formal economy into a formal economy to expand tax base and employment and (iv) to give a big boost to digitalization of payments to make India a less cash economy. The cost of this experiment was loss of jobs (undocumented) largely in the informal labor sector, in addition to more than 100 deaths. The following exercise however, presents an analytic account of the behavioural response on demonetization from 3 lakh tweets. The first segment focuses on the context, methodology and results of the sentiment analysis. The latter section includes comprehensive R codes used for the analysis.

Twitter Feed: The Data

To avoid any localised bias, 3 lakhs tweets have been extracted on four separate dates. ‘twitteR’ package allows the user to setup a connection from R terminal to extract the tweets, which contains tweets by twitter handles (which were anonymised) The tweet text is first steralised, cleaning for non-englih, emoji’s and repeated (retweeted) tweets. The text analysis is then conducted on 1.7 lakhs tweets. In one go, one can extract tweets from a limited window of 8-10 days. I have therefore, collected tweets on four different days and have stitched together to arrive at a clean before and after sample. The 1.7 lakh tweets represent more than 55 thousand individual users.

Sentiment Analysis: The Methodology

All the tweets are first text steralised, by removing emoticons, stopwords, unnecessary special characters (except for !) and spaces. The words in the tweets are then scored on the basis of negative and postive words in the tweet.

source("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/SentiScr_basic.R")
# Read Lexicon of all positive and negative words
# ---------------
pos.words <- read.csv("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/Positive Words.csv")
neg.words <- read.csv("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/Negative Words.csv")
# Scan the words into R
# ---------------
pos.words <- scan("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/Positive Words.csv",
                  what = 'character')
neg.words <- scan("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/Negative Words.csv",
                  what = 'character')
# Add +/- words Manually to the list
# ---------------
pos.words = c(pos.words, 'new','nice' ,'good', 'horizon')
neg.words = c(neg.words, 'wtf', 'behind','feels', 'ugly', 'back','worse' , 'shitty', 'bad', 'no','freaking','sucks','horrible')

# Practical Working of Sentiment Function: Sampled from the tweets 
# ---------------
# Instance of a Postive Tweet
tx1 <- "Frightening statistics Modi made disaster Demonetisation taking its toll on Indias economy"
scr.snt(tx1,pos.words,neg.words)

##   score
## 1    -3
##                                                                                         text
## 1 Frightening statistics Modi made disaster Demonetisation taking its toll on Indias economy

# Instance of a Negative Tweet
tx2 <- "Due to Demonetisation 56 lakh new tax payers have been added indicates better compliance and better tax revenues"
scr.snt(tx2,pos.words,neg.words)

##   score
## 1     3
##                                                                                                               text
## 1 Due to Demonetisation 56 lakh new tax payers have been added indicates better compliance and better tax revenues

# Limitation: Sarcasm 
# tx3 <- "I hav sufferd 4 Lac loss in shares due to Demonetisation But Still IAmWithModi amp WILL deposit100 Rs notes in bnk as it"
# scr.snt(tx2,pos.words,neg.words)

Sentiment Wave: Feedback over Time

Distirbution of Postive, Negative and Neutral feedback of users. Notice the overall spike and movement of distribution along specific days and time intervals. To explore this further, see the next chart.

Event Study for a specified period: 30th August to 2nd September, 2017

## Warning: Removed 339 rows containing missing values (geom_path).

## Warning: Removed 339 rows containing missing values (geom_point).

Spectrum of Sentiment Response on Demonetization

The tweets have been scored over a spectrum of positive, negative and neutral sentiments. Further they have been sub-classified by the fervor/intensity of a particular emotion. The following chart displays a distrbution for the same.

## Warning: Ignoring unknown parameters: stats

Setting-up Twitter connection: ROuth Authentication

R can extract tweets by setting up a connection Twitter his function wraps the OAuth authentication handshake functions from the httr package for a twitteR session. Details for the process are: https://www.rdocumentation.org/packages/twitteR/versions/1.1.9/topics/setup_twitter_oauth The code here currently uses Authentication Credentials from a pre- established connection (to save processing time)

pt <- "/Users/parthkhare/Desktop/TwitterSurvey"
load(paste0(pt,"/twit_cred.RData"))
setup_twitter_oauth(consumerKey, consumerSecret, access_token, access_token_secret)

# Extract Tweets from Twitter
dem <- searchTwitter("#demonetisation", n = 85000)
# Convert the Tweet List to Data Frame
dm <- do.call("rbind", lapply(dem, as.data.frame))

Text Cleaning and Corpus Generation: R Codes

load(file = "/Users/parthkhare/Desktop/TwitterSurvey/Demonetisation/Event/Dem85_2sep.RData")
# Convert tweets to data frame
# ---------------
dm <- dm1
dm$tx <- dm$text

# Text Transformations
# ---------------
# Remove Re-Tweets
dm$tx <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", dm$tx)
# Replace blank space (???rt???)
dm$tx <- gsub("rt", "", dm$tx)
# Replace @UserName
dm$tx <- gsub("@\\w+", "", dm$tx)
# Remove punctuation
dm$tx <- gsub("[[:punct:]]", "", dm$tx)
# Remove links
dm$tx <- gsub("http\\w+", "", dm$tx)
# Remove Special Characters
dm$tx <- gsub("[^a-zA-Z0-9 ]","",dm$tx)
# Remove blank spaces at the beginning
dm$tx <- gsub("^ ", "", dm$tx)
# Remove blank spaces at the end
dm$tx <- gsub(" $", "", dm$tx)
# Remove tabs
dm$tx <- gsub("[ |\t]{2,}", "", dm$tx)

# Corpus Generation
# ---------------
crp <- Corpus(VectorSource(dm$tx))

# Strelaization
# ---------------
# strip white space
crp <- tm_map(crp, stripWhitespace)
# convert all characters to lower case
crp <- tm_map(crp, tolower, lazy = T)

# Include 'stop words' to the standard ones
# ---------------
twev.stp <- c(stopwords('english'), "amp", "can","say","came","http","per")
# remove stop words
crp <- tm_map(crp, removeWords, twev.stp)

# Sample analysis of working of the function
# ---------------
load(file = "/Users/parthkhare/Desktop/TwitterSurvey/Demonetisation/Event/Dem85.prsd_2sep.RData")
#load(file="/Users/parthkhare/Desktop/TwitterSurvey/Demonetisation/Event/SntMan.Dem85_2sep.RData")
# Load the Sentiment Score Function

# Read Lexicon of all positive and negative words
# ---------------
pos.words <- read.csv("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/pve.csv")
neg.words <- read.csv("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/nve.csv")

# Scan the words into R
# ---------------
pos.words <- scan("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/pve.csv",
                  what = 'character')
neg.words <- scan("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/nve.csv",
                  what = 'character')

# Add +/- words Manually to the list
# ---------------
pos.words = c(pos.words, 'new','nice' ,'good', 'horizon')
neg.words = c(neg.words, 'wtf', 'behind','feels', 'ugly', 'back','worse' , 'shitty', 'bad', 'no','freaking','sucks','horrible')

# Source function for Sentiment Extraction
source("/Users/parthkhare/Desktop/TwitterSurvey/Lexicon/SentiScr_basic.R")

# Sentiment Analysis according to the package
# ---------------
system.time(res <- scr.snt(dm$tx,pos.words,neg.words))
res$sn <- ifelse(res$score > 0, "Positive","Negative")
res$sn <- ifelse(res$score == 0, "Neutral",res$sn)

# Apply sentiment Function to the tweets
# ---------------
system.time(res <- scr.snt(dm$tx,pos.words,neg.words))

# With due acknowledgement to Stefan Feuerriegel, Nicolas Proellochs.
Have added 25% + colloquial more vocabulary to the lexicon