Every piece of writing has some kind of tone. Some writing has a positive tone, like Tony Robbin’s speeches or transcripts of Mr. Rogers. Other writing has a negative tone, such as Jude the Obscure by Thomas Hardy. Examples of neutral writing might be instruction manuals or cookbooks. Sentiment analysis uses statistics to determine if the tone of a piece of writing is positive, negative, or neutral. It commonly scores the writing according to a scale where 1 indicates a postive piece of writing, 0 is negative, and 0.5 is neutral. For example, the following reddit comment has a score of 0.756:
Personally I love Spacey as an actor This started with the Usual Suspects I was blown American Beauty
The Negotiator and the House of cards made it even more easy to like him Kudos for netflix
This is clearly a positive comment. To illustrate the other side of the coin, here is a reddit comment with a score of 0.142:
I m also upset that House of Cards is cancelled but the only person I m mad at about it is Kevin Spacey
I began by downloading all 4 billion reddit comments from 2005 to 2017 in JSON format and imported all the data into a MongoDB database. For the purposes of this project, I narrowed down the time frame to 2015 through 2017 and searched each comment for references to “Spacey” or “Kevin Spacey”. I took each of these comments and ran them through a generalized linear model (using Apache Spark on top of HDFS) that assigned each one a score from 0 to 1. I then took the average comment score for each day and plotted it over 2015-2017 below.
For many years, Kevin Spacy was a well-regarded actor and public figure. On reddit, he maintained a sentiment rating of about 0.65. In the fall of 2017, he was accused of inappropriate behavior by another actor. This was followed by several more accusations and widespread media coverage. As this scandal broke, sentiment on reddit declined precipitously.
This plot demonstrates that we can use sentiment analysis on social media data to reveal popular opinion regarding a public figure. I am sure that this process can be applied to other public figures, companies, events, or anything one can encapsulate in a set of search terms.