Project Proposal - Reddit sentiment analysis

January 22, 2020

Problem Description

We want to do a sentiment analysis on the World News subreddit comments for the different posts relating to recent news from all over the World
It would be interesting to see how the Reddit community is reacting to recent news in light of some attention-seeking events that have happened recently.

Data Collection and Transformation

We will use RedditExtractoR, tidyverse, and other relevant libraries; no secondary data source.
Data can be retrieved by supplying the search arguments like so:

reddit_links <- reddit_urls(
  subreddit = "worldnews",
  page_threshold = 10)

reddit_thread <- reddit_content(reddit_links$URL)

Data retrieved as a data frame.
Initial data explorations shows no violation of any tidy rules.
Since we are dealing with text data, we will need to clean and transform the text to remove stopwords, stemming, etc.

Analytics Plan

Data exploration for comments based on posting times and number of comments in threads
Convert the comment texts to vectors, so we can perform a text analysis on them.
N-grams, TF-IDF, Sentiment Analysis
Use Sentiments Analysis
- Positive / Negative
- Emotion
- Numeric

Evaluation Plan

Wordcloud to get most frequent phrases, sentiment analysis results
Graphs, histograms for domains
Sentiment score/count plot