January 22, 2020

Problem Description

  • We want to do a sentiment analysis on the World News subreddit comments for the different posts relating to recent news from all over the World
  • It would be interesting to see how the Reddit community is reacting to recent news in light of some attention-seeking events that have happened recently.

Data Collection and Transformation

  • We will use RedditExtractoR, tidyverse, and other relevant libraries; no secondary data source.
  • Data can be retrieved by supplying the search arguments like so:
reddit_links <- reddit_urls(
  subreddit = "worldnews",
  page_threshold = 10)

reddit_thread <- reddit_content(reddit_links$URL)
  • Data retrieved as a data frame.
  • Initial data explorations shows no violation of any tidy rules.
  • Since we are dealing with text data, we will need to clean and transform the text to remove stopwords, stemming, etc.

Analytics Plan

  • Data exploration for comments based on posting times and number of comments in threads
  • Convert the comment texts to vectors, so we can perform a text analysis on them.
  • N-grams, TF-IDF, Sentiment Analysis
  • Use Sentiments Analysis
    • Positive / Negative
    • Emotion
    • Numeric

Evaluation Plan

  • Wordcloud to get most frequent phrases, sentiment analysis results
  • Graphs, histograms for domains
  • Sentiment score/count plot