12/9/2020

Business context

Reddit is a social network, web content rating and discussion website. Registered members submit content to the site such as links, text posts, and images, which are then voted up or down by other members. Posts are organized by subject into user-created boards called “subreddits”, which cover a variety of topics such as news, politics, science, movies, video games, music, books, sports, fitness, cooking, pets, and image-sharing

Data Cleaning

Data is collected from Reddit API. r extractor page was used. The fetched data will be stored in csv files. Data cleaning and manipulation is done with the help of tidyr and dplyr libraries.

Problem Statement

Data is collected from Reddit API. r extractor page was used. The fetched data will be stored in csv files. Data cleaning and manipulation is done with the help of tidyr and dplyr libraries.

Peer comments

Predict sentiment

multiple regressions to assess sentiments

Evaluation Plan

Used NLP techniques such as TF IDF amd polarity plots. Also used lexicons such as NRC and bing for categorizing the words. Used N gram bi grams etc.plotted wordclouds to understand sentiments. Polarity plot gave us a good indication.