Data Collection and Transformation
- We will use RedditExtractoR, tidyverse, and other relevant libraries; no secondary data source.
- Data can be retrieved by supplying the search arguments like so:
reddit_links <- reddit_urls(
subreddit = "worldnews",
page_threshold = 10)
reddit_thread <- reddit_content(reddit_links$URL)
- Data retrieved as a data frame.
- Initial data explorations shows no violation of any tidy rules.
- Since we are dealing with text data, we will need to clean and transform the text to remove stopwords, stemming, etc.