Extracting data from Reddit using R

Nicole O'Donnell

January 23, 2022

What is RedditExtractor?

A minimalistic R wrapper for the Reddit API (application programming interface)

Install the RedditExtractoR package

install.packages("RedditExtractoR")

Load the package into the console

library(RedditExtractoR)

Building the search query

Use the get_reddit() function to query Reddit data and modify the arguments below:

  • Arguments
    • search_terms A string of terms to be searched on Reddit.
    • regex_filter An optional regular expression filter that will remove URLs with titles that do not match the condition.
    • subreddit An optional character string that will restrict the search to the specified subreddit.
    • cn_threshold Comment number threshold that remove URLs with fewer comments that cn_threshold. 0 by default.
    • page_threshold Page threshold that controls the number of pages is going to be searched
    • sort_by "comments" to arrange by number of comments, or "new" to arrange by date.
    • wait_time Wait time in seconds between page requests. 2 by default

Posts in the r/pics subreddit containing the term bone marrow

bone_marrow_posts = get_reddit(search_terms = "bone marrow",subreddit = "pics",cn_threshold=0,sort_by = "comments")

Output

datatable(sample_n(bone_marrow_posts,25) %>% select(structure,author,user,comment))

Additional Resources

GitHub source code
Full Manual CRAN