First, we need to define a function, get_posts(), to retrieve subreddit posts for us.
We also need to clean the raw data as it is retrieved in a somewhat messy JSON format. We define a function, clean_posts(), to accomplish this.
We can then use get_posts() to retrieve data from our subreddits-of-interest, r/highereducation and r/Professors. Note that the maximum number of posts we can retrieve is n = 1,000 due to limits by the Reddit API.
Our data collection allowed us to retreive 998 posts from r/highereducation, going back to 2019-06-16, and 994 posts from r/Professors, going back to 2020-05-01.