In 2018, and according to Alexa (Amazon’s own web traffic analysis platform), Reddit surpassed Facebook and became the third most visited website in the United States, following Google and YouTube, and followed by Facebook and Amazon. If you are unfamiliar with Reddit, it is a social network board which allows users to submit webpages, photos, videos, and posts that can be up-voted to front page visibility or down-voted to obscurity by the community to which they are posted in. These communities are called subreddits, and they vary in topics, from politics, to movies, to technology. Even R programming and machine learning have their own subreddits. Users on the website do not go by first and last name like other social media, but they do have usernames.
Hence, the RedditExtractoR Package helps scrape Reddit posts and retrieves comments. While those attributes on their own are very massive, they can be used in combination with text mining packages to analyse for trends. So, let’s start scraping!
Depending on its popularity, a Reddit post can get thousand of above and hundreds of comments. The construct_graph command allows provides a visualisation of the comment chains in a single Reddit comment thread; however, if there are too many comments, the visibility of the graph might be lost midst the usernames and the comment chains. Let’s take an example down here from one of my favorite subreddit, r/dataisbeautiful.
Reddit_url = "https://www.reddit.com/r/dataisbeautiful/comments/b5zzht/oc_the_gap_in_life_expectancy_in_each_us_state/"
url_data = reddit_content(Reddit_url)
graph_object = construct_graph(url_data, plot = TRUE)
Up there, username “inspurious_” created the post,, and the top tier of the comments are the root comments. Arrows pointing down refer to replies. As you can see, it is a crowded graph.
If you have noticed the code up there, there is another function used. That was the reddit_content function. The output of that function? A dataframe that includes link to the posts, the comments and replies, and their upvote/downvote scores for a single thread.
Let’s say that you don’t want to look at one thread at a time, but instead, you want to look into a search term for a popular relevant topic. That can be done in two ways. You can search for all reddit URLs that include that search term in one, multiple, or all subreddits. The reddit_urls function should do the trick.
Let’s take for example New Zealand Prime Ministe and all around awesome person Jacinda Ardern. Her courageous actions in the last few weeks were the centre of many articles across different political isles. Let’s see how that is reflected on Reddit.
Jacinda_Urls <- reddit_urls(search_terms = "Jacinda", subreddit = "worldnews politics", cn_threshold = 20, sort_by = "comments") #worldnews and politics are two different subreddits, and are the areas we are focusing our search on. cn_threshold is the minimum number of comments threshold.
Jacinda_Urls
## date num_comments
## 1 21-03-19 22277
## 2 21-03-19 9691
## 3 19-03-19 6193
## 4 15-03-19 1089
## 5 18-01-18 1055
## 6 20-08-18 822
## 7 25-10-17 804
## 8 19-10-17 572
## 9 20-10-17 203
## 10 23-03-19 198
## 11 22-10-17 85
## 12 22-03-19 48
## 13 07-11-17 42
## 14 21-03-19 40
## 15 22-03-19 47
## 16 25-10-17 31
## title
## 1 Jacinda Ardern has announced that New Zealand will ban all military-style semi-automatic weapons and assault rifles, essentially commencing immediately
## 2 Alexandria Ocasio-Cortez Slams U.S. Inaction on Gun Control as Jacinda Ardern Bans Weapons: 'This Is What Leadership Looks Like'
## 3 "He is a terrorist. He is a criminal. He is an extremist. But he will, when I speak, be nameless." - Jacinda Arderns extraordinary speech to parliament
## 4 "We are a proud nation of more than 200 ethnicities, 160 languages, and amongst that diversity we share common values.. We were chosen because we represent diversity, you may have chosen us, but we utterly reject and condemn you." - New Zealand prime Minster Jacinda Ardern
## 5 New Zealand Prime Minister Jacinda Ardern announces pregnancy
## 6 New Zealand Prime Minister Jacinda Ardern denies herself and her ministers a pay rise.
## 7 New Zealand to ban foreign homebuyers from purchasing existing properties: Jacinda Ardern has announced a dramatic plan to tackle soaring real estate prices in New Zealand, while her deputy claims the country is \030no longer for sale\031.
## 8 Jacinda Ardern is next prime minister of New Zealand, Winston Peters confirms
## 9 New Zealand Prime Minister-elect Jacinda Ardern says homelessness proves capitalism is a 'blatant failure'
## 10 World's tallest building lit up with image of Jacinda Ardern
## 11 New Zealand prime minister Jacinda Ardern left Mormon church to support LGBT rights
## 12 America Deserves a Leader as Good as Jacinda Ardern - New York Times Editorial Board
## 13 New Zealand's New Leader Wants to Kill Off Carbon: Jacinda Ardern aims to switch the electricity grid entirely to renewables by 2035, which would place the South Pacific island in a small club of nations ditching fuels like coal and natural gas to cut carbon emissions.
## 14 PM Jacinda Ardern says New Zealand will ban all military-style semi-automatic weapons and all assault rifles | RNZ News
## 15 New Zealand's PM , Jacinda Ardern receives death threats after the terrorist attacl that has happened in New Zealand .
## 16 Jacinda Ardern, 37, has been sworn in as the Prime Minister of New Zealand. She is the youngest person to hold the title in 160 years.
## subreddit
## 1 worldnews
## 2 politics
## 3 worldnews
## 4 worldnews
## 5 worldnews
## 6 worldnews
## 7 worldnews
## 8 worldnews
## 9 worldnews
## 10 worldnews
## 11 worldnews
## 12 politics
## 13 worldnews
## 14 worldnews
## 15 worldnews
## 16 worldnews
## URL
## 1 http://www.reddit.com/r/worldnews/comments/b3kvkr/jacinda_ardern_has_announced_that_new_zealand/
## 2 http://www.reddit.com/r/politics/comments/b3p1a5/alexandria_ocasiocortez_slams_us_inaction_on_gun/
## 3 http://www.reddit.com/r/worldnews/comments/b2rrqr/he_is_a_terrorist_he_is_a_criminal_he_is_an/
## 4 http://www.reddit.com/r/worldnews/comments/b1bsac/we_are_a_proud_nation_of_more_than_200/
## 5 http://www.reddit.com/r/worldnews/comments/7rdckh/new_zealand_prime_minister_jacinda_ardern/
## 6 http://www.reddit.com/r/worldnews/comments/98st18/new_zealand_prime_minister_jacinda_ardern_denies/
## 7 http://www.reddit.com/r/worldnews/comments/78mz9b/new_zealand_to_ban_foreign_homebuyers_from/
## 8 http://www.reddit.com/r/worldnews/comments/77clf8/jacinda_ardern_is_next_prime_minister_of_new/
## 9 http://www.reddit.com/r/worldnews/comments/77pepd/new_zealand_prime_ministerelect_jacinda_ardern/
## 10 http://www.reddit.com/r/worldnews/comments/b4g9mk/worlds_tallest_building_lit_up_with_image_of/
## 11 http://www.reddit.com/r/worldnews/comments/77zl8o/new_zealand_prime_minister_jacinda_ardern_left/
## 12 http://www.reddit.com/r/politics/comments/b3yz9w/america_deserves_a_leader_as_good_as_jacinda/
## 13 http://www.reddit.com/r/worldnews/comments/7bawqb/new_zealands_new_leader_wants_to_kill_off_carbon/
## 14 http://www.reddit.com/r/worldnews/comments/b3kw6c/pm_jacinda_ardern_says_new_zealand_will_ban_all/
## 15 http://www.reddit.com/r/worldnews/comments/b45k64/new_zealands_pm_jacinda_ardern_receives_death/
## 16 http://www.reddit.com/r/worldnews/comments/78rheo/jacinda_ardern_37_has_been_sworn_in_as_the_prime/
The result of this function is a dataframe of all Reddit posts in the r/worldnews and r/politics subreddit that contain the name Jacinda in the title, along with the urls.
But what if you want to undertake some sentiment analysis for these posts, or more accurately, for that search query? Thankfully, there is another convinient function for that. The get_reddit function does that. Let’s return to our favorite prime minister and see all the comments about her.
Jacinda <- get_reddit(search_terms = "Jacinda", subreddit = "worldnews politics", cn_threshold = 20, sort_by = "comments" )
What this does is that it creates a database (named Jacinda in our example) that shows all the comments in the posts that had Jacinda in their titles in the two subreddits, r/politics, and r/worldnews. That dataframe can then be analysed individually with other text mining packages for sentiment analysis or topic modelling. This database is close to 5000 entries, and thus we won’t be printing it in here.
Finally, let’s look into some network relationships between various users. We will continue working with our Jacinda dataframe with the help of another package, dplyr.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Jacinda_Min <- Jacinda_Urls %>%
filter(num_comments==min(Jacinda$num_comments)) %$%
URL %>% reddit_content #This targets the thread with the least number of comments
##
|
| | 0%
|
|=================================================================| 100%
user_network_list <- Jacinda_Min %>% user_network(include_author = FALSE, agg = TRUE)
user_network_list$plot
What this graph does is show the relationships between different posters and commenters in the smallest Jacinda related post on Reddit. This functionality can be helpful if one is interested in a specific user’s interactions (like a celebrity or a power Reddit user) and can then monitor them across different threads.
Here it is. These are all the functions in the Reddit API package, RedditExtractoR.
Go redditing now!