R Markdown

Description: This meetup is for anyone interested in learning and sharing knowledge about analyzing Reddit data using R. In this tutorial, we will be using RedditExtractoR and a few other R packages to analyze a dataset of Reddit posts.

Text mining is the process of analyzing large collections of unstructured text data to discover patterns, trends, and insights. With the rise of social media platforms like Reddit, there is a wealth of information available in the form of user-generated content that can be analyzed using text mining techniques.

R is a popular programming language and environment for statistical computing and graphics, widely used in data analysis and data visualization. In recent years, it has also become a powerful tool for text mining and natural language processing.

In this Meetup event, we will explore how to use R for text mining of Reddit data. We will walk through the process of collecting data from Reddit using its API, cleaning and preprocessing the data, and applying text mining techniques such as sentiment analysis and topic modeling. By the end of the session, you will have a basic understanding of how to use R for text mining of social media data and be able to apply these techniques to other similar datasets.

Who should attend?

This meetup is open to all skill levels.

Requirements: Participants should bring their laptops to the online event. Basic knowledge of R programming is recommended, but not required. Internet access will be required to access Yahoo Finance pages during the live coding session.

Processing data

Using a few R packages, we will clean and preprocess the data to prepare it for analysis. We will remove stop words, punctuations, and URLs from the text data.

This will create a corpus of the post titles and remove punctuations, URLs, and stop words. We also perform stemming to reduce words to their root form.

Creating a Document Term Matrix

We will now create a document term matrix to represent the text data.

Text Analysis using tm and other packages

We can now perform text analysis using tm and other packages. We will start by creating a few plots (word cloud, etc.) to visualize the most frequent words in the post titles.

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Loading required package: NLP
## Loading required package: RColorBrewer
## 
## Attaching package: 'syuzhet'
## The following object is masked from 'package:rtweet':
## 
##     get_tokens
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
## 
##     annotate
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Error: 'urls.csv' does not exist in current working directory ('/cloud/project').
## Error in eval(expr, envir, enclos): object 'reddit' not found
## Error in eval(expr, envir, enclos): object 'reddit' not found
## Error in eval(expr, envir, enclos): object 'reddit' not found
## Error in eval(expr, envir, enclos): object 'reddit' not found
## Error in eval(expr, envir, enclos): object 'reddit_text' not found
## Error in eval(expr, envir, enclos): object 'reddit_text' not found
## Error in eval(expr, envir, enclos): object 'reddit_text' not found
## Error in eval(expr, envir, enclos): object 'reddit_text' not found
## Error in eval(expr, envir, enclos): object 'reddit_source' not found
## Error in eval(expr, envir, enclos): object 'reddit_corpus' not found
## Error in eval(expr, envir, enclos): object 'reddit_corpus' not found
## Error in eval(expr, envir, enclos): object 'reddit_text' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc' not found
## Error in eval(expr, envir, enclos): object 'TextDoc_dtm' not found
## Error in eval(expr, envir, enclos): object 'dtm_m' not found
## Error in eval(expr, envir, enclos): object 'dtm_v' not found
## Error in eval(expr, envir, enclos): object 'dtm_d' not found
## Error in eval(expr, envir, enclos): object 'dtm_d' not found
## Error in eval(expr, envir, enclos): object 'dtm_v' not found
## Error in eval(expr, envir, enclos): object 'dtm_v' not found

References

‘RedditExtractoR’ - An R Package that helps you access the Reddit API: https://github.com/ivan-rivera/RedditExtractor

What Are APIs? - Simply Explained: https://www.youtube.com/watch?v=OVvTv9Hy91Q