Capturing Twitter Content with RTweet

Below are three distinct, but complementary, R scripts for retrieving and analyzing Twitter content using RTweet version 1.02.

The Retrieving tweets that match a query script fetches recent tweets that match a set of customizable search terms, stores the tweets’ content and selected metadata in a data frame named FinalData, and exports the results to a comma-separated value file named FinalData.csv. The code can specify a maximum number of tweets to be fetched, and a minimum number of times a tweet must have been retweeted before it is included in the fetch.

The Getting most recent tweets from a user script fetches up to a customizable number of tweets posted by a specified Twitter user. The fetch begins with the most recent tweet, then works backwards in time until the specified number of tweets has been retrieved. Like the first script, this one stores the content of the tweets and selected metadata in a data frame named FinalData and also in a comma-separated value file named FinalData.csv.

Finally, the Getting most common words in tweets script produces a list of words contained in the full_text field of the FinalData data frame and the number of times each word appears. The words and their frequencies are stored in a data frame named WordFrequency and also stored in a comma-separated value filed named WordFrequency.csv. This script can help identify themes in the tweets retrieved by either of the first two scripts.

The first two scripts require a one-time login to a standard Twitter account, using a Twitter profile name and password. These credentials will be stored in a file on the user’s computer. The script will report the file name and path, in case the user wants to remove the credentials. RTweet does not require users to have developer access to the Twitter API.

See ” https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview for descriptions of the Twitter variables retrieved by the first two scripts.

Retrieving tweets that match a query

See # Query examples in the script for instructions about customizing the Twitter search terms. By default, the script searches for up to 18,000 tweets that mention “Joe Biden” and that have been retweeted 25 times or more. The FinalData.csv file will be stored on the user’s computer, in the same directory as the script. The 18,000-tweet maximum and 25-retweet minimum can both be adjusted as needed.

##############################################################
## Retrieving tweets that match a query
##############################################################

## Installing and loading packages

if (!require("rtweet")) install.packages("rtweet")
if (!require("httpuv")) install.packages("httpuv")
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("readr")) install.packages("readr")

library(rtweet)
library(httpuv)
library(tidyverse)
library(readr)

# Authentication

auth_setup_default()

# Getting tweets

# Query examples
#
# Both "data" AND "science:
# q = 'data science',
#
# Either "data" OR "science":
# q = 'data OR science',
#
# The exact phrase "data science":
# q = '"data science"',
#
# To specify a minimum retweet count (e.g., 25) for any of the above:
# q = 'data science min_retweets:25',
# q = 'data OR science min_retweets:25',
# q = '"data science"min_retweets:25',
# Change 25 to adjust the minimum number of retweets required.
# Change 18000 to adjust the maximum number of tweets to fetch.
# See search_tweets in the RTweet reference manual for more options

Tweets <- search_tweets(q = '"Joe Biden" min_retweets:25',
                         lang = "en",
                         include_rts = FALSE,
                         n = 18000)

# Getting details about the users who posted the tweets retrieved

Users <- users_data(Tweets)

# Subsetting and merging the Tweets and Users dataframes
# into "FinalData" dataframe, then deleting the component
# dataframes and lists

TweetsSubset <- c("created_at",
                  "id_str",
                  "full_text",
                  "retweet_count",
                  "favorite_count",
                  "lang")
UsersSubset <- c("name",
                 "screen_name",
                 "followers_count",
                 "location",
                 "description",
                 "verified")
Tweets <- data.frame(Tweets[TweetsSubset])
Users <- data.frame(Users[UsersSubset])
FinalData <- cbind(Tweets, Users)
rm("Tweets","Users","TweetsSubset","UsersSubset")

# Adding local (Central) timestamp and tweet URL
# to FinalData dataframe

FinalData$localtime <- as.POSIXct(FinalData$created_at,tz="GMT")
FinalData$localtime <- format(FinalData$localtime, tz = "America/Chicago", usetz = TRUE)
FinalData$URL <- paste("https://twitter.com/user/status/",FinalData$id_str, sep = "")
FinalData <- subset(FinalData, select = -c(id_str))

# Writing FinalData dataframe to .csv

write_excel_csv(FinalData, file = "FinalData.csv")

Getting most recent tweets from a user

See # To specify the target user: in the script for instructions on specifying which Twitter user to retrieve tweets from. By default, the script searches for up to 3,500 tweets posted by President Joe Biden’s @JoeBiden Twitter account. The FinalData.csv file will be stored on the user’s computer, in the same directory as the script.

##############################################################
### Getting most recent tweets from a user
##############################################################

## Installing and loading packages

if (!require("rtweet")) install.packages("rtweet")
if (!require("httpuv")) install.packages("httpuv")
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("readr")) install.packages("readr")

library(rtweet)
library(httpuv)
library(tidyverse)
library(readr)

# Authentication

auth_setup_default()

# Getting tweets

# To specify the target user:
# Remove the "@" from the user's Twitter account name,
# then add the result to the line:
# UserTweets <- get_timeline("xxxxxx",
# in place of xxxxx.
# Also: Change 3500 to adjust the maximum number
# of tweets to retrieve.

UserTweets <- get_timeline("JoeBiden",
                           n = 3500,
                           retryonratelimit = TRUE,
                           verbose = TRUE)

# Subsetting the retrieved tweets

UserTweetsSubset <- c("created_at",
                  "id_str",
                  "full_text",
                  "retweet_count",
                  "favorite_count",
                  "lang")
FinalData <- data.frame(UserTweets[UserTweetsSubset])
rm("UserTweets", "UserTweetsSubset")

# Adding local (Central) timestamp and tweet URL
# to FinalData dataframe

FinalData$localtime <- as.POSIXct(FinalData$created_at,tz="GMT")
FinalData$localtime <- format(FinalData$localtime, tz = "America/Chicago", usetz = TRUE)
FinalData$URL <- paste("https://twitter.com/user/status/",FinalData$id_str, sep = "")
FinalData <- subset(FinalData, select = -c(id_str))

# Writing FinalData dataframe to .csv

write_excel_csv(FinalData, file = "FinalData.csv")

Producing word counts

This script will work with the FinalData data frame produced by either of the two scripts above. Run at least one of those two scripts before running this one. The WordFrequency.csv file will be stored in the same directory as the script.

##############################################################
### Getting most common words in tweets
##############################################################

# Installing and loading required packages

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
library(tidyverse)
library(tidytext)
library(stringr) # Part of the tidyverse package

# Formatting tweets

remove_reg <- "&amp;|&lt;|&gt;"
tidy_tweets <- FinalData %>% 
  mutate(text = str_remove_all(full_text, remove_reg)) %>%
  unnest_tokens(word, text, token = "tweets") %>%
  filter(!word %in% stop_words$word,
         !word %in% str_remove_all(stop_words$word, "'"),
         str_detect(word, "[a-z]"))

# Generating, displaying, and saving word frequency counts

WordFrequency <- tidy_tweets %>% 
  count(word, sort = TRUE) 
WordFrequency
rm("tidy_tweets", "remove_reg")
write_excel_csv(WordFrequency, file = "WordFrequency.csv")

Capturing Twitter Content with RTweet

Ken Blake

2022-08-07

Retrieving tweets that match a query

Getting most recent tweets from a user

Producing word counts