Below are three distinct, but complementary, R scripts for retrieving and analyzing Twitter content using RTweet version 1.02.
The Retrieving tweets that match a query script
fetches recent tweets that match a set of customizable search terms,
stores the tweets’ content and selected metadata in a data frame named
FinalData
, and exports the results to a comma-separated
value file named FinalData.csv
. The code can specify a
maximum number of tweets to be fetched, and a minimum number of times a
tweet must have been retweeted before it is included in the fetch.
The Getting most recent tweets from a user script
fetches up to a customizable number of tweets posted by a specified
Twitter user. The fetch begins with the most recent tweet, then works
backwards in time until the specified number of tweets has been
retrieved. Like the first script, this one stores the content of the
tweets and selected metadata in a data frame named
FinalData
and also in a comma-separated value file named
FinalData.csv
.
Finally, the Getting most common words in tweets
script produces a list of words contained in the full_text
field of the FinalData
data frame and the number of times
each word appears. The words and their frequencies are stored in a data
frame named WordFrequency
and also stored in a
comma-separated value filed named WordFrequency.csv
. This
script can help identify themes in the tweets retrieved by either of the
first two scripts.
The first two scripts require a one-time login to a standard Twitter account, using a Twitter profile name and password. These credentials will be stored in a file on the user’s computer. The script will report the file name and path, in case the user wants to remove the credentials. RTweet does not require users to have developer access to the Twitter API.
See ” https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview for descriptions of the Twitter variables retrieved by the first two scripts.
See # Query examples
in the script for instructions
about customizing the Twitter search terms. By default, the script
searches for up to 18,000 tweets that mention “Joe Biden” and that have
been retweeted 25 times or more. The FinalData.csv
file
will be stored on the user’s computer, in the same directory as the
script. The 18,000-tweet maximum and 25-retweet minimum can both be
adjusted as needed.
##############################################################
## Retrieving tweets that match a query
##############################################################
## Installing and loading packages
if (!require("rtweet")) install.packages("rtweet")
if (!require("httpuv")) install.packages("httpuv")
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("readr")) install.packages("readr")
library(rtweet)
library(httpuv)
library(tidyverse)
library(readr)
# Authentication
auth_setup_default()
# Getting tweets
# Query examples
#
# Both "data" AND "science:
# q = 'data science',
#
# Either "data" OR "science":
# q = 'data OR science',
#
# The exact phrase "data science":
# q = '"data science"',
#
# To specify a minimum retweet count (e.g., 25) for any of the above:
# q = 'data science min_retweets:25',
# q = 'data OR science min_retweets:25',
# q = '"data science"min_retweets:25',
# Change 25 to adjust the minimum number of retweets required.
# Change 18000 to adjust the maximum number of tweets to fetch.
# See search_tweets in the RTweet reference manual for more options
Tweets <- search_tweets(q = '"Joe Biden" min_retweets:25',
lang = "en",
include_rts = FALSE,
n = 18000)
# Getting details about the users who posted the tweets retrieved
Users <- users_data(Tweets)
# Subsetting and merging the Tweets and Users dataframes
# into "FinalData" dataframe, then deleting the component
# dataframes and lists
TweetsSubset <- c("created_at",
"id_str",
"full_text",
"retweet_count",
"favorite_count",
"lang")
UsersSubset <- c("name",
"screen_name",
"followers_count",
"location",
"description",
"verified")
Tweets <- data.frame(Tweets[TweetsSubset])
Users <- data.frame(Users[UsersSubset])
FinalData <- cbind(Tweets, Users)
rm("Tweets","Users","TweetsSubset","UsersSubset")
# Adding local (Central) timestamp and tweet URL
# to FinalData dataframe
FinalData$localtime <- as.POSIXct(FinalData$created_at,tz="GMT")
FinalData$localtime <- format(FinalData$localtime, tz = "America/Chicago", usetz = TRUE)
FinalData$URL <- paste("https://twitter.com/user/status/",FinalData$id_str, sep = "")
FinalData <- subset(FinalData, select = -c(id_str))
# Writing FinalData dataframe to .csv
write_excel_csv(FinalData, file = "FinalData.csv")
See # To specify the target user:
in the script for
instructions on specifying which Twitter user to retrieve tweets from.
By default, the script searches for up to 3,500 tweets posted by
President Joe Biden’s @JoeBiden Twitter
account. The FinalData.csv
file will be stored on the
user’s computer, in the same directory as the script.
##############################################################
### Getting most recent tweets from a user
##############################################################
## Installing and loading packages
if (!require("rtweet")) install.packages("rtweet")
if (!require("httpuv")) install.packages("httpuv")
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("readr")) install.packages("readr")
library(rtweet)
library(httpuv)
library(tidyverse)
library(readr)
# Authentication
auth_setup_default()
# Getting tweets
# To specify the target user:
# Remove the "@" from the user's Twitter account name,
# then add the result to the line:
# UserTweets <- get_timeline("xxxxxx",
# in place of xxxxx.
# Also: Change 3500 to adjust the maximum number
# of tweets to retrieve.
UserTweets <- get_timeline("JoeBiden",
n = 3500,
retryonratelimit = TRUE,
verbose = TRUE)
# Subsetting the retrieved tweets
UserTweetsSubset <- c("created_at",
"id_str",
"full_text",
"retweet_count",
"favorite_count",
"lang")
FinalData <- data.frame(UserTweets[UserTweetsSubset])
rm("UserTweets", "UserTweetsSubset")
# Adding local (Central) timestamp and tweet URL
# to FinalData dataframe
FinalData$localtime <- as.POSIXct(FinalData$created_at,tz="GMT")
FinalData$localtime <- format(FinalData$localtime, tz = "America/Chicago", usetz = TRUE)
FinalData$URL <- paste("https://twitter.com/user/status/",FinalData$id_str, sep = "")
FinalData <- subset(FinalData, select = -c(id_str))
# Writing FinalData dataframe to .csv
write_excel_csv(FinalData, file = "FinalData.csv")
This script will work with the FinalData
data frame
produced by either of the two scripts above. Run at least one of those
two scripts before running this one. The WordFrequency.csv
file will be stored in the same directory as the script.
##############################################################
### Getting most common words in tweets
##############################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
library(tidyverse)
library(tidytext)
library(stringr) # Part of the tidyverse package
# Formatting tweets
remove_reg <- "&|<|>"
tidy_tweets <- FinalData %>%
mutate(text = str_remove_all(full_text, remove_reg)) %>%
unnest_tokens(word, text, token = "tweets") %>%
filter(!word %in% stop_words$word,
!word %in% str_remove_all(stop_words$word, "'"),
str_detect(word, "[a-z]"))
# Generating, displaying, and saving word frequency counts
WordFrequency <- tidy_tweets %>%
count(word, sort = TRUE)
WordFrequency
rm("tidy_tweets", "remove_reg")
write_excel_csv(WordFrequency, file = "WordFrequency.csv")