How to get Twitter data with rtweet in R

Install and load packages (and vignettes for further documentation)

#devtools::install_github("mkearney/rtweet") # Latest working version of rtweet, this is preferred version to use
#packageVersion("rtweet")

#install.packages("tidyverse")
library(rtweet)
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.5
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter()  masks stats::filter()
## x purrr::flatten() masks rtweet::flatten()
## x dplyr::lag()     masks stats::lag()
library(knitr)

## quick overview of rtweet functions
## vignette("intro", package = "rtweet")
## working with the stream
## vignette("stream", package = "rtweet")
## troubleshooting
## vignette("FAQ", package = "rtweet")

Twitter API

Main Steps:

1) Apply for a Twitter API at: https://developer.twitter.com/en/apply-for-access (A Twitter account is required to go through the process)

Fill out the application form by answering the following questions:

a) The core use case, intent, or business purpose for your use of the Twitter APIs

b) If you intend to analyze Tweets, Twitter users, or their content, share details about the analyses you plan to conduct and the methods or techniques

c) If your use involves Tweeting, Retweeting, or liking content, share how you will interact with Twitter users or their content

d) If you’ll display Twitter content off of Twitter, explain how and where Tweets and Twitter content will be displayed to users of your product or service, including whether Tweets and Twitter content will be displayed at row level or aggregated

2) Create a Twitter App and obtain access tokens

See the auth vignette (https://rtweet.info/articles/auth.html) for instructions on obtaining access to Twitter’s APIs

3) Create Token in R - Commands commented and keys masked as this step depends on the individual Twitter app created

## appname <- "twitter_analysis"
## key <- "12345678901234567890"
##secret <- "12345678901234567890abcdefghijk"
# create token named "twitter_token"
##twitter_token <- rtweet::create_token(app = appname,
##                                    consumer_key = key,
##                                    consumer_secret = secret)

Creating the token in R will take you to an authentication step via the browser (interactive). Click on “Authorize the app” button to finalize the process

Save twitter_token in your home directory - Commands commented as the token is created interactively and cannot be repro’ed within and RMD file

# path of home directory 
##home_directory <- "C:/DATA/R Working Dir"
# combine with name for token
##file_name <- file.path(home_directory,
##                       "twitter_token.rds")
# save token to home directory
##saveRDS(twitter_token, file = file_name)

# assuming you followed the procodures to create "file_name"
# from the previous code chunk, then the code below should
# create and save your environment variable.
##cat(paste0("TWITTER_PAT=", file_name),
##    file = file.path(home_directory, ".Renviron"),
##    append = TRUE)

Start collecting and analyzing some Twitter data

## search for 3000 tweets using the rstats hashtag
rt <- rtweet::search_tweets("#rstats", n = 3000, include_rts = FALSE)
## preview tweets data
rt %>% dplyr::glimpse(10)
## Observations: 2,865
## Variables: 88
## $ user_id                 <chr> ...
## $ status_id               <chr> ...
## $ created_at              <dttm> ...
## $ screen_name             <chr> ...
## $ text                    <chr> ...
## $ source                  <chr> ...
## $ display_text_width      <dbl> ...
## $ reply_to_status_id      <chr> ...
## $ reply_to_user_id        <chr> ...
## $ reply_to_screen_name    <chr> ...
## $ is_quote                <lgl> ...
## $ is_retweet              <lgl> ...
## $ favorite_count          <int> ...
## $ retweet_count           <int> ...
## $ hashtags                <list> ...
## $ symbols                 <list> ...
## $ urls_url                <list> ...
## $ urls_t.co               <list> ...
## $ urls_expanded_url       <list> ...
## $ media_url               <list> ...
## $ media_t.co              <list> ...
## $ media_expanded_url      <list> ...
## $ media_type              <list> ...
## $ ext_media_url           <list> ...
## $ ext_media_t.co          <list> ...
## $ ext_media_expanded_url  <list> ...
## $ ext_media_type          <chr> ...
## $ mentions_user_id        <list> ...
## $ mentions_screen_name    <list> ...
## $ lang                    <chr> ...
## $ quoted_status_id        <chr> ...
## $ quoted_text             <chr> ...
## $ quoted_created_at       <dttm> ...
## $ quoted_source           <chr> ...
## $ quoted_favorite_count   <int> ...
## $ quoted_retweet_count    <int> ...
## $ quoted_user_id          <chr> ...
## $ quoted_screen_name      <chr> ...
## $ quoted_name             <chr> ...
## $ quoted_followers_count  <int> ...
## $ quoted_friends_count    <int> ...
## $ quoted_statuses_count   <int> ...
## $ quoted_location         <chr> ...
## $ quoted_description      <chr> ...
## $ quoted_verified         <lgl> ...
## $ retweet_status_id       <chr> ...
## $ retweet_text            <chr> ...
## $ retweet_created_at      <dttm> ...
## $ retweet_source          <chr> ...
## $ retweet_favorite_count  <int> ...
## $ retweet_retweet_count   <int> ...
## $ retweet_user_id         <chr> ...
## $ retweet_screen_name     <chr> ...
## $ retweet_name            <chr> ...
## $ retweet_followers_count <int> ...
## $ retweet_friends_count   <int> ...
## $ retweet_statuses_count  <int> ...
## $ retweet_location        <chr> ...
## $ retweet_description     <chr> ...
## $ retweet_verified        <lgl> ...
## $ place_url               <chr> ...
## $ place_name              <chr> ...
## $ place_full_name         <chr> ...
## $ place_type              <chr> ...
## $ country                 <chr> ...
## $ country_code            <chr> ...
## $ geo_coords              <list> ...
## $ coords_coords           <list> ...
## $ bbox_coords             <list> ...
## $ status_url              <chr> ...
## $ name                    <chr> ...
## $ location                <chr> ...
## $ description             <chr> ...
## $ url                     <chr> ...
## $ protected               <lgl> ...
## $ followers_count         <int> ...
## $ friends_count           <int> ...
## $ listed_count            <int> ...
## $ statuses_count          <int> ...
## $ favourites_count        <int> ...
## $ account_created_at      <dttm> ...
## $ verified                <lgl> ...
## $ profile_url             <chr> ...
## $ profile_expanded_url    <chr> ...
## $ account_lang            <chr> ...
## $ profile_banner_url      <chr> ...
## $ profile_background_url  <chr> ...
## $ profile_image_url       <chr> ...
## plot time series
ts_plot(rt) +
  ggplot2::theme_minimal() +
  ggplot2::theme(plot.title = ggplot2::element_text(face = "bold")) +
  ggplot2::labs(
    x = NULL, y = NULL,
    title = "Frequency of #rstats Twitter statuses from past 9 days",
    caption = "\nSource: Data collected from Twitter's REST API via rtweet"
  )

Maps

## search for 1000 tweets sent from the US
rt <- search_tweets(
  "lang:en", geocode = lookup_coords("usa"), n = 1000
)

## create lat/lng variables using all available tweet and profile geo-location data
rt <- lat_lng(rt)

## plot state boundaries
par(mar = c(0, 0, 0, 0))
maps::map("state", lwd = .25)

## plot lat and lng points onto state map
with(rt, points(lng, lat, pch = 20, cex = .75, col = rgb(0, .3, .7, .75)))

Streaming tweets

## stream tweets mentioning Reinforcement Learning for a week (60 secs)
stream_tweets(
  "Machine Learning",
  timeout = 60,
  file_name = "tweetsaboutml.json",
  parse = FALSE
)
## Streaming tweets for 60 seconds...
## Finished streaming tweets!
## streaming data saved as tweetsaboutml.json
## read in the data as a tidy tbl data frame
mlt <- parse_stream("tweetsaboutml.json")
## opening file input connection.
## 
 Found 5 records...
 Imported 5 records. Simplifying...
## closing file input connection.
select(mlt, location, text) %>% kable()
location text
Golden, CO Amazon debuts a scale model autonomous car to teach developers machine learning.
https://t.co/99y2DGyQVI
Dortmund, Deutschland Microsoft and @Nielsen aim to change the #CPG industry by bringing data and machine learning together. Read more on @Forbes:
https://t.co/1WFK103jf1 https ://t.co/pEe6lxsy9A
HollyWood, CA Amazon debuts a scale model autonomous car to teach developers machine learning - https://t.co/SBTAyOwNcB https://t.co/0kLqPyMR4W
<U+0 93F>, “Pattern Recognition and Machine Learning” by Dr. @ChrisBishopMSFT is now available for free. Download today for your introduction to the fields of pattern recognition and machine learning: https://t.co/w0GwuuGPlo
San Francisco Bay Area, CA The GANfather @goodfellow_ian talking about Adversarial Machine Learning @Perimeter.

The ML and Theoretical Physics worlds collide! https://t.co/8zn6jUiJSi

Timelines

## get the most recent 3000 tweets from cnn, BBCWorld, and foxnews
tmls <- get_timelines(c("cnn", "BBCWorld", "foxnews"), n = 3000)

## plot the frequency of tweets for each user over time
tmls %>%
  dplyr::filter(created_at > "2018-10-01") %>%
  dplyr::group_by(screen_name) %>%
  ts_plot("days", trim = 1L) +
  ggplot2::geom_point() +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    legend.title = ggplot2::element_blank(),
    legend.position = "bottom",
    plot.title = ggplot2::element_text(face = "bold")) +
  ggplot2::labs(
    x = NULL, y = NULL,
    title = "Frequency of Tweets posted by news organization",
    subtitle = "Tweet counts aggregated by day from October 2018",
    caption = "\nSource: Data collected from Twitter's REST API via rtweet"
  )