This R markdown comes from R documentation for the rtweet package at https://www.rdocumentation.org/packages/rtweet/versions/0.7.0

rweet

rtweet provides users a range of functions designed to extract data from Twitter’s REST and streaming APIs.

Installation

To get the current released version from CRAN:

#install.packages("httpuv")
library(httpuv)

## install rtweet from CRAN
#install.packages("rtweet")

## load rtweet package
library(rtweet)

API authorization

All users must be authorized to interact with Twitter’s APIs. This gives you more stability and permissions.

Package features

Search tweets

Search for up to 18,000 (non-retweeted) tweets containing the Election2020 hashtag.

search for 18000 tweets using the Election2020 hashtag

api_key = "hQEd7o3rkkgCyW40y*********"
api_secret = "8j3WOcBztObXEuAtRo7RZu37**********************"

## authentication via web browser
token <- create_token(
  app = "BDM2020", # App의 이름은 어떤것이라도 상관없습니다
  consumer_key = api_key, 
  consumer_secret = api_secret
  ) # Key 와 Secret 코드는 각자 다 다른 코드를 부여 받게 됨으로 각자의 코드를 복사 후 붙이기 해야 합니다.
token

rt <- search_tweets(
  "#Election2020", n = 18000, include_rts = FALSE, token = token
)

Rate Limit

Twitter rate limits cap the number of search results returned to 18,000 every 15 minutes. To request more than that, simply set retryonratelimit = TRUE and rtweet will wait for rate limit resets for you.

## search for 250,000 tweets containing the word data
rt <- search_tweets(
  "#Election2020", n = 250000, include_rts = FALSE, token = token, retryonratelimit = TRUE
)

Quick overview of rtweet package

## quick overview of rtweet functions
vignette("intro", package = "rtweet")

Creating a Twitter App

  • To use the rtweet package, we need to obtain Twitter access tokens from apps.twitter.com. To do so, we will create a new app by providing a Name, Description, and Website of our choosing.
  • Important in the Callback URL field, make sure to enter this: http://127.0.0.1:1410
  • Check yes if you agree and then click “Create your Twitter application”
  • Once you’ve successfully created an app, click the tab labeled Setting to uncheck ‘Enable Callback Locking’
  • Then, click the tab labeled Keys and Access Tokens to retrieve your consumer key (API Key) and secret (API Secret)
  • Copy and paste these keys into your R script file and assign them to objects like I’ve done in the code below.
  • Once the keys are read into R, use create_token() and store the output as twitter_token.
  • We need to log in our own Twitter account and agree/authorize the rtweet application.
# Whatever name you assigned to your created app
appname <- "rtweet_token_shin"

# Your own API key
## My own is below
key <- "8RuTxOXIoICVuXZf5tfBjBqwd"

# Your own API secret
## My own is below
secret <- "tYBzWwqGbzMo8VDUDm13WAPOHDfJ9KlqBdEQPAtNg0NtMJPTDU"

# Create token named "twitter_token"
twitter_token <- create_token(
  app = appname,
  consumer_key = key,
  consumer_secret = secret)

# Check it is stored and working
Sys.getenv("TWITTER_PAT")
get_token()

Note that it is possible to create multiple Twitter apps, resulting in multiple tokens. Twitter discourages abusing their API rate limits so they regulate the max amount of requests we can make per 15 minutes (18,000 tweets). Abusing Twitter rate limits can even result in Twitter completely revoking your API access. However, Twitter does allow users multiple tokens as long as each token is used for a unique purpose…

So, at this point, we technically have enough to start using rtweet functions; all we need is just setting the token argument equal to twitter_token (the token object we just created). But later, we might need to save tokens as an environment variable. This variable points to the saved Twitter token in our home directory.

API Authorization

Search tweets and compare their frequency

If we wanted to model the topics of tweets, we could conduct two searches for tweets over the same time period and then compare the frequencies of tweets over time using time series. That’s what I’ve done in the example below.

First I searched for tweets mentioning “Donald Trump” and “Joe Biden” We can search tweets for up to 18,000 (non-retweeted) tweets per 15 minutes. It will return Twitter statuses including the terms from the past 6-9 days.

Be careful in setting the number of tweets to be extracted. Twitter rate limits cap the number of search results returned to 18,000 every 15 minutes. To request more than that, simply set retryonratelimit = TRUE and rtweet will wait for rate limit resets for you.

# Search for 18,000 tweets containing 'Donald Trump' written in English from the U.S., using 'lang' and 'geocode'
dt_rtweet <- search_tweets("Donald Trump", include_rts = FALSE, lang = "en",
                                   geocode = lookup_coords("usa"))
save(dt_rtweet,file="dt_rtweet.RData")
load("dt_rtweet.RData")

# Search for 18,000 tweets containing 'Joe Biden' in English
jb_rtweet <- search_tweets("Joe Biden", include_rts = FALSE, lang = "en",
                               geocode = lookup_coords("usa"))
save(jb_rtweet, file="jb_rtweet.RData")
load("jb_rtweet.RData")

## Search for 250,000 #BTS tweets, using 'retryonratelimit', from the U.S.
bts_rtweet_more <- search_tweets("#BTS", n = 250000, include_rts = FALSE, lang = "en",
                                 geocode = lookup_coords("usa"), retryonratelimit = TRUE)

Let’s visualize the frequency of tweets over time

library(dplyr)
library(lubridate)
library(ggplot2)
# First, we will combine Trump and Biden tweets into one data frame
tweets_all <- bind_rows(dt_rtweet %>% 
                          mutate(entity="Donald Trump"),
                        jb_rtweet %>% 
                          mutate(entity="Joe Biden")) 

# Next, we will aggregate tweets into the hour-long unit of time and count the time variable by hours
tweets_hours <- tweets_all %>% 
  mutate(time_floor = floor_date(created_at, unit = "hour")) %>% 
  count(entity, time_floor)

# Now, we are ready to visualize the time-series data counting tweets by hours
tweets_hours %>%
  ggplot(aes(x=time_floor, y=n, color=entity)) +
  geom_line() +
  theme_bw() +
  labs(x = NULL, y = "Hourly Sum",
       title = "Tracing topic salience of Donald Trump and Joe Biden on Twitter",
       subtitle = "Tweets were aggregated in 1-hour intervals. Retweets were excluded.")

Visualization of tweets’ geo-locations

# We can also visualize #BTS tweets by geo-location over the world.
bts_tweets_rtweet <- search_tweets("#BTS", n = 18000, include_rts = FALSE) # Gather tweets from anywhere

# And create lat(latitude) and lng(longtitude) variables using all avaiable geolocation info.
bts_tweets_geo <- lat_lng(bts_tweets_rtweet) %>% 
  group_by(lng,lat) %>% 
  summarise(sum = n()) %>% 
  filter(!is.na(lng)|!is.na(lat))

#install.packages("ggmap")
library(ggmap)

# Mapping Asia
asiamap <- get_map(location="Asia", zoom=3)
ggmap(asiamap) +
    geom_point(aes(x=lng, y=lat, size=sum), data=bts_tweets_geo, colour="tomato", alpha=0.5)

# Mapping Europe
eumap <- get_map(location="Europe", zoom=4)
ggmap(eumap) +
    geom_point(aes(x=lng, y=lat, size=sum), data=bts_tweets_geo, colour="tomato", alpha=0.5)

# Mapping US
usmap <- get_map(location="US", zoom=4)
ggmap(usmap) + 
    geom_point(aes(x=lng, y=lat, size=sum), data=bts_tweets_geo, colour="tomato", alpha=0.5)