This R markdown comes from R documentation for the rtweet package at https://www.rdocumentation.org/packages/rtweet/versions/0.7.0
rtweet
provides users a range of functions designed to extract data from Twitter’s REST and streaming APIs.
To get the current released version from CRAN:
#install.packages("httpuv")
library(httpuv)
## install rtweet from CRAN
#install.packages("rtweet")
## load rtweet package
library(rtweet)
Search for up to 18,000 (non-retweeted) tweets containing the Election2020 hashtag.
api_key = "hQEd7o3rkkgCyW40y*********"
api_secret = "8j3WOcBztObXEuAtRo7RZu37**********************"
## authentication via web browser
token <- create_token(
app = "BDM2020", # App의 이름은 어떤것이라도 상관없습니다
consumer_key = api_key,
consumer_secret = api_secret
) # Key 와 Secret 코드는 각자 다 다른 코드를 부여 받게 됨으로 각자의 코드를 복사 후 붙이기 해야 합니다.
token
rt <- search_tweets(
"#Election2020", n = 18000, include_rts = FALSE, token = token
)
Twitter rate limits cap the number of search results returned to 18,000 every 15 minutes. To request more than that, simply set retryonratelimit = TRUE
and rtweet will wait for rate limit resets for you.
## search for 250,000 tweets containing the word data
rt <- search_tweets(
"#Election2020", n = 250000, include_rts = FALSE, token = token, retryonratelimit = TRUE
)
## quick overview of rtweet functions
vignette("intro", package = "rtweet")
rtweet
package, we need to obtain Twitter access tokens from apps.twitter.com. To do so, we will create a new app by providing a Name
, Description
, and Website
of our choosing.Callback URL
field, make sure to enter this: http://127.0.0.1:1410
Setting
to uncheck ‘Enable Callback Locking’Keys and Access Tokens
to retrieve your consumer key (API Key) and secret (API Secret)create_token()
and store the output as twitter_token
.# Whatever name you assigned to your created app
appname <- "rtweet_token_shin"
# Your own API key
## My own is below
key <- "8RuTxOXIoICVuXZf5tfBjBqwd"
# Your own API secret
## My own is below
secret <- "tYBzWwqGbzMo8VDUDm13WAPOHDfJ9KlqBdEQPAtNg0NtMJPTDU"
# Create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret)
# Check it is stored and working
Sys.getenv("TWITTER_PAT")
get_token()
Note that it is possible to create multiple Twitter apps, resulting in multiple tokens. Twitter discourages abusing their API rate limits so they regulate the max amount of requests we can make per 15 minutes (18,000 tweets). Abusing Twitter rate limits can even result in Twitter completely revoking your API access. However, Twitter does allow users multiple tokens as long as each token is used for a unique purpose…
So, at this point, we technically have enough to start using rtweet
functions; all we need is just setting the token
argument equal to twitter_token
(the token object we just created). But later, we might need to save tokens as an environment variable. This variable points to the saved Twitter token in our home directory.
If we wanted to model the topics of tweets, we could conduct two searches for tweets over the same time period and then compare the frequencies of tweets over time using time series. That’s what I’ve done in the example below.
First I searched for tweets mentioning “Donald Trump” and “Joe Biden” We can search tweets for up to 18,000 (non-retweeted) tweets per 15 minutes. It will return Twitter statuses including the terms from the past 6-9 days.
Be careful in setting the number of tweets to be extracted. Twitter rate limits cap the number of search results returned to 18,000 every 15 minutes. To request more than that, simply set retryonratelimit = TRUE
and rtweet will wait for rate limit resets for you.
# Search for 18,000 tweets containing 'Donald Trump' written in English from the U.S., using 'lang' and 'geocode'
dt_rtweet <- search_tweets("Donald Trump", include_rts = FALSE, lang = "en",
geocode = lookup_coords("usa"))
save(dt_rtweet,file="dt_rtweet.RData")
load("dt_rtweet.RData")
# Search for 18,000 tweets containing 'Joe Biden' in English
jb_rtweet <- search_tweets("Joe Biden", include_rts = FALSE, lang = "en",
geocode = lookup_coords("usa"))
save(jb_rtweet, file="jb_rtweet.RData")
load("jb_rtweet.RData")
## Search for 250,000 #BTS tweets, using 'retryonratelimit', from the U.S.
bts_rtweet_more <- search_tweets("#BTS", n = 250000, include_rts = FALSE, lang = "en",
geocode = lookup_coords("usa"), retryonratelimit = TRUE)
bind_rows()
floor_date
allows us to do this; using “hour” seems to work well for this hourly change in tweetslibrary(dplyr)
library(lubridate)
library(ggplot2)
# First, we will combine Trump and Biden tweets into one data frame
tweets_all <- bind_rows(dt_rtweet %>%
mutate(entity="Donald Trump"),
jb_rtweet %>%
mutate(entity="Joe Biden"))
# Next, we will aggregate tweets into the hour-long unit of time and count the time variable by hours
tweets_hours <- tweets_all %>%
mutate(time_floor = floor_date(created_at, unit = "hour")) %>%
count(entity, time_floor)
# Now, we are ready to visualize the time-series data counting tweets by hours
tweets_hours %>%
ggplot(aes(x=time_floor, y=n, color=entity)) +
geom_line() +
theme_bw() +
labs(x = NULL, y = "Hourly Sum",
title = "Tracing topic salience of Donald Trump and Joe Biden on Twitter",
subtitle = "Tweets were aggregated in 1-hour intervals. Retweets were excluded.")
# We can also visualize #BTS tweets by geo-location over the world.
bts_tweets_rtweet <- search_tweets("#BTS", n = 18000, include_rts = FALSE) # Gather tweets from anywhere
# And create lat(latitude) and lng(longtitude) variables using all avaiable geolocation info.
bts_tweets_geo <- lat_lng(bts_tweets_rtweet) %>%
group_by(lng,lat) %>%
summarise(sum = n()) %>%
filter(!is.na(lng)|!is.na(lat))
#install.packages("ggmap")
library(ggmap)
# Mapping Asia
asiamap <- get_map(location="Asia", zoom=3)
ggmap(asiamap) +
geom_point(aes(x=lng, y=lat, size=sum), data=bts_tweets_geo, colour="tomato", alpha=0.5)
# Mapping Europe
eumap <- get_map(location="Europe", zoom=4)
ggmap(eumap) +
geom_point(aes(x=lng, y=lat, size=sum), data=bts_tweets_geo, colour="tomato", alpha=0.5)
# Mapping US
usmap <- get_map(location="US", zoom=4)
ggmap(usmap) +
geom_point(aes(x=lng, y=lat, size=sum), data=bts_tweets_geo, colour="tomato", alpha=0.5)