JMSC 6116 Lecture 2: Analyzing #MeToo Campaign: Using Twitter API

First at all, we install the “rtweet” library and load it into the RStudio.

if (!require("rtweet")) install.packages("rtweet", repos="https://cran.cnr.berkeley.edu/", dependencies = TRUE)

library("rtweet")  # load the required library

Then, please insert your appname, consumerKey, and consumerSecret into the three different “character” variables as shown in the following code.

appname <- "YOUR APP NAME"
consumerKey <- "YOUR CONSUMER KEY"
consumerSecret <- "YOUR CONSUMER SECRET"

Setup your Twitter token …..

# Using the following command, a token file is saved in your computer. No need to setup each time 
twitter_token <- create_token(app = appname, consumer_key = consumerKey, consumer_secret = consumerSecret, set_renv = TRUE)

Once the Twitter API token is ready, we use the command “search_tweets” to search Twitter for posts containing the keyword “#metoo”. The returned object (100 tweets, no retweets) is in format of data frame and [5,] and $COLUMN_NAME are used respectively to display the 5th row and the columns of the data frame.

metoo <- search_tweets("#metoo", n = 100, include_rts = FALSE)
class(metoo) # Show its data class

## [1] "tbl_df"     "tbl"        "data.frame"

colnames(metoo) # Show all its columns

##  [1] "status_id"              "created_at"            
##  [3] "user_id"                "screen_name"           
##  [5] "text"                   "source"                
##  [7] "reply_to_status_id"     "reply_to_user_id"      
##  [9] "reply_to_screen_name"   "is_quote"              
## [11] "is_retweet"             "favorite_count"        
## [13] "retweet_count"          "hashtags"              
## [15] "symbols"                "urls_url"              
## [17] "urls_t.co"              "urls_expanded_url"     
## [19] "media_url"              "media_t.co"            
## [21] "media_expanded_url"     "media_type"            
## [23] "ext_media_url"          "ext_media_t.co"        
## [25] "ext_media_expanded_url" "ext_media_type"        
## [27] "mentions_user_id"       "mentions_screen_name"  
## [29] "lang"                   "quoted_status_id"      
## [31] "quoted_text"            "retweet_status_id"     
## [33] "retweet_text"           "place_url"             
## [35] "place_name"             "place_full_name"       
## [37] "place_type"             "country"               
## [39] "country_code"           "geo_coords"            
## [41] "coords_coords"          "bbox_coords"

metoo[5,]$text  ### The text of the status

## [1] "I watched The Accused when I was about 5 yrs old. That movie traumatized me. #metoo Rape Porn is illegal. #stopthedemand #stophumantrafficking"

metoo[5,]$screen_name  ### Screen name of the user who posted this status

## [1] "reeciedoll"

metoo[5,]$created_at ### When this status was created

## [1] "2018-01-23 06:46:16 UTC"

metoo[5,]$retweet_count ### The number of times this status has been retweeted

## [1] 0

## Exercise: Try a large number by setting 
##rt <- search_tweets("#metoo", n = 18000)
##rt <- search_tweets("#metoo", n = 25000, retryonratelimit = TRUE)  # beyond 15 mins rate limit

Next, we analyze the profile of handle “metoocenter”, whose aim is to “serves as the central repository of predators and their victim’s stories.” Again, the returned values are in a data frame and we can use $ to show the columns.

metoocenter <- lookup_users("metoocenter")
metoocenter$name # Name of the user

## [1] "MeToo Center - Wiki for Predators"

metoocenter$followers_count # Followers count

## [1] 21405

metoocenter$description # User's description

## [1] "MeToo Center serves as the central repository of predators and their victim's stories. Share your #metoo story no matter how big or how small."

metoocenter$location # Location

## [1] "United States"

It is interesting to check who are following this handle. We deploy the command “get_followers” to collect the list of its followers and use [,] and $ to display the contents of the data frame. Next, we use “lookup_users” to obtain the individual follower’s profile.

metoocenter_folls <- get_followers("metoocenter", n = 5000)
#metoocenter_folls <- get_followers("metoocenter", n = 75000)  # Change n to obtain the complete list
head(metoocenter_folls)  # First 6 followers

## # A tibble: 6 x 1
##   user_id           
##   <chr>             
## 1 955693154736353280
## 2 953133371655208960
## 3 95155620          
## 4 884354502152683520
## 5 908655987514515458
## 6 954039850013024256

metoocenter_folls_data <- lookup_users(metoocenter_folls$user_id) # Obtain the followers' profile
metoocenter_folls_data[1,]$screen_name # Show first follower's screen name

## [1] "brookielyn3"

metoocenter_folls_data[1,]$location # location

## [1] ""

metoocenter_folls_data[1,]$followers_count # follower's coiunt

## [1] 0

class(metoocenter_folls_data$location) # Check the data class of location

## [1] "character"

sort(table(metoocenter_folls_data$location),decreasing = TRUE)[1:15]  # First 15 "locations"

## 
##                     United States  California, USA  Los Angeles, CA 
##             3115               26               22               15 
##     Florida, USA           Canada  London, England   Washington, DC 
##               11               10               10                9 
## Toronto, Ontario     New York, NY Philadelphia, PA              USA 
##                8                7                7                7 
## <U+65E5><U+672C>     Cairo, Egypt      Chicago, IL 
##                6                6                6

class(metoocenter_folls_data$followers_count) # Check the data class of followers_count

## [1] "integer"

summary(metoocenter_folls_data$followers_count) # Show its summary

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##      0.0      5.0     13.0    543.8     62.0 307608.0

Finally, let’s have a look at the Twitter trend data and check the class of the returned data

loc <- trends_available()
sf <- get_trends("San Francisco") # trending topics in san francisco
ny <- get_trends("New York") # trending topics in new york
tk <- get_trends("Tokyo") # trending topics in tokyo
kr <- get_trends("Korea") # trending topics in Hong Kong
ww <- get_trends("Worldwide") # all around the world
class(ny) # Check data class - data.frame

## [1] "tbl_df"     "tbl"        "data.frame"

Last, check the extent to which the Twitter trend of each location is shared with the worldwide one.

sum(ww$trend %in% ny$trend)/length(ww$trend) # Check if trends@WW are trends$NY

## [1] 0.28

sum(ww$trend %in% sf$trend)/length(ww$trend) # Check if trends@WW are trends$SF

## [1] 0.26

sum(ww$trend %in% tk$trend)/length(ww$trend) # Check if trends@WW are trends$TK

## [1] 0.02

sum(ww$trend %in% kr$trend)/length(ww$trend) # Check if trends@WW are trends$KR

## [1] 0.06

sum(ny$trend %in% sf$trend)/length(ww$trend) # Check if trends@NY are trends$SF

## [1] 0.88

JMSC 6116 Lecture 2: Analyzing #MeToo Campaign: Using Twitter API

King-wa Fu

January 22, 2018