This script attempts to get some insight about the users who follow a particular account on Twitter.
The report looks at that data several ways:
To run this yourself, you do need to do a little bit of setup within Twitter and Google, as well as specify the user you’re working with.
Pay close attention to the first block of the code below, as you will want to adjust that based on what you set up.
The two things you will need to set up are:
The values you get from the above need to get updated in the code below.
# Set the base account you are looking to analyze
tw_account <- "moemkiss"
# You need an OLDER version of ggmap (2.7). So, uncomment and run the following. See
# https://stackoverflow.com/questions/36175529/getting-over-query-limit-after-one-request-with-geocode
# for details.
# devtools::install_github("dkahle/ggmap")
# Load libraries
if (!require("pacman")) install.packages("pacman")
pacman::p_load(rtweet, # accessing the Twitter API
tidyverse, # well... we just always need this
kableExtra, # nicer table formatting
ggmap, # visualizing the maps
scales, # getting commas into numbers... :-(
tm, # text mining
SnowballC, # text mining
wordcloud, # word cloud generation
DT, # Interactive tables
RColorBrewer, # For palettes in the word cloud
plotly) # interactive visualizations
# Set the max # of users to do geo lookups on
geo_count <- 6000
# Label for what this is
main_label <- paste0("Followers of @",tw_account)
##############
# Get the Twitter app credentials. The code below assumes the app name, key, and secret
# are stored in a .Renviron file. But, you can simply replace the "Sys.getenv()" statements
# with hardcoded strings if you wish.
# Name assigned to created app
tw_appname <- Sys.getenv("TWITTER_APPNAME")
# Key and Secret
tw_key <- Sys.getenv("TWITTER_KEY")
tw_secret <- Sys.getenv("TWITTER_SECRET")
# Create the token
tw_token <- create_token(
app = tw_appname,
consumer_key = tw_key,
consumer_secret = tw_secret)
###############
# Get the Google Maps (Geocoding API) credentials. You can query that something like 2,500
# times/day for free, but it's also pretty cheap to up the limits there for a pretty nominal
# cost. Like the Twitter app credentials, you can just hardcode that key if you desire.
# Google Maps API Key
gmaps_key <- Sys.getenv("GOOGLE_MAPS_KEY")
# Now, register with those credentials
register_google(key = gmaps_key)
The code below has three basic steps:
There are some limits to the Twitter API, so if you are analyzing a user with more than 15,000 (I think) followers, you will need to update the code to split up the Twitter requests into batches. And, you will need to have a billing account enabled on your Google App account with limits adjusted for the Geocoding API.
This will take a while to run – primarily for the Geocoding API lookups. But, you’ll be able to watch the lookups flow by in the console as the data gets pulled.
# Get a list of all followers of the user. The default number of followers that will be returned
# is 5,000 max. You can up this by adding an "n=" argument below, but check the ?get_followers
# documentation to be ensure you understand the ramifications.
user_followers <- get_followers(tw_account, token = tw_token)
# Get the user details for each of those followers
followers_details <- lookup_users(user_followers$user_id, parse = TRUE, token = tw_token)
# There seems to be both favourites_count and favorite_count. Documentation is a little
# limited, so we're just going to add them together
followers_details <- followers_details %>%
mutate(favourites_count = ifelse(is.na(favourites_count), 0, favourites_count),
favorite_count = ifelse(is.na(favorite_count), 0, favorite_count)) %>%
mutate(favorites_count = favourites_count + favorite_count) %>%
select(-favourites_count, -favorite_count)
############
# Get the Geo Data
############
# Function to try to figure out the location. This won't be perfect, but, hopefully, will
# get a good enough chunk. We're going to rely on the Google Maps API for this -- it'll do the
# best it can with the location entered. To avoid weird OVER_QUERY_LIMIT use v2.7 and an
# API key: https://stackoverflow.com/questions/36175529/getting-over-query-limit-after-one-request-with-geocode
get_lat_lon <- function(location){
# If the location is null, then don't even try
if(is.na(location)){
lon_lat <- data.frame(lon = NA, lat = NA)
} else {
lon_lat <- geocode(location, source="google", override_limit = 7000)
}
# Return the longitude and latitude
lon_lat
}
# Process the followers_details locations. A chunk of these will come back with no data. This
# bit of code may run for a bit depending on how many followers are being analyzed.
geo_detail <- map_dfr(followers_details$location, get_lat_lon)
# Check that the results are inside the continental U.S. and flag the ones that are as TRUE.
# This is just based on a rectangle, so some bit of Canada and Mexico will sneak in.
geo_detail <- geo_detail %>%
mutate(continental_us = ifelse(lon < -66.9513812 &
lon > -124.7844079 &
lat < 49.3457868 &
lat > 24.7433195,
TRUE, FALSE))
# Add those values into followers_details
followers_details$longitude <- geo_detail$lon
followers_details$latitude <- geo_detail$lat
followers_details$continental_us <- geo_detail$continental_us
followers_details$`# of Followers` <- followers_details$followers_count
The following is an illustration of how many followers the users included in this analysis have, as well as how many total times their tweets have been favorited.
The following is a word cloud of the descriptions users included in this analysis in their user profile. The two bits of cleanup performed are stopword removal and pushing everything to lowercase.
The following uses the location users entered into their Twitter profile – interpreted to a specific location. Note that this is not a required value in Twitter, so this only includes users where a value is entered for their location (and that location can be deciphered by the Google Maps Geocoding API).
Below are a couple of interactive versions of the above maps. The U.S. map still shows followers who are outside the U.S. a bit, but it’s close.
# See https://plot.ly/r/reference/#scattergeo for details -- especially if you want to swap out and
# do different geographic regions (see the "Scope" and "Projections" at that link)
# Interactive world map
# Establish the base world map
world_map <- list(
scope = 'world',
projection = list(type = 'orthographic'),
showland = TRUE,
landcolor = toRGB("gray95"),
subunitwidth = 1,
countrywidth = 1,
subunitcolor = toRGB("white"),
countrycolor = toRGB("white")
)
# Create the world map and output it
plot_world <- plot_geo(followers_details, sizes = c(1, 250)) %>%
add_markers(
x = ~longitude, y = ~latitude, size = ~log(`# of Followers`), hoverinfo = "text",
text = followers_details$screen_name) %>%
layout(geo = world_map)
plot_world
# Interactive U.S. map
# Set the details for the base map
us_map <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showland = TRUE,
landcolor = toRGB("gray95"),
subunitwidth = 1,
countrywidth = 1,
subunitcolor = toRGB("white"),
countrycolor = toRGB("white")
)
# Create the interactive map
plot_us <- plot_geo(followers_details, locationmode = 'USA-states', sizes = c(1, 250)) %>%
add_markers(
x = ~longitude, y = ~latitude, size = ~log(`# of Followers`), hoverinfo = "text",
text = followers_details$screen_name) %>%
layout(geo = us_map)
plot_us
The following is a sortable/searchable table of the list of followers.