This script attempts to get some insight about the users who follow a particular account on Twitter.

The report looks at that data several ways:

Important Configuration Information

To run this yourself, you do need to do a little bit of setup within Twitter and Google, as well as specify the user you’re working with.

Pay close attention to the first block of the code below, as you will want to adjust that based on what you set up.

The two things you will need to set up are:

The values you get from the above need to get updated in the code below.

# Set the base account you are looking to analyze
tw_account <- "moemkiss"

# You need an OLDER version of ggmap (2.7). So, uncomment and run the following. See
# https://stackoverflow.com/questions/36175529/getting-over-query-limit-after-one-request-with-geocode
# for details.
# devtools::install_github("dkahle/ggmap")

# Load libraries
if (!require("pacman")) install.packages("pacman")
pacman::p_load(rtweet,          # accessing the Twitter API
               tidyverse,       # well... we just always need this
               kableExtra,      # nicer table formatting
               ggmap,           # visualizing the maps
               scales,          # getting commas into numbers... :-(
               tm,              # text mining
               SnowballC,       # text mining
               wordcloud,       # word cloud generation
               DT,              # Interactive tables
               RColorBrewer,    # For palettes in the word cloud
               plotly)          # interactive visualizations

# Set the max # of users to do geo lookups on
geo_count <- 6000

# Label for what this is
main_label <- paste0("Followers of @",tw_account)

##############
# Get the Twitter app credentials. The code below assumes the app name, key, and secret
# are stored in a .Renviron file. But, you can simply replace the "Sys.getenv()" statements
# with hardcoded strings if you wish.

# Name assigned to created app
tw_appname <- Sys.getenv("TWITTER_APPNAME")

# Key and Secret
tw_key <- Sys.getenv("TWITTER_KEY")
tw_secret <- Sys.getenv("TWITTER_SECRET")

# Create the token
tw_token <- create_token(
    app = tw_appname,
    consumer_key = tw_key,
    consumer_secret = tw_secret)

###############
# Get the Google Maps (Geocoding API) credentials. You can query that something like 2,500
# times/day for free, but it's also pretty cheap to up the limits there for a pretty nominal
# cost. Like the Twitter app credentials, you can just hardcode that key if you desire.

# Google Maps API Key
gmaps_key <- Sys.getenv("GOOGLE_MAPS_KEY")

# Now, register with those credentials
register_google(key = gmaps_key)

Get the Data

The code below has three basic steps:

  1. Pull a list of all of the followers of the user specified in the earlier code
  2. Pull the profile details for each of those users
  3. Attempt to get the latitude and longitude for each of the followers based on the “Location” value they entered for their profile

There are some limits to the Twitter API, so if you are analyzing a user with more than 15,000 (I think) followers, you will need to update the code to split up the Twitter requests into batches. And, you will need to have a billing account enabled on your Google App account with limits adjusted for the Geocoding API.

This will take a while to run – primarily for the Geocoding API lookups. But, you’ll be able to watch the lookups flow by in the console as the data gets pulled.

# Get a list of all followers of the user. The default number of followers that will be returned
# is 5,000 max. You can up this by adding an "n=" argument below, but check the ?get_followers 
# documentation to be ensure you understand the ramifications.
user_followers <- get_followers(tw_account, token = tw_token)

# Get the user details for each of those followers
followers_details <-  lookup_users(user_followers$user_id, parse = TRUE, token = tw_token)

# There seems to be both favourites_count and favorite_count. Documentation is a little 
# limited, so we're just going to add them together
followers_details <- followers_details %>% 
  mutate(favourites_count = ifelse(is.na(favourites_count), 0, favourites_count),
         favorite_count = ifelse(is.na(favorite_count), 0, favorite_count)) %>% 
  mutate(favorites_count = favourites_count + favorite_count) %>% 
  select(-favourites_count, -favorite_count)

############
# Get the Geo Data
############

# Function to try to figure out the location. This won't be perfect, but, hopefully, will
# get a good enough chunk. We're going to rely on the Google Maps API for this -- it'll do the
# best it can with the location entered. To avoid weird OVER_QUERY_LIMIT use v2.7 and an
# API key: https://stackoverflow.com/questions/36175529/getting-over-query-limit-after-one-request-with-geocode

get_lat_lon <- function(location){
  
  # If the location is null, then don't even try
  if(is.na(location)){
    lon_lat <- data.frame(lon = NA, lat = NA)
  } else {
    lon_lat <- geocode(location, source="google", override_limit = 7000)
  }
  
  # Return the longitude and latitude
  lon_lat
}


# Process the followers_details locations. A chunk of these will come back with no data. This
# bit of code may run for a bit depending on how many followers are being analyzed.
geo_detail <- map_dfr(followers_details$location, get_lat_lon)

# Check that the results are inside the continental U.S. and flag the ones that are as TRUE.
# This is just based on a rectangle, so some bit of Canada and Mexico will sneak in.
geo_detail <- geo_detail %>% 
  mutate(continental_us = ifelse(lon < -66.9513812 & 
                                   lon > -124.7844079 & 
                                   lat < 49.3457868 & 
                                   lat > 24.7433195, 
                                 TRUE, FALSE))

# Add those values into followers_details
followers_details$longitude <- geo_detail$lon
followers_details$latitude <- geo_detail$lat
followers_details$continental_us <- geo_detail$continental_us
followers_details$`# of Followers` <- followers_details$followers_count

How Many Followers / How Many Favorites

The following is an illustration of how many followers the users included in this analysis have, as well as how many total times their tweets have been favorited.

How the Users Describe Themselves

The following is a word cloud of the descriptions users included in this analysis in their user profile. The two bits of cleanup performed are stopword removal and pushing everything to lowercase.

Where Users Are Located

The following uses the location users entered into their Twitter profile – interpreted to a specific location. Note that this is not a required value in Twitter, so this only includes users where a value is entered for their location (and that location can be deciphered by the Google Maps Geocoding API).

Interactive Map(s)

Below are a couple of interactive versions of the above maps. The U.S. map still shows followers who are outside the U.S. a bit, but it’s close.

# See https://plot.ly/r/reference/#scattergeo for details -- especially if you want to swap out and
# do different geographic regions (see the "Scope" and "Projections" at that link)

# Interactive world map

# Establish the base world map
world_map <- list(
  scope = 'world',
  projection = list(type = 'orthographic'),
  showland = TRUE,
  landcolor = toRGB("gray95"),
  subunitwidth = 1,
  countrywidth = 1,
  subunitcolor = toRGB("white"),
  countrycolor = toRGB("white")
)

# Create the world map and output it
plot_world <- plot_geo(followers_details, sizes = c(1, 250)) %>%
  add_markers(
    x = ~longitude, y = ~latitude, size = ~log(`# of Followers`), hoverinfo = "text",
    text = followers_details$screen_name) %>% 
  layout(geo = world_map)

plot_world
# Interactive U.S. map

# Set the details for the base map
us_map <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showland = TRUE,
  landcolor = toRGB("gray95"),
  subunitwidth = 1,
  countrywidth = 1,
  subunitcolor = toRGB("white"),
  countrycolor = toRGB("white")
)

# Create the interactive map
plot_us <- plot_geo(followers_details, locationmode = 'USA-states', sizes = c(1, 250)) %>%
  add_markers(
    x = ~longitude, y = ~latitude, size = ~log(`# of Followers`), hoverinfo = "text",
    text = followers_details$screen_name) %>%
  layout(geo = us_map)

plot_us

List of Followers

The following is a sortable/searchable table of the list of followers.