@VaxHuntersCan Twitter Analysis in R

For this analysis, we will capture, process, and visualize tweets from @VaxHuntersCan in order to build a map of Covid19 pop-up vaccines clinics in Canada.

Vaccine Hunters Canada is a volunteer organization helping Canadians navigate the patchwork of Covid19 vaccine clinics, supply, and eligibility criteria. Primarily operating on Twitter & Discord, @VaxHuntersCan has created a crowd-sourced hub for Covid19 vaccine information in Canada. In addition to the usual metadata that the Twitter API gives developers access to, @VaxHuntersCan tweets contain a common data structure from which we can parse useful location, supply, and eligibility information.

This project would not be possible without the great work being done by the creators & maintainers of @VaxHuntersCan.

Authenticate to Twitter API

First, a Twitter developer account is required. Request one here to start building apps that utilize the Twitter API.

Next, create a new app project in your Twitter developer console, and copy and paste your api_key, api_secret_key, access_token, and access_token_secret into the relevant fields of the twitter_auth.R file which we will source in the snippet below and use as environmental variables to generate an auth token.

source("twitter_auth.R")
  
token <- create_token(
  app = "VaxHuntersTweetBot",
  consumer_key = api_key,
  consumer_secret = api_secret_key,
  access_token = access_token,
  access_secret = access_token_secret
)

Pull Tweets from @VaxHuntersCan Timeline

Here, we use the get_timeline() function from the rtweet package to pull tweets from the @VaxHuntersCan Twitter timeline. This function creates a data frame which we will call vax_tweets. We then use the colnames function to take a look at the column headers in the returned data frame.

vax_tweets <- rtweet::get_timeline("VaxHuntersCan", n=3500, parse = TRUE)
colnames(vax_tweets)

Pre-process Tweets

Using a custom helper function unescape_html() in conjunction with sapply(), we start cleaning up the vax_tweets$text column by un-escaping html values using the xml2 package. This cleans up the text of the tweets so that encodings like ‘&amp’ end up displaying as ‘&’ instead. Next, we replace carriage & line return symbols with blank spaces to further clean-up the vax_tweets$text field. Finally we split the date & time fields to allow for easier date & time-series analytics.

# Pre-process tweets by unescaping xml/html values using xml2 package
vax_tweets$text <- sapply(vax_tweets$text, unescape_html, USE.NAMES = FALSE)

# Replace carriage & line returns using stringr
vax_tweets$text <- str_replace_all(vax_tweets$text, "[\r\n]" , " ")

# Split 'created_at' field into date & time fields
vax_tweets$date <- as.Date(vax_tweets$created_at)
vax_tweets$time <- format(vax_tweets$created_at,"%H:%M:%S")

Create a datatable to display and filter full-text tweets.

datatable(head(vax_tweets,3200), 
  options = list(
      columnDefs = list(list(className = 'dt-left', targets = "_all",width='auto')),
      pageLength = 5,
      lengthMenu = c(5,10),
      fontSize='50%'
  ))

Authenticate to Google Cloud Natural Language API

source("googleNLP_auth.R")

Perform Entity Analysis

Next, we send the pre-processed tweets to the Google Natural Language API using the googleLanguageR package. The goal here is to use Google’s NLP algorithms to perform Entity Analysis on the tweets to extract address information from vax_tweets$text and return the list to nlp_result. Because the NLP API has a rate limit of 600 requests/minute, we batch our requests into chunks of 500 and pause for 30 seconds in between requests with the Sys.sleep() command.

#nlp_result1 <- gl_nlp(vax_tweets$text[0:500],nlp_type = "analyzeEntities")  
#Sys.sleep(30)
#nlp_result2 <- gl_nlp(vax_tweets$text[501:1000],nlp_type = "analyzeEntities")  
#Sys.sleep(30)
#nlp_result3 <- gl_nlp(vax_tweets$text[1001:1500],nlp_type = "analyzeEntities") 
#Sys.sleep(30)
#nlp_result4 <- gl_nlp(vax_tweets$text[1501:2000],nlp_type = "analyzeEntities") 
#Sys.sleep(30)
#nlp_result5 <- gl_nlp(vax_tweets$text[2001:2500],nlp_type = "analyzeEntities") 
#Sys.sleep(30)
#nlp_result6 <- gl_nlp(vax_tweets$text[2501:3000],nlp_type = "analyzeEntities") 
#Sys.sleep(30)
#nlp_result7 <- gl_nlp(vax_tweets$text[3001:3500],nlp_type = "analyzeEntities")

#combined <- c(nlp_result1,nlp_result2,nlp_result3,nlp_result4,nlp_result5,nlp_result6,nlp_result7)

#head(nlp_result[["entities"]][[500]]$name,10)
#head(nlp_result[["entities"]][[500]]$type,10)

Authenticate to Mapbox API

Similar to the Twitter authentication step above, here we authenticate with Mapbox which allows us to send the extracted addresses from vax_tweets to the Mapbox (Forward) Geocoding API which takes in addresses, and returns coordinates (lat/lon). The free tier through Mapbox allows for 100,000 API requests per month.

source("mapbox_auth.R")

Format & Send Request to Mapbox Geocoding API

Now that we’ve authenticated to the Mapbox API, we can use the mb_geocode() function to convert street addresses into coordinates (lat/lon) which will allow us to plot coordinates on a leaflet map.

One-off Geocoding Request

geocoded_address <- mb_geocode(
  search_text = "CN Tower",
  endpoint = "mapbox.places",
  limit = 1,
  types = NULL,
  search_within = NULL,
  language = NULL,
  output = "coordinates",
  access_token = access_token)

geocoded_address

## [1] -79.38716  43.64264

Batch Geocoding Request

geocoded_addresses <- lapply(vax_clinic_locations$name, mb_geocode)
geocoded_addresses <- data.frame(matrix(unlist(geocoded_addresses),nrow=length(geocoded_addresses),byrow=TRUE))

str(geocoded_addresses)

## 'data.frame':    170 obs. of  2 variables:
##  $ X1: num  -122.3 -119.6 -79.2 -79.3 -122.8 ...
##  $ X2: num  49 49.5 43.8 43.8 49.1 ...

Create the Vax Clinic Map with Leaflet

Using the leaflet package, we will build an interactive map to visualize the data in our vax_tweets data frame.

    vaxmap <- leaflet() %>%  # Base groups
      addTiles(group = "OSM (default)") %>%
      addProviderTiles(providers$CartoDB.Positron, 
      group = "CartoDB",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
      addProviderTiles(providers$CartoDB.DarkMatter, 
      group = "Dark",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
      addProviderTiles(providers$Esri.WorldImagery, 
      group = "Satellite",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
      addProviderTiles(providers$OpenStreetMap, 
      group = "OSM",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
      addProviderTiles(providers$Stamen.TonerLines, options = providerTileOptions(opacity = 0.35), 
      group = "TonerLines") %>%
      addProviderTiles(providers$CartoDB.VoyagerOnlyLabels, 
      group = "Labels") %>% 
      addLayersControl(baseGroups = c("Satellite","Satellite","Dark","OSM"),
      options = layersControlOptions(collapsed = TRUE),
      overlayGroups = c("TonerLines", "Labels")) %>%
      addAwesomeMarkers(lng = geocoded_addresses$X1,lat = geocoded_addresses$X2,
      clusterOptions = markerClusterOptions(group="Clusters")) %>% 
      setView(-79.38716, 43.64264, 4) %>% 
      enableTileCaching() %>% 
      addFullscreenControl(pseudoFullscreen = F) %>%
      addResetMapButton()

vaxmap