For this analysis, we will capture, process, and visualize tweets from @VaxHuntersCan in order to build a map of Covid19 pop-up vaccines clinics in Canada.
Vaccine Hunters Canada is a volunteer organization helping Canadians navigate the patchwork of Covid19 vaccine clinics, supply, and eligibility criteria. Primarily operating on Twitter & Discord, @VaxHuntersCan has created a crowd-sourced hub for Covid19 vaccine information in Canada. In addition to the usual metadata that the Twitter API gives developers access to, @VaxHuntersCan tweets contain a common data structure from which we can parse useful location, supply, and eligibility information.
This project would not be possible without the great work being done by the creators & maintainers of @VaxHuntersCan.
Authenticate to Twitter API
First, a Twitter developer account is required. Request one here to start building apps that utilize the Twitter API.
Next, create a new app project in your Twitter developer console, and copy and paste your api_key
, api_secret_key
, access_token
, and access_token_secret
into the relevant fields of the twitter_auth.R
file which we will source
in the snippet below and use as environmental variables to generate an auth token.
source("twitter_auth.R")
<- create_token(
token app = "VaxHuntersTweetBot",
consumer_key = api_key,
consumer_secret = api_secret_key,
access_token = access_token,
access_secret = access_token_secret
)
Pull Tweets from @VaxHuntersCan Timeline
Here, we use the get_timeline()
function from the rtweet
package to pull tweets from the @VaxHuntersCan Twitter timeline. This function creates a data frame which we will call vax_tweets
. We then use the colnames
function to take a look at the column headers in the returned data frame.
<- rtweet::get_timeline("VaxHuntersCan", n=3500, parse = TRUE)
vax_tweets colnames(vax_tweets)
Pre-process Tweets
Using a custom helper function unescape_html()
in conjunction with sapply()
, we start cleaning up the vax_tweets$text
column by un-escaping html values using the xml2
package. This cleans up the text of the tweets so that encodings like ‘&’ end up displaying as ‘&’ instead. Next, we replace carriage & line return symbols with blank spaces to further clean-up the vax_tweets$text
field. Finally we split the date & time fields to allow for easier date & time-series analytics.
# Pre-process tweets by unescaping xml/html values using xml2 package
$text <- sapply(vax_tweets$text, unescape_html, USE.NAMES = FALSE)
vax_tweets
# Replace carriage & line returns using stringr
$text <- str_replace_all(vax_tweets$text, "[\r\n]" , " ")
vax_tweets
# Split 'created_at' field into date & time fields
$date <- as.Date(vax_tweets$created_at)
vax_tweets$time <- format(vax_tweets$created_at,"%H:%M:%S") vax_tweets
Create a datatable
to display and filter full-text tweets.
datatable(head(vax_tweets,3200),
options = list(
columnDefs = list(list(className = 'dt-left', targets = "_all",width='auto')),
pageLength = 5,
lengthMenu = c(5,10),
fontSize='50%'
))
Authenticate to Google Cloud Natural Language API
source("googleNLP_auth.R")
Perform Entity Analysis
Next, we send the pre-processed tweets to the Google Natural Language API using the googleLanguageR
package. The goal here is to use Google’s NLP algorithms to perform Entity Analysis on the tweets to extract address information from vax_tweets$text
and return the list to nlp_result
. Because the NLP API has a rate limit of 600 requests/minute, we batch our requests into chunks of 500 and pause for 30 seconds in between requests with the Sys.sleep()
command.
#nlp_result1 <- gl_nlp(vax_tweets$text[0:500],nlp_type = "analyzeEntities")
#Sys.sleep(30)
#nlp_result2 <- gl_nlp(vax_tweets$text[501:1000],nlp_type = "analyzeEntities")
#Sys.sleep(30)
#nlp_result3 <- gl_nlp(vax_tweets$text[1001:1500],nlp_type = "analyzeEntities")
#Sys.sleep(30)
#nlp_result4 <- gl_nlp(vax_tweets$text[1501:2000],nlp_type = "analyzeEntities")
#Sys.sleep(30)
#nlp_result5 <- gl_nlp(vax_tweets$text[2001:2500],nlp_type = "analyzeEntities")
#Sys.sleep(30)
#nlp_result6 <- gl_nlp(vax_tweets$text[2501:3000],nlp_type = "analyzeEntities")
#Sys.sleep(30)
#nlp_result7 <- gl_nlp(vax_tweets$text[3001:3500],nlp_type = "analyzeEntities")
#combined <- c(nlp_result1,nlp_result2,nlp_result3,nlp_result4,nlp_result5,nlp_result6,nlp_result7)
#head(nlp_result[["entities"]][[500]]$name,10)
#head(nlp_result[["entities"]][[500]]$type,10)
Authenticate to Mapbox API
Similar to the Twitter authentication step above, here we authenticate with Mapbox which allows us to send the extracted addresses from vax_tweets
to the Mapbox (Forward) Geocoding API which takes in addresses, and returns coordinates (lat/lon). The free tier through Mapbox allows for 100,000 API requests per month.
source("mapbox_auth.R")
Format & Send Request to Mapbox Geocoding API
Now that we’ve authenticated to the Mapbox API, we can use the mb_geocode()
function to convert street addresses into coordinates (lat/lon) which will allow us to plot coordinates on a leaflet
map.
One-off Geocoding Request
<- mb_geocode(
geocoded_address search_text = "CN Tower",
endpoint = "mapbox.places",
limit = 1,
types = NULL,
search_within = NULL,
language = NULL,
output = "coordinates",
access_token = access_token)
geocoded_address
## [1] -79.38716 43.64264
Batch Geocoding Request
<- lapply(vax_clinic_locations$name, mb_geocode)
geocoded_addresses <- data.frame(matrix(unlist(geocoded_addresses),nrow=length(geocoded_addresses),byrow=TRUE))
geocoded_addresses
str(geocoded_addresses)
## 'data.frame': 170 obs. of 2 variables:
## $ X1: num -122.3 -119.6 -79.2 -79.3 -122.8 ...
## $ X2: num 49 49.5 43.8 43.8 49.1 ...
Create the Vax Clinic Map with Leaflet
Using the leaflet
package, we will build an interactive map to visualize the data in our vax_tweets
data frame.
<- leaflet() %>% # Base groups
vaxmap addTiles(group = "OSM (default)") %>%
addProviderTiles(providers$CartoDB.Positron,
group = "CartoDB",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
addProviderTiles(providers$CartoDB.DarkMatter,
group = "Dark",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
addProviderTiles(providers$Esri.WorldImagery,
group = "Satellite",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
addProviderTiles(providers$OpenStreetMap,
group = "OSM",tileOptions(useCache=TRUE, crossOrigin=T)) %>%
addProviderTiles(providers$Stamen.TonerLines, options = providerTileOptions(opacity = 0.35),
group = "TonerLines") %>%
addProviderTiles(providers$CartoDB.VoyagerOnlyLabels,
group = "Labels") %>%
addLayersControl(baseGroups = c("Satellite","Satellite","Dark","OSM"),
options = layersControlOptions(collapsed = TRUE),
overlayGroups = c("TonerLines", "Labels")) %>%
addAwesomeMarkers(lng = geocoded_addresses$X1,lat = geocoded_addresses$X2,
clusterOptions = markerClusterOptions(group="Clusters")) %>%
setView(-79.38716, 43.64264, 4) %>%
enableTileCaching() %>%
addFullscreenControl(pseudoFullscreen = F) %>%
addResetMapButton()
vaxmap