Web Scraping, with Web APIs, from New York Times Developer API , Data Importing Exploration And Analysis Of the Events DataSet Available Through New York Times Events API
knitr::opts_chunk$set(message = FALSE, echo = TRUE)
# Library for string manipulation/regex operations
library(stringr)
# Library for data display in tabular format
library(DT)
# Library to gather (to long format) and spread (to wide format) data, to tidy data
library(tidyr)
# Library to filter, transform data
library(dplyr)
# Library to plot
library(ggplot2)
library(knitr)
# Library for loading data
library(jsonlite)
library(ggmap)
library(ggrepel)New York Times has developer APIs to access the different datasets available to explore. The Events API is chosen to access Events data. Only Comedy and Dance Events occurring in the vicinity of 2500 meters radius , over the upcoming weekend i.e. next three days, form Nov 4, 2016 to Nov 6 2016 are considered.
Following parameters are considered for filtering the events data
Parameters Considered
apikey <- c("837a0f631c0442a6823a690e4900e106")
nytevent.baseurl <- "https://api.nytimes.com/svc/events/v2/listings.json?"
latlongparam <- c("40.7589,-73.9851")
radiusparam <- c("2500")
categoryparam <- c("(Comedy+Dance)")
daterangeparam <- c("2016-11-04:2016-11-06")
timeschoiceparam <- c("true")Forming URL To Web Scrape The Events Data
The parameters / filters are attached as query string and url is formed.
nytevent.url <- paste0(nytevent.baseurl, "api-key=", apikey, "&ll", latlongparam,
"&radius", radiusparam, "&filters=", "category:", categoryparam, ",", "times_pick:",
timeschoiceparam, "&date_range:", daterangeparam)
nytevent.url## [1] "https://api.nytimes.com/svc/events/v2/listings.json?api-key=837a0f631c0442a6823a690e4900e106&ll40.7589,-73.9851&radius2500&filters=category:(Comedy+Dance),times_pick:true&date_range:2016-11-04:2016-11-06"
Importing Data New York Times Website
The Events data is available in JSON format. Based on the parameters considered, the url is formed and data imported into R.
nytevent.jsonstr <- paste(readLines(nytevent.url), collapse = "")
nyteventdata <- fromJSON(nytevent.jsonstr)
names(nyteventdata)## [1] "status" "copyright" "num_results" "results"
class(nyteventdata)## [1] "list"
Extracting Imported JSON Data Into R Data Frame
nyceventsnearTSQ <- as.data.frame(nyteventdata$results)nyceventsnearTSQ <- nyceventsnearTSQ %>% select(event_id, event_name, event_detail_url,
web_description, venue_name, geocode_latitude, geocode_longitude, street_address,
category, date_time_description)
nyceventsnearTSQ <- nyceventsnearTSQ %>% mutate(web_description = paste(c(substr(web_description,
start = 1, stop = 250)), "..."))
colnames(nyceventsnearTSQ) <- c("Id", "Event", "URL", "About", "Venue", "Latitude",
"Longitude", "Address", "Category", "DateTime")
datatable(nyceventsnearTSQ)Mapping the various Comedy and Dance Events to occur over upcoming weekend to NYC Map across boroughs.
lon <- as.numeric(as.character(nyceventsnearTSQ$Longitude))
lat <- as.numeric(as.character(nyceventsnearTSQ$Latitude))
nycmap <- get_map("New York City", maptype = "roadmap", source = "google", zoom = 12)
mapPoints <- ggmap(nycmap) + geom_point(aes(x = lon, y = lat, color = nyceventsnearTSQ$Category), , size = 3, data = nyceventsnearTSQ,
alpha = 0.5)
mapPoints <- mapPoints + xlab("Longitude") + ylab("Latitude")
mapPoints <- mapPoints + geom_label_repel(data = nyceventsnearTSQ, aes(x = lon, y = lat, label = paste(nyceventsnearTSQ$Event,
"@", nyceventsnearTSQ$Venue)), fill = "white", box.padding = unit(0.4, "lines"), label.padding = unit(0.1, "lines"),
segment.color = "red", segment.size = 1)
mapPoints