Note: To understand some of the technical terms, it would help if the reader has some understanding of cartography and intermediary R programming knowledge.
“The purpose of visualization is insight, not pictures.” - Ben Shneiderman
On my last blog plotting large data with ggplot2 I wanted to test visualization with static spaital mapping in R. About a million geocode data from New York City Taxi and Limousine commission were used. The data was collected from taxi’s GPS’s on customer pickup and drop off locations. The commission also makes a shape file available that contains taxi boundaries for all of 261 borough in New York City.
The test, that I dubbed ‘stress’ test, was partially successful, in that I was able to plot about 439,000 data points in the viewable panel, the rest of the plots fall outside of the panel. However, ggmap proved that it is capable of plotting all one million gecodes if we zoom out and fit all the plots. But the static plot with massive data gives you undecipherable overlaps making the visualization unusable.
On this blog I will use the Leaflet, “an open source JavaScript library used to build [interactive] mapping applications.” To plot the 1,000,000 geocodes from the New York Taxi commission.
With out further ado, lets get right to it.
Will not go over the details of how the data is prepared for plotting on this blog, since the data prepration was discussed on the previous blog. However, a new variable was added for the interactive plot that pupulates the popup, thus the following script starts from there.
As always, we start by loading the required libraries, data and geojson shape files.
#load library
library("leaflet") # Create a Leaflet map widget
library("geojsonio") # Convert various data formats to/from GeoJSON or TopoJSON.
library("dplyr") # Data cleaning
library("mapview") # View spatial objects interactively
setwd("~/Documents/Data-Science/Blog/Blog8")
#load the data
df_ride_total <- read.csv("./data/df_ride_total25k.csv")
ny_taxi_zone_geojson <- readLines("./data/taxi_zones.geojson") %>% paste(collapse = "\n")
# map view for the
dat1 <- geojson_read("./data/taxi_zones.geojson", what = "sp")
#mapview(dat1)
#head(dat1)Each data point will display its longitude and latitude when clicked with a mouse. Here is the code that adds that feature:
df_ride_total <- df_ride_total %>% mutate( popupInfo1 = paste(
"lat", round(dropoff_latitude,2), ",",
"long", round(dropoff_longitude,2)
)
)Finally we are ready to plot and interactively examine the one million data points. The geocoded data for the taxi’s drop off locations are combined with the shape file. The shape file that came in a geojson format was loaded into R with the geojson_read function from mapview library.
When rendering the plot with ‘knitr’, it takes long time if for your browser if your computer doesn’t have enough memory, therefor the following interactive plot is only for 25,000 goecode points. If the map doesn’t center on New York, you may have to drag the map to center with your mouse, and you can zoom & click on the map and see it in action interactively just like the gif above.
# Keep only the taxi zone for the popup
pp_leaflet_spatial_1 <- leaflet(df_ride_total) %>%
addTiles(group = "OpenStreetMap.BlackAndWhite (default)") %>%
addProviderTiles("Hydda.Full", group = "Full") %>%
addProviderTiles("Stamen.Toner", group = "Toner") %>%
addProviderTiles("Esri.WorldStreetMap", group = "WorldStreetMap") %>%
setView(lng = -73.97125, lat = 40.78306, zoom = 11) %>% # geocode("manhattan, NY")
addPolygons(data = dat1, popup = popupTable(dat1), color = "green", group = "Outline") %>%
addCircleMarkers( ~dropoff_longitude,
~dropoff_latitude,
group = "Markers",
radius = 5,
color = "red",
fill = TRUE,
opacity = 0.8,
popup= ~popupInfo1,
options = popupOptions(closeButton = TRUE),
clusterOptions = markerClusterOptions()
#icon = icon goes here.
) %>% addLayersControl(
baseGroups = c("OpenStreetMap.BlackAndWhite (default)",
"Full",
"Toner",
"WorldStreetMap"
),
overlayGroups = c("Markers", "Outline"),
position = "topleft"
)
pp_leaflet_spatial_1In addition to gaining interactive location for each dot, on a click of a mouse on taxi regions, the Mapview package reveals the taxi zone numbers and area size. This information is extracted from the geojson file header automatically.
As demonstrated, the Leaflet java script library for R, a programming language for statistical computing and graphics, is capable of plotting 1,000,000 data points on small screen. Provided one has large enough memory (RAM) on his/hers computer. Otherwise, a cloud resource can be used to plot interactive map for ‘larger number of data aka ’big data’.
The advantage of interactive mapping for large data points is its ability to cluster, zoom, click on individual data points and apply geojson shape file to accentuate the bordering lines of New York taxi zones. It is more intuitive and useful to examine large geocode data with higher degree of detail.
If you need consultation on this kind of work, feel free to contact ability.giday@gmail.com.
This a fully reproducible markdown document generated using RStudio IDE.