Note: To understand some of the technical terms, it would help if the reader has some understanding of cartography and intermediary R programming knowledge.

“The purpose of visualization is insight, not pictures.” - Ben Shneiderman

Motivation

On my last blog plotting large data with ggplot2 I wanted to test visualization with static spaital mapping in R. About a million geocode data from New York City Taxi and Limousine commission were used. The data was collected from taxi’s GPS’s on customer pickup and drop off locations. The commission also makes a shape file available that contains taxi boundaries for all of 261 borough in New York City.

The test, that I dubbed ‘stress’ test, was partially successful, in that I was able to plot about 439,000 data points in the viewable panel, the rest of the plots fall outside of the panel. However, ggmap proved that it is capable of plotting all one million gecodes if we zoom out and fit all the plots. But the static plot with massive data gives you undecipherable overlaps making the visualization unusable.

On this blog I will use the Leaflet, “an open source JavaScript library used to build [interactive] mapping applications.” To plot the 1,000,000 geocodes from the New York Taxi commission.

With out further ado, lets get right to it.

Data Prepration

Will not go over the details of how the data is prepared for plotting on this blog, since the data prepration was discussed on the previous blog. However, a new variable was added for the interactive plot that pupulates the popup, thus the following script starts from there.

Load the libraries and data required to plot the interactive map.

As always, we start by loading the required libraries, data and geojson shape files.

#load library
library("leaflet")          # Create a Leaflet map widget
library("geojsonio")        # Convert various data formats to/from GeoJSON or TopoJSON.
library("dplyr")            # Data cleaning
library("mapview")          # View spatial objects interactively

setwd("~/Documents/Data-Science/Blog/Blog8")
#load the data
df_ride_total <- read.csv("./data/df_ride_total25k.csv")
ny_taxi_zone_geojson <- readLines("./data/taxi_zones.geojson") %>% paste(collapse = "\n")

# map view  for the 
dat1 <- geojson_read("./data/taxi_zones.geojson", what = "sp")
#mapview(dat1)  
#head(dat1)

Add a feature for the geocode popup

Each data point will display its longitude and latitude when clicked with a mouse. Here is the code that adds that feature:

df_ride_total <- df_ride_total %>% mutate( popupInfo1 = paste(
                                            "lat",   round(dropoff_latitude,2), ",",
                                            "long",  round(dropoff_longitude,2)
                                            )
                                  )

Visualizing a million data points on interactive map

Finally we are ready to plot and interactively examine the one million data points. The geocoded data for the taxi’s drop off locations are combined with the shape file. The shape file that came in a geojson format was loaded into R with the geojson_read function from mapview library.

Figure1:



When rendering the plot with ‘knitr’, it takes long time if for your browser if your computer doesn’t have enough memory, therefor the following interactive plot is only for 25,000 goecode points. If the map doesn’t center on New York, you may have to drag the map to center with your mouse, and you can zoom & click on the map and see it in action interactively just like the gif above.

Figure2:

# Keep only the taxi zone for the popup 
pp_leaflet_spatial_1 <- leaflet(df_ride_total) %>% 
                        addTiles(group = "OpenStreetMap.BlackAndWhite (default)")  %>%
                        addProviderTiles("Hydda.Full", group = "Full")  %>%
                        addProviderTiles("Stamen.Toner", group = "Toner")  %>%
                        addProviderTiles("Esri.WorldStreetMap", group = "WorldStreetMap")   %>%
                        setView(lng = -73.97125, lat = 40.78306, zoom = 11) %>%         # geocode("manhattan, NY")
                        addPolygons(data = dat1, popup = popupTable(dat1), color = "green", group = "Outline") %>%          
                        addCircleMarkers( ~dropoff_longitude, 
                                        ~dropoff_latitude, 
                                        group = "Markers",
                                        radius = 5,
                                        color = "red", 
                                        fill = TRUE, 
                                        opacity = 0.8,
                                        popup= ~popupInfo1,
                                        options = popupOptions(closeButton = TRUE),
                                        clusterOptions = markerClusterOptions() 
                                        #icon = icon goes here.
                                        ) %>% addLayersControl(
                                                baseGroups = c("OpenStreetMap.BlackAndWhite (default)",
                                                               "Full", 
                                                               "Toner",
                                                               "WorldStreetMap"
                                                               ), 
                                                                              overlayGroups = c("Markers", "Outline"),
                                                                              position = "topleft"
                                                               )
pp_leaflet_spatial_1


Another feature is layer control. It is located on the top right. When slected it provides a choice of 4 tiles from different map providers. In addition to the openstreet default tile, we have ESRI WorldStreetMap, Stamen Toner and Hydda Full tiles that can be selected. The shape file and the geocode points can also be selected/deselected for different views. These capabilities makes interactive maps more intuitive and allow for a better examination of large amount of data, at different parts of the city.

In addition to gaining interactive location for each dot, on a click of a mouse on taxi regions, the Mapview package reveals the taxi zone numbers and area size. This information is extracted from the geojson file header automatically.

Take Away

As demonstrated, the Leaflet java script library for R, a programming language for statistical computing and graphics, is capable of plotting 1,000,000 data points on small screen. Provided one has large enough memory (RAM) on his/hers computer. Otherwise, a cloud resource can be used to plot interactive map for ‘larger number of data aka ’big data’.

The advantage of interactive mapping for large data points is its ability to cluster, zoom, click on individual data points and apply geojson shape file to accentuate the bordering lines of New York taxi zones. It is more intuitive and useful to examine large geocode data with higher degree of detail.

Contact

If you need consultation on this kind of work, feel free to contact ability.giday@gmail.com.

References

This a fully reproducible markdown document generated using RStudio IDE.