Many cities are now providing city data for public viewing online. Virginia Beach, VA is one of them.

As a data science enthusiast, I wanted to learn a little bit more about the data.

The Virginia Beach Police Calls for Service dataset looked like a good place to start.

Even though this web site provides tools for basic exploration of the data, I wanted to take it a step further. So let’s see what else we can learn from the data.

While reading articles on R-bloggers, I stumbled upon the term “Descriptive Analytics”. This lead me to a series of exercises by Vasileios Tsakalos. Descriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?” The goal is to inform the user about what is going on at the dataset. I liked how he structured his exercises and his analysis. I found it clear and easy to understand and based this analysis on it.

Let’s get to work.

0. Data Exploration

Download the data in JSON format. For this example, I’m working with a local copy of the file.

This article on ZevRoss provides an overview on parsing JSON data from an open data portal.

On this site, the author used RJSONIO’s fromJSON to read the JSON file. I found the extraction of the data especially the location information rather complex. After some investigating, I opted to give rList’s list.load a try.

#vb_data <- list.load("https://data.vbgov.com/api/views/7gep-fpmg/rows.json?accessType=DOWNLOAD")
#vb_data <- fromJSON("./data/vb_police_calls.json", simplifyVector=FALSE)
vb_data_raw <- list.load("./data/vb_police_calls.json")

The JSON data file is composed of 2 sections: meta, data.

The meta section contains descriptive and administrative metadata about the JSON file while the data section contains the actual information to analyze.

Let’s learn a little bit about the JSON file.

info <- vb_data_raw[[names(vb_data_raw['meta'])]]
# JSON meta columns
names(info$view)
##  [1] "id"                       "name"                    
##  [3] "attribution"              "averageRating"           
##  [5] "category"                 "createdAt"               
##  [7] "description"              "displayType"             
##  [9] "downloadCount"            "hideFromCatalog"         
## [11] "hideFromDataJson"         "indexUpdatedAt"          
## [13] "licenseId"                "newBackend"              
## [15] "numberOfComments"         "oid"                     
## [17] "provenance"               "publicationAppendEnabled"
## [19] "publicationDate"          "publicationGroup"        
## [21] "publicationStage"         "rowClass"                
## [23] "rowsUpdatedAt"            "rowsUpdatedBy"           
## [25] "tableId"                  "totalTimesRated"         
## [27] "viewCount"                "viewLastModified"        
## [29] "viewType"                 "columns"                 
## [31] "grants"                   "license"                 
## [33] "metadata"                 "owner"                   
## [35] "query"                    "rights"                  
## [37] "tableAuthor"              "tags"                    
## [39] "flags"
# show description of JSON file
info$view$description
## [1] "This dataset includes information on calls for service to the Virginia Beach Police Department.  Calls for service are requests from citizens for police assistance.  Calls can range from minor problems in the neighborhood (traffic complaints, loud neighbors, and graffiti) to the most serious crimes (burglaries, robberies, and homicides).  Call descriptions are based upon the information  received at the time of dispatch."
# show license
info$view$license
## $name
## [1] "Public Domain"
# show type of data
info$view$category
## [1] "Public Safety"
# show JSON file date, e.g. indexUpdateAt
as.POSIXct(info$view$indexUpdatedAt, origin="1970-01-01")
## [1] "2017-10-14 08:39:35 EDT"

Data Structure

The relevant columns are as follows:

Index Column Name Description
9 Incident Number This number is generated by the computer-aided dispatch system (CAD) and provides a unique ID number for each call for service.
10 Report Number The report number assigned to the call by the Virginia Beach Police Department. Only incidents that result in a police report will be assigned a report number.
11 Call Type Provides the type of call as described by the individual making the call for service. Upon investigation by responding officers, the original call type may be amended and captured as the “Final Call Type.”
12 Zone Reporting District ID used by the Virginia Beach Police Department.
13 Call Disposition This describes how the call was resolved. A list of all case dispositions can be downloaded on the dataset information page.
14 Priority There are five priority codes depending on the severity of the incident. For a description of all priority codes, download the ‘Priority Descriptions’ document on the dataset information page.
15 Subdivision These are subdivisions or neighborhood areas that have been delineated by the Virginia Beach Police Department. To view the map, download the ‘Subdivision Map’ file on the dataset information page.
16 Call Date The date and time the call was received.
17 Entry Date The time the call was entered into the computer-aided dispatch (CAD) system.
18 Dispatch Date The time the call was initially dispatched.
19 Enroute Date Represents the first en route time captured. If multiple units are responding, this time represents the first unit classified as en route.
20 On Scene Date The time the first unit arrives on scene.
21 Close Date Time that the call is concluded.
22 Location Info This includes approximate (down to the street block level) location information where the incident occurred.

The rList’s list.select provides a straightforward method to extract the crime data. It also performs the action as a single operation unlike in the ZevRoss example.

# extract the relevant columns into a data frame plus lat/lng from location info
crimesDF <- vb_data_raw[['data']] %>>%
  list.select(.[9]
              , .[10]
              , .[11]
              , .[12]
              , .[13]
              , .[14]
              , .[15]
              , .[16]
              , .[17]
              , .[18]
              , .[19]
              , .[20]
              , .[21]
              , (.[22] %>>% list.map(.[2]))
              , (.[22] %>>% list.map(.[3]))) %>>%
  list.stack()

# set the column names
colnames(crimesDF) <- c("incident_num"
                        , "report_num"
                        , "call_type"
                        , "zone"
                        , "call_disposition"
                        , "priority"
                        , "subdivision"
                        , "call_date"
                        , "entry_date"
                        , "dispatch_date"
                        , "enroute_date"
                        , "onscene_date"
                        , "close_date"
                        , "lat"
                        , "lng")

1. Data Imputation

In statistics, imputation is the process of replacing missing data with substituted values.

Clean up NULL values. Ensure numeric fields are numeric. Ensure data fields are valid and date fields consistent.

# init NULL report numbers to -1
crimesDF$report_num[crimesDF$report_num == "NULL"] <- -1

# init NULL call dispositions to <UNDEF>
crimesDF$call_disposition[crimesDF$call_disposition == "NULL"] <- "<UNDEF>"

# init NULL call types to -1
crimesDF$call_type[crimesDF$call_type == "NULL"] <- -1

# init NULL subdivisions to <UNDEF>
crimesDF$subdivision[crimesDF$subdivision == "NULL"] <- "<UNDEF>"


# format incident number, report number, lat/lng columns as number
crimesDF[, c(1,2,14:15)] <- lapply(crimesDF[, c(1,2,14:15)], as.numeric)

# format date columns
dateFormat <- "%Y-%m-%dT%H:%M:%S"
dateCols <- names(crimesDF[,grep("_date", names(crimesDF))])
crimesDF[, dateCols] <- lapply(crimesDF[, dateCols]
                        , function(x) as.POSIXct(strptime(as.character(x)
                                                          , format=dateFormat)
                                                 , format=dateFormat))

# define factor variables
crimesDF[c(4)] <- factor(unlist(crimesDF[[c(4)]]))
crimesDF[c(6)] <- factor(unlist(crimesDF[[c(6)]]))

2. Data Formatting

#### Column names
names(crimesDF)
##  [1] "incident_num"     "report_num"       "call_type"       
##  [4] "zone"             "call_disposition" "priority"        
##  [7] "subdivision"      "call_date"        "entry_date"      
## [10] "dispatch_date"    "enroute_date"     "onscene_date"    
## [13] "close_date"       "lat"              "lng"
#### Basic structure
glimpse(crimesDF)
## Observations: 976,699
## Variables: 15
## $ incident_num     <dbl> 151590600, 160250724, 170130295, 162170127, 1...
## $ report_num       <dbl> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -...
## $ call_type        <list> ["INFORMATION", "TRAFFIC STOP", "ASSIST CITI...
## $ zone             <fctr> 427, 126, 328, 122, 125, 324, 321, 421, 225,...
## $ call_disposition <list> ["Miscellaneous/Other", "Miscellaneous/Other...
## $ priority         <fctr> 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 4, 3, 3, 4,...
## $ subdivision      <list> ["WINDSOR OAKS WEST", "CARRIAGE HILL/BROOKWO...
## $ call_date        <dttm> 2015-06-08 16:06:23, 2016-01-25 21:44:36, 20...
## $ entry_date       <dttm> 2015-06-08 16:06:23, 2016-01-25 21:44:36, 20...
## $ dispatch_date    <dttm> 2015-06-08 16:06:23, 2016-01-25 21:44:36, 20...
## $ enroute_date     <dttm> 2015-06-08 16:06:23, 2016-01-25 21:44:36, 20...
## $ onscene_date     <dttm> 2015-06-08 16:06:23, 2016-01-25 21:44:36, 20...
## $ close_date       <dttm> 2015-06-08 16:52:54, 2016-01-25 21:55:48, 20...
## $ lat              <dbl> 36.82396, 36.83911, 36.86787, 36.73692, NA, 3...
## $ lng              <dbl> -76.10523, -76.06816, -76.13274, -76.02077, N...
set.seed(29)
#### random data sample
crimesDF %>% sample_n(5)
##        incident_num report_num                       call_type zone
## 97359     153450886         -1 OFFICER INITIATED INVESTIGATION  223
## 235289    171730057         -1                    TRAFFIC STOP  420
## 100809    161670505         -1       CELLULAR 911 HANG-UP CALL  427
## 317997    170831022         -1                    TRAFFIC STOP  228
## 571356    152780216   15041418                         LARCENY  327
##           call_disposition priority                           subdivision
## 97359            No Report        5                    HILLTOP RIDGE APTS
## 235289 Miscellaneous/Other        3          COLLEGE PARK SQUARE SHPG CTR
## 100809           No Report        3 BEHIND WINDSOR WOODS IN WINDSOR WOODS
## 317997 Miscellaneous/Other        3                               SEATACK
## 571356      Offense Report        4                   BAYSIDE HIGH SCHOOL
##                  call_date          entry_date       dispatch_date
## 97359  2015-12-11 22:10:28 2015-12-11 22:10:28 2015-12-11 22:10:28
## 235289 2017-06-22 01:41:40 2017-06-22 01:41:40 2017-06-22 01:41:40
## 100809 2016-06-15 14:54:21 2016-06-15 14:54:57 2016-06-15 14:55:43
## 317997 2017-03-24 22:42:30 2017-03-24 22:42:30 2017-03-24 22:42:30
## 571356 2015-10-05 10:10:26 2015-10-05 10:10:26 2015-10-05 10:10:26
##               enroute_date        onscene_date          close_date
## 97359  2015-12-11 22:10:28 2015-12-11 22:10:28 2015-12-11 23:59:27
## 235289 2017-06-22 01:41:40 2017-06-22 01:41:40 2017-06-22 01:49:40
## 100809 1899-12-30 00:00:00 1899-12-30 00:00:00 2016-06-15 14:55:46
## 317997 2017-03-24 22:42:30 2017-03-24 22:42:30 2017-03-24 22:46:56
## 571356 2015-10-05 10:10:26 2015-10-05 10:10:26 2015-10-05 10:10:42
##             lat       lng
## 97359  36.84947 -76.01617
## 235289       NA        NA
## 100809 36.83401 -76.09659
## 317997 36.81512 -75.99707
## 571356 36.87003 -76.14763

Use the Nominatim API to extract extra location information. Other API’s are available. But this one provides detailed neighborhood information, ZIP codes, and was affordable (FREE).

###################################################
# Helper functions to format OpenStreetMap API 
# request.
###################################################

formatReq <- function(lat, lon) {
  paste0(
  "lat=",
  lat,
  "&lon=",
  lon,
  "&zoom=18&addressdetails=1"
  , apiEmail
  )
}

getLocationDetail <- function(lat, lon) {
  
  URL <- "http://nominatim.openstreetmap.org/reverse?format=json&"
  #response <- GET(paste0(URL, formatReq(lat, lon)))
  response <- http_error(GET(paste0(URL, formatReq(lat, lon)))) 
  
  if (!response) {
    json_data <- jsonlite::fromJSON(paste0(URL, formatReq(lat, lon)), simplifyVector = FALSE)
  
    Sys.sleep(2) # added to alleviate [to a degreee] timeout problems
 
    return (json_data)
  }
}

A local Postgres database was used to cache location information from Nominatim OpenStreetMap. This reduces the number of Nominatim API calls.

The Postgres table to store this information is defined as follows:

  CREATE TABLE location_info (
     ID serial NOT NULL PRIMARY KEY,
     lat NUMERIC(8,6) NOT NULL,
     lng NUMERIC(9,6) NOT NULL,
     info json NOT NULL
  );

  ALTER TABLE public.location_info
    ADD CONSTRAINT location_info_uk1 UNIQUE (lat, lng);

Extract location information from the Postgres database.

options(java.parameters = "-Xmx8048m")
drv = JDBC("org.postgresql.Driver", "../../lib/postgresql-9.4.1212.jar")
con <- dbConnect(drv
                 , url="jdbc:postgresql://localhost:5432/db"
                 , user="postgres"
                 , password="docker")

getLocInfoFromDB<-function(lat, lng) {

  # When the JSON result is parsed, the returned value is formatted as: ["return_value"].  Remove this
  # format before using.
  if (complete.cases(lat, lng)) {
    results <- dbGetQuery(con
                , paste0("SELECT regexp_replace(info -> 'address' ->> 'hamlet'"
                                                , ", '[\"\\[\\]]', '', 'g') AS neighborhood "
                         , ", regexp_replace(info -> 'address' ->> 'postcode'"
                                                , ", '[\"\\[\\]]', '', 'g') AS zip "
                         , "FROM location_info "
                         , "WHERE lat=", lat
                         , " AND lng=", lng))
  } else {
    results <- "|"
  }

  return(results)
}

3. Outlier Treatment

#### number of rows with no locations
nrow(crimesDF %>% filter(is.na(lat) | is.na(lng)))
## [1] 148883
#### sample of the rows with no lat/lng
crimesDF %>% 
  select(subdivision, lat, lng) %>% 
  filter(is.na(lat) | is.na(lng)) %>% 
  sample_n(10)
##                           subdivision lat lng
## 13860              PINE HURST ESTATES  NA  NA
## 123300                   COLLEGE PARK  NA  NA
## 128977 FIRST LANDING  STATE PARK AREA  NA  NA
## 17865                 BAYSIDE BOROUGH  NA  NA
## 34711                         <UNDEF>  NA  NA
## 146287             BAYLAKE PINES AREA  NA  NA
## 58524      OCEANFRONT - 31ST ST SOUTH  NA  NA
## 44818                    THOROUGHGOOD  NA  NA
## 94010             PRINCESS ANNE PLAZA  NA  NA
## 26228      OCEANFRONT - 31ST ST NORTH  NA  NA
# add location info [from database] to crimes data frame
crimesDF.zip <- left_join(crimesDF.loc, distinct_locs, by = c("lat","lng"))

# number of remaining records to process
crimesDF.zip %>% 
  select(lat, lng, zip, neighborhood, subdivision) %>% 
  filter(grepl("^234", zip)) %>% 
  count()
## # A tibble: 1 x 1
##        n
##    <int>
## 1 771969
# number of non Virginia Beach zip codes (invalid lat/lng)
crimesDF.zip %>% 
  select(lat, lng, zip, neighborhood, subdivision) %>% 
  filter(!grepl("^234", zip)) %>% 
  distinct() %>%
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1 55847
# sample of non Virginia Beach zip codes
crimesDF.zip %>% 
  select(lat, lng, zip, neighborhood, subdivision) %>% 
  filter(!grepl("^234", zip)) %>% 
  sample_n(10)
##            lat       lng          zip neighborhood
## 46209 36.78291 -75.96000 character(0) character(0)
## 37000 36.85538 -76.02489 character(0) character(0)
## 20223 36.76113 -76.00533 character(0) character(0)
## 48403 40.27180 -75.22031        18914           NA
## 20251 36.80352 -76.20816 character(0) character(0)
## 37185 39.94620 -82.98280        43206           NA
## 34528 36.82464 -76.14902 character(0) character(0)
## 54326 26.11609 -97.64632        78586           NA
## 54852 35.81262 -78.78464        27513           NA
## 34714 36.78025 -75.98186 character(0) character(0)
##                                  subdivision
## 46209                               DAM NECK
## 37000                                 OCEANA
## 20223 STRAWBRIDGE OFFICE PARK IN STRAWBRIDGE
## 48403                     BELLAMY PLANTATION
## 20251                            LEVEL GREEN
## 37185                        ARAGONA VILLAGE
## 34528                       KEMPSVILLE LAKES
## 54326                                <UNDEF>
## 54852                                <UNDEF>
## 34714                           PINE MEADOWS

4. Data Manipulation

# sample of the types of call by ZIP code
crimesDF.zip %>% 
  select(zip, neighborhood, subdivision, call_type, call_disposition) %>% 
  filter(grepl("^234", zip)) %>% 
  sample_n(10) %>%
  arrange(zip)
##      zip       neighborhood                 subdivision
## 1  23451                 NA  OCEANFRONT - 31ST ST SOUTH
## 2  23451                 NA  OCEANFRONT - 31ST ST SOUTH
## 3  23451                 NA                     HILLTOP
## 4  23454    Southern Points        WOLFSNARE PLANTATION
## 5  23454             Oceana              OCEANA GARDENS
## 6  23455 Smith Lake Terrace     LAKE SMITH TERRACE WEST
## 7  23456       Colony Acres            MUNICIPAL CENTER
## 8  23456       Hills Corner         HERON RIDGE ESTATES
## 9  23462     Pembroke Manor             BAYSIDE BOROUGH
## 10 23464  Lakeville Estates WOODSTOCK ELEMENTARY SCHOOL
##                             call_type    call_disposition
## 1  DISABLED VEHICLE, WRECKER REQUIRED Miscellaneous/Other
## 2     OFFICER INITIATED INVESTIGATION           No Report
## 3                        TRAFFIC STOP Miscellaneous/Other
## 4                      ASSIST CITIZEN Miscellaneous/Other
## 5     DISPUTE - BOYFRIEND/GIRL FRIEND           No Report
## 6                         ANIMAL TRAP Miscellaneous/Other
## 7                    WARRANT TRANSFER       Arrest Report
## 8              ASSIST FIRE DEPARTMENT Miscellaneous/Other
## 9                       BURGLAR ALARM           No Report
## 10          CELLULAR 911 HANG-UP CALL      Case Cancelled
# call disposition breakdown
crimesDF.zip %>% 
  select(incident_num, call_disposition) %>% 
  count(unlist(call_disposition))
## # A tibble: 15 x 2
##    `unlist(call_disposition)`      n
##                         <chr>  <int>
##  1                    <UNDEF>     34
##  2       Accident Report Made  10093
##  3         Animal Bite Report   3772
##  4              Arrest Report  26212
##  5       Burglar Alarm Report  37672
##  6             Case Cancelled 100682
##  7             Duplicate Call  12876
##  8        Miscellaneous/Other 345991
##  9                  No Report 184347
## 10             Offense Report  75450
## 11           Picked Up Animal  17901
## 12           Public Utilities   2208
## 13               Public Works   1261
## 14           Unable to Locate   9245
## 15            Unknown/Uncoded     72
# number of 911 calls
crimesDF.zip %>% 
  select(call_type) %>% 
  filter(grepl("911", call_type)) %>% 
  count(unlist(call_type))
## # A tibble: 2 x 2
##         `unlist(call_type)`      n
##                       <chr>  <int>
## 1          911 HANG UP CALL  16394
## 2 CELLULAR 911 HANG-UP CALL 110409

5a. Data Visualization (Categorical Variables)

# distribution of Virginia Beach ZIP codes - begin with 234
ggplot(crimesDF.zip %>% 
         select(zip) %>% 
         filter(grepl("^234", zip))
       , aes(zip)) +
  geom_bar() +
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
  labs(title = "Virginia Beach Crime Incidents By Zip Code", x = "Zip Code", y = "Number of Incidents")

# distribution of crime types
ggplot(crimesDF.zip %>% 
         select(priority, zip) %>% 
         filter(grepl("^234", zip)) %>% 
         select(priority)
       , aes(priority)) +
  geom_bar() +
  labs(title = "Virginia Beach Crime Incidents By Priority", x = "Priority", y = "Number of Incidents")

#  heat map illustrating types of crimes by year and qtr
# call disposition breakdown (by year and qtr)
crimes.yrmo <- crimesDF.zip %>% 
  transmute(incident_num
            , call_disposition
            , date_entered=paste0(year(entry_date)
                                  , '-'
                                  , sprintf("%02d", month(entry_date)))) %>%
  count(disposition=unlist(call_disposition), date_entered)

ggplot(crimes.yrmo, aes(x=date_entered, y=disposition)) +
  geom_raster(aes(fill=n), hjust=0.5, vjust=0.5, interpolate=TRUE) + 
  scale_fill_gradient(name="Number Of Incidents", low = 'yellow', high='red') +
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
  labs(title="Crime Incidents", x="Entry Date", y="Disposition")

ggplot(crimes.yrmo, aes(x=date_entered, y=n, fill=disposition)) +
  geom_bar(stat="identity") + 
  #scale_y_continuous(labels=comma) +
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
  guides(fill=guide_legend(title="Dispositions")) +
  labs(title="Crime Incidents", x="Entry Date", y="Number of Incidents") 

crimes.zone <- crimesDF %>%
  transmute(zone
            , priority
            , entry_date
            , entry_mo=month(entry_date)
            , entry_yr=year(entry_date)
            , entry_qtr=quarter(entry_date, with_year=TRUE)) %>%
  group_by(zone, priority, entry_mo, entry_yr, entry_qtr) %>%
  count()

ggplot(data=crimes.zone %>%
       select(priority, n, entry_qtr)
       , aes(x=entry_qtr, y=n, color=zone)) +
  # geom_violin(trim = FALSE) +
  stat_summary(fun.y = sum, geom = "line") +  # avg line
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
  guides(color=guide_legend(title="Incidents By Zone")) +
  labs(title="Quarterly Incidents By Zone", x="Quarter", y="Number Of Incidents") +
  facet_wrap(~zone)

rm (crimes.yrmo, crimes.zone)

5b. Data Visualization (Continuous Variables)

Let’s explore police response time. Calculate response time as onscene_date-dispatch_date.

crimesDF.response <- crimesDF %>%
 transmute(priority
            , call_type
            , response_time=onscene_date-dispatch_date
            , entry_mo=month(entry_date)
            , entry_yr=year(entry_date)
            , entry_qtr=quarter(entry_date, with_year=TRUE)) %>%
  filter(response_time > 0)

ggplot(data=crimesDF.response %>%
       select(priority, response_time, entry_qtr)
       , aes(x=paste(entry_qtr), y=as.numeric(response_time), color=priority)) +
  scale_y_log10() +
  geom_violin(trim = FALSE) +
  geom_boxplot(width=0.1) +
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
  guides(color=guide_legend(title="Call Priority")) +
  labs(title="Response Time By Call Priority"
       , x="Call Priority"
       , y="Response Time (sec)") +
  facet_wrap(~priority)
## Warning in max(data$density): no non-missing arguments to max; returning -
## Inf

5c. Data Visualization (Spatial Variables)

Group the number of incidents depending upon the map’s zoom level. The number inside each circle represents the total number of incidents in that area. When you click on a cluster, the map will automatically zoom into that area and split into smaller clusters or show the individual incidents depending on how zoomed in you are. Small circles represent individual incident reports.

Display a 2017 incident map by zip code. Represent the Virginia Beach Police Stations by blue markers. Display zip code, zone, priority, call type and location information if available.

crimeGrp <- full_join(crimesDF.loc, distinct_locs, by = c("lat","lng")) %>%
  select(lat, lng, priority, call_type, neighborhood, subdivision, zip, zone, entry_date) %>%
  filter(year(entry_date) == 2017) %>%
  group_by(lat
           , lng
           , priority
           , unlist(call_type)
           , neighborhood
           , unlist(subdivision)
           , zip
           , zone) %>%
  count()

colnames(crimeGrp) <- c("lat"
                        , "lng"
                        , "priority"
                        , "call_type"
                        , "neighborhood"
                        , "subdivision"
                        , "zip"
                        , "zone"
                        , "cnt")

# zip code overlay gone 11/2017
m.zip <- leaflet(crimeGrp) %>%
  addTiles() %>%
#  addGeoJSON(zipcodes) %>% # zip code overlay
  addMarkers(lng=vbPrecincts$lng, lat=vbPrecincts$lat, popup=vbPrecincts$descr) %>%
  addCircleMarkers(clusterOptions = markerClusterOptions()
                   , popup=paste("<b>("
                                 , crimeGrp$zip
                                 , "-"
                                 , crimeGrp$zone
                                 , "-"
                                 , crimeGrp$priority
                                 , ")</b> "
                                 , "-"
                                 , crimeGrp$call_type
                                 , "<br/>"
                                 , crimeGrp$neighborhood
                                 , "/"
                                 , crimeGrp$subdivision)) %>%
  setView(-76.1339487, 36.8464062, 11)

m.zip  # Print the map

6. Endnotes

Space allocation (in bytes).

sort(sapply(ls()
            , function(x) {object.size(get(x))}
            )
     ) 
##                 i               lat               lng          apiEmail 
##                48                48                48               120 
##               con               drv       vbPrecincts              info 
##              1632              1632              1664              4200 
##         formatReq getLocationDetail  getLocInfoFromDB         precincts 
##              5200             14512             14552             17096 
##          zipcodes             zones     distinct_locs         crimeZone 
##             19008             19008            471344           2625656 
##            m.zone          crimeGrp             m.zip crimesDF.response 
##           3281192          15883328          27390312          71696776 
##      crimesDF.loc      crimesDF.zip          crimesDF       vb_data_raw 
##         376386104         386335800         440035120        2449401432

Loaded packages.

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 15063)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2    lubridate_1.7.0 leaflet_1.1.0   ggplot2_2.2.1  
##  [5] RJDBC_0.2-5     rJava_0.9-9     DBI_0.7         httr_1.3.1     
##  [9] dplyr_0.7.4     pipeR_0.6.1.3   rlist_0.4.6.1  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.13        compiler_3.4.2      plyr_1.8.4         
##  [4] bindr_0.1           tools_3.4.2         digest_0.6.12      
##  [7] jsonlite_1.5        evaluate_0.10.1     tibble_1.3.4       
## [10] gtable_0.2.0        pkgconfig_2.0.1     rlang_0.1.2        
## [13] shiny_1.0.5         crosstalk_1.0.0     yaml_2.1.14        
## [16] stringr_1.2.0       knitr_1.17          htmlwidgets_0.9    
## [19] rprojroot_1.2       grid_3.4.2          glue_1.2.0         
## [22] data.table_1.10.4-3 R6_2.2.2            rmarkdown_1.6      
## [25] magrittr_1.5        backports_1.1.1     scales_0.5.0       
## [28] htmltools_0.3.6     assertthat_0.2.0    xtable_1.8-2       
## [31] mime_0.5            colorspace_1.3-2    httpuv_1.3.5       
## [34] labeling_0.3        stringi_1.1.5       lazyeval_0.2.1     
## [37] munsell_0.4.3       rjson_0.2.15

Report generated: 2017-11-12 22:31:11