I first learned R in a spatial statistics class, so this topic is near and dear to my heart. Spatial data in R use to be based in the sp package, along with a whole host of other packages you’d need to load to do different things. sp has all the functionality you’d need, but the way its data structures were originally written has major problems. It’s totally out of date in the dawning tidy era, and it was very different from the way any other software handles spatial data. The sf package, which stands for “simple features” has been under development for over a year, and while efforts continue, it is the new state-of-the-art in spatial data for R. This is actually the first time I’ve tried to use sf so this has been very helpful for me!

Reading spatial data

One of the things I like most about sf is how much simpler it is to load spatial data. This is done with a single function: st_read(). This function automatically detects the format of data you’re reading in based on a file extention or database connection, and outputs it as an sf class of data frame regardless of geometry type (point, line, polygon, etc.).

There’s plenty of online resources to help you use st_read(), so I wanted to showcase a couple other methods of obtaining spatial data for R. In the first section, I’ll use existing coordinate columns to turn a regular dataset into a spatial points file. Then, I’ll use two packages that access the U.S. Census Bureau APIs to import their polygon files and demographic data. Loading census data through the API helps make your code more reproducible, and once you have it down, it’s easier than downloading a bunch of files over and over. If you made a mistake, all it takes is to edit your code!

Adding geometry from coordinates

We’re using the 2014 NYPD Stop, Question, and Frisk data for this section, which is included in the project data folder. It came as an excel spreadsheet, so we’ll need the readxl package.

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readxl)
library(here)
## here() starts at /Users/benjaminbellman/Documents/Computer Backup/Repos/r-workshop18
nyc_sqf <- read_xlsx(here("data","2014_sqf_web.xlsx"))
nyc_sqf
## # A tibble: 45,787 x 101
##     year   pct ser_num datestop timestop  city   sex  race      dob   age
##    <dbl> <dbl>   <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1  2014    67      15  1012014     2330     2     1     1 12311900    18
##  2  2014     7      26  1032014     1530     1     1     1 12311900    31
##  3  2014    84      52  1042014     2100     2     1     1 12311900    16
##  4  2014    84      13  1092014     1250     2     1     5 12311900    19
##  5  2014    77      19  1092014     1310     2     1     1 12311900    32
##  6  2014    44      62  1152014     2220     3     1     1 12311900    22
##  7  2014    66      61  1172014     2200     2     1     1 12311900    32
##  8  2014    79      97  1262014      250     2     1     1 12311900    49
##  9  2014    52      70  1302014     2215     3     1     3 12311900    24
## 10  2014    63      38  1312014     2015     2     1     1 12311900    41
## # ... with 45,777 more rows, and 91 more variables: height <dbl>,
## #   weight <dbl>, haircolr <dbl>, eyecolor <dbl>, build <dbl>,
## #   othfeatr <chr>, frisked <dbl>, searched <dbl>, contrabn <dbl>,
## #   pistol <dbl>, riflshot <dbl>, asltweap <dbl>, knifcuti <dbl>,
## #   machgun <dbl>, othrweap <dbl>, arstmade <dbl>, arstoffn <chr>,
## #   sumissue <dbl>, sumoffen <chr>, crimsusp <chr>, detailcm <dbl>,
## #   perobs <dbl>, perstop <dbl>, pf_hands <dbl>, pf_wall <dbl>,
## #   pf_grnd <dbl>, pf_drwep <dbl>, pf_ptwep <dbl>, pf_baton <dbl>,
## #   pf_hcuff <dbl>, pf_pepsp <dbl>, pf_other <dbl>, cs_objcs <dbl>,
## #   cs_descr <dbl>, cs_casng <dbl>, cs_lkout <dbl>, cs_cloth <dbl>,
## #   cs_drgtr <dbl>, cs_furtv <dbl>, cs_vcrim <dbl>, cs_bulge <dbl>,
## #   cs_other <dbl>, rf_vcrim <dbl>, rf_othsw <dbl>, rf_attir <dbl>,
## #   rf_vcact <dbl>, rf_rfcmp <dbl>, rf_verbl <dbl>, rf_knowl <dbl>,
## #   rf_furt <dbl>, rf_bulg <dbl>, sb_hdobj <dbl>, sb_outln <dbl>,
## #   sb_admis <dbl>, sb_other <dbl>, ac_proxm <dbl>, ac_evasv <dbl>,
## #   ac_assoc <dbl>, ac_cgdir <dbl>, ac_incid <dbl>, ac_time <dbl>,
## #   ac_stsnd <dbl>, ac_rept <dbl>, ac_inves <dbl>, ac_other <dbl>,
## #   forceuse <dbl>, inout <dbl>, trhsloc <dbl>, premname <chr>,
## #   addrnum <chr>, stname <chr>, stinter <chr>, crossst <chr>,
## #   addrpct <dbl>, sector <dbl>, beat <dbl>, post <dbl>, xcoord <dbl>,
## #   ycoord <dbl>, typeofid <dbl>, othpers <dbl>, explnstp <dbl>,
## #   repcmd <dbl>, revcmd <dbl>, offunif <dbl>, offverb <dbl>,
## #   officrid <dbl>, offshld <dbl>, radio <dbl>, recstat <dbl>,
## #   linecm <dbl>

We want to use the coordinate columns that tell us where in NYC these stops happened.

select(nyc_sqf, xcoord, ycoord)
## # A tibble: 45,787 x 2
##     xcoord ycoord
##      <dbl>  <dbl>
##  1 1000633 176542
##  2  987521 201066
##  3  988579 191174
##  4  988827 194808
##  5 1005873 185052
##  6 1009416 244229
##  7      NA     NA
##  8  997847 189692
##  9 1011120 254774
## 10 1001738 165362
## # ... with 45,777 more rows

What kind of coordinates are these??? If you were to check the codebook in the data, you’d discover that these coordinates are given in a New York State Plane projection (NAD83 Long Island) that has a unit of feet. That makes sense given fine level of detail being depicted, but it will make mapping difficult, as generally spatial data refers to latitutde and longitude coordinates. We’ll have to transform these coordinates as we create our spatial data.

If you’re not familiar with coordinate reference systems (CRS) and projections, just write this stuff off as GIS mumbo-jumbo for now. But fair warning: this stuff can and WILL cause you some major headaches when overlaying spatial data someday. Pay special attention to the sf metadata so you’ll know where to look for these problems later on.

While there are always many ways to complete tasks like this, I’ll first attach the coordinates to make a spatial data frame, and then I’ll transform the coordinates to lat/lon format. The st_as_sf() function is used for this kind of object conversion, and we use the coords= argument when turning columns into points, specifying the x column first (longitude, then latitude.

library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
# function doesn't accept missing values for coordinates
nyc_sqf <- nyc_sqf %>% 
  filter(is.na(xcoord) == F & is.na(ycoord) == F) %>% 
  st_as_sf(coords = c("xcoord", "ycoord"))
class(nyc_sqf)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
nyc_sqf
## Simple feature collection with 44137 features and 99 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 914151 ymin: 121420 xmax: 1066574 ymax: 271882
## epsg (SRID):    NA
## proj4string:    NA
## # A tibble: 44,137 x 100
##     year   pct ser_num datestop timestop  city   sex  race      dob   age
##    <dbl> <dbl>   <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1  2014    67      15  1012014     2330     2     1     1 12311900    18
##  2  2014     7      26  1032014     1530     1     1     1 12311900    31
##  3  2014    84      52  1042014     2100     2     1     1 12311900    16
##  4  2014    84      13  1092014     1250     2     1     5 12311900    19
##  5  2014    77      19  1092014     1310     2     1     1 12311900    32
##  6  2014    44      62  1152014     2220     3     1     1 12311900    22
##  7  2014    79      97  1262014      250     2     1     1 12311900    49
##  8  2014    52      70  1302014     2215     3     1     3 12311900    24
##  9  2014    63      38  1312014     2015     2     1     1 12311900    41
## 10  2014    70      98  2062014     2000     2     1     1 12311900    39
## # ... with 44,127 more rows, and 90 more variables: height <dbl>,
## #   weight <dbl>, haircolr <dbl>, eyecolor <dbl>, build <dbl>,
## #   othfeatr <chr>, frisked <dbl>, searched <dbl>, contrabn <dbl>,
## #   pistol <dbl>, riflshot <dbl>, asltweap <dbl>, knifcuti <dbl>,
## #   machgun <dbl>, othrweap <dbl>, arstmade <dbl>, arstoffn <chr>,
## #   sumissue <dbl>, sumoffen <chr>, crimsusp <chr>, detailcm <dbl>,
## #   perobs <dbl>, perstop <dbl>, pf_hands <dbl>, pf_wall <dbl>,
## #   pf_grnd <dbl>, pf_drwep <dbl>, pf_ptwep <dbl>, pf_baton <dbl>,
## #   pf_hcuff <dbl>, pf_pepsp <dbl>, pf_other <dbl>, cs_objcs <dbl>,
## #   cs_descr <dbl>, cs_casng <dbl>, cs_lkout <dbl>, cs_cloth <dbl>,
## #   cs_drgtr <dbl>, cs_furtv <dbl>, cs_vcrim <dbl>, cs_bulge <dbl>,
## #   cs_other <dbl>, rf_vcrim <dbl>, rf_othsw <dbl>, rf_attir <dbl>,
## #   rf_vcact <dbl>, rf_rfcmp <dbl>, rf_verbl <dbl>, rf_knowl <dbl>,
## #   rf_furt <dbl>, rf_bulg <dbl>, sb_hdobj <dbl>, sb_outln <dbl>,
## #   sb_admis <dbl>, sb_other <dbl>, ac_proxm <dbl>, ac_evasv <dbl>,
## #   ac_assoc <dbl>, ac_cgdir <dbl>, ac_incid <dbl>, ac_time <dbl>,
## #   ac_stsnd <dbl>, ac_rept <dbl>, ac_inves <dbl>, ac_other <dbl>,
## #   forceuse <dbl>, inout <dbl>, trhsloc <dbl>, premname <chr>,
## #   addrnum <chr>, stname <chr>, stinter <chr>, crossst <chr>,
## #   addrpct <dbl>, sector <dbl>, beat <dbl>, post <dbl>, typeofid <dbl>,
## #   othpers <dbl>, explnstp <dbl>, repcmd <dbl>, revcmd <dbl>,
## #   offunif <dbl>, offverb <dbl>, officrid <dbl>, offshld <dbl>,
## #   radio <dbl>, recstat <dbl>, linecm <dbl>, geometry <POINT>

The sf class has been added to the file, but it is still a tbl and a data.frame, which was definitely not the case with the old methods. This means that all of the dplyr functions will work on sf objects without any problems. Also be sure to notice the new metadata that appears at the top of the console information, which gives us the geometry type, the bounding box (maximum geographic extent of XY coordinates), and two empty fields called espg (SRID) and proj4string.

GIS MUMBO-JUMBO ALERT: An SRID is an ID number for a specific spatial reference system, while the proj4string is a text string that specifies how the system works in reference to latitude and longitude. When overlaying different spatial data files, it is very important that the files’ coordinates and metadata be in the same CRS. Otherwise, the computer will not overlay the data properly.

First, we need to set the data’s CRS so that the computer knows how to work with the values. We can do this with st_set_crs(). You’ll need to look up exactly which ESPG code or proj4string you’ll need online based on the CRS at a website like http://spatialreference.org/.

nyc_sqf <- st_set_crs(nyc_sqf, 3628)

# Let's take a closer look at the geographic info
select(nyc_sqf, age, geometry)
## Simple feature collection with 44137 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 914151 ymin: 121420 xmax: 1066574 ymax: 271882
## epsg (SRID):    3628
## proj4string:    +proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
## # A tibble: 44,137 x 2
##      age                 geometry
##    <dbl> <POINT [US_survey_foot]>
##  1    18         (1000633 176542)
##  2    31          (987521 201066)
##  3    16          (988579 191174)
##  4    19          (988827 194808)
##  5    32         (1005873 185052)
##  6    22         (1009416 244229)
##  7    49          (997847 189692)
##  8    24         (1011120 254774)
##  9    41         (1001738 165362)
## 10    39          (993319 174812)
## # ... with 44,127 more rows

There are a couple things to notice here. The first is that the geographic information is being stored in a single column of the data frame. This is consistent with the majority of spatial data formats, and makes our lives so much easier compared to the old packages. All hail sp!

Second, notice that the coordinates haven’t actually changed. All we did was inform the computer of the CRS for this data. Without it, the computer would not know how to change the data into a new CRS.

Now, let’s use the st_transform() function to all the geographic info in one fell swoop. I want the data to be in lat/lon format using the same datum (NAD83).

nyc_sqf <- st_transform(nyc_sqf, 4269)
select(nyc_sqf, age, geometry)
## Simple feature collection with 44137 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -74.25208 ymin: 40.4997 xmax: -73.70287 ymax: 40.91289
## epsg (SRID):    4269
## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## # A tibble: 44,137 x 2
##      age             geometry
##    <dbl>          <POINT [°]>
##  1    18 (-73.94096 40.65123)
##  2    31  (-73.9882 40.71856)
##  3    16  (-73.98439 40.6914)
##  4    19 (-73.98349 40.70138)
##  5    32 (-73.92205 40.67458)
##  6    22 (-73.90905 40.83699)
##  7    49 (-73.95097 40.68733)
##  8    24 (-73.90285 40.86593)
##  9    41 (-73.93701 40.62054)
## 10    39 (-73.96732 40.64649)
## # ... with 44,127 more rows

Now, many things have changed with the data. The CRS metadata has changed, the coordinates themselves have changed, and even the spatial units of the geography column have changed from feet to decimal degrees! Dealing with CRS and projections is always frustration, but trust me, this is SO MUCH EASIER than it was with sp.

Downloading U.S. Census data

The Census Bureau maintains APIs and repositories for their huge range of data products, and two fairly recent R packages take advantage of these interfaces: tigris and tidycensus. I’ve decided that tidycensus is a bit too much to learn as part of the workshop, but the syntax is very intuitive. The trouble is generating your API keys, as well as looking for Census codes for specific data you want to download. Both packages are developed by Kyle Walker, an assistant professor of geography at TCU. If you want to use R for spatial and census data, follow his twitter account. He posts helpful tricks and scripts to get started with his packges, and it’s a great way to build your R skills in general.

The tigris package is particularly easy to use, because there’s no API key needed. The Census’s TIGER geometry files are easily downloaded for free, and this package interfaces with their website. The function you use depends on the scale you want geometries for. The arguments narrow your search by state, county, etc. using FIPS codes.

library(tigris)
## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
## 
## Attaching package: 'tigris'
## The following object is masked from 'package:graphics':
## 
##     plot
ri <- tracts(state = 44)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================| 100%
class(ri)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

Huh??? That’s not a sf object!!! tigris was written according to sp standards, so it returns a very different kind of object (run str(ri) to see for yourself what a nightmare spatial data frames are).

However, this is fine for us. sf is collective effort by many original developers of sp and its other packages, and they have included plenty of tools to convert old spatial objects to simple features. You should remember this function from earlier!

ri <- st_as_sf(ri)
ri
## Simple feature collection with 244 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -71.90726 ymin: 41.09583 xmax: -71.08857 ymax: 42.0188
## epsg (SRID):    4269
## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## First 10 features:
##   STATEFP COUNTYFP TRACTCE       GEOID   NAME            NAMELSAD MTFCC
## 0      44      005  040102 44005040102 401.02 Census Tract 401.02 G5020
## 1      44      005  040302 44005040302 403.02 Census Tract 403.02 G5020
## 2      44      005  040400 44005040400    404    Census Tract 404 G5020
## 3      44      005  040103 44005040103 401.03 Census Tract 401.03 G5020
## 4      44      005  041000 44005041000    410    Census Tract 410 G5020
## 5      44      005  041602 44005041602 416.02 Census Tract 416.02 G5020
## 6      44      005  040101 44005040101 401.01 Census Tract 401.01 G5020
## 7      44      005  040303 44005040303 403.03 Census Tract 403.03 G5020
## 8      44      005  040200 44005040200    402    Census Tract 402 G5020
## 9      44      005  040500 44005040500    405    Census Tract 405 G5020
##   FUNCSTAT    ALAND   AWATER    INTPTLAT     INTPTLON
## 0        S 15303718  3185447 +41.5634715 -071.2409098
## 1        S  2053830     9506 +41.5122470 -071.2979089
## 2        S 19065891  3923639 +41.5042616 -071.2629589
## 3        S 38234910 19322522 +41.6126140 -071.3168962
## 4        S   711464   564951 +41.4796667 -071.3155318
## 5        S  8113205  1426119 +41.6383338 -071.1945678
## 6        S  6024002  4507838 +41.6310716 -071.2353723
## 7        S  3883359     9444 +41.5448590 -071.2998620
## 8        S  2334457  4304255 +41.5372453 -071.3100859
## 9        S  2405884        0 +41.5075228 -071.3126443
##                         geometry
## 0 MULTIPOLYGON (((-71.26572 4...
## 1 MULTIPOLYGON (((-71.31096 4...
## 2 MULTIPOLYGON (((-71.29494 4...
## 3 MULTIPOLYGON (((-71.29246 4...
## 4 MULTIPOLYGON (((-71.32514 4...
## 5 MULTIPOLYGON (((-71.21682 4...
## 6 MULTIPOLYGON (((-71.22228 4...
## 7 MULTIPOLYGON (((-71.30931 4...
## 8 MULTIPOLYGON (((-71.32327 4...
## 9 MULTIPOLYGON (((-71.31917 4...

GIS operations with R

The main attraction of GIS and spatial data is that spatial attributes allow you to use data together in powerful ways. You can also exploit the spatial information to create new spatial definitions. One easy example is a dissolve. This generates the mathematical union of a set of geometires, and aggregates them into a single geometry. Let’s disslove Providence County into a single object with one row.

pc <- ri %>% 
  filter(COUNTYFP == "007") %>% 
  st_union()
plot(pc)

class(pc)
## [1] "sfc_POLYGON" "sfc"

Notice that this result is not an sf table, but of class sfc, a simple geometry column. This is basically the geographic container that tabular data is attached to.

Another easy example of a GIS operation is a buffer, which simply extends the boundaries of a geometry in all directions by a set distance. Points become circles, lines become oblong polygons, and polygons get bigger! However, we’ll need to convert the data into another CRS, because latitude and longitude are not measures of distance, and are thus inappropriate for buffering.

pc %>% 
  st_transform(3438) %>%   # RI State Plane in feet
  st_buffer(5280) %>%      # One-mile buffer specified in feet
  plot()

One of the most important GIS operations out there is the spatial join. There are many tasks this handles: adding data attributes based on location, counting points inside a polygon, subsetting data based on location, etc.

Let’s use tigris to bring in some NYC spatial data to use with the SQF events. I’d like to subset just the events that happened in the Bronx, and then I’ll eventually map numbers of events across census tracts. First, we download the tracts for Bronx county, and convert the file to simple features.

bronx <- tracts(state = "36", "005") %>% 
  st_as_sf()
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=======                                                          |  12%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |====================                                             |  32%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |=============================================================    |  95%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%

Now, we can use st_join() to attach the attributes of intersecting features to our events.

bronx_sqf <- st_join(nyc_sqf, bronx)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
names(bronx_sqf)
##   [1] "year"     "pct"      "ser_num"  "datestop" "timestop" "city"    
##   [7] "sex"      "race"     "dob"      "age"      "height"   "weight"  
##  [13] "haircolr" "eyecolor" "build"    "othfeatr" "frisked"  "searched"
##  [19] "contrabn" "pistol"   "riflshot" "asltweap" "knifcuti" "machgun" 
##  [25] "othrweap" "arstmade" "arstoffn" "sumissue" "sumoffen" "crimsusp"
##  [31] "detailcm" "perobs"   "perstop"  "pf_hands" "pf_wall"  "pf_grnd" 
##  [37] "pf_drwep" "pf_ptwep" "pf_baton" "pf_hcuff" "pf_pepsp" "pf_other"
##  [43] "cs_objcs" "cs_descr" "cs_casng" "cs_lkout" "cs_cloth" "cs_drgtr"
##  [49] "cs_furtv" "cs_vcrim" "cs_bulge" "cs_other" "rf_vcrim" "rf_othsw"
##  [55] "rf_attir" "rf_vcact" "rf_rfcmp" "rf_verbl" "rf_knowl" "rf_furt" 
##  [61] "rf_bulg"  "sb_hdobj" "sb_outln" "sb_admis" "sb_other" "ac_proxm"
##  [67] "ac_evasv" "ac_assoc" "ac_cgdir" "ac_incid" "ac_time"  "ac_stsnd"
##  [73] "ac_rept"  "ac_inves" "ac_other" "forceuse" "inout"    "trhsloc" 
##  [79] "premname" "addrnum"  "stname"   "stinter"  "crossst"  "addrpct" 
##  [85] "sector"   "beat"     "post"     "typeofid" "othpers"  "explnstp"
##  [91] "repcmd"   "revcmd"   "offunif"  "offverb"  "officrid" "offshld" 
##  [97] "radio"    "recstat"  "linecm"   "STATEFP"  "COUNTYFP" "TRACTCE" 
## [103] "GEOID"    "NAME"     "NAMELSAD" "MTFCC"    "FUNCSTAT" "ALAND"   
## [109] "AWATER"   "INTPTLAT" "INTPTLON" "geometry"

Notice all the new columns from the Bronx census tracts! These columns have empty values where our point data did not overlap the Bronx, so we can use that attribute to subest our data.

table(bronx_sqf$COUNTYFP)
## 
##  005 
## 6553
bronx_sqf <- filter(bronx_sqf, COUNTYFP == "005")
nrow(bronx_sqf)
## [1] 6553

Finally, let’s count the number of 2014 SQF events in Bronx census tracts. This time, there will be a duplicate of each tract for every point within it, so we can reduce this file back to unique tracts by the number of times it appears.

bronx_events <- st_join(bronx, nyc_sqf) %>% 
  group_by(GEOID) %>%                  # group by unique tract ID
  mutate(events = n()) %>% 
  ungroup() %>% 
  select(GEOID, events, geometry) %>%  # drop unwanted columns             
  unique()                             # keep only unique tracts
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
bronx_events
## Simple feature collection with 339 features and 2 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -73.93381 ymin: 40.78536 xmax: -73.74806 ymax: 40.91758
## epsg (SRID):    4269
## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## # A tibble: 339 x 3
##    GEOID       events                                             geometry
##    <chr>        <int>                                        <POLYGON [°]>
##  1 36005006100     17 ((-73.92688 40.81948, -73.92665 40.82005, -73.92632…
##  2 36005006400      1 ((-73.87394 40.83598, -73.8738 40.83597, -73.87298 …
##  3 36005006700    148 ((-73.92178 40.81953, -73.92148 40.8202, -73.92119 …
##  4 36005006800      5 ((-73.87319 40.8299, -73.87275 40.83003, -73.87179 …
##  5 36005007000      4 ((-73.86872 40.82956, -73.86776 40.82969, -73.86815…
##  6 36005007100     21 ((-73.91763 40.81605, -73.91676 40.81656, -73.91599…
##  7 36005007300     15 ((-73.91258 40.81493, -73.91226 40.81549, -73.91194…
##  8 36005007500     28 ((-73.91138 40.81737, -73.91061 40.81922, -73.91035…
##  9 36005007700     11 ((-73.90692 40.81853, -73.90644 40.81969, -73.906 4…
## 10 36005007800      1 ((-73.85738 40.83214, -73.85627 40.83184, -73.85598…
## # ... with 329 more rows

Mapping with R

There is a large and growing collection of packages that enable different kinds of mapping. I’ll introduce the ggplot2 options available, as well as how to make a simple interactive web map using the leaflet package.

It’s easy to do a quick plot of sfc objects. By default, plot methods generally try to visualize all the data available, so if you just want to quickly see the geometry of an object, you can just feed it as an sfc.

st_geometry(bronx_events) %>% plot()

ggplot2 has built-in functions to handle plotting sf objects. This stuff is in-progress and has some bugs (you ahve to use data = on the sf object), so I recommend looking into a package called tmap. I’ve never used it as I make my “best” maps with ArcGIS, but it’s effective for lots of data and visual styles and is very popular.

ggplot() +
  geom_sf(data = bronx_events, aes(fill = events))

You can add more data objects by adding another geom_sf() function.

samp <- sample_n(bronx_sqf, 50)

ggplot() +
  geom_sf(data = bronx_events, aes(fill = events), size = 0.5) +
  geom_sf(data = samp, col = "red", shape = 3)

One trick I really like comes from the ggmap package. This package will generate a web map tile from a range of sources, including Google. I prefer a clean, simple open source background.

library(ggmap)
m <- get_map("bronx", maptype = "toner-lite", zoom = 12)
## maptype = "toner-lite" is only available with source = "stamen".
## resetting to source = "stamen"...
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=bronx&zoom=12&size=640x640&scale=2&maptype=terrain&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=bronx&sensor=false
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1536.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1536.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1536.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1537.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1537.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1537.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1538.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1538.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1538.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1539.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1539.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1539.png
ggmap(m) +
  geom_sf(data = bronx_events, aes(fill = events), 
          size = 0, alpha = 0.7,
          inherit.aes = FALSE)   # needs this because bugs
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Finally, you can use the leaflet package to create an interactive web maps. To do this, the data’s CRS must be in lat/lon coordinates using WGS84, the datum for web cartography. Leaflet has a website with lots of tutorials on how to use their (very complicated) package.

library(leaflet)

pal <- colorNumeric(palette = "Blues", domain = bronx_events$events)

st_transform(bronx_events, 4326) %>%    # transform to correct CRS
leaflet() %>%
  addTiles() %>%
  addPolygons(stroke = FALSE, smoothFactor = 0.3, fillOpacity = 0.8,
              fillColor = ~pal(events), label = ~paste0("Tract ", GEOID, " - ", events, " stops"))

This is only a small taste of the spatial data capabilities of R. Keep learning and seraching on your own, best of luck!