sf and tigrisI first learned R in a spatial statistics class, so this topic is near and dear to my heart. Spatial data in R use to be based in the sp package, along with a whole host of other packages you’d need to load to do different things. sp has all the functionality you’d need, but the way its data structures were originally written has major problems. It’s totally out of date in the dawning tidy era, and it was very different from the way any other software handles spatial data. The sf package, which stands for “simple features” has been under development for over a year, and while efforts continue, it is the new state-of-the-art in spatial data for R. This is actually the first time I’ve tried to use sf so this has been very helpful for me!
One of the things I like most about sf is how much simpler it is to load spatial data. This is done with a single function: st_read(). This function automatically detects the format of data you’re reading in based on a file extention or database connection, and outputs it as an sf class of data frame regardless of geometry type (point, line, polygon, etc.).
There’s plenty of online resources to help you use st_read(), so I wanted to showcase a couple other methods of obtaining spatial data for R. In the first section, I’ll use existing coordinate columns to turn a regular dataset into a spatial points file. Then, I’ll use two packages that access the U.S. Census Bureau APIs to import their polygon files and demographic data. Loading census data through the API helps make your code more reproducible, and once you have it down, it’s easier than downloading a bunch of files over and over. If you made a mistake, all it takes is to edit your code!
We’re using the 2014 NYPD Stop, Question, and Frisk data for this section, which is included in the project data folder. It came as an excel spreadsheet, so we’ll need the readxl package.
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.6
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(readxl)
library(here)
## here() starts at /Users/benjaminbellman/Documents/Computer Backup/Repos/r-workshop18
nyc_sqf <- read_xlsx(here("data","2014_sqf_web.xlsx"))
nyc_sqf
## # A tibble: 45,787 x 101
## year pct ser_num datestop timestop city sex race dob age
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 67 15 1012014 2330 2 1 1 12311900 18
## 2 2014 7 26 1032014 1530 1 1 1 12311900 31
## 3 2014 84 52 1042014 2100 2 1 1 12311900 16
## 4 2014 84 13 1092014 1250 2 1 5 12311900 19
## 5 2014 77 19 1092014 1310 2 1 1 12311900 32
## 6 2014 44 62 1152014 2220 3 1 1 12311900 22
## 7 2014 66 61 1172014 2200 2 1 1 12311900 32
## 8 2014 79 97 1262014 250 2 1 1 12311900 49
## 9 2014 52 70 1302014 2215 3 1 3 12311900 24
## 10 2014 63 38 1312014 2015 2 1 1 12311900 41
## # ... with 45,777 more rows, and 91 more variables: height <dbl>,
## # weight <dbl>, haircolr <dbl>, eyecolor <dbl>, build <dbl>,
## # othfeatr <chr>, frisked <dbl>, searched <dbl>, contrabn <dbl>,
## # pistol <dbl>, riflshot <dbl>, asltweap <dbl>, knifcuti <dbl>,
## # machgun <dbl>, othrweap <dbl>, arstmade <dbl>, arstoffn <chr>,
## # sumissue <dbl>, sumoffen <chr>, crimsusp <chr>, detailcm <dbl>,
## # perobs <dbl>, perstop <dbl>, pf_hands <dbl>, pf_wall <dbl>,
## # pf_grnd <dbl>, pf_drwep <dbl>, pf_ptwep <dbl>, pf_baton <dbl>,
## # pf_hcuff <dbl>, pf_pepsp <dbl>, pf_other <dbl>, cs_objcs <dbl>,
## # cs_descr <dbl>, cs_casng <dbl>, cs_lkout <dbl>, cs_cloth <dbl>,
## # cs_drgtr <dbl>, cs_furtv <dbl>, cs_vcrim <dbl>, cs_bulge <dbl>,
## # cs_other <dbl>, rf_vcrim <dbl>, rf_othsw <dbl>, rf_attir <dbl>,
## # rf_vcact <dbl>, rf_rfcmp <dbl>, rf_verbl <dbl>, rf_knowl <dbl>,
## # rf_furt <dbl>, rf_bulg <dbl>, sb_hdobj <dbl>, sb_outln <dbl>,
## # sb_admis <dbl>, sb_other <dbl>, ac_proxm <dbl>, ac_evasv <dbl>,
## # ac_assoc <dbl>, ac_cgdir <dbl>, ac_incid <dbl>, ac_time <dbl>,
## # ac_stsnd <dbl>, ac_rept <dbl>, ac_inves <dbl>, ac_other <dbl>,
## # forceuse <dbl>, inout <dbl>, trhsloc <dbl>, premname <chr>,
## # addrnum <chr>, stname <chr>, stinter <chr>, crossst <chr>,
## # addrpct <dbl>, sector <dbl>, beat <dbl>, post <dbl>, xcoord <dbl>,
## # ycoord <dbl>, typeofid <dbl>, othpers <dbl>, explnstp <dbl>,
## # repcmd <dbl>, revcmd <dbl>, offunif <dbl>, offverb <dbl>,
## # officrid <dbl>, offshld <dbl>, radio <dbl>, recstat <dbl>,
## # linecm <dbl>
We want to use the coordinate columns that tell us where in NYC these stops happened.
select(nyc_sqf, xcoord, ycoord)
## # A tibble: 45,787 x 2
## xcoord ycoord
## <dbl> <dbl>
## 1 1000633 176542
## 2 987521 201066
## 3 988579 191174
## 4 988827 194808
## 5 1005873 185052
## 6 1009416 244229
## 7 NA NA
## 8 997847 189692
## 9 1011120 254774
## 10 1001738 165362
## # ... with 45,777 more rows
What kind of coordinates are these??? If you were to check the codebook in the data, you’d discover that these coordinates are given in a New York State Plane projection (NAD83 Long Island) that has a unit of feet. That makes sense given fine level of detail being depicted, but it will make mapping difficult, as generally spatial data refers to latitutde and longitude coordinates. We’ll have to transform these coordinates as we create our spatial data.
If you’re not familiar with coordinate reference systems (CRS) and projections, just write this stuff off as GIS mumbo-jumbo for now. But fair warning: this stuff can and WILL cause you some major headaches when overlaying spatial data someday. Pay special attention to the sf metadata so you’ll know where to look for these problems later on.
While there are always many ways to complete tasks like this, I’ll first attach the coordinates to make a spatial data frame, and then I’ll transform the coordinates to lat/lon format. The st_as_sf() function is used for this kind of object conversion, and we use the coords= argument when turning columns into points, specifying the x column first (longitude, then latitude.
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
# function doesn't accept missing values for coordinates
nyc_sqf <- nyc_sqf %>%
filter(is.na(xcoord) == F & is.na(ycoord) == F) %>%
st_as_sf(coords = c("xcoord", "ycoord"))
class(nyc_sqf)
## [1] "sf" "tbl_df" "tbl" "data.frame"
nyc_sqf
## Simple feature collection with 44137 features and 99 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 914151 ymin: 121420 xmax: 1066574 ymax: 271882
## epsg (SRID): NA
## proj4string: NA
## # A tibble: 44,137 x 100
## year pct ser_num datestop timestop city sex race dob age
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 67 15 1012014 2330 2 1 1 12311900 18
## 2 2014 7 26 1032014 1530 1 1 1 12311900 31
## 3 2014 84 52 1042014 2100 2 1 1 12311900 16
## 4 2014 84 13 1092014 1250 2 1 5 12311900 19
## 5 2014 77 19 1092014 1310 2 1 1 12311900 32
## 6 2014 44 62 1152014 2220 3 1 1 12311900 22
## 7 2014 79 97 1262014 250 2 1 1 12311900 49
## 8 2014 52 70 1302014 2215 3 1 3 12311900 24
## 9 2014 63 38 1312014 2015 2 1 1 12311900 41
## 10 2014 70 98 2062014 2000 2 1 1 12311900 39
## # ... with 44,127 more rows, and 90 more variables: height <dbl>,
## # weight <dbl>, haircolr <dbl>, eyecolor <dbl>, build <dbl>,
## # othfeatr <chr>, frisked <dbl>, searched <dbl>, contrabn <dbl>,
## # pistol <dbl>, riflshot <dbl>, asltweap <dbl>, knifcuti <dbl>,
## # machgun <dbl>, othrweap <dbl>, arstmade <dbl>, arstoffn <chr>,
## # sumissue <dbl>, sumoffen <chr>, crimsusp <chr>, detailcm <dbl>,
## # perobs <dbl>, perstop <dbl>, pf_hands <dbl>, pf_wall <dbl>,
## # pf_grnd <dbl>, pf_drwep <dbl>, pf_ptwep <dbl>, pf_baton <dbl>,
## # pf_hcuff <dbl>, pf_pepsp <dbl>, pf_other <dbl>, cs_objcs <dbl>,
## # cs_descr <dbl>, cs_casng <dbl>, cs_lkout <dbl>, cs_cloth <dbl>,
## # cs_drgtr <dbl>, cs_furtv <dbl>, cs_vcrim <dbl>, cs_bulge <dbl>,
## # cs_other <dbl>, rf_vcrim <dbl>, rf_othsw <dbl>, rf_attir <dbl>,
## # rf_vcact <dbl>, rf_rfcmp <dbl>, rf_verbl <dbl>, rf_knowl <dbl>,
## # rf_furt <dbl>, rf_bulg <dbl>, sb_hdobj <dbl>, sb_outln <dbl>,
## # sb_admis <dbl>, sb_other <dbl>, ac_proxm <dbl>, ac_evasv <dbl>,
## # ac_assoc <dbl>, ac_cgdir <dbl>, ac_incid <dbl>, ac_time <dbl>,
## # ac_stsnd <dbl>, ac_rept <dbl>, ac_inves <dbl>, ac_other <dbl>,
## # forceuse <dbl>, inout <dbl>, trhsloc <dbl>, premname <chr>,
## # addrnum <chr>, stname <chr>, stinter <chr>, crossst <chr>,
## # addrpct <dbl>, sector <dbl>, beat <dbl>, post <dbl>, typeofid <dbl>,
## # othpers <dbl>, explnstp <dbl>, repcmd <dbl>, revcmd <dbl>,
## # offunif <dbl>, offverb <dbl>, officrid <dbl>, offshld <dbl>,
## # radio <dbl>, recstat <dbl>, linecm <dbl>, geometry <POINT>
The sf class has been added to the file, but it is still a tbl and a data.frame, which was definitely not the case with the old methods. This means that all of the dplyr functions will work on sf objects without any problems. Also be sure to notice the new metadata that appears at the top of the console information, which gives us the geometry type, the bounding box (maximum geographic extent of XY coordinates), and two empty fields called espg (SRID) and proj4string.
GIS MUMBO-JUMBO ALERT: An SRID is an ID number for a specific spatial reference system, while the proj4string is a text string that specifies how the system works in reference to latitude and longitude. When overlaying different spatial data files, it is very important that the files’ coordinates and metadata be in the same CRS. Otherwise, the computer will not overlay the data properly.
First, we need to set the data’s CRS so that the computer knows how to work with the values. We can do this with st_set_crs(). You’ll need to look up exactly which ESPG code or proj4string you’ll need online based on the CRS at a website like http://spatialreference.org/.
nyc_sqf <- st_set_crs(nyc_sqf, 3628)
# Let's take a closer look at the geographic info
select(nyc_sqf, age, geometry)
## Simple feature collection with 44137 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: 914151 ymin: 121420 xmax: 1066574 ymax: 271882
## epsg (SRID): 3628
## proj4string: +proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
## # A tibble: 44,137 x 2
## age geometry
## <dbl> <POINT [US_survey_foot]>
## 1 18 (1000633 176542)
## 2 31 (987521 201066)
## 3 16 (988579 191174)
## 4 19 (988827 194808)
## 5 32 (1005873 185052)
## 6 22 (1009416 244229)
## 7 49 (997847 189692)
## 8 24 (1011120 254774)
## 9 41 (1001738 165362)
## 10 39 (993319 174812)
## # ... with 44,127 more rows
There are a couple things to notice here. The first is that the geographic information is being stored in a single column of the data frame. This is consistent with the majority of spatial data formats, and makes our lives so much easier compared to the old packages. All hail sp!
Second, notice that the coordinates haven’t actually changed. All we did was inform the computer of the CRS for this data. Without it, the computer would not know how to change the data into a new CRS.
Now, let’s use the st_transform() function to all the geographic info in one fell swoop. I want the data to be in lat/lon format using the same datum (NAD83).
nyc_sqf <- st_transform(nyc_sqf, 4269)
select(nyc_sqf, age, geometry)
## Simple feature collection with 44137 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: -74.25208 ymin: 40.4997 xmax: -73.70287 ymax: 40.91289
## epsg (SRID): 4269
## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## # A tibble: 44,137 x 2
## age geometry
## <dbl> <POINT [°]>
## 1 18 (-73.94096 40.65123)
## 2 31 (-73.9882 40.71856)
## 3 16 (-73.98439 40.6914)
## 4 19 (-73.98349 40.70138)
## 5 32 (-73.92205 40.67458)
## 6 22 (-73.90905 40.83699)
## 7 49 (-73.95097 40.68733)
## 8 24 (-73.90285 40.86593)
## 9 41 (-73.93701 40.62054)
## 10 39 (-73.96732 40.64649)
## # ... with 44,127 more rows
Now, many things have changed with the data. The CRS metadata has changed, the coordinates themselves have changed, and even the spatial units of the geography column have changed from feet to decimal degrees! Dealing with CRS and projections is always frustration, but trust me, this is SO MUCH EASIER than it was with sp.
The Census Bureau maintains APIs and repositories for their huge range of data products, and two fairly recent R packages take advantage of these interfaces: tigris and tidycensus. I’ve decided that tidycensus is a bit too much to learn as part of the workshop, but the syntax is very intuitive. The trouble is generating your API keys, as well as looking for Census codes for specific data you want to download. Both packages are developed by Kyle Walker, an assistant professor of geography at TCU. If you want to use R for spatial and census data, follow his twitter account. He posts helpful tricks and scripts to get started with his packges, and it’s a great way to build your R skills in general.
The tigris package is particularly easy to use, because there’s no API key needed. The Census’s TIGER geometry files are easily downloaded for free, and this package interfaces with their website. The function you use depends on the scale you want geometries for. The arguments narrow your search by state, county, etc. using FIPS codes.
library(tigris)
## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
##
## Attaching package: 'tigris'
## The following object is masked from 'package:graphics':
##
## plot
ri <- tracts(state = 44)
##
|
| | 0%
|
|= | 2%
|
|== | 3%
|
|=== | 5%
|
|==== | 6%
|
|====== | 9%
|
|====== | 10%
|
|======== | 12%
|
|======== | 13%
|
|========== | 15%
|
|========== | 16%
|
|============ | 18%
|
|============ | 19%
|
|============== | 21%
|
|============== | 22%
|
|================ | 24%
|
|================= | 26%
|
|================== | 28%
|
|=================== | 29%
|
|===================== | 32%
|
|====================== | 33%
|
|======================= | 35%
|
|======================== | 37%
|
|========================= | 39%
|
|========================== | 40%
|
|============================ | 43%
|
|============================= | 44%
|
|============================== | 46%
|
|=============================== | 48%
|
|================================ | 50%
|
|================================= | 51%
|
|=================================== | 54%
|
|==================================== | 55%
|
|===================================== | 57%
|
|====================================== | 59%
|
|======================================== | 61%
|
|========================================= | 62%
|
|========================================== | 64%
|
|=========================================== | 66%
|
|============================================ | 68%
|
|============================================= | 70%
|
|=============================================== | 72%
|
|================================================ | 73%
|
|================================================= | 75%
|
|================================================== | 77%
|
|=================================================== | 79%
|
|==================================================== | 80%
|
|===================================================== | 82%
|
|====================================================== | 83%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 100%
class(ri)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
Huh??? That’s not a sf object!!! tigris was written according to sp standards, so it returns a very different kind of object (run str(ri) to see for yourself what a nightmare spatial data frames are).
However, this is fine for us. sf is collective effort by many original developers of sp and its other packages, and they have included plenty of tools to convert old spatial objects to simple features. You should remember this function from earlier!
ri <- st_as_sf(ri)
ri
## Simple feature collection with 244 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -71.90726 ymin: 41.09583 xmax: -71.08857 ymax: 42.0188
## epsg (SRID): 4269
## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## First 10 features:
## STATEFP COUNTYFP TRACTCE GEOID NAME NAMELSAD MTFCC
## 0 44 005 040102 44005040102 401.02 Census Tract 401.02 G5020
## 1 44 005 040302 44005040302 403.02 Census Tract 403.02 G5020
## 2 44 005 040400 44005040400 404 Census Tract 404 G5020
## 3 44 005 040103 44005040103 401.03 Census Tract 401.03 G5020
## 4 44 005 041000 44005041000 410 Census Tract 410 G5020
## 5 44 005 041602 44005041602 416.02 Census Tract 416.02 G5020
## 6 44 005 040101 44005040101 401.01 Census Tract 401.01 G5020
## 7 44 005 040303 44005040303 403.03 Census Tract 403.03 G5020
## 8 44 005 040200 44005040200 402 Census Tract 402 G5020
## 9 44 005 040500 44005040500 405 Census Tract 405 G5020
## FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
## 0 S 15303718 3185447 +41.5634715 -071.2409098
## 1 S 2053830 9506 +41.5122470 -071.2979089
## 2 S 19065891 3923639 +41.5042616 -071.2629589
## 3 S 38234910 19322522 +41.6126140 -071.3168962
## 4 S 711464 564951 +41.4796667 -071.3155318
## 5 S 8113205 1426119 +41.6383338 -071.1945678
## 6 S 6024002 4507838 +41.6310716 -071.2353723
## 7 S 3883359 9444 +41.5448590 -071.2998620
## 8 S 2334457 4304255 +41.5372453 -071.3100859
## 9 S 2405884 0 +41.5075228 -071.3126443
## geometry
## 0 MULTIPOLYGON (((-71.26572 4...
## 1 MULTIPOLYGON (((-71.31096 4...
## 2 MULTIPOLYGON (((-71.29494 4...
## 3 MULTIPOLYGON (((-71.29246 4...
## 4 MULTIPOLYGON (((-71.32514 4...
## 5 MULTIPOLYGON (((-71.21682 4...
## 6 MULTIPOLYGON (((-71.22228 4...
## 7 MULTIPOLYGON (((-71.30931 4...
## 8 MULTIPOLYGON (((-71.32327 4...
## 9 MULTIPOLYGON (((-71.31917 4...
The main attraction of GIS and spatial data is that spatial attributes allow you to use data together in powerful ways. You can also exploit the spatial information to create new spatial definitions. One easy example is a dissolve. This generates the mathematical union of a set of geometires, and aggregates them into a single geometry. Let’s disslove Providence County into a single object with one row.
pc <- ri %>%
filter(COUNTYFP == "007") %>%
st_union()
plot(pc)
class(pc)
## [1] "sfc_POLYGON" "sfc"
Notice that this result is not an sf table, but of class sfc, a simple geometry column. This is basically the geographic container that tabular data is attached to.
Another easy example of a GIS operation is a buffer, which simply extends the boundaries of a geometry in all directions by a set distance. Points become circles, lines become oblong polygons, and polygons get bigger! However, we’ll need to convert the data into another CRS, because latitude and longitude are not measures of distance, and are thus inappropriate for buffering.
pc %>%
st_transform(3438) %>% # RI State Plane in feet
st_buffer(5280) %>% # One-mile buffer specified in feet
plot()
One of the most important GIS operations out there is the spatial join. There are many tasks this handles: adding data attributes based on location, counting points inside a polygon, subsetting data based on location, etc.
Let’s use tigris to bring in some NYC spatial data to use with the SQF events. I’d like to subset just the events that happened in the Bronx, and then I’ll eventually map numbers of events across census tracts. First, we download the tracts for Bronx county, and convert the file to simple features.
bronx <- tracts(state = "36", "005") %>%
st_as_sf()
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======= | 12%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|==================== | 32%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
Now, we can use st_join() to attach the attributes of intersecting features to our events.
bronx_sqf <- st_join(nyc_sqf, bronx)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
names(bronx_sqf)
## [1] "year" "pct" "ser_num" "datestop" "timestop" "city"
## [7] "sex" "race" "dob" "age" "height" "weight"
## [13] "haircolr" "eyecolor" "build" "othfeatr" "frisked" "searched"
## [19] "contrabn" "pistol" "riflshot" "asltweap" "knifcuti" "machgun"
## [25] "othrweap" "arstmade" "arstoffn" "sumissue" "sumoffen" "crimsusp"
## [31] "detailcm" "perobs" "perstop" "pf_hands" "pf_wall" "pf_grnd"
## [37] "pf_drwep" "pf_ptwep" "pf_baton" "pf_hcuff" "pf_pepsp" "pf_other"
## [43] "cs_objcs" "cs_descr" "cs_casng" "cs_lkout" "cs_cloth" "cs_drgtr"
## [49] "cs_furtv" "cs_vcrim" "cs_bulge" "cs_other" "rf_vcrim" "rf_othsw"
## [55] "rf_attir" "rf_vcact" "rf_rfcmp" "rf_verbl" "rf_knowl" "rf_furt"
## [61] "rf_bulg" "sb_hdobj" "sb_outln" "sb_admis" "sb_other" "ac_proxm"
## [67] "ac_evasv" "ac_assoc" "ac_cgdir" "ac_incid" "ac_time" "ac_stsnd"
## [73] "ac_rept" "ac_inves" "ac_other" "forceuse" "inout" "trhsloc"
## [79] "premname" "addrnum" "stname" "stinter" "crossst" "addrpct"
## [85] "sector" "beat" "post" "typeofid" "othpers" "explnstp"
## [91] "repcmd" "revcmd" "offunif" "offverb" "officrid" "offshld"
## [97] "radio" "recstat" "linecm" "STATEFP" "COUNTYFP" "TRACTCE"
## [103] "GEOID" "NAME" "NAMELSAD" "MTFCC" "FUNCSTAT" "ALAND"
## [109] "AWATER" "INTPTLAT" "INTPTLON" "geometry"
Notice all the new columns from the Bronx census tracts! These columns have empty values where our point data did not overlap the Bronx, so we can use that attribute to subest our data.
table(bronx_sqf$COUNTYFP)
##
## 005
## 6553
bronx_sqf <- filter(bronx_sqf, COUNTYFP == "005")
nrow(bronx_sqf)
## [1] 6553
Finally, let’s count the number of 2014 SQF events in Bronx census tracts. This time, there will be a duplicate of each tract for every point within it, so we can reduce this file back to unique tracts by the number of times it appears.
bronx_events <- st_join(bronx, nyc_sqf) %>%
group_by(GEOID) %>% # group by unique tract ID
mutate(events = n()) %>%
ungroup() %>%
select(GEOID, events, geometry) %>% # drop unwanted columns
unique() # keep only unique tracts
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
bronx_events
## Simple feature collection with 339 features and 2 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -73.93381 ymin: 40.78536 xmax: -73.74806 ymax: 40.91758
## epsg (SRID): 4269
## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## # A tibble: 339 x 3
## GEOID events geometry
## <chr> <int> <POLYGON [°]>
## 1 36005006100 17 ((-73.92688 40.81948, -73.92665 40.82005, -73.92632…
## 2 36005006400 1 ((-73.87394 40.83598, -73.8738 40.83597, -73.87298 …
## 3 36005006700 148 ((-73.92178 40.81953, -73.92148 40.8202, -73.92119 …
## 4 36005006800 5 ((-73.87319 40.8299, -73.87275 40.83003, -73.87179 …
## 5 36005007000 4 ((-73.86872 40.82956, -73.86776 40.82969, -73.86815…
## 6 36005007100 21 ((-73.91763 40.81605, -73.91676 40.81656, -73.91599…
## 7 36005007300 15 ((-73.91258 40.81493, -73.91226 40.81549, -73.91194…
## 8 36005007500 28 ((-73.91138 40.81737, -73.91061 40.81922, -73.91035…
## 9 36005007700 11 ((-73.90692 40.81853, -73.90644 40.81969, -73.906 4…
## 10 36005007800 1 ((-73.85738 40.83214, -73.85627 40.83184, -73.85598…
## # ... with 329 more rows
There is a large and growing collection of packages that enable different kinds of mapping. I’ll introduce the ggplot2 options available, as well as how to make a simple interactive web map using the leaflet package.
It’s easy to do a quick plot of sfc objects. By default, plot methods generally try to visualize all the data available, so if you just want to quickly see the geometry of an object, you can just feed it as an sfc.
st_geometry(bronx_events) %>% plot()
ggplot2 has built-in functions to handle plotting sf objects. This stuff is in-progress and has some bugs (you ahve to use data = on the sf object), so I recommend looking into a package called tmap. I’ve never used it as I make my “best” maps with ArcGIS, but it’s effective for lots of data and visual styles and is very popular.
ggplot() +
geom_sf(data = bronx_events, aes(fill = events))
You can add more data objects by adding another geom_sf() function.
samp <- sample_n(bronx_sqf, 50)
ggplot() +
geom_sf(data = bronx_events, aes(fill = events), size = 0.5) +
geom_sf(data = samp, col = "red", shape = 3)
One trick I really like comes from the ggmap package. This package will generate a web map tile from a range of sources, including Google. I prefer a clean, simple open source background.
library(ggmap)
m <- get_map("bronx", maptype = "toner-lite", zoom = 12)
## maptype = "toner-lite" is only available with source = "stamen".
## resetting to source = "stamen"...
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=bronx&zoom=12&size=640x640&scale=2&maptype=terrain&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=bronx&sensor=false
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1536.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1536.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1536.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1537.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1537.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1537.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1538.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1538.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1538.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1206/1539.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1207/1539.png
## Map from URL : http://tile.stamen.com/toner-lite/12/1208/1539.png
ggmap(m) +
geom_sf(data = bronx_events, aes(fill = events),
size = 0, alpha = 0.7,
inherit.aes = FALSE) # needs this because bugs
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Finally, you can use the leaflet package to create an interactive web maps. To do this, the data’s CRS must be in lat/lon coordinates using WGS84, the datum for web cartography. Leaflet has a website with lots of tutorials on how to use their (very complicated) package.
library(leaflet)
pal <- colorNumeric(palette = "Blues", domain = bronx_events$events)
st_transform(bronx_events, 4326) %>% # transform to correct CRS
leaflet() %>%
addTiles() %>%
addPolygons(stroke = FALSE, smoothFactor = 0.3, fillOpacity = 0.8,
fillColor = ~pal(events), label = ~paste0("Tract ", GEOID, " - ", events, " stops"))
This is only a small taste of the spatial data capabilities of R. Keep learning and seraching on your own, best of luck!