In this project, I will mainly use two key packages for spatial analysis, the new sf package(https://cran.r-project.org/web/packages/sf/sf.pdf) for working with vector data and the raster package(https://cran.r-project.org/web/packages/raster/raster.pdf) for working with grids to analyze New York City data

For vector data I will use st_read() from sf package, which can be used to read in a wide range of spatial types including shapefiles, geojson, gps, netcdf and others. For raster data we will talk about the raster() and brick() functions. The raster() function reads single-band rasters and brick() reads in multi-band rasters. Single-band, also referred to as a single-layer, raster is one that has just a single set of values for each grid cell. With multi-band rasters, on the other hand, there may be multiple layers, meaning each grid cell might have multiple values. Examples include satellite images where each band represents ranges of frequencies along the electromagnetic spectrum. In a “true color” satellite image, an image that looks like a photo, you would have three bands, one each for the red, green and blue bands of light. In order to read in this type of raster, we would use the brick function.

Reading vector data

The sf package, created by Edzer Pebesma and colleagues, has dramatically simplified reading vector spatial data into R.

setwd("~/Desktop/Spatial Analysis")
trees <- st_read("trees.shp")
## Reading layer `trees' from data source `/Users/sunjie/Desktop/Spatial Analysis/trees.shp' using driver `ESRI Shapefile'
## Simple feature collection with 65217 features and 7 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.2546 ymin: 40.49894 xmax: -73.70078 ymax: 40.91165
## Geodetic CRS:  WGS 84
neighborhoods <- st_read("neighborhoods.shp")
## Reading layer `neighborhoods' from data source `/Users/sunjie/Desktop/Spatial Analysis/neighborhoods.shp' using driver `ESRI Shapefile'
## Simple feature collection with 195 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS:  WGS 84
parks <- st_read("parks.shp")
## Reading layer `parks' from data source `/Users/sunjie/Desktop/Spatial Analysis/parks.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2008 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25609 ymin: 40.49449 xmax: -73.70905 ymax: 40.91133
## Geodetic CRS:  WGS 84
head(trees)
## Simple feature collection with 6 features and 7 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.13116 ymin: 40.62351 xmax: -73.80057 ymax: 40.77393
## Geodetic CRS:  WGS 84
##   tree_id  nta longitude latitude stump_diam      species      boroname
## 1  558423 QN76 -73.80057 40.67035          0  honeylocust        Queens
## 2  286191 MN32 -73.95422 40.77393          0 Callery pear     Manhattan
## 3  257044 QN70 -73.92309 40.76196          0  Chinese elm        Queens
## 4  603262 BK09 -73.99866 40.69312          0       cherry      Brooklyn
## 5   41769 SI22 -74.11773 40.63166          0       cherry Staten Island
## 6   24024 SI07 -74.13116 40.62351          0    red maple Staten Island
##                     geometry
## 1 POINT (-73.80057 40.67035)
## 2 POINT (-73.95422 40.77393)
## 3 POINT (-73.92309 40.76196)
## 4 POINT (-73.99866 40.69312)
## 5 POINT (-74.11773 40.63166)
## 6 POINT (-74.13116 40.62351)

There are three shapefiles (one point file and two polygon files). it shows some header metadata about the file and a special geometry column.

Reading raster data

The term “raster” refers to gridded data that can include satellite imagery, aerial photographs (like orthophotos) and other types. In R, raster data can be handled using the raster package created by Robert J. Hijmans.

When working with raster data, one of the most important things to keep in mind is that the raw data can be what is known as “single-band” or “multi-band” and these are handled a little differently in R. Single-band rasters are the simplest, these have a single layer of raster values – a classic example would be an elevation raster where each cell value represents the elevation at that location.

Multi-band rasters will have more than one layer. An example is a color aerial photo in which there would be one band each representing red, green or blue light.

# Read in the tree canopy single-band raster
canopy <- raster("canopy.tif")
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition
# Read in the manhattan Landsat image multi-band raster
manhattan <- brick("manhattan.tif")

# Get the class for the new objects
class(canopy)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
class(manhattan)
## [1] "RasterBrick"
## attr(,"package")
## [1] "raster"
# Identify how many layers each object has
nlayers(canopy)
## [1] 1
nlayers(manhattan)
## [1] 3

sf objects are data frames

Spatial objects in sf are just data frames with some special properties. This means that packages like dplyr can be used to manipulate sf objects. Thus I will use the dplyr functions select() to select or drop variables, filter() to filter the data and mutate() to add or alter columns.

# Read in the trees shapefile
trees <- st_read("trees.shp")
## Reading layer `trees' from data source `/Users/sunjie/Desktop/Spatial Analysis/trees.shp' using driver `ESRI Shapefile'
## Simple feature collection with 65217 features and 7 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.2546 ymin: 40.49894 xmax: -73.70078 ymax: 40.91165
## Geodetic CRS:  WGS 84
trees
## Simple feature collection with 65217 features and 7 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.2546 ymin: 40.49894 xmax: -73.70078 ymax: 40.91165
## Geodetic CRS:  WGS 84
## First 10 features:
##    tree_id  nta longitude latitude stump_diam          species      boroname
## 1   558423 QN76 -73.80057 40.67035          0      honeylocust        Queens
## 2   286191 MN32 -73.95422 40.77393          0     Callery pear     Manhattan
## 3   257044 QN70 -73.92309 40.76196          0      Chinese elm        Queens
## 4   603262 BK09 -73.99866 40.69312          0           cherry      Brooklyn
## 5    41769 SI22 -74.11773 40.63166          0           cherry Staten Island
## 6    24024 SI07 -74.13116 40.62351          0        red maple Staten Island
## 7   333768 BK45 -73.93787 40.61385          0     silver maple      Brooklyn
## 8   107210 MN03 -73.94048 40.80822          0 London planetree     Manhattan
## 9   584738 SI12 -74.15531 40.63013          0         sweetgum Staten Island
## 10  388799 QN27 -73.86853 40.76100          0 London planetree        Queens
##                      geometry
## 1  POINT (-73.80057 40.67035)
## 2  POINT (-73.95422 40.77393)
## 3  POINT (-73.92309 40.76196)
## 4  POINT (-73.99866 40.69312)
## 5  POINT (-74.11773 40.63166)
## 6  POINT (-74.13116 40.62351)
## 7  POINT (-73.93787 40.61385)
## 8  POINT (-73.94048 40.80822)
## 9  POINT (-74.15531 40.63013)
## 10   POINT (-73.86853 40.761)
# Use filter() to limit to honey locust trees
honeylocust <- trees %>% filter(species == "honeylocust")

# Count the number of rows
nrow(honeylocust)
## [1] 6418
# Limit to tree_id and boroname variables
honeylocust_lim <- honeylocust %>% dplyr::select(tree_id, boroname) 

# Use head() to look at the first few records
head(honeylocust_lim)
## Simple feature collection with 6 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -73.97524 ymin: 40.67035 xmax: -73.80057 ymax: 40.83136
## Geodetic CRS:  WGS 84
##   tree_id boroname                   geometry
## 1  558423   Queens POINT (-73.80057 40.67035)
## 2  548625 Brooklyn POINT (-73.97524 40.68575)
## 3  549439 Brooklyn POINT (-73.94137 40.67557)
## 4  181757 Brooklyn POINT (-73.89377 40.67169)
## 5  379387   Queens  POINT (-73.8221 40.69365)
## 6  383562    Bronx POINT (-73.90041 40.83136)

Geometry is stored in list-columns

A major innovation in sf is that spatial objects are data frames. This is possible thanks, in part, to the list-column.

A list-column behaves, to a certain extent, like any other R column. The main difference is that instead of a standard value such as a single number, character or boolean value, each observation value in that column is a piece of an R list and this list can be as complex as needed. The list column allows us to store far more information in a single variable and sf takes advantage of this by storing all geographic information for each feature in the list.

# Create a standard, non-spatial data frame with one column
df <- data.frame(a = 1:3)

# Add a list column to your data frame
df$b <- list(1:4, 1:5, 1:10)

# Look at your data frame with head
head(df)
##   a                             b
## 1 1                    1, 2, 3, 4
## 2 2                 1, 2, 3, 4, 5
## 3 3 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
# Convert your data frame to a tibble and print on console
as_tibble(df)
## # A tibble: 3 x 2
##       a b         
##   <int> <list>    
## 1     1 <int [4]> 
## 2     2 <int [5]> 
## 3     3 <int [10]>
# Pull out the third observation from both columns individually
df$a[3]
## [1] 3
df$b[3]
## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10

##Extracting geometric information from your vector layers There are several functions in sf that allow us to access geometric information like area from the vector features. For example, the functions st_area() and st_length() return the area and length of the features, respectively.

Note that the result of functions like st_area() and st_length() will not be a traditional vector. Instead the result has a class of units which means the vector result is accompanied by metadata describing the object’s units.

parks <- st_read("parks.shp")
## Reading layer `parks' from data source `/Users/sunjie/Desktop/Spatial Analysis/parks.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2008 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25609 ymin: 40.49449 xmax: -73.70905 ymax: 40.91133
## Geodetic CRS:  WGS 84
# Compute the areas of the parks
areas <- st_area(parks)

# Create a quick histogram of the areas using hist
hist(areas, xlim = c(0, 200000), breaks = 1000)

# Filter to parks greater than 30000 (square meters)
big_parks <- parks %>% filter(unclass(areas) > 30000)

# Plot just the geometry of big_parks
plot(st_geometry(big_parks))

First look at plotting vector spatial objects

I still plot the raw geometry with no attribute color-coding with the st_geometry() function to extract the geometry and plot the result. Either create a new object or nest st_geometry() within the plot() function.

plot(parks)
## Warning: plotting the first 9 out of 14 attributes; use max.plot = 14 to plot
## all

#head(parks)
# Plot just the acres attribute of the parks data
plot(parks["acres"])

# Create a new object of just the parks geometry
#Frequently I just want to plot the raw geometry with no attribute color-coding (e.g., adding county boundaries to a map of points). For this, I can use the st_geometry() function to extract the geometry and plot the result. You can either create a new object or you can nest st_geometry() within the plot() function.
parks_geo <- st_geometry(parks)

plot(parks_geo)

Learning about your raster objects

Instead of storing raster objects in data frames, the raster package stores spatial data in specially designed R classes that contain slots where the data and metadata are stored. The data and metadata can be accessed using a suite of functions. For example, the spatial extent (the bounding box) of the object can be accessed with extent(), the coordinate reference system can be accessed with crs() and the number of grid cells can be determined with ncell().

# Read in the rasters
canopy <- raster("canopy.tif")
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition
manhattan <- brick("manhattan.tif")

# Get the extent of the canopy object
extent(canopy)
## class      : Extent 
## xmin       : 1793685 
## xmax       : 1869585 
## ymin       : 2141805 
## ymax       : 2210805
# Get the CRS of the manhattan object
crs(manhattan)
## CRS arguments:
##  +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
# Determine the number of grid cells in both raster objects
ncell(manhattan)
## [1] 619173
ncell(canopy)
## [1] 58190

Accessing raster data values

Raster data can be very big depending on the extent and resolution (grid size). In order to deal with this the raster() and brick() functions are designed to only read in the actual raster values as needed. This could use the inMemory() function on an object to verify and it will return FALSE if the values are not in memory, and if I use the head() function, the raster package will read in only the values needed, not the full set of values.

The raster package only reads in raster values as needed to save space in memory. I can get the values manually using the getValues() function.

# Check if the data is in memory
#inMemory(canopy)

# Use head() to peak at the first few records
#head(canopy)

# Use getValues() to read the values into a vector
vals <- getValues(canopy)

# Use hist() to create a histogram of the values
hist(vals)

Plot your raster object

The plot() function can be used to plot single layers while the plotRGB() function can be used to combine layers into a single image.

# Plot the canopy raster (single raster)
plot(canopy)

# Plot the manhattan raster (as a single image for each layer)
plot(manhattan)

# Plot the manhattan raster as an image
plotRGB(manhattan)

Vector and raster coordinate systems

In order to perform any spatial analysis with more than one layer, the layers should share the same coordinate reference system (CRS) and the first step is determining what coordinate reference system data has. To do this I will use of the sf function st_crs() and the raster function crs().

When the geographic data reading in with sf already has a CRS defined both sf and raster will recognize and retain it. When the CRS is not defined I will need to define it using either the EPSG number or the proj4string.

# Determine the CRS for the neighborhoods and trees vector objects
st_crs(neighborhoods)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["D_unknown",
##         ELLIPSOID["WGS84",6378137,298.257223563,
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["Degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["Degree",0.0174532925199433]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["Degree",0.0174532925199433]]]
st_crs(trees)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["D_unknown",
##         ELLIPSOID["WGS84",6378137,298.257223563,
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["Degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["Degree",0.0174532925199433]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["Degree",0.0174532925199433]]]
# Assign the CRS to trees
crs_1 <- "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(trees) <- crs_1
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
# Determine the CRS for the canopy and manhattan rasters
crs(canopy)
## CRS arguments:
##  +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0
## +ellps=GRS80 +units=m +no_defs
crs(manhattan)
## CRS arguments:
##  +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
# Assign the CRS to manhattan
crs_2 <- "+proj=utm +zone=18 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"
crs(manhattan) <- crs_2

Transform layers to a common CRS

When ran st_crs() and crs() beforehand, the CRS’ were different for the different layers. The vector layer’s CRS began with +proj=longlat and the raster layer’s began with +proj=utm or +proj=aea. In order to use these layers together in spatial analysis they need to have the same CRS.

I will transform the objects so they share a single CRS. It is generally best to perform spatial analysis with layers that have a projected CRS. To determine if the object has a projected CRS we can look at the first part of the result from st_crs() or crs() – if it begins with +proj=longlat then the CRS is unprojected.

method = “ngb” will be used in call to projectRaster() to prevent distortion in the manhattan image.

# Get the CRS from the canopy object
the_crs <- crs(canopy, asText = TRUE)

# Project trees to match the CRS of canopy
trees_crs <- st_transform(trees, crs = the_crs)

# Project neighborhoods to match the CRS of canopy
neighborhoods_crs <- st_transform(neighborhoods, crs = the_crs)

# Project manhattan to match the CRS of canopy
manhattan_crs <- projectRaster(manhattan, crs = the_crs, method = "ngb")
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
## = prefer_proj): Discarded datum Unknown based on GRS80 ellipsoid in Proj4
## definition
# Look at the CRS to see if they match
st_crs(trees_crs)
## Coordinate Reference System:
##   User input: +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs 
##   wkt:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on GRS80 ellipsoid",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",7019]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Albers Equal Area",
##             ID["EPSG",9822]],
##         PARAMETER["Latitude of false origin",23,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-96,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",29.5,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",45.5,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
st_crs(neighborhoods_crs)
## Coordinate Reference System:
##   User input: +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs 
##   wkt:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on GRS80 ellipsoid",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",7019]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Albers Equal Area",
##             ID["EPSG",9822]],
##         PARAMETER["Latitude of false origin",23,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-96,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",29.5,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",45.5,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
crs(manhattan_crs)
## CRS arguments:
##  +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0
## +ellps=GRS80 +units=m +no_defs

Plot vector and raster together

If the layers do not share a common CRS they may not align on a plot. To illustrate, I will initially create a plot with the plot() function and try to add two layers that do not share the same CRS. Then transforming one layer’s CRS to match the other and ploting this with both the plot() function.

Note that for this exercise we returned all the layers to their original CRS and did not retain the changes you made in the last exercise.

plot(canopy)
plot(neighborhoods, add = TRUE)
## Warning in plot.sf(neighborhoods, add = TRUE): ignoring all but the first
## attribute

# See if canopy and neighborhoods share a CRS
st_crs(neighborhoods)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["D_unknown",
##         ELLIPSOID["WGS84",6378137,298.257223563,
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["Degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["Degree",0.0174532925199433]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["Degree",0.0174532925199433]]]
crs(canopy)
## CRS arguments:
##  +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0
## +ellps=GRS80 +units=m +no_defs
# Save the CRS of the canopy layer
the_crs <- crs(canopy, asText = TRUE)

# Transform the neighborhoods CRS to match canopy
neighborhoods_crs <- st_transform(neighborhoods, crs = the_crs)

# Re-run plotting code (run both lines together)
# Do the neighborhoods show up now?
plot(canopy)
plot(neighborhoods_crs, add = TRUE)
## Warning in plot.sf(neighborhoods_crs, add = TRUE): ignoring all but the first
## attribute

# Simply run the tmap code
tm_shape(canopy) + 
    tm_raster() + 
    tm_shape(neighborhoods_crs) + 
    tm_polygons(alpha = 0.5)

They mostly can’t be plot layers together if they don’t have the same CRS.

Dropping geometry from a data frame

In sf we can use data frames for storing spatial objects. This allows us to slice and dice your spatial data in the same way you do for non-spatial data.

# Create a data frame of counts by species
species_counts <- count(trees, species, sort = TRUE)
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
head(species_counts)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOINT
## Dimension:     XY
## Bounding box:  xmin: -74.25443 ymin: 40.49894 xmax: -73.70104 ymax: 40.91165
## CRS:           +proj=longlat +ellps=WGS84 +no_defs
##             species    n                       geometry
## 1  London planetree 8709 MULTIPOINT ((-74.25408 40.5...
## 2       honeylocust 6418 MULTIPOINT ((-74.25426 40.5...
## 3      Callery pear 5902 MULTIPOINT ((-74.25443 40.5...
## 4           pin oak 5355 MULTIPOINT ((-74.25329 40.5...
## 5      Norway maple 3373 MULTIPOINT ((-74.25443 40.5...
## 6 littleleaf linden 3043 MULTIPOINT ((-74.25032 40.5...
# Drop the geometry column
species_no_geometry <- st_set_geometry(species_counts, NULL)

head(species_no_geometry)
##             species    n
## 1  London planetree 8709
## 2       honeylocust 6418
## 3      Callery pear 5902
## 4           pin oak 5355
## 5      Norway maple 3373
## 6 littleleaf linden 3043

The sf package conveniently allows us to apply dplyr verbs to spatial objects. The default of keeping a geometry column is not always what we want, but we can drop that column with st_set_geometry(data, NULL).

Join spatial and non-spatial data

Testing joining spatial and non-spatial data. The trees data has a full county name (the variable is called boroname) but does not have the county codes. The neighborhoods file has both a county name (the variable is called boro_name) and the county codes – neighborhoods are nested within counties. I will create a non-spatial data frame of county name and county code from the neighborhoods object. Then I will join this data frame into the spatial trees object with inner_join().

Simplify the neighborhood boundaries

In sf st_simplify() function can used to reduce line and polygon complexity. Computing the size in megabytes using the handy object_size() function in the pryr package and counting the number of vertices – the number of points required to delineate a line or polygon.

# Plot the neighborhoods geometry
plot(st_geometry(neighborhoods), col = "grey")

# Measure the size of the neighborhoods object
object_size(neighborhoods)
## 1.84 MB
# Compute the number of vertices in the neighborhoods object
pts_neighborhoods <- st_cast(neighborhoods$geometry, "MULTIPOINT")
cnt_neighborhoods <- sapply(pts_neighborhoods, length)
sum(cnt_neighborhoods)
## [1] 210736
# Simplify the neighborhoods object
neighborhoods_simple <- st_simplify(neighborhoods, 
                            preserveTopology = TRUE, 
                            dTolerance = 100)
## Warning in st_simplify.sfc(st_geometry(x), preserveTopology, dTolerance):
## st_simplify does not correctly simplify longitude/latitude data, dTolerance
## needs to be in decimal degrees
# Measure the size of the neighborhoods_simple object
object_size(neighborhoods_simple)
## 194 kB
# Compute the number of vertices in the neighborhoods_simple object
pts_neighborhoods_simple <- st_cast(neighborhoods_simple$geometry, "MULTIPOINT")
cnt_neighborhoods_simple <- sapply(pts_neighborhoods_simple, length)
sum(cnt_neighborhoods_simple)
## [1] 2898
# Plot the neighborhoods_simple object geometry
plot(st_geometry(neighborhoods_simple), col = "grey")

The neighborhoods object was reduced from nearly 2 megabytes and more than 200,000 points to 215 Kb (0.215 megabytes) and 8614 points but it can barely be seen the difference in the plots. The size reduction is 8x smaller and 20x fewer points. This is a good way to reduce object size for computation or, for example, for display on the web.

Converting sf objects to sp objects

Using as() function with Class = “Spatial” to convert an sf object to an sp object (which has a Spatial class), using st_as_sf() and accept the defaults to convert back to sf.

trees <- st_read("trees.shp")
## Reading layer `trees' from data source `/Users/sunjie/Desktop/Spatial Analysis/trees.shp' using driver `ESRI Shapefile'
## Simple feature collection with 65217 features and 7 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.2546 ymin: 40.49894 xmax: -73.70078 ymax: 40.91165
## Geodetic CRS:  WGS 84
# Convert to Spatial class
trees_sp <- as(trees, Class = "Spatial")
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
## = prefer_proj): Discarded datum Unknown based on WGS84 ellipsoid in Proj4
## definition
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum D_unknown in Proj4 definition
# Confirm conversion, should be "SpatialPointsDataFrame"
class(trees_sp)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
# Convert back to sf
trees_sf <- st_as_sf(trees_sp)

# Confirm conversion
class(trees_sf)
## [1] "sf"         "data.frame"

Converting to and from coordinates

Using st_as_sf() function to convert a data frame of coordinates into an sf object. Specifing the coords argument with the names of the coordinate variables (with the X coordinate/longitude coordinate listed first) and, the crs argument if knowing the CRS of coordinates. The CRS can be specified as a proj4 string or EPSG code.

Using the st_write() function with a hidden argument to convert sf point objects to a data frame with coordinates,forcing sf to include the coordinates in the output file.

# Read in the CSV
#trees <- read.csv("trees.csv")
#head(trees)

# Convert the data frame to an sf object
#trees_sf <- st_as_sf(trees, coords =c("longitude","latitude"), crs = 4326)

# Plot the geometry of the points
#plot(st_geometry(trees_sf))

# Write the file out with coordinates
#st_write(trees_sf, "new_trees.csv",  layer_options = "GEOMETRY=AS_XY", delete_dsn = TRUE)

# Read in the file you just created and check coordinates
#new_trees <- read.csv("new_trees.csv")
#head(new_trees)

Change the raster grid cell size using aggregate

For rasters, the function to reduce resolution is aggregate(), which aggregates grid cells into larger grid cells using a user-defined function (for example, mean or max). The function used to aggregate the values is determined by the fun argument (the default being mean) and the amount of aggregation is driven by the fact.

canopy <- raster("canopy.tif")
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition
plot(canopy)

# Determine the raster resolution
res(canopy)
## [1] 300 300
# Determine the number of cells
ncell(canopy)
## [1] 58190
# Aggregate the raster
canopy_small <- aggregate(canopy, fact = 10)

# Plot the new canopy layer
plot(canopy_small)

# Determine the new raster resolution
res(canopy_small)
## [1] 3000 3000
# Determine the number of cells in the new raster
ncell(canopy_small)
## [1] 598

Reading in a raster and then converted it to a lower resolution raster to save on the size of the object and ultimately computation power required.

Change values and handle missing values in rasters

When changing outlier values to NA, in the raster package, reclassification is performed with the reclassify() function.

# Plot the canopy layer to see the values above 100
plot(canopy)

# Set up the matrix
vals <- cbind(100, 300, NA)

# Reclassify 
canopy_reclass <- reclassify(canopy, rcl = vals)

plot(canopy_reclass)

Buffer layers

Computing buffers is a key spatial analysis skill and the resulting buffers have a wide range of uses, for example, identifying the number of roads within one kilometer of a school or computing the number of hazardous waste sites near sensitive natural areas. It is highly recommended that transforming unprojected data to a projected CRS before buffering, although we can buffer data with unprojected coodinate reference systems, but the buffer distance will be more meaningful with a projected CRS.

# Recall df
#df

# Convert the data frame to an sf object             
#df_sf <- st_as_sf(df, coords = c("longitude", "latitude"), crs = 4326)

# Transform the points to match the manhattan CRS
#df_crs <- st_transform(df_sf, crs = crs(manhattan, asText = TRUE))

# Buffer the points
#df_buf <- st_buffer(df_crs, dist = 1000)

#plotRGB(manhattan)
#plot(st_geometry(df_buf), col = "firebrick", add = TRUE)
#plot(st_geometry(df_crs), pch = 16, add = TRUE)

Compute polygon centroids

Similar to buffering, computing polygon centroids is a bedrock geoprocessing task used to assign values and even to help with labeling maps. The function for this in sf is st_centroid().

Also similar to buffering, centroid calculations should generally be performed on data with a projected coordinate reference system.

neighborhoods <- st_read("neighborhoods.shp")
## Reading layer `neighborhoods' from data source `/Users/sunjie/Desktop/Spatial Analysis/neighborhoods.shp' using driver `ESRI Shapefile'
## Simple feature collection with 195 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS:  WGS 84
# Project neighborhoods to match manhattan
neighborhoods_tf <- st_transform(neighborhoods, crs = 32618)

# Compute the neighborhood centroids
centroids <- st_centroid(neighborhoods_tf)
## Warning in st_centroid.sf(neighborhoods_tf): st_centroid assumes attributes are
## constant over geometries of x
# Plot the neighborhood geometry
plot(st_geometry(neighborhoods_tf), col = "grey", border = "white")
plot(centroids, pch = 16, col = "firebrick", add = TRUE)
## Warning in plot.sf(centroids, pch = 16, col = "firebrick", add = TRUE): ignoring
## all but the first attribute

# Plot the neighborhoods and beech trees

plot(st_geometry(neighborhoods), col = "grey", border = "white")
#plot(beech, add = TRUE, pch = 16, col = "forestgreen")

# Compute the coordinates of the bounding box
#st_bbox(beech)

# Create a bounding box polygon
#beech_box <- st_make_grid(beech, n = 1)

# Plot the neighborhoods, add the beech trees and add the new box
plot(st_geometry(neighborhoods), col = "grey", border = "white")

#plot(beech, add = TRUE, pch = 16, col = "forestgreen")
#plot(beech_box, add = TRUE)

The first step toward computing tree densities, knowing tree counts by neighborhoods. Next is to compute the neighborhood areas.

#Producing counts of all trees by neighborhood in NYC and create a single data frame with a column for total trees. The result should be a data frame with no geometry.

# Compute the counts of all trees by nta
tree_counts <- count(trees_crs, nta)
#trees
#trees_crs
head(tree_counts)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOINT
## Dimension:     XY
## Bounding box:  xmin: 1827381 ymin: 2163958 xmax: 1836505 ymax: 2178211
## CRS:           +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs
##    nta   n                       geometry
## 1 BK09 174 MULTIPOINT ((1827381 217733...
## 2 BK17 499 MULTIPOINT ((1833357 216625...
## 3 BK19 132 MULTIPOINT ((1832524 216468...
## 4 BK21 136 MULTIPOINT ((1829872 216471...
## 5 BK23  53 MULTIPOINT ((1831901 216497...
## 6 BK25 396 MULTIPOINT ((1831774 216841...
# Remove the geometry
tree_counts_no_geom <- st_set_geometry(tree_counts, NULL)

# Rename the n variable to tree_cnt
tree_counts_renamed <- rename(tree_counts_no_geom, tree_cnt = n)
  
hist(tree_counts_renamed$tree_cnt)

Then, computing average tree canopy by neighborhood as a percentage so that we can compare if the results are similar.

# Confirm that you have the neighborhood density results
head(neighborhoods)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.00736 ymin: 40.61264 xmax: -73.77574 ymax: 40.8355
## Geodetic CRS:  WGS 84
##   county_fip boro_name ntacode            ntaname boro_code
## 1        047  Brooklyn    BK88       Borough Park         3
## 2        081    Queens    QN52      East Flushing         4
## 3        081    Queens    QN48         Auburndale         4
## 4        081    Queens    QN51        Murray Hill         4
## 5        081    Queens    QN27      East Elmhurst         4
## 6        005     Bronx    BX35 Morrisania-Melrose         2
##                         geometry
## 1 MULTIPOLYGON (((-73.97605 4...
## 2 MULTIPOLYGON (((-73.79493 4...
## 3 MULTIPOLYGON (((-73.77574 4...
## 4 MULTIPOLYGON (((-73.80379 4...
## 5 MULTIPOLYGON (((-73.8611 40...
## 6 MULTIPOLYGON (((-73.89697 4...
# Transform the neighborhoods CRS to match the canopy layer
neighborhoods_crs <- st_transform(neighborhoods, crs = crs(canopy, asText = TRUE))

# Convert neighborhoods object to a Spatial object
neighborhoods_sp <- as(neighborhoods_crs,"Spatial")
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
## = prefer_proj): Discarded datum Unknown based on GRS80 ellipsoid in Proj4
## definition
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj
## = prefer_proj): Discarded datum Unknown based on GRS80 ellipsoid in Proj4
## definition
# Compute the mean of canopy values by neighborhood
canopy_neighborhoods <- extract(canopy, neighborhoods_sp, fun = mean)

# Add the mean canopy values to neighborhoods
neighborhoods_avg_canopy <- mutate(neighborhoods, avg_canopy = canopy_neighborhoods)

We transformed the neighborhoods object’s CRS. This is actually not strictly necessary because extract() can transform CRS on the fly. But it will be needed for plotting and other operations later so doing manually is important here.