This Notebook will demonstrate how to import various types of vector GIS data into R.
First let’s look at layers in the data folder, by passing the directory to st_layers() from the sf package. This will show us the Shapefiles but not layers that are in ‘containers’, like file geodatabases, geojson files, etc.
library(sf)
## View spatial layers in the data folder.
st_layers("./data")
Driver: ESRI Shapefile
Available layers:
Import the ‘yose_boundary’ layer (a Shapefile)
yose_bnd_ll <- st_read(dsn="./data", layer="yose_boundary")
Reading layer `yose_boundary' from data source
`/home/lindangulopez/Desktop/SpatialRWorkshop/data' using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -119.8864 ymin: 37.4947 xmax: -119.1964 ymax: 38.18515
Geodetic CRS: North_American_Datum_1983
# This also works:
# yose_bnd_ll <- st_read(dsn="./data/yose_boundary.shp")
Note 1: we don’t need to add the .shp extension
Note 2: this code is using convention to name variables yose_bnd_ll.
yose - all Yosemite layers start with this
bnd - tell me this the park boundary
ll - lat-long coordinates
Write an expression that returns the class (type) of yose_bnd_ll. Answer
## Your answer here
class(yose_bnd_ll)
[1] "sf" "data.frame"
We see that yose_bnd_ll is both a sf object (simple feature data frame) as well as a data.frame. This means we should be able to use the functions designed for either of those objects.
View the properties of yose_bnd_ll by simply running it by itself:
yose_bnd_ll
Simple feature collection with 1 feature and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -119.8864 ymin: 37.4947 xmax: -119.1964 ymax: 38.18515
Geodetic CRS: North_American_Datum_1983
UNIT_CODE
1 YOSE
GIS_NOTES
1 Lands - http://landsnet.nps.gov/tractsnet/documents/YOSE/METADATA/yose_metadata.xml
UNIT_NAME DATE_EDIT STATE REGION GNIS_ID UNIT_TYPE CREATED_BY
1 Yosemite National Park 2016-01-27 CA PW 255923 National Park Lands
METADATA PARKNAME
1 http://nrdata.nps.gov/programs/Lands/YOSE_METADATA.xml Yosemite
geometry
1 POLYGON ((-119.8456 37.8327...
What coordinate reference system is yose_bnd_ll in? Answer
The names() function returns the column labels of a data frame (in this case the attribute table).
## View column names in the attribute table
names(yose_bnd_ll)
[1] "UNIT_CODE" "GIS_NOTES" "UNIT_NAME" "DATE_EDIT" "STATE" "REGION"
[7] "GNIS_ID" "UNIT_TYPE" "CREATED_BY" "METADATA" "PARKNAME" "geometry"
Take note of the last column - geometry. That’s where the geometry is saved (we’ll come back to that later).
View the first few rows of the attribute table with head():
head(yose_bnd_ll)
Simple feature collection with 1 feature and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -119.8864 ymin: 37.4947 xmax: -119.1964 ymax: 38.18515
Geodetic CRS: North_American_Datum_1983
UNIT_CODE
1 YOSE
GIS_NOTES
1 Lands - http://landsnet.nps.gov/tractsnet/documents/YOSE/METADATA/yose_metadata.xml
UNIT_NAME DATE_EDIT STATE REGION GNIS_ID UNIT_TYPE CREATED_BY
1 Yosemite National Park 2016-01-27 CA PW 255923 National Park Lands
METADATA PARKNAME
1 http://nrdata.nps.gov/programs/Lands/YOSE_METADATA.xml Yosemite
geometry
1 POLYGON ((-119.8456 37.8327...
To plot just the geometry of a sf object (i.e., no symbology from the attribute table), we can use the st_geometry() function.
## Plot the geometry (outline) of the Yosemite boundary
plot(yose_bnd_ll %>% st_geometry(), asp=1)
Add axes=TRUE to your plot() statement. Answer
## Plot the geometry (outline) of the Yosemite boundary
plot(yose_bnd_ll %>% st_geometry(), asp=1, axes=TRUE)
Import the Yosemite Points-of-Interest (POI) Shapefile and plot them. Answer
yose_poi_ll <-st_read(dsn = "./data", layer = "yose_poi")
Reading layer `yose_poi' from data source
`/home/lindangulopez/Desktop/SpatialRWorkshop/data' using driver `ESRI Shapefile'
Simple feature collection with 2720 features and 30 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 246416.2 ymin: 4153717 xmax: 301510.7 ymax: 4208419
Projected CRS: NAD83 / UTM zone 11N
plot(yose_poi_ll %>% st_geometry())
kml & kmz files can have more than one layer. Hence the source is the kml file, and you must specify the layer by name.
Import a kml containing the National Register of Historic Places in Yosemite in Yosemite. First find the KML file:
## Import KML file
kml_fn <- "./data/yose_historic_pts.kml"
file.exists(kml_fn)
[1] TRUE
View the layers within this KML:
## View the layers in this kml
st_layers(kml_fn)
Driver: LIBKML
Available layers:
Import:
## Import the 'yosem_historic_places' layer
yose_hp_ll <- st_read(kml_fn, layer="yose_historic_places")
Reading layer `yose_historic_places' from data source
`/home/lindangulopez/Desktop/SpatialRWorkshop/data/yose_historic_pts.kml'
using driver `LIBKML'
Simple feature collection with 35 features and 15 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -119.8447 ymin: 37.51356 xmax: -119.2165 ymax: 38.08368
Geodetic CRS: WGS 84
View its properties:
## View properties
yose_hp_ll
Simple feature collection with 35 features and 15 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -119.8447 ymin: 37.51356 xmax: -119.2165 ymax: 38.08368
Geodetic CRS: WGS 84
First 10 features:
Name description timestamp begin end
1 Hetch Hetchy Railroad Engine No. 6 <NA> <NA> <NA> <NA>
2 Hodgdon Homestead Cabin <NA> <NA> <NA> <NA>
3 Rangers' Club <NA> <NA> <NA> <NA>
4 Buck Creek Cabin <NA> <NA> <NA> <NA>
5 Wawona Covered Bridge <NA> <NA> <NA> <NA>
6 Crane Flat Fire Lookout <NA> <NA> <NA> <NA>
7 Glacier Point Trailside Museum <NA> <NA> <NA> <NA>
8 McCauley Cabin <NA> <NA> <NA> <NA>
9 Bagby Stationhouse <NA> <NA> <NA> <NA>
10 Great Sierra Mine <NA> <NA> <NA> <NA>
altitudeMode tessellate extrude visibility drawOrder icon prop_type status
1 <NA> -1 0 -1 NA <NA> Structure Listed
2 <NA> -1 0 -1 NA <NA> Structure Listed
3 <NA> -1 0 -1 NA <NA> <NA> Listed
4 <NA> -1 0 -1 NA <NA> Building Eligible
5 <NA> -1 0 -1 NA <NA> Structure Listed
6 <NA> -1 0 -1 NA <NA> Building Listed
7 <NA> -1 0 -1 NA <NA> Building Listed
8 <NA> -1 0 -1 NA <NA> Building Listed
9 <NA> -1 0 -1 NA <NA> Building Listed
10 <NA> -1 0 -1 NA <NA> Site Listed
date period geometry
1 1978/01/30 08:00:00+00 1921-1958 POINT (-119.786 37.67437)
2 1978/06/07 07:00:00+00 1879-1960 POINT (-119.656 37.53924)
3 <NA> 1920-Present POINT (-119.5883 37.74735)
4 2004/08/23 07:00:00+00 1931-1938 POINT (-119.4897 37.56131)
5 2007/01/11 08:00:00+00 1868-1933 POINT (-119.656 37.53859)
6 1996/04/04 08:00:00+00 1931 POINT (-119.8207 37.75978)
7 1978/04/04 08:00:00+00 1924 POINT (-119.5731 37.72916)
8 1977/03/08 08:00:00+00 1902-1972 POINT (-119.3676 37.87812)
9 1979/04/13 08:00:00+00 1907 POINT (-119.7862 37.67439)
10 1978/05/24 07:00:00+00 1881-1884 POINT (-119.2688 37.9276)
Remember to overlay more than one layer on a plot:
## Plot the boundary, then the historic places
{plot(yose_bnd_ll %>% st_geometry(), asp=1)
plot(yose_hp_ll %>% st_geometry(), add=TRUE)}
Import the California county boundaries, which is saved as a GeoJSON file.
## Import a Geojson file
counties_fn <- "./data/ca_counties.geojson"
file.exists(counties_fn)
[1] TRUE
View the layers in this GeoJSON file:
## View the layers
st_layers(counties_fn)
Driver: GeoJSON
Available layers:
Import the ‘ca_counties’ layer:
## Import the 'ca_counties' layer
ca_counties_ll <- st_read(counties_fn)
Reading layer `ca_counties' from data source
`/home/lindangulopez/Desktop/SpatialRWorkshop/data/ca_counties.geojson'
using driver `GeoJSON'
Simple feature collection with 58 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XYZ
Bounding box: xmin: -124.4096 ymin: 32.53416 xmax: -114.1312 ymax: 42.00952
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Plot the county boundaries. Answer
ca_counties_ll <-st_read(counties_fn)
Reading layer `ca_counties' from data source
`/home/lindangulopez/Desktop/SpatialRWorkshop/data/ca_counties.geojson'
using driver `GeoJSON'
Simple feature collection with 58 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XYZ
Bounding box: xmin: -124.4096 ymin: 32.53416 xmax: -114.1312 ymax: 42.00952
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
You can import (but not write to) an ESRI file geodatabase using the sf package. In this case, the source is the folder containing the geodatabase.
Import the Yosemite’s trails from a geodatabase. First find the gdb file:
## Define the path to the file geodatabase (a folder)
gdb_fn <- "./data/yose_trails.gdb"
file.exists(gdb_fn)
[1] TRUE
View the layers in this source:
st_layers(gdb_fn)
Driver: OpenFileGDB
Available layers:
Import the ‘Trails’ layer
## Import the 'Trails' layer (case sensitive!)
yose_trails <- st_read(gdb_fn, layer="Trails")
Reading layer `Trails' from data source
`/home/lindangulopez/Desktop/SpatialRWorkshop/data/yose_trails.gdb'
using driver `OpenFileGDB'
Simple feature collection with 1074 features and 13 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 245134 ymin: 4153668 xmax: 323239.7 ymax: 4250703
Projected CRS: NAD83 / UTM zone 11N
Plot Yosemite’s Trails:
## Plot the trails layer
plot(st_geometry(yose_trails), axes=TRUE)
The following code does not work to make a plot of the park boundary and the trails. Can you tell why? Answer
{plot(yose_bnd_ll %>% st_geometry())
plot(yose_trails %>% st_geometry(), add=TRUE)}
Let’s import Yosemite’s watersheds from a geopackage file.
## Import watersheds from a geopackage
gpkg_watershd_fn <- "./data/yose_watersheds.gpkg"
file.exists(gpkg_watershd_fn)
[1] TRUE
st_layers(gpkg_watershd_fn)
Driver: GPKG
Available layers:
yose_watersheds <- st_read(gpkg_watershd_fn, layer="calw221")
Reading layer `calw221' from data source
`/home/lindangulopez/Desktop/SpatialRWorkshop/data/yose_watersheds.gpkg'
using driver `GPKG'
Simple feature collection with 127 features and 38 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 1383.82 ymin: -61442.93 xmax: 81596.71 ymax: 26405.66
Projected CRS: unnamed
Plot the watersheds:
plot(st_geometry(yose_watersheds), axes=TRUE)
What CRS are the Yosemite watersheds in? Answer
ANS. California Equal Albers (a common projection for statewide data in California)
st_crs(yose_watersheds)
Coordinate Reference System:
User input: unnamed
wkt:
BOUNDCRS[
SOURCECRS[
PROJCRS["unnamed",
BASEGEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]],
CONVERSION["unnamed",
METHOD["Albers Equal Area",
ID["EPSG",9822]],
PARAMETER["Latitude of 1st standard parallel",34,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8823]],
PARAMETER["Latitude of 2nd standard parallel",40.5,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8824]],
PARAMETER["Latitude of false origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8821]],
PARAMETER["Longitude of false origin",-120,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8822]],
PARAMETER["Easting at false origin",0,
LENGTHUNIT["Meter",1],
ID["EPSG",8826]],
PARAMETER["Northing at false origin",-4000000,
LENGTHUNIT["Meter",1],
ID["EPSG",8827]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["Meter",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["Meter",1]]]],
TARGETCRS[
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]],
ABRIDGEDTRANSFORMATION["Transformation from NAD83 to WGS84",
METHOD["Position Vector transformation (geog2D domain)",
ID["EPSG",9606]],
PARAMETER["X-axis translation",0,
ID["EPSG",8605]],
PARAMETER["Y-axis translation",0,
ID["EPSG",8606]],
PARAMETER["Z-axis translation",0,
ID["EPSG",8607]],
PARAMETER["X-axis rotation",0,
ID["EPSG",8608]],
PARAMETER["Y-axis rotation",0,
ID["EPSG",8609]],
PARAMETER["Z-axis rotation",0,
ID["EPSG",8610]],
PARAMETER["Scale difference",1,
ID["EPSG",8611]]]]
Import a CSV file containing missing persons records. Step 1 is to import it as a data frame:
## Import missing people csv file
missing_df <- read.csv("./data/yosemite_missing_people.csv", stringsAsFactors = FALSE)
tibble::glimpse(missing_df)
Rows: 213
Columns: 49
$ X <dbl> -119.6632, -119.8099, -119.5958, -119.5599, -119.5937, -119.60…
$ Y <dbl> 37.66355, 37.76910, 37.74595, 37.75631, 37.74561, 37.74521, 37…
$ OBJECTID_1 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
$ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
$ Georef_Unc <dbl> 336.3710, 526.3630, 56.3650, 126.3640, 41.3650, 846.5152, 41.3…
$ Distance <dbl> 1340.26046, 1293.06310, 0.00000, 1760.04205, 357.14291, 1823.4…
$ Type <chr> "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP",…
$ Lat <dbl> 37.66355, 37.76910, 37.74595, 37.75631, 37.74561, 37.74521, 37…
$ Long <dbl> -119.6632, -119.8099, -119.5958, -119.5599, -119.5937, -119.60…
$ Extent <dbl> 310, 500, 30, 100, 15, 15, 15, 25, 15, 405, 15, 30, 150, 15, 4…
$ CaseNumber <int> 20090248, 20090652, 20090940, 20091134, 20091252, 20091345, 20…
$ SARNumber <int> 2009004, 2009014, 2009024, 2009029, 2009036, 2009042, 2009043,…
$ IncidYear <int> 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 20…
$ DateTimeLa <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-…
$ DateTimeIn <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-…
$ DateTimeSu <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-…
$ DateTIme_1 <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-…
$ ContactMet <chr> "Subject Cell Phone", "Reported Missing", "Reported Missing", …
$ EcoRegionD <chr> "Temperate", "Temperate", "Temperate", "Temperate", "Temperate…
$ EcoRegio_1 <chr> "M260 Mediterranean Regime Mountains", "M260 Mediterranean Reg…
$ IncidType <chr> "Search", "Separated Party", "Overdue", "Search", "Separated P…
$ NumberofSu <int> 1, 1, 1, 1, 1, 1, 2, 2, 3, 1, 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 2,…
$ GroupDynam <chr> "Solo Subject", "Solo Subject", "Solo Subject", "Solo Subject"…
$ SubjectCat <chr> "Mental Retardation", "Hiker", "Child (13-15)", "Hiker", "Chil…
$ SubSex <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Group - Mixe…
$ SubAge <int> 31, 0, 14, 35, 6, 29, 0, 0, 0, 23, 0, 0, 54, 0, 15, 13, 72, 19…
$ IPPType <chr> "LKP", "PLS", "LKP", "LKP", "PLS", "PLS", "PLS", "PLS", "LKP",…
$ IPPClassif <chr> "Locality Description (Added)", "Woods", "Building", "Locality…
$ IncidContr <chr> "Darkness", "Unknown", "Unknown", "Snow/Ice", "Unknown", "Unkn…
$ IncidOutco <chr> "Subject Found Alive", "Subject Found Alive", "Subject Found A…
$ Scenario <chr> "Lost", "Separated", "Overdue", "Lost", "Separated", "Overdue"…
$ SubjMedInj <chr> "None", "None", "None", "None", "None", "None", "None", "Other…
$ RescueMeth <chr> "Snow Machine", "Walkout", "Other", "Helicopter", "Other", "Ot…
$ LostPerson <chr> "Route Traveling", "Route Traveling", "Unknown", "Unknown", "N…
$ IPP_GR_Loc <chr> "Badger Pass Ski Area", "Tuolumne Grove", "Lower Falls Restroo…
$ IPP_GR_Typ <chr> "NEAR A FEATURE", "FEATURE (NAMED PLACE)", "NEAR A FEATURE", "…
$ IPP_GR_Pat <chr> "Null", "Null", "Null", "Null", "Null", "Trail", "Null", "Null…
$ IPP_GR_Not <chr> "Subject's last known point was described as \"Near Badger Pas…
$ Intended_D <chr> "Unknown", "Unknown", "Top of Yosemite Falls", "Loop - back to…
$ FindFeatur <chr> "Forest/woods", "Road", "Structure", "Forest/woods", "Structur…
$ Found_GR_L <chr> "Eagle Chair Lift", "Tuolumne Grove Parking", "Lower Falls Res…
$ Found_GR_T <chr> "OFFSET DIRECTION", "FEATURE (NAMED PLACE)", "NEAR A FEATURE",…
$ Found_GR_P <chr> "Null", "Null", "Null", "Null", "Null", "Road", "Null", "Road"…
$ Found_GR_N <chr> "Found just south of the top of the Eale Chair Lift at Badger …
$ Motorized_ <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, …
$ Incident_N <chr> "Subject was snowshoeing, became disoriented, and called for h…
$ TotalTimeM <int> 18, -19, 5, 15, 1, 5, -12, 8, 33, 37, 21, 24, 5, -19, 20, 17, …
$ TotalSearc <int> 1, -19, 0, 1, 1, 1, -22, 5, 13, 28, 2, 1, 2, -23, 1, 2, 1, 1, …
$ GlobalID <chr> "083c9dbc-711f-4127-861d-b2f7b5bb0470", "5f387c80-547a-4a46-97…
Step 2 is to convert it to a sf data frame. We can surmise from the column names that the coordinates are geographic. We don’t know precisely which datum, but passing crs=4326 (WGS84) will be close enough.
## Convert to sf and plot
yose_missing_ll <- st_as_sf(missing_df, coords=c("Long", "Lat"), crs=4326)
Plot to make sure:
{plot(yose_bnd_ll %>% st_geometry(), col=NA, border="chartreuse4", lwd=3, main = "Missing Persons!")
plot(yose_missing_ll %>% st_geometry(), pch=16, cex=0.5, add=TRUE)}
Look at the other GIS files in the data folder. Select one, import it, and plot it.
## Your answer here
{plot(yose_bnd_ll %>% st_geometry(), col=NA, border="chartreuse4", lwd=3, main = "Missing Persons!")
plot(yose_missing_ll %>% st_geometry(), pch=16, cex=0.5, add=TRUE)
plot(st_geometry(yose_watersheds), axes=TRUE)
}
NA
NA