Class 04: Geographic Commands in R

Introduction

In this class we study the basic commands and functions for spatial analysis. In order to start with the basic understanding of spatial analysis, we install basic packages sf, raster, spData, and spDataLarge.

  • sf: This library helps us to work with Vector data, i.e. maps, lines, points, and geography information in a 2D maps.
  • raster: This library helps us to work with raster data, i.e. maps with other kind of information than vectors such as: land altitute, mountains information, and so on.
  • spData: This library helps us to upload geographic data and work with it easily.
  • spDataLarge: This library helps us to upload geographic data with large information that commonly can not be done in spData.
#install.packages("sf")
#install.packages("raster")
#install.packages("spData")
#remotes::install_github("Nowosad/spDataLarge")

library(sf)        
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(raster)     
## Loading required package: sp
library(spData)        
library(spDataLarge)  

The vignette commnad let us to search information from the command we are curious.

#vignette(package = "sf") # see which vignettes are available
#vignette("sf1")          # an introduction to the package

Maps

Here by using data got from spData we have access to:

  • alaska - Alaska multipolygon
  • baltimore - House sales prices, Baltimore, MD 1978
  • hawaii - Hawaii multipolygon
  • lnd - The boroughs of London
  • nz - The regions of New Zealand
  • urban_agglomerations - Major urban areas worldwide
  • us_states - US states polygons
  • world - World country polygons

We list the name of the variables in some of our datasets, then we plot them by using the comand plot:

names(world)
##  [1] "iso_a2"    "name_long" "continent" "region_un" "subregion" "type"     
##  [7] "area_km2"  "pop"       "lifeExp"   "gdpPercap" "geom"
names(baltimore)
##  [1] "STATION" "PRICE"   "NROOM"   "DWELL"   "NBATH"   "PATIO"   "FIREPL" 
##  [8] "AC"      "BMENT"   "NSTOR"   "GAR"     "AGE"     "CITCOU"  "LOTSZ"  
## [15] "SQFT"    "X"       "Y"
names(us_states)
## [1] "GEOID"        "NAME"         "REGION"       "AREA"         "total_pop_10"
## [6] "total_pop_15" "geometry"
plot(world)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all

plot(baltimore)

plot(us_states)

We obtain information and summarize it by using the command summary:

data(world)
summary(world["lifeExp"])
##     lifeExp                 geom    
##  Min.   :50.62   MULTIPOLYGON :177  
##  1st Qu.:64.96   epsg:4326    :  0  
##  Median :72.87   +proj=long...:  0  
##  Mean   :70.85                      
##  3rd Qu.:76.78                      
##  Max.   :83.59                      
##  NA's   :10
summary(world)
##     iso_a2           name_long          continent          region_un        
##  Length:177         Length:177         Length:177         Length:177        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   subregion             type              area_km2             pop           
##  Length:177         Length:177         Min.   :    2417   Min.   :5.630e+04  
##  Class :character   Class :character   1st Qu.:   46185   1st Qu.:3.755e+06  
##  Mode  :character   Mode  :character   Median :  185004   Median :1.040e+07  
##                                        Mean   :  832558   Mean   :4.282e+07  
##                                        3rd Qu.:  621860   3rd Qu.:3.075e+07  
##                                        Max.   :17018507   Max.   :1.364e+09  
##                                                           NA's   :10         
##     lifeExp        gdpPercap                   geom    
##  Min.   :50.62   Min.   :   597.1   MULTIPOLYGON :177  
##  1st Qu.:64.96   1st Qu.:  3752.4   epsg:4326    :  0  
##  Median :72.87   Median : 10734.1   +proj=long...:  0  
##  Mean   :70.85   Mean   : 17106.0                      
##  3rd Qu.:76.78   3rd Qu.: 24232.7                      
##  Max.   :83.59   Max.   :120860.1                      
##  NA's   :10      NA's   :17

Lets reduce the data set for get a sample from the first two rows and the first three columns, i.e. first 3 variables. Note that the information includes the variable geometry because it includes the geographic needed information.

world_mini <- world[1:2, 1:3]
world_mini
## Simple feature collection with 2 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -18.28799 xmax: 180 ymax: -0.95
## geographic CRS: WGS 84
## # A tibble: 2 x 4
##   iso_a2 name_long continent                                                geom
##   <chr>  <chr>     <chr>                                      <MULTIPOLYGON [°]>
## 1 FJ     Fiji      Oceania   (((180 -16.06713, 180 -16.55522, 179.3641 -16.8013~
## 2 TZ     Tanzania  Africa    (((33.90371 -0.95, 34.07262 -1.05982, 37.69869 -3.~

With the st_as_sf we modify our data to simple feature type data.

library(sp)
(world_sp <- as(world, Class = "Spatial"))
## class       : SpatialPolygonsDataFrame 
## features    : 177 
## extent      : -180, 180, -90, 83.64513  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 10
## names       : iso_a2,   name_long,     continent,               region_un,      subregion,              type,         area_km2,        pop,          lifeExp,        gdpPercap 
## min values  :     AE, Afghanistan,        Africa,                  Africa,     Antarctica,           Country, 2416.87048266498,      56295,           50.621, 597.135168986395 
## max values  :     ZW,    Zimbabwe, South America, Seven seas (open ocean), Western Europe, Sovereign country, 17018507.4094666, 1364270000, 83.5878048780488,  120860.06755829
(world_sf <- st_as_sf(world_sp))
## Simple feature collection with 177 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## geographic CRS: WGS 84
## First 10 features:
##    iso_a2        name_long     continent region_un          subregion
## 1      FJ             Fiji       Oceania   Oceania          Melanesia
## 2      TZ         Tanzania        Africa    Africa     Eastern Africa
## 3      EH   Western Sahara        Africa    Africa    Northern Africa
## 4      CA           Canada North America  Americas   Northern America
## 5      US    United States North America  Americas   Northern America
## 6      KZ       Kazakhstan          Asia      Asia       Central Asia
## 7      UZ       Uzbekistan          Asia      Asia       Central Asia
## 8      PG Papua New Guinea       Oceania   Oceania          Melanesia
## 9      ID        Indonesia          Asia      Asia South-Eastern Asia
## 10     AR        Argentina South America  Americas      South America
##                 type    area_km2       pop  lifeExp gdpPercap
## 1  Sovereign country    19289.97    885806 69.96000  8222.254
## 2  Sovereign country   932745.79  52234869 64.16300  2402.099
## 3      Indeterminate    96270.60        NA       NA        NA
## 4  Sovereign country 10036042.98  35535348 81.95305 43079.143
## 5            Country  9510743.74 318622525 78.84146 51921.985
## 6  Sovereign country  2729810.51  17288285 71.62000 23587.338
## 7  Sovereign country   461410.26  30757700 71.03900  5370.866
## 8  Sovereign country   464520.07   7755785 65.23000  3709.082
## 9  Sovereign country  1819251.33 255131116 68.85600 10003.089
## 10 Sovereign country  2784468.59  42981515 76.25200 18797.548
##                          geometry
## 1  MULTIPOLYGON (((180 -16.067...
## 2  MULTIPOLYGON (((33.90371 -0...
## 3  MULTIPOLYGON (((-8.66559 27...
## 4  MULTIPOLYGON (((-122.84 49,...
## 5  MULTIPOLYGON (((-122.84 49,...
## 6  MULTIPOLYGON (((87.35997 49...
## 7  MULTIPOLYGON (((55.96819 41...
## 8  MULTIPOLYGON (((141.0002 -2...
## 9  MULTIPOLYGON (((141.0002 -2...
## 10 MULTIPOLYGON (((-68.63401 -...

Lets plot data for some variables from our world dataset:

plot(world[3:6])

plot(world["pop"])

In order to get information only for part of the data, first we explore what we information we have in the variable continent, and thwen we can create subsamples for Aia and South America:

table(world$continent)
## 
##                  Africa              Antarctica                    Asia 
##                      51                       1                      47 
##                  Europe           North America                 Oceania 
##                      39                      18                       7 
## Seven seas (open ocean)           South America 
##                       1                      13
(world_asia <- world[world$continent == "Asia", ])
## Simple feature collection with 47 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 26.04335 ymin: -10.35999 xmax: 145.5431 ymax: 55.38525
## geographic CRS: WGS 84
## # A tibble: 47 x 11
##    iso_a2 name_long continent region_un subregion type  area_km2    pop lifeExp
##    <chr>  <chr>     <chr>     <chr>     <chr>     <chr>    <dbl>  <dbl>   <dbl>
##  1 KZ     Kazakhst~ Asia      Asia      Central ~ Sove~ 2729811. 1.73e7    71.6
##  2 UZ     Uzbekist~ Asia      Asia      Central ~ Sove~  461410. 3.08e7    71.0
##  3 ID     Indonesia Asia      Asia      South-Ea~ Sove~ 1819251. 2.55e8    68.9
##  4 TL     Timor-Le~ Asia      Asia      South-Ea~ Sove~   14715. 1.21e6    68.3
##  5 IL     Israel    Asia      Asia      Western ~ Coun~   22991. 8.22e6    82.2
##  6 LB     Lebanon   Asia      Asia      Western ~ Sove~   10099. 5.60e6    79.2
##  7 PS     Palestine Asia      Asia      Western ~ Disp~    5037. 4.29e6    73.1
##  8 JO     Jordan    Asia      Asia      Western ~ Sove~   89207. 8.81e6    74.0
##  9 AE     United A~ Asia      Asia      Western ~ Sove~   79881. 9.07e6    76.9
## 10 QA     Qatar     Asia      Asia      Western ~ Sove~   11328. 2.37e6    77.9
## # ... with 37 more rows, and 2 more variables: gdpPercap <dbl>,
## #   geom <MULTIPOLYGON [°]>
(asia <- st_union(world_asia))
## Geometry set for 1 feature 
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 26.04335 ymin: -10.35999 xmax: 145.5431 ymax: 55.38525
## geographic CRS: WGS 84
## MULTIPOLYGON (((120.295 -10.25865, 118.9678 -9....
world_samerica = world[world$continent == "South America", ]
samerica = st_union(world_samerica)

Then, we can plot the data for the variable continent:

For this plot, cex it help us to refer to some information we are going to use in our map, in this case we use population. st_centroid help us to calculate the center of a polygon that going to be included in our map by converting the geometry of polygons into points.

plot(world["continent"], reset = FALSE)
cex <- sqrt(world$pop) / 10000
world_cents <- st_centroid(world, of_largest = TRUE)
## Warning in st_centroid.sf(world, of_largest = TRUE): st_centroid assumes
## attributes are constant over geometries of x
## Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
## of_largest_polygon): st_centroid does not give correct centroids for longitude/
## latitude data
plot(st_geometry(world_cents), add = TRUE, cex = cex)

Lets plot the map for china: expandBB takes a numeric vector of lenght four that expands the bounding bo of th eplot relative to zero in the following order: bottom, left, top, right.

china = world[world$name_long == "China", ]
plot(st_geometry(china), expandBB = c(0, 0.2, 0.1, 1), col = "gray", lwd = 3)
plot(world_asia[0], add = TRUE)

Lets try with another country:

peru = world[world$name_long == "Peru", ]
plot(st_geometry(peru), expandBB = c(0.5, 1, 0.5, 1), col = "red", lwd = 4)
plot(world_samerica[0], add = TRUE)

Geometries:

Geometries are composed by a set of mathematical representations of spatial relationships, measures, and properties. We have points, lines, and polygons.

## MULTIPOINT
multipoint_matrix = rbind(c(5, 2), c(1, 3), c(3, 4), c(3, 2))
st_multipoint(multipoint_matrix)
## MULTIPOINT ((5 2), (1 3), (3 4), (3 2))
plot(st_multipoint(multipoint_matrix), axes = TRUE) 

## LINESTRING
linestring_matrix = rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2), c(3, 2))
st_linestring(linestring_matrix)
## LINESTRING (1 5, 4 4, 4 1, 2 2, 3 2)
plot(st_linestring(linestring_matrix), axes = TRUE)

## POLYGON
polygon_list = list(rbind(c(1, 5), c(2, 2), c(4, 1), c(4, 4), c(1, 5)))
st_polygon(polygon_list)
## POLYGON ((1 5, 2 2, 4 1, 4 4, 1 5))
plot(st_polygon(polygon_list), axes = TRUE)

## POLYGON with a hole
polygon_border = rbind(c(1, 5), c(2, 2), c(4, 1), c(4, 4), c(1, 5))
polygon_hole = rbind(c(2, 4), c(3, 4), c(3, 3), c(2, 3), c(2, 4))
polygon_with_hole_list = list(polygon_border, polygon_hole)
st_polygon(polygon_with_hole_list)
## POLYGON ((1 5, 2 2, 4 1, 4 4, 1 5), (2 4, 3 4, 3 3, 2 3, 2 4))
plot(st_polygon(polygon_with_hole_list), axes = TRUE)

## MULTILINESTRING
multilinestring_list = list(rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2), c(3, 2)), 
                            rbind(c(1, 2), c(2, 4)))
st_multilinestring((multilinestring_list))
## MULTILINESTRING ((1 5, 4 4, 4 1, 2 2, 3 2), (1 2, 2 4))
plot(st_multilinestring((multilinestring_list)), axes = TRUE)

## MULTIPOLYGON
multipolygon_list = list(list(rbind(c(1, 5), c(2, 2), c(4, 1), c(4, 4), c(1, 5))),
                         list(rbind(c(0, 2), c(1, 2), c(1, 3), c(0, 3), c(0, 2))))
st_multipolygon(multipolygon_list)
## MULTIPOLYGON (((1 5, 2 2, 4 1, 4 4, 1 5)), ((0 2, 1 2, 1 3, 0 3, 0 2)))
plot(st_multipolygon(multipolygon_list), axes = TRUE)

## GEOMETRYCOLLECTION
gemetrycollection_list = list(st_multipoint(multipoint_matrix),
                              st_linestring(linestring_matrix))
st_geometrycollection(gemetrycollection_list)
## GEOMETRYCOLLECTION (MULTIPOINT ((5 2), (1 3), (3 4), (3 2)), LINESTRING (1 5, 4 4, 4 1, 2 2, 3 2))
plot(st_geometrycollection(gemetrycollection_list), axes = TRUE)

A sfc object only contains a single simple feature geometry.

# sfc POINT
point1 = st_point(c(5, 2))
point2 = st_point(c(1, 3))
points_sfc = st_sfc(point1, point2)
points_sfc
## Geometry set for 2 features 
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1 ymin: 2 xmax: 5 ymax: 3
## CRS:            NA
## POINT (5 2)
## POINT (1 3)
plot(points_sfc)

# sfc POLYGON
polygon_list1 = list(rbind(c(1, 5), c(2, 2), c(4, 1), c(4, 4), c(1, 5)))
polygon1 = st_polygon(polygon_list1)
polygon_list2 = list(rbind(c(0, 2), c(1, 2), c(1, 3), c(0, 3), c(0, 2)))
polygon2 = st_polygon(polygon_list2)
polygon_sfc = st_sfc(polygon1, polygon2)
st_geometry_type(polygon_sfc)
## [1] POLYGON POLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
plot(polygon_sfc)

# sfc MULTILINESTRING
multilinestring_list1 = list(rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2), c(3, 2)), 
                             rbind(c(1, 2), c(2, 4)))
multilinestring1 = st_multilinestring((multilinestring_list1))
multilinestring_list2 = list(rbind(c(2, 9), c(7, 9), c(5, 6), c(4, 7), c(2, 7)), 
                             rbind(c(1, 7), c(3, 8)))
multilinestring2 = st_multilinestring((multilinestring_list2))
multilinestring_sfc = st_sfc(multilinestring1, multilinestring2)
st_geometry_type(multilinestring_sfc)
## [1] MULTILINESTRING MULTILINESTRING
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
plot(multilinestring_sfc)

# sfc GEOMETRY
point_multilinestring_sfc = st_sfc(point1, multilinestring1)
st_geometry_type(point_multilinestring_sfc)
## [1] POINT           MULTILINESTRING
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

The crs stands for Coordinate Reference System, which refers the way we represent earth’s surface in a 2D surface.

The WGS84 is the most popular coordination reference system. It stands for World Geodetic System made in 1984, and used for GPS systems. Other popular reference system is the NAD83, the North American Datum of 1983, it uses the center of the earth as coordinate origin.

## sfc objects can additionally store information on the coordinate reference systems (CRS).
st_crs(points_sfc)
## Coordinate Reference System: NA
# EPSG definition
points_sfc_wgs = st_sfc(point1, point2, crs = 4326)
st_crs(points_sfc_wgs)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World"],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]
# PROJ4STRING definition
st_sfc(point1, point2, crs = "+proj=longlat +datum=WGS84 +no_defs")
## Geometry set for 2 features 
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1 ymin: 2 xmax: 5 ymax: 3
## CRS:            +proj=longlat +datum=WGS84 +no_defs
## POINT (5 2)
## POINT (1 3)
##############  The sf class
lnd_point = st_point(c(0.1, 51.5))                 # sfg object
lnd_geom = st_sfc(lnd_point, crs = 4326)           # sfc object
lnd_attrib = data.frame(                           # data.frame object
  name = "London",
  temperature = 25,
  date = as.Date("2017-06-21")
)
lnd_sf = st_sf(lnd_attrib, geometry = lnd_geom)    # sf object
lnd_sf
## Simple feature collection with 1 feature and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 0.1 ymin: 51.5 xmax: 0.1 ymax: 51.5
## geographic CRS: WGS 84
##     name temperature       date         geometry
## 1 London          25 2017-06-21 POINT (0.1 51.5)
class(lnd_sf)
## [1] "sf"         "data.frame"

Raster data

It is composed by cells formed by vertical and horizontal lines, each of those cells are called pixeles. Each of those cells might has information saved in it such as color, elevation information, and so on.

raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
new_raster = raster(raster_filepath)

# Zion National Park (Utah, USA). For example, srtm.tif is a digital elevation model of this area 

new_raster
## class      : RasterLayer 
## dimensions : 457, 465, 212505  (nrow, ncol, ncell)
## resolution : 0.0008333333, 0.0008333333  (x, y)
## extent     : -113.2396, -112.8521, 37.13208, 37.51292  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : C:/Program Files/R/R-4.0.2/library/spDataLarge/raster/srtm.tif 
## names      : srtm 
## values     : 1024, 2892  (min, max)
ncell(new_raster)   #number of cells (pixels)
## [1] 212505
res(new_raster)     #spatial resolution
## [1] 0.0008333333 0.0008333333
plot(new_raster)

spplot(new_raster)  #new plot type

#Rasters can also be created from scratch using the raster() function.
new_raster2 = raster(nrows = 6, ncols = 6, res = 0.5, 
                     xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5,
                     vals = 1:36)
plot(new_raster2)

Rasterbrick

A rasterbrick consists of multiple layers, which typically correspond to a single mutispectral stellite file or a single multilayer object in memory. the brick() function creates a Rasterbrick object.

multi_raster_file = system.file("raster/landsat.tif", package = "spDataLarge")
r_brick = brick(multi_raster_file)
r_brick
## class      : RasterBrick 
## dimensions : 1428, 1128, 1610784, 4  (nrow, ncol, ncell, nlayers)
## resolution : 30, 30  (x, y)
## extent     : 301905, 335745, 4111245, 4154085  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=12 +datum=WGS84 +units=m +no_defs 
## source     : C:/Program Files/R/R-4.0.2/library/spDataLarge/raster/landsat.tif 
## names      : landsat.1, landsat.2, landsat.3, landsat.4 
## min values :      7550,      6404,      5678,      5252 
## max values :     19071,     22051,     25780,     31961
nlayers(r_brick)
## [1] 4
raster_on_disk = raster(r_brick, layer = 1)
raster_in_memory = raster(xmn = 301905, xmx = 335745,
                          ymn = 4111245, ymx = 4154085, 
                          res = 30)
values(raster_in_memory) = sample(seq_len(ncell(raster_in_memory)))
crs(raster_in_memory) = crs(raster_on_disk)
r_stack = stack(raster_in_memory, raster_on_disk)
r_stack
## class      : RasterStack 
## dimensions : 1428, 1128, 1610784, 2  (nrow, ncol, ncell, nlayers)
## resolution : 30, 30  (x, y)
## extent     : 301905, 335745, 4111245, 4154085  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=12 +datum=WGS84 +units=m +no_defs 
## names      :   layer, landsat.1 
## min values :       1,      7550 
## max values : 1610784,     19071

Decision on which Raster class should be used depends mostly on the character of input data. Processing of a single mulitilayer file or object is the most effective with RasterBrick, while RasterStack allows calculations based on many files, many Raster* objects, or both.