library(sp)
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(raster)
library(ggplot2)
There are two fundamental geographic data models: vector and raster.
The vector data model represents the world using points, lines and polygons. These have discrete, well-defined borders, meaning that vector datasets usually have a high level of precision.
The raster data model divides the surface up into cells of constant size. Raster datasets are used for storing remote sensing data as well as in web-mapping. Rasters aggregate spatially specific features to a given resolution.
The main vector data types are points, lines and polygons. In all cases, the geometry of these data structures consists of sets of coordinate pairs (x, y). Points are the simplest case: each point has one coordinate pair, and n associated variables. It is also possible to combine several points into a multi-point structure, with a single attribute record. For example, all the coffee shops in a town could be considered as a single geometry.
The geometry of lines is a just a little bit more complex. First note that the term ‘line’ refers to a set of one or more polylines (connected series of line segments). For example, in spatial analysis, a river and all its tributaries could be considered as a single ‘line’ (but they could also also be several lines, perhaps one for each tributary river). Lines are represented as ordered sets of coordinates (nodes). The actual line segments can be computed by connecting the points. Thus, the representation of a line is very similar to that of a multi-point structure. The main difference is that the ordering of the points is important, because we need to know which points should be connected. A network (e.g. a road or river network), is a special type of lines geometry where there is additional information about things like flow, connectivity, direction, and distance.
A polygon refers to a set of closed polylines. The geometry is very similar to that of lines, but to close a polygon the last coordinate pair coincides with the first pair. A complication with polygons is that they can have holes (for example, an island inside a lake). Also, valid polygons do not self-intersect. Again, multiple polygons can be considered as a single geometry. For example, Indonesia consists of many islands. Each island can be represented by a single polygon, but together then can be represent a single (multi-) polygon representing the entire country.
Raster data is commonly used to represent spatially continuous phenomena such as elevation. A raster divides the world into a grid of equally sized rectangles (referred to as cells or, in the context of satellite remote sensing, pixels) that all have one or more values (or missing values) for the variables of interest. A raster cell value should normally represent the average (or majority) value for the area it covers.
In contrast to vector data, in raster data the geometry is not explicitly stored as coordinates. It is implicitly set by knowing the spatial extent and the number or rows and columns in which the area is divided. From the extent and number of rows and columns, the size of the raster cells (spatial resolution) can be computed.
Continuous surface data are sometimes stored as triangulated irregular networks (TINs); these are not discussed here.
The basic data types in R are numbers, characters, logical (TRUE or FALSE) and factor values. Values of a single type can be combined in vectors and matrices, and variables of multiple types can be combined into a data.frame. We can represent (only very) basic spatial data with these data types. Let’s say we have the location (represented by longitude and latitude) of ten weather stations (named A to J) and their annual precipitation.
In the example below we make a very simple map. Note that a map is special type of plot (like a scatter plot, barplot, etc.). A map is a plot of geospatial data that also has labels and other graphical objects such as a scale bar or legend. The spatial data itself should not be referred to as a map.
name <- LETTERS[1:10]
longitude <- c(-116.7, -120.4, -116.7, -113.5, -115.5,
-120.8, -119.5, -113.7, -113.7, -110.7)
latitude <- c(45.3, 42.6, 38.9, 42.1, 35.7, 38.9,
36.2, 39, 41.6, 36.9)
stations <- cbind(longitude, latitude)
class(stations)
## [1] "matrix"
head(stations)
## longitude latitude
## [1,] -116.7 45.3
## [2,] -120.4 42.6
## [3,] -116.7 38.9
## [4,] -113.5 42.1
## [5,] -115.5 35.7
## [6,] -120.8 38.9
# Simulated rainfall data
set.seed(0)
precip <- (runif(length(latitude))*10)^3
class(precip)
## [1] "numeric"
head(precip)
## [1] 721.003613 18.716993 51.530302 187.988119 749.127376 8.203534
A map of point locations is not that different from a basic x-y scatter plot. Here I make a plot (a map in this case) that shows the location of the weather stations, and the size of the dots is proportional to the amount of precipitation. The point size is set with argument cex.
psize <- 1 + precip/500
plot(stations, cex=psize, pch=20, col='blue', main='Precipitation')
# add names to plot
text(stations, name, pos=4)
# add a legend
breaks <- c(100, 500, 1000, 2000)
legend("topright", legend=breaks, pch=20, pt.cex=psize, col='blue', bg='gray')
We can add multiple sets of points to the plot, and even draw lines and polygons:
lon <- c(-116.8, -114.2, -112.9, -111.9, -114.2, -115.4, -117.7)
lat <- c(41.3, 42.9, 42.4, 39.8, 37.6, 38.3, 37.6)
x <- cbind(lon, lat)
plot(stations, main='Precipitation')
polygon(x, col='gray', border='light gray')
lines(stations, lwd=1, col='green')
points(x, cex=2, pch=20, col='dark gray')
points(stations, cex=psize, pch=20, col='blue', main='Precipitation')
The above illustrates how numeric vectors representing locations can be used to draw simple maps. It also shows how points can (and typically are) represented by pairs of numbers, and a line and a polygons by a number of these points. Polygons is that they are “closed”, i.e. the first point coincides with the last point, but the polygon function took care of that for us.
There are cases where a simple approach like this may suffice and you may come across this in older R code or packages. Likewise, raster data could be represented by a matrix or higher-order array. Particularly when only dealing with point data such an approach may be practical. For example, a spatial data set representing points and attributes could be made by combining geometry and attributes in a single ’data.frame`.
wst <- data.frame(longitude, latitude, name, precip)
wst
However, wst is a data.frame and R does not automatically understand the special meaning of the first two columns, or to what coordinate reference system it refers (longitude/latitude, or perhaps UTM zone 17S, or ….?).
Moreover, it is non-trivial to do some basic spatial operations. For example, the polygon drawn on the map above might represent a state, and a next question might be which of the 10 stations fall within that polygon. And how about any other operation on spatial data, including reading from and writing data to files? To answer such questions there are a number of R packages that define new spatial data types and that can be used. The most important packages that define such spatial data structures are sp, sf and raster.
Package sp supports spatial data analysis in R. sp defines a set of classes to represent spatial data. A class defines a particular data type. The data.frame is an example of a class. Any particular data.frame you create is an object (instantiation) of that class.
The main reason for defining classes is to create a standard representation of a particular data type to make it easier to write functions (also known as ‘methods’) for them. In fact, the sp package does not provide many functions to modify or analyze spatial data; but the classes it defines are used in more than 100 other R packages that provide specific functionality.
Package sp introduces a number of classes with names that start with Spatial. For vector data, the basic types are the SpatialPoints, SpatialLines, and SpatialPolygons. These classes only represent geometries. To also store attributes, classes are available with these names plus DataFrame, for example, SpatialPolygonsDataFrame and SpatialPointsDataFrame. When referring to any object with a name that starts with Spatial, it is common to write Spatial. When referring to a SpatialPolygons or SpatialPolygonsDataFrame object it is common to write SpatialPolygons. The Spatial classes (and their use) are described in detail by Bivand, Pebesma and Gómez-Rubio.
It is possible to create Spatial* objects from scratch with R code. That can be very useful to create small self contained example to illustrate something, for example to ask a question about how to do a particular operation without needing to give access to the real data you are using. But in real life you will read these from a file or database, for example from a “shapefile”.
longitude <- c(-116.7, -120.4, -116.7, -113.5, -115.5, -120.8, -119.5, -113.7, -113.7, -110.7)
latitude <- c(45.3, 42.6, 38.9, 42.1, 35.7, 38.9, 36.2, 39, 41.6, 36.9)
lonlat <- cbind(longitude, latitude)
class(lonlat)
## [1] "matrix"
lonlat
## longitude latitude
## [1,] -116.7 45.3
## [2,] -120.4 42.6
## [3,] -116.7 38.9
## [4,] -113.5 42.1
## [5,] -115.5 35.7
## [6,] -120.8 38.9
## [7,] -119.5 36.2
## [8,] -113.7 39.0
## [9,] -113.7 41.6
## [10,] -110.7 36.9
Now create a SpatialPoints object
library(sp)
pts <- SpatialPoints(lonlat)
Let’s check what kind of object pts is.
class (pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
And what is inside of it
showDefault(pts)
## An object of class "SpatialPoints"
## Slot "coords":
## longitude latitude
## [1,] -116.7 45.3
## [2,] -120.4 42.6
## [3,] -116.7 38.9
## [4,] -113.5 42.1
## [5,] -115.5 35.7
## [6,] -120.8 38.9
## [7,] -119.5 36.2
## [8,] -113.7 39.0
## [9,] -113.7 41.6
## [10,] -110.7 36.9
##
## Slot "bbox":
## min max
## longitude -120.8 -110.7
## latitude 35.7 45.3
##
## Slot "proj4string":
## CRS arguments: NA
So we see that the object has the coordinates we supplied, but also a bbox. This is a ‘bounding box’, or the ‘spatial extent’ that was computed from the coordinates. There is also a “proj4string”. This stores the coordinate reference system (“crs”, discussed in more detail later). We did not provide the crs so it is unknown (NA). That is not good, so let’s recreate the object, and now provide a crs.
crdref <- CRS('+proj=longlat +datum=WGS84')
pts <- SpatialPoints(lonlat, proj4string=crdref)
pts
## class : SpatialPoints
## features : 10
## extent : -120.8, -110.7, 35.7, 45.3 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
We can use the SpatialPoints object to create a SpatialPointsDataFrame object. First we need a data.frame with the same number of rows as there are geometries.
# Generate random precipitation values, same quantity as points
precipvalue <- runif(nrow(lonlat), min=0, max=100)
df <- data.frame(ID=1:nrow(lonlat), precip=precipvalue)
class(df)
## [1] "data.frame"
df
Combine the SpatialPoints with the data.frame.
ptsdf <- SpatialPointsDataFrame(pts, data=df)
ptsdf
## class : SpatialPointsDataFrame
## features : 10
## extent : -120.8, -110.7, 35.7, 45.3 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## variables : 2
## names : ID, precip
## min values : 1, 6.17862704675645
## max values : 10, 99.1906094830483
## class : SpatialPointsDataFrame
## features : 10
## extent : -120.8, -110.7, 35.7, 45.3 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## variables : 2
## names : ID, precip
## min values : 1, 6.17862704675645
## max values : 10, 99.1906094830483
To see what is inside:
str(ptsdf)
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 10 obs. of 2 variables:
## .. ..$ ID : int [1:10] 1 2 3 4 5 6 7 8 9 10
## .. ..$ precip: num [1:10] 6.18 20.6 17.66 68.7 38.41 ...
## ..@ coords.nrs : num(0)
## ..@ coords : num [1:10, 1:2] -117 -120 -117 -114 -116 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:2] "longitude" "latitude"
## ..@ bbox : num [1:2, 1:2] -120.8 35.7 -110.7 45.3
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "longitude" "latitude"
## .. .. ..$ : chr [1:2] "min" "max"
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
showDefault(ptsdf)
## An object of class "SpatialPointsDataFrame"
## Slot "data":
## ID precip
## 1 1 6.178627
## 2 2 20.597457
## 3 3 17.655675
## 4 4 68.702285
## 5 5 38.410372
## 6 6 76.984142
## 7 7 49.769924
## 8 8 71.761851
## 9 9 99.190609
## 10 10 38.003518
##
## Slot "coords.nrs":
## numeric(0)
##
## Slot "coords":
## longitude latitude
## [1,] -116.7 45.3
## [2,] -120.4 42.6
## [3,] -116.7 38.9
## [4,] -113.5 42.1
## [5,] -115.5 35.7
## [6,] -120.8 38.9
## [7,] -119.5 36.2
## [8,] -113.7 39.0
## [9,] -113.7 41.6
## [10,] -110.7 36.9
##
## Slot "bbox":
## min max
## longitude -120.8 -110.7
## latitude 35.7 45.3
##
## Slot "proj4string":
## CRS arguments:
## +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Making a SpatialPoints object was easy. Making a SpatialLines and SpatialPolygons object is still relatively straightforward with the spLines and spPolygons functions (from the raster package).
library(raster)
lon <- c(-116.8, -114.2, -112.9, -111.9, -114.2, -115.4, -117.7)
lat <- c(41.3, 42.9, 42.4, 39.8, 37.6, 38.3, 37.6)
lonlat <- cbind(lon, lat)
lns <- spLines(lonlat, crs=crdref)
lns
## class : SpatialLines
## features : 1
## extent : -117.7, -111.9, 37.6, 42.9 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
pols <- spPolygons(lonlat, crs=crdref)
pols
## class : SpatialPolygons
## features : 1
## extent : -117.7, -111.9, 37.6, 42.9 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
The structure of the SpatialPolygons class is somewhat complex as it needs to accommodate the possibility of multiple polygons, each consisting of multiple sub-polygons, some of which may be “holes”.
str(pols)
## Formal class 'SpatialPolygons' [package "sp"] with 4 slots
## ..@ polygons :List of 1
## .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
## .. .. .. ..@ Polygons :List of 1
## .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
## .. .. .. .. .. .. ..@ labpt : num [1:2] -114.7 40.1
## .. .. .. .. .. .. ..@ area : num 19.7
## .. .. .. .. .. .. ..@ hole : logi FALSE
## .. .. .. .. .. .. ..@ ringDir: int 1
## .. .. .. .. .. .. ..@ coords : num [1:8, 1:2] -117 -114 -113 -112 -114 ...
## .. .. .. ..@ plotOrder: int 1
## .. .. .. ..@ labpt : num [1:2] -114.7 40.1
## .. .. .. ..@ ID : chr "1"
## .. .. .. ..@ area : num 19.7
## ..@ plotOrder : int 1
## ..@ bbox : num [1:2, 1:2] -117.7 37.6 -111.9 42.9
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "x" "y"
## .. .. ..$ : chr [1:2] "min" "max"
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
Fortunately, we do not need to understand how these structures are organized. The main take-home message is that they store geometries (coordinates), the name of the coordinate reference system, and attributes.
We can make use generic R function plot to make a map.
plot(pols, axes=TRUE, las=1)
plot(pols, border='blue', col='yellow', lwd=3, add=TRUE)
points(pts, col='red', pch=20, cex=3)
A lot of geospatial data is stored using the following formats: - Text file formats, such as txt or csv - Spatial file formats, such as ESRI Shapefiles or Geopackages - Database (spatial or not), such as Postgres, PostGIS, etc.
In the old days the R “geospatial approach” consisted of using 3 packages: - sp: provides a set of dedicated classes for the different vector and raster datatypes , and some analysis tools (overlays, etc.), - rgdal: input/output library built over GDAL/OGR to read and write spatial data, - rgeos: geometry library built over GEOS for all geometry operations (union, intersections, buffer, etc).
Since last year the sf (for Simple Features) package has been introduced to manipulate vector data: - It provides users with one unique class for all vector types, - It’s based on Simple Features, a formal standard (ISO 19125-1:2004) widely used in the GIS world that describes how objects in the real world can be represented, - The main class provided by sf is a data.frame, - It combines the capabilities of sp, rgdal, and rgeos under one unique package, - It is easier to install on some platforms than rgdal, - It is much faster, and scales better than sp and rgdal
According to the standard:
“A simple feature is defined by the OpenGIS Abstract specification to have both spatial and non-spatial attributes. Spatial attributes are geometry valued, and simple features are based on 2D geometry with linear interpolation between vertices”.
Basically, spatial is not that special anymore: spatial attributes can be described as a geometry, which is just another attribute of the dataset.
Several types of geometry can be implemented using this standard: POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION. There’s also more fancy stuff like CURVEPOLYGON, etc.
Simple feature types fully supported by sf.
Simple Features are represented using WKT/WKB (Well Known Text/Well Known Binary). It looks like:
### first, let's define work directory
#setwd("/Users/ivanlizarazo/Documents/datos/g4d")
### getting vector data
download.file("http://biogeo.ucdavis.edu/data/diva/adm/DEU_adm.zip",
destfile = "./datos/DEU_adm.zip" , mode='wb')
unzip("./datos/DEU_adm.zip", exdir = "./datos")
### reading a shapefile
library(sf)
deu_adm3 <- st_read("./datos/DEU_adm3.shp")
## Reading layer `DEU_adm3' from data source `/Users/ivanlizarazo/Documents/datos/g4d/notebooks/datos/DEU_adm3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 435 features and 13 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 5.871619 ymin: 47.26986 xmax: 15.03811 ymax: 55.05653
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
class(deu_adm3)
## [1] "sf" "data.frame"
attributes(deu_adm3)
## $names
## [1] "ID_0" "ISO" "NAME_0" "ID_1" "NAME_1"
## [6] "ID_2" "NAME_2" "ID_3" "NAME_3" "TYPE_3"
## [11] "ENGTYPE_3" "NL_NAME_3" "VARNAME_3" "geometry"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
## [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
## [120] 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
## [137] 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
## [154] 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
## [171] 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
## [188] 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
## [205] 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
## [222] 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
## [239] 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
## [256] 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272
## [273] 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
## [290] 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
## [307] 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
## [324] 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
## [341] 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357
## [358] 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374
## [375] 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391
## [392] 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408
## [409] 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425
## [426] 426 427 428 429 430 431 432 433 434 435
##
## $class
## [1] "sf" "data.frame"
##
## $sf_column
## [1] "geometry"
##
## $agr
## ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2
## <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## ID_3 NAME_3 TYPE_3 ENGTYPE_3 NL_NAME_3 VARNAME_3
## <NA> <NA> <NA> <NA> <NA> <NA>
## Levels: constant aggregate identity
Simple Features are represented by sf as a data.frame. The spatial attributes of the Simple Features (or geometry) are represented using WKT. Technically speaking, there are 3 types of classes implemented by the sf package:
The functions provided by sf are prefixed by st_ — similarly to PostGIS. That makes it easy to search for them on the command line too.
It is very often that point data (such as profiles) is exchanged using a text delimited format such as CSV and TXT. These files can be read using the classic read.csv, read.delim, etc., in base R. however, the readr package — from the tidyverse — does provide interesting alternatives that are much faster in read and write.
Let’s take the example dataset meuse from the sp package. meuse is a data.frame — similar to what could have been read from a CSV file using read_csv:
data('meuse', package = "sp")
head(meuse)
The easiest way to modifiy this data.frame so that its coordinates (x and y) are used to generate a geometry column is using st_as_sf (“as simple feature”):
ms <- st_as_sf(
meuse,
coords = c('x', 'y'),
crs = "+init=epsg:28992"
)
# What is ms?
ms
There is a simple plotting funtion including in sf. It is very similar to the old sp::spplot:
plot(ms)
## Warning: plotting the first 10 out of 12 attributes; use max.plot = 12 to
## plot all
A wide selection of spatial data formats can be read using the st_read command (it’s using GDAL/OGR behind the scenes). Unlike readOGR from the rgdal package, generally the command can guess which driver it should use.
file_names <- list.files(path="./datos/", pattern=".shp")
file_names
## [1] "DEU_adm0.shp" "DEU_adm1.shp" "DEU_adm2.shp"
## [4] "DEU_adm3.shp" "sample_deu_adm2.shp" "tdeu_adm2.shp"
file_name <- "./datos/DEU_adm3.shp"
nc3 <- st_read(file_name)
## Reading layer `DEU_adm3' from data source `/Users/ivanlizarazo/Documents/datos/g4d/notebooks/datos/DEU_adm3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 435 features and 13 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 5.871619 ymin: 47.26986 xmax: 15.03811 ymax: 55.05653
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
attributes(nc3)
## $names
## [1] "ID_0" "ISO" "NAME_0" "ID_1" "NAME_1"
## [6] "ID_2" "NAME_2" "ID_3" "NAME_3" "TYPE_3"
## [11] "ENGTYPE_3" "NL_NAME_3" "VARNAME_3" "geometry"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
## [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
## [120] 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
## [137] 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
## [154] 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
## [171] 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
## [188] 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
## [205] 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
## [222] 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
## [239] 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
## [256] 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272
## [273] 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
## [290] 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
## [307] 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
## [324] 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
## [341] 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357
## [358] 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374
## [375] 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391
## [392] 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408
## [409] 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425
## [426] 426 427 428 429 430 431 432 433 434 435
##
## $class
## [1] "sf" "data.frame"
##
## $sf_column
## [1] "geometry"
##
## $agr
## ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2
## <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## ID_3 NAME_3 TYPE_3 ENGTYPE_3 NL_NAME_3 VARNAME_3
## <NA> <NA> <NA> <NA> <NA> <NA>
## Levels: constant aggregate identity
plot(nc3['NAME_3'])
file_name <- "./datos/DEU_adm2.shp"
nc2 <- st_read(file_name)
## Reading layer `DEU_adm2' from data source `/Users/ivanlizarazo/Documents/datos/g4d/notebooks/datos/DEU_adm2.shp' using driver `ESRI Shapefile'
## Simple feature collection with 42 features and 11 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 5.871619 ymin: 47.26986 xmax: 15.03811 ymax: 55.05653
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
attributes(nc2)
## $names
## [1] "ID_0" "ISO" "NAME_0" "ID_1" "NAME_1"
## [6] "ID_2" "NAME_2" "TYPE_2" "ENGTYPE_2" "NL_NAME_2"
## [11] "VARNAME_2" "geometry"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
##
## $class
## [1] "sf" "data.frame"
##
## $sf_column
## [1] "geometry"
##
## $agr
## ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2
## <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2
## <NA> <NA> <NA> <NA>
## Levels: constant aggregate identity
print(nc2)
## Simple feature collection with 42 features and 11 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 5.871619 ymin: 47.26986 xmax: 15.03811 ymax: 55.05653
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2
## 1 86 DEU Germany 1 Baden-Württemberg 1 Freiburg
## 2 86 DEU Germany 1 Baden-Württemberg 2 Karlsruhe
## 3 86 DEU Germany 1 Baden-Württemberg 3 Stuttgart
## 4 86 DEU Germany 1 Baden-Württemberg 4 Tübingen
## 5 86 DEU Germany 2 Bayern 5 Mittelfranken
## 6 86 DEU Germany 2 Bayern 6 Niederbayern
## 7 86 DEU Germany 2 Bayern 7 Oberbayern
## 8 86 DEU Germany 2 Bayern 8 Oberfranken
## 9 86 DEU Germany 2 Bayern 9 Oberpfalz
## 10 86 DEU Germany 2 Bayern 10 Schwaben
## TYPE_2 ENGTYPE_2 NL_NAME_2
## 1 Regierungsbezirk Administrative Region <NA>
## 2 Regierungsbezirk Administrative Region <NA>
## 3 Regierungsbezirk Administrative Region <NA>
## 4 Regierungsbezirk Administrative Region <NA>
## 5 Regierungsbezirk Administrative Region <NA>
## 6 Regierungsbezirk Administrative Region <NA>
## 7 Regierungsbezirk Administrative Region <NA>
## 8 Regierungsbezirk Administrative Region <NA>
## 9 Regierungsbezirk Administrative Region <NA>
## 10 Regierungsbezirk Administrative Region <NA>
## VARNAME_2
## 1 Friburgo|Fribourg
## 2 <NA>
## 3 Estugarda
## 4 Tubinga
## 5 Franconia Central|Middle Franconia|Média Francónia| Moyenne-Franconie
## 6 Lower Bavaria|Baixa Baviera|Baja Baviera|Basse-Bavière
## 7 Upper Bavaria|Alta Baviera|Haute-Bavière
## 8 Upper Franconia|Alta Franconia|Alta Francónia |Haute-Franconie
## 9 Upper Palatinate|Alto Palatinado|Haut-Palatinat
## 10 Swabia|Suabia|Suábia|Souabe
## geometry
## 1 MULTIPOLYGON (((8.708373 47...
## 2 MULTIPOLYGON (((9.45798 49....
## 3 MULTIPOLYGON (((9.65046 49....
## 4 MULTIPOLYGON (((9.951169 48...
## 5 MULTIPOLYGON (((10.74197 49...
## 6 MULTIPOLYGON (((13.18021 49...
## 7 MULTIPOLYGON (((11.41995 49...
## 8 MULTIPOLYGON (((11.38319 50...
## 9 MULTIPOLYGON (((12.26625 50...
## 10 MULTIPOLYGON (((10.59526 49...
plot(nc2['NAME_2'])
nc2$AREA <- st_area(nc2)/1000000
plot(nc2['AREA'])
plot(st_geometry(nc2), graticule=TRUE, key.pos = NULL, axes = TRUE)
Its counterpart is st_write and is just as easy to use:
#st_write(nc, "nc.shp")
st_write(nc2, "ndeu_adm2.shp", delete_layer = TRUE)
## Deleting layer `ndeu_adm2' using driver `ESRI Shapefile'
## Writing layer `ndeu_adm2' to data source `/Users/ivanlizarazo/Documents/datos/g4d/notebooks/ndeu_adm2.shp' using driver `ESRI Shapefile'
## features: 42
## fields: 12
## geometry type: Multi Polygon
The drivers available on our machine can be printed using the st_drivers command:
my_drivers <- st_drivers()
head(my_drivers)
The main functions are st_crs, which retrieves the Coordinate Reference System (CRS) of the sf object, and st_transform, which allows to re-project the dataset into a different CRS.
The CRS can be specified as an EPSG number (that is: an integer), or as a character string follwoing the Proj.4 standard.
# Retrieve projection information
st_crs(nc2)
## Coordinate Reference System:
## EPSG: 4326
## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
# Test whether data is projected or not
st_is_longlat(nc2)
## [1] TRUE
Looking for map projections used in our country:
library(rgdal)
## rgdal: version: 1.3-4, (SVN revision 766)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
## GDAL binary built with GEOS: FALSE
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
## Linking to sp version: 1.3-1
epsg <- make_EPSG()
i <- grep("Germany", epsg$note, ignore.case=TRUE)
class(i)
## [1] "integer"
length(i)
## [1] 2
# first two
print(epsg[i[1:2], "prj4"])
## [1] "+proj=lcc +lat_1=48.66666666666666 +lat_2=53.66666666666666 +lat_0=51 +lon_0=10.5 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## [2] "+proj=lcc +lat_1=48.66666666666666 +lat_2=53.66666666666666 +lat_0=51 +lon_0=10.5 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
Let’s use the first one to reproject our data:
new_crs <- "+proj=lcc +lat_1=48.66666666666666 +lat_2=53.66666666666666
+lat_0=51 +lon_0=10.5 +x_0=0 +y_0=0 +ellps=GRS80
+towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
new_nc2 <- st_transform(nc2, new_crs)
new_nc2$km2 <- st_area(nc2)/1000000
st_crs(new_nc2)
## Coordinate Reference System:
## No EPSG code
## proj4string: "+proj=lcc +lat_1=48.66666666666666 +lat_2=53.66666666666666 +lat_0=51 +lon_0=10.5 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
plot(st_geometry(new_nc2), graticule=TRUE, key.pos = NULL, axes = TRUE)
A variety of CRSs are available within R:
crs_data = rgdal::make_EPSG()
nrow(crs_data)
## [1] 5513
crs_select <- c("4326", "3114", "3333", "3219", "4267")
crs_data[crs_select,]
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:raster':
##
## intersect, select, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
wgs84_crs <- filter(crs_data, code == 4326)
wgs84_crs
Note that you can convert sf objects back to either data.frame or one of the old sp classes very easily:
# Convert to data.frame
as.data.frame(nc2) %>% head
as(nc2, "Spatial") %>% summary
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 5.871619 15.03811
## y 47.269859 55.05653
## Is projected: FALSE
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Data attributes:
## ID_0 ISO NAME_0 ID_1
## Min. :86 DEU:42 Germany:42 Min. : 1.000
## 1st Qu.:86 1st Qu.: 2.250
## Median :86 Median : 9.000
## Mean :86 Mean : 7.667
## 3rd Qu.:86 3rd Qu.:11.000
## Max. :86 Max. :16.000
##
## NAME_1 ID_2 NAME_2
## Bayern : 7 Min. : 1.00 Arnsberg : 1
## Niedersachsen : 5 1st Qu.:11.25 Berlin : 1
## Nordrhein-Westfalen: 5 Median :21.50 Brandenburg : 1
## Baden-Württemberg : 4 Mean :21.50 Braunschweig: 1
## Hessen : 3 3rd Qu.:31.75 Bremen : 1
## Rheinland-Pfalz : 3 Max. :42.00 Bremerhaven : 1
## (Other) :15 (Other) :36
## TYPE_2 ENGTYPE_2 NL_NAME_2
## Kreisfreie Stadt : 1 Administrative Region:34 <Null>: 1
## Kreisfreie Städte: 3 City : 4 NA's :41
## Land|Bundesland : 4 State|Federal State : 4
## Regierungsbezirk :34
##
##
##
## VARNAME_2 AREA
## Lunebourg|Luneburgo : 2 Min. : 74.04
## <Null> : 1 1st Qu.: 5303.60
## Brema|Brême : 1 Median : 7679.01
## Brunswick : 1 Mean : 8506.31
## Coblença|Coblence |Coblenza: 1 3rd Qu.: 9922.69
## (Other) :21 Max. :29458.38
## NA's :15
The raster package is built around a number of classes of which the RasterLayer, RasterBrick, and RasterStack classes are the most important. When discussing methods that can operate on all three of these objects, they are referred to as ’Raster*’ objects.
The raster package has functions for creating, reading, manipulating, and writing raster data. The package provides, among other things, general raster data manipulation functions that can easily be used to develop more specific functions. For example, there are functions to read a chunk of raster values from a file or to convert cell numbers to coordinates and back. The package also implements raster algebra and many other functions for raster data manipulation.
A RasterLayer object represents single-layer (variable) raster data. A RasterLayer object always stores a number of fundamental parameters that describe it. These include the number of columns and rows, the spatial extent, and the Coordinate Reference System. In addition, a RasterLayer can store information about the file in which the raster cell values are stored (if there is such a file). A RasterLayer can also hold the raster cell values in memory.
Here we create a RasterLayer from scratch. But note that in most cases where real data is analyzed, these objects are created from a file.
library(raster)
r <- raster(ncol=10, nrow=10, xmn=3, xmx=17, ymn=45, ymx=55)
r
## class : RasterLayer
## dimensions : 10, 10, 100 (nrow, ncol, ncell)
## resolution : 1.4, 1 (x, y)
## extent : 3, 17, 45, 55 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
r[]
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [24] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [47] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [70] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [93] NA NA NA NA NA NA NA NA
values(r)
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [24] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [47] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [70] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [93] NA NA NA NA NA NA NA NA
#
ncell(r)
## [1] 100
ncol(r)
## [1] 10
values(r) <- 1:ncell(r)
r
## class : RasterLayer
## dimensions : 10, 10, 100 (nrow, ncol, ncell)
## resolution : 1.4, 1 (x, y)
## extent : 3, 17, 45, 55 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer
## values : 1, 100 (min, max)
# plot raster
# add polygon and points
lat <- runif(8, 48.0, 52.0)
lat
## [1] 51.10978 51.73882 48.84857 50.60670 48.50222 49.06888 49.54446 48.05356
lon <- runif(8, 9.0, 15.0)
lon
## [1] 11.29433 14.21815 11.04209 11.89248 12.59739 11.96125 10.11731 13.96424
lonlat <- cbind(lon, lat)
lonlat
## lon lat
## [1,] 11.29433 51.10978
## [2,] 14.21815 51.73882
## [3,] 11.04209 48.84857
## [4,] 11.89248 50.60670
## [5,] 12.59739 48.50222
## [6,] 11.96125 49.06888
## [7,] 10.11731 49.54446
## [8,] 13.96424 48.05356
pols <- spPolygons(lonlat, crs='+proj=longlat +datum=WGS84')
pols
## class : SpatialPolygons
## features : 1
## extent : 10.11731, 14.21815, 48.05356, 51.73882 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
plot(r)
points(lonlat, col='red', pch=20, cex=3)
plot(pols, border='blue', lwd=2, add=TRUE)
It is quite common to analyze raster data using single-layer objects. However, in many cases multi-variable raster data sets are used. The raster package has two classes for multi-layer data the RasterStack and the RasterBrick. The principal difference between these two classes is that a RasterBrick can only be linked to a single (multi-layer) file. In contrast, a RasterStack can be formed from separate files and/or from a few layers (‘bands’) from a single file.
In fact, a RasterStack is a collection of RasterLayer objects with the same spatial extent and resolution. In essence it is a list of RasterLayer objects. A RasterStack can easily be formed form a collection of files in different locations and these can be mixed with RasterLayer objects that only exist in the RAM memory (not on disk).
A RasterBrick is truly a multi-layered object, and processing a RasterBrick can be more efficient than processing a RasterStack representing the same data. However, it can only refer to a single file. A typical example of such a file would be a multi-band satellite image or the output of a global climate model (with e.g., a time series of temperature values for each day of the year for each raster cell). Methods that operate on RasterStack and RasterBrick objects typically return a RasterBrick object.
Thus, the main difference is that a RasterStack is loose collection of RasterLayer objects that can refer to different files (but must all have the same extent and resolution), whereas a RasterBrick can only point to a single file.
Here is an example how you can make a RasterStack from multiple layers.
######### raster stack and raster brick
r2 <- r * r
r3 <- sqrt(r)
s <- stack(r, r2, r3)
s
## class : RasterStack
## dimensions : 10, 10, 100, 3 (nrow, ncol, ncell, nlayers)
## resolution : 1.4, 1 (x, y)
## extent : 3, 17, 45, 55 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## names : layer.1, layer.2, layer.3
## min values : 1, 1, 1
## max values : 100, 10000, 10
plot(s)
library(raster)
download.file("http://biogeo.ucdavis.edu/data/diva/msk_alt/DEU_msk_alt.zip",
destfile = "./datos/DEU_msk_alt.zip" , mode='wb')
unzip("./datos/DEU_msk_alt.zip", exdir = "./datos")
deu_elev <- raster("./datos/DEU_msk_alt.grd")
deu_elev
## class : RasterLayer
## dimensions : 960, 1116, 1071360 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 5.8, 15.1, 47.2, 55.2 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84
## data source : /Users/ivanlizarazo/Documents/datos/g4d/notebooks/datos/DEU_msk_alt.grd
## names : DEU_msk_alt
## values : -179, 2701 (min, max)
plot(deu_elev, main="Germany - Altitude (m) - Geographic Coordinates",
xlab="Longitude", ylab="Latitude",
xlim=c(5, 15), ylim=c(47, 55))
### several times a transformation is needed
new_deu_elev <- projectRaster(deu_elev, crs=new_crs)
new_deu_elev
## class : RasterLayer
## dimensions : 981, 1228, 1204668 (nrow, ncol, ncell)
## resolution : 582, 926 (x, y)
## extent : -361040.4, 353655.6, -426861.4, 481544.6 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=lcc +lat_1=48.66666666666666 +lat_2=53.66666666666666 +lat_0=51 +lon_0=10.5 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## data source : in memory
## names : DEU_msk_alt
## values : -165.6785, 2569.762 (min, max)
plot(new_deu_elev, main="Germany - Altitude (m) - Projected Coordinates",
xlab="East (m)", ylab="North (m)",
xlim=c(-365000, 355000), ylim=c(-430000, 4900000))
plot(st_geometry(new_nc2), graticule = TRUE, key.pos = NULL, axes = TRUE)
plot(nc2["NAME_1"], key.pos = 1, axes = TRUE, key.width = lcm(1.4), key.length = 1.0)
new_nc2$size = cut(new_nc2$km2, c(0, 1000, 2000, 3000, 4000, 5000, 6000,
7000, 8000, 9000, 10000, 20000, 50000, 60000, 100000))
plot(new_nc2["size"], axes = TRUE, key.pos = 4, pal = sf.colors(14), key.width = lcm(4.5))
plot(st_geometry(new_nc2), col = sf.colors(12, categorical = TRUE), border = 'grey',
axes = TRUE)
plot(st_geometry(st_centroid(new_nc2)), pch = 21, col = 'red', add = TRUE)
## Warning in st_centroid.sf(new_nc2): st_centroid assumes attributes are
## constant over geometries of x
library(ggplot2)
StatSfCoordinates <- ggproto(
"StatSfCoordinates", Stat,
setup_params = function(data, params) {
if (is.null(params$fun.geometry)) {
params$fun.geometry <- sf::st_point_on_surface
}
params
},
compute_group = function(data, scales, fun.geometry) {
points_sfc <- fun.geometry(data$geometry)
coordinates <- sf::st_coordinates(points_sfc)
data <- cbind(data, coordinates)
data
},
default_aes = aes(x = stat(X), y = stat(Y)),
required_aes = c("geometry")
)
ggplot(new_nc2[1:10, ]) +
geom_sf(aes(fill = size)) +
geom_label(aes(geometry = geometry, label = NAME_2), stat = "sf_coordinates")
### see https://yutannihilation.github.io/ggsflabel/reference/geom_sf_label.html
Let’s check what formats are available in R for writing geodata:
subset(rgdal::ogrDrivers(), write == TRUE)
subset(rgdal::gdalDrivers(), create == TRUE)
st_write(nc2, "./datos/ndeu_adm2.shp", driver="ESRI Shapefile") # create to a shapefile
## Writing layer `ndeu_adm2' to data source `./datos/ndeu_adm2.shp' using driver `ESRI Shapefile'
## features: 42
## fields: 12
## geometry type: Multi Polygon
writeRaster(deu_elev, "./datos/nelev_out.tif", format="GTiff" ) # Create a geoTiff file