class: center, middle, inverse, title-slide .title[ # What’s so special about (geo)spatial ] .subtitle[ ## Geospatial tutorial - intro ] .author[ ### JR Ferrer-Paris ] .institute[ ### UNSW codeRs ] .date[ ### 2023-12-13 ] --- layout: true <div class="my-footer"><span>JR Ferrer-Paris / <a href='https://github.com/UNSW-codeRs/geospatial-data-in-R'>geospatial-data-in-R</a></span></div> <!-- this adds the link footer to all slides, depends on my-footer class in css-->
--- class: center,inverse background-image: url("/Users/z3529065/proyectos/codeRs/geospatial-data-in-R/images/Heal-Country-NAIDOC-2021.png") background-position: center background-size: contain --- class: inverse, center, middle # Spatial or geospatial? --- # Spatial data Think of **spatial data** as ordinary data (vectors or matrices) with spatial properties that we will call geometries. $$ \mathrm{spatial} \ \mathrm{data} = \mathrm{data} + \mathrm{geometry} $$ -- The **geometry** part can be interpreted in different ways: - as a regular grid with a known origin, extent and cell size - as discrete geometric locations that define the shape of the spatial object: points, lines, polygons. We call these two main types **rasters** and **vectors**, respectively. --- ## Rasters .center[ <img src="https://kodu.ut.ee/~kmoch/geopython2020/_images/raster_concept.png" width = "60%"/> Source: National Ecological Observatory Network (NEON) ] --- ## Vectors .center[ <img src="https://datacarpentry.org/organization-geospatial/fig/dc-spatial-vector/pnt_line_poly.png" width = "60%"/> Source: National Ecological Observatory Network (NEON) ] --- ## Rasters vs. vectors .pull-left[ **Rasters** - Grid defined by extent and resolution. - Each pixel/cell is associated with a specific location. - Multiple variables are stored as **bands**. - Examples: aerial photographs, satellite images, precipitation maps, elevation maps, landcover maps, etc. - One common file formats for raster data is the [GeoTIFF](https://en.wikipedia.org/wiki/GeoTIFF) format. ] .pull-right[ **Vectors** - Multiple geometries representing points, lines or polygons - Table of attributes links each spatial object (row) with multiple attributes (columns). - Examples of vector data: sampling locations, cities, roads or routes, country boundaries. - Many file formats: - ESRI Shapefiles are very popular, - OGC GeoPackage (GPKG) is an open and standards-based format, - GeoJSON is used in many web applications ] --- # Spatial or geospatial? Think of **spatial data** as ordinary data (vectors or matrices) with spatial properties that we will call geometries. $$ \mathrm{spatial} \ \mathrm{data} = \mathrm{data} + \mathrm{geometry} $$ -- Now let's add a Coordinate Reference System (CRS) and we get **geospatial data**: $$ \mathrm{geospatial} \ \mathrm{data} = \mathrm{data} + \mathrm{geometry} + \mathrm{CRS} $$ -- There are several benefits to this: - CRS translate locations on the earth to planar coordinates and viceversa. - Compare, overlay and combine spatial data using the same CRS - Transform coordinates between alternative CRS --- ### Why do we need a CRS?  --- ## Same data, different projections In order to represent the earth as a 2D map you need to apply some mathematical conversions known as [projections](https://proj.org/en/6.0/operations/projections/index.html). .panelset[ .panel[.panel-name[Plate Carrée] ```r tm_shape(World) + tm_polygons() ``` <!-- --> ] .panel[.panel-name[Robinson] ```r tm_shape(World, projection='+proj=robin') + tm_polygons() ``` <!-- --> ] .panel[.panel-name[Adams World in a Square I] ```r tm_shape(World, projection='+proj=adams_ws1') + tm_polygons() ``` <!-- --> ] .panel[.panel-name[Mercator] ```r tm_shape(World, projection='+proj=merc') + tm_polygons() ``` <!-- --> ] ] --- class: inverse, center, middle # Geospatial data in R --- layout: true # Geospatial data in R --- There is a large group dedicated to developing spatial capabilities in R: [CRAN Task View: Analysis of Spatial Data](https://cran.r-project.org/web/views/Spatial.html) Many of the packages for handling and analysing spatial data use *shared classes*. Two informal organisations curate websites: - [r-spatial](https://github.com/r-spatial) is more generally geo-informatics based, worked on vector packages `sp`, `sf` and `stars` - [rspatial](https://github.com/rspatial) has grown from the `raster` package, now moving towards the modern `terra` package. --- .pull-left[ **Rasters** - some classes in package `sp` (up until 2016) - package `raster` has been used extensively for many years - `terra` is a modern re-implementation of `raster` functionality - `stars` provides functions for spatiotemporal data in the form of dense arrays ] .pull-right[ **Vectors** - Up until 2016 package `sp` package provided shared classes for spatial vector and raster data. - `sf` provides Modern and efficient international standards for spatial vector data are implemented ] --- Visualisation of geospatial data is very important, and there are many options in R. - `ggplot2` includes some functions for handling spatial data as we usually use other kinds of data - `leaflet` uses the external JavaScript library `leaflet.js` for interactive visualisation - `tmap` is used for creating thematic maps that can be static (like a plot) or interactive (based on `leaflet` functions) - `mapview` is a very intuitive solution for quick and useful interactive maps, also based on `leaflet` functions --- Later, we will explore vector data in R with: | Nr. | Tutorial | Title | Packages | Links | |----------|-------|----------|---------| | 1 | learnr | Create a spatial object | `sf`, `mapview` | [shinyapps 1](https://yessl3-unswcoders.shinyapps.io/tutorial-1-create-spatial-object/) / [shinyapps 2](https://ecosphere.shinyapps.io/tutorial-1-create-spatial-object/) | | 2 | learnr | Points, lines and polygons | `sf`, `mapview` | [shinyapps 1](https://yessl3-unswcoders.shinyapps.io/tutorial-2-points-lines-polygons/) / [shinyapps 2]( https://ecosphere.shinyapps.io/tutorial-2-points-lines-polygons/) | | 3 | learnr | Thematic mapping | `sf`, `tmap` | [posit cloud](https://posit.cloud/content/7255037) / [posit connect](https://posit-connect-unsw.intersect.org.au/content/fbb893a7-f657-435a-ab18-4116252cd14c) | --- layout: false class: inverse, center, middle # Geospatial analysis --- layout: true ## Geospatial analysis --- Different types of geospatial analysis involve operations on *data*, *geometries* and *CRS*. -- Some analysis focus on **visualisation**: - **data** and **geometry** used as aesthetic elements in plots - interactive navigation (like your favorite map app!) -- Some analysis perform **spatial operations**: - using **geometries** to calculate distances or areas - using **geometries** to query **data** -- Finally, **spatial modelling** and **spatial prediction** use the **data** and **geometries** to explore processes and relationships. Some examples include: - geostatistics ([Kriging](https://en.wikipedia.org/wiki/Kriging)) analysis - GLMM with spatial correlation structure - Point pattern analysis --- ### Some examples of geospatial analysis I will show a couple of examples, each with several steps. I will ask some questions at the end of each slide. I would like to hear or read your responses before moving to the following step/slide. --- layout: true ## Population in major cities --- Consider data about trends in human population size in major cities of the world. We have spatial data of metropolitan areas. It includes a population times series from 1950 to (forecasted) 2030. All metro areas with over 1 million inhabitants in 2010 are included. .panelset[ .panel[.panel-name[Let's have a glimpse at this dataset] .small-code[ ```r data(metro) glimpse(metro) ## Rows: 436 ## old-style crs object detected; please recreate object with a recent sf::st_crs() ## Columns: 13 ## $ name <chr> "Kabul", "Algiers", "Luanda", "Buenos Aires", "Cordoba", "Ro… ## $ name_long <chr> "Kabul", "El Djazair (Algiers)", "Luanda", "Buenos Aires", … ## $ iso_a3 <chr> "AFG", "DZA", "AGO", "ARG", "ARG", "ARG", "ARM", "AUS", "AUS… ## $ pop1950 <dbl> 170784, 516450, 138413, 5097612, 429249, 554483, 341432, 429… ## $ pop1960 <dbl> 285352, 871636, 219427, 6597634, 605309, 671349, 537759, 571… ## $ pop1970 <dbl> 471891, 1281127, 459225, 8104621, 809794, 816230, 778158, 85… ## $ pop1980 <dbl> 977824, 1621442, 771349, 9422362, 1009521, 953491, 1041587, … ## $ pop1990 <dbl> 1549320, 1797068, 1390240, 10513284, 1200168, 1083819, 11745… ## $ pop2000 <dbl> 2401109, 2140577, 2591388, 12406780, 1347561, 1152387, 11113… ## $ pop2010 <dbl> 3722320, 2432023, 4508434, 14245871, 1459268, 1298073, 10655… ## $ pop2020 <dbl> 5721697, 2835218, 6836849, 15894307, 1562509, 1453814, 10237… ## $ pop2030 <dbl> 8279607, 3404575, 10428756, 16956491, 1718192, 1606993, 1057… ## $ geometry <POINT [°]> POINT (69.17246 34.52889), POINT (3.04197 36.7525), PO… ``` ] ] .panel[.panel-name[Questions] What kind of data is this: - Spatial or GeoSpatial? - Raster or Vector? What is the extent and resolution of the data? ] ] --- For each city we have several measures or estimates of population size for each decade. Let's calculate the growth between 2010 and 2020 and select the city with the largest growth .panelset[ .panel[.panel-name[Code] .small-code[ ```r metro <- metro %>% mutate(growth = (pop2020 - pop2010)/(pop2010 * 10) * 100) metro %>% arrange(desc(growth)) %>% slice(1) ## old-style crs object detected; please recreate object with a recent sf::st_crs() ## Simple feature collection with 1 feature and 13 fields ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: 100.5968 ymin: 13.59934 xmax: 100.5968 ymax: 13.59934 ## Geodetic CRS: WGS 84 ## name name_long iso_a3 pop1950 pop1960 pop1970 pop1980 pop1990 ## 1 Samut Prakan Samut Prakan THA 10288 22107 46632 49147 138179 ## pop2000 pop2010 pop2020 pop2030 growth geometry ## 1 388603 1092566 2558160 3139358 13.41424 POINT (100.5968 13.59934) ``` ] ] .panel[.panel-name[Questions] What kind of geospatial analysis is this? > (Visualisation, spatial operations, modelling, something else?) How did we use the *geometry* component? ] ] --- We can visualise the data by combining the `metro` dataset with another dataset of the countries of the world, and use functions in the package `tmap` to create a beautiful thematic map. .panelset[ .panel[.panel-name[Code] ```r tm_shape(World, projection='+proj=robin') + tm_polygons() + tm_text("iso_a3", size = "AREA", col = "gray30", root = 3) + tm_shape(metro) + tm_bubbles(size = "pop2010", col = "growth", border.col = "black", border.alpha = 0.5, breaks = c(-Inf, 0, 2, 4, 6, Inf), palette = "-RdYlGn", title.size = "Metro population (2010)", title.col = "Annual growth rate (%)", id = "name", popup.vars = c("pop2010", "pop2020", "growth")) ``` ] .panel[.panel-name[Plot] .center[ <!-- --> ] ] .panel[.panel-name[Questions] What kind of geospatial analysis is this? > (Visualisation, spatial operations, modelling, something else?) Are we using all components of the geospatial data? - Where did we use the *data* ? - Where did we use the *geometry* ? - Where did we use the *CRS* ? What do the size and colour of the bubbles indicate? ] ] --- You can work through these steps in more detail in the following tutorial. | Nr. | Tutorial | Title | Packages | Links | |----------|-------|----------|---------| | 3 | learnr | Thematic mapping | `sf`, `tmap` | [posit cloud](https://posit.cloud/content/7255037) / [posit connect](https://posit-connect-unsw.intersect.org.au/content/fbb893a7-f657-435a-ab18-4116252cd14c) | --- layout: true ## Biologists in a botanical garden --- Our team is studying if the activity of a frog species in related to vegetation characteristics in a botanical garden (we will call it [JBM](https://iamvenezuela.com/2017/05/jardin-botanico-de-maracaibo-leandro-aristeguieta/#prettyPhoto) for short). They use a paper map of the garden and write down coordinates of each sampling point where they describe characteristics of the vegetation at these points (leaf litter, tree cover, etc). .panelset[ .panel[.panel-name[Let's have a glimpse at their data] ```r glimpse(data_jbm) ## Rows: 74 ## Columns: 8 ## $ Uncertainty <dbl> 14, 15, 15, 16, 15, 16, 15, 15, 15, 15, 16, 16, 15, 16, … ## $ code <chr> "A01", "A02", "A03", "A04", "A05", "A06", "A07", "A08", … ## $ Substrate <chr> "2", "2", "2", "2", "2", "2", "4", "2", "2", "2", "2", "… ## $ Understory <dbl> 3, 2, 2, 2, 2, 2, 3, 4, 3, 3, 3, 2, 2, 3, 3, 3, 2, 1, 3,… ## $ `Leaf Litter` <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,… ## $ `Tree cover` <dbl> 4, 4, 4, 4, 1, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,… ## $ Canopy <dbl> 2, 3, 1, 4, 2, 3, 4, 3, 2, 3, 2, 2, 4, 2, 2, 3, 3, 3, 3,… ## $ geom <POINT> POINT (655.7148 529.6264), POINT (408.8195 512.6097), … ``` ] .panel[.panel-name[Questions] What kind of data is this: - Spatial or GeoSpatial? - Raster or Vector? ] ] --- Correct! This is a spatial object in vector format with point geometries. The data is in special class of R object called `sf` or `simple feature` class, and we can use this code to plot the data. .panelset[ .panel[.panel-name[Code] ```r ggplot(data_jbm) + geom_sf(aes(geometry = geom, size=`Tree cover`, colour=`Tree cover`)) + labs(subtitle="Coordinates in meters", x="x", y="y", size="Tree\ncover\nclass", colour="Tree\ncover\nclass") + guides(color= guide_legend(), size=guide_legend()) ``` ] .panel[.panel-name[Plot] .center[ <!-- --> ] ] .panel[.panel-name[Questions] What kind of geospatial analysis is this? > (Visualisation, spatial operations, modelling, something else?) What is the extent and resolution of this example? The plot aesthetics include the x and y axis, size and colour, but: - Which ones represent the *data*? - Which ones represent the *geometry* component? ] ] --- **Spatial autocorrelation** is a measure of how similar are observations that are close together. Here we use a **semivariogram** to describe this autocorrelation. We use functions from the `gstats` package. .panelset[ .panel[.panel-name[Code] ```r library(gstat) empirical_variogram <- variogram(I(Tree.cover > 2)~1, data_jbm) variogram_model <- fit.variogram(empirical_variogram, vgm(1, "Exp", 70, .1)) plot(empirical_variogram, variogram_model) ``` ] .panel[.panel-name[Plot] <!-- --> ] .panel[.panel-name[Questions] What kind of geospatial analysis is this? > (Visualisation, spatial operations, modelling, something else?) The variables in this plot were derived from the spatial data: - Where did we use the *data* ? - Where did we use the *geometry* component? ] ] --- In Geostatistics the term **Kriging** is used for a family of models that minimise the spatial error and can be used for spatial interpolation and prediction. Here we use **Indicator kriging** to interpolate the probability of `\(\mathrm{Tree\ cover} > 2\)` from out measurement points into a regular grid. .panelset[ .panel[.panel-name[Fit the model] ```r ik = krige(I(Tree.cover > 2)~1, # our target variable as_Spatial(data_jbm), # our spatial data prd.grd, # the regular grid for prediction variogram_model) # the spatial autocorrelation ## [using ordinary kriging] ``` ] .panel[.panel-name[Prediction] .pull-left[ .small-code[ ```r spplot(ik["var1.pred"], main = "indicator kriging predictions", sub = "Z = Tree Cover > 2") ``` <!-- --> ] ] .pull-right[ .small-code[ ```r spplot(ik["var1.var"], main = "indicator kriging variance", sub = "Z = Tree Cover > 2") ``` <!-- --> ] ] ] .panel[.panel-name[Questions] What kind of geospatial analysis is this? What kind of geospatial data was produced in the prediction? ] ] --- You can work through the full example of the analysis in the tutorial | Nr. | Tutorial | Title | Packages | Links | |----------|-------|----------|---------| | 4 | learnr | Geostatistic analysis | `sf`, `sp`, `raster`, `gstat` | [shinyapps 1](https://ecosphere.shinyapps.io/tutorial-4-geostatistic-example/) | --- layout:false class: inverse, middle, center # Hand's on tutorials For this workshop I have prepared several tutorials using the R packages `learnr` and `Rmarkdown`. You can work through each one of them at your own pace. | Nr. | Tutorial | Title | Packages | Links | |----------|-------|----------|---------| | 1 | learnr | Create a spatial object | `sf`, `mapview` | [shinyapps 1](https://yessl3-unswcoders.shinyapps.io/tutorial-1-create-spatial-object/) / [shinyapps 2](https://ecosphere.shinyapps.io/tutorial-1-create-spatial-object/) | | 2 | learnr | Points, lines and polygons | `sf`, `mapview` | [shinyapps 1](https://yessl3-unswcoders.shinyapps.io/tutorial-2-points-lines-polygons/) / [shinyapps 2]( https://ecosphere.shinyapps.io/tutorial-2-points-lines-polygons/) | | 3 | learnr | Thematic mapping | `sf`, `tmap` | [posit cloud](https://posit.cloud/content/7255037) / [posit connect](https://posit-connect-unsw.intersect.org.au/content/fbb893a7-f657-435a-ab18-4116252cd14c) | | 4 | learnr | Geostatistic analysis | `sf`, `sp`, `raster`, `gstat` | [shinyapps 1](https://ecosphere.shinyapps.io/tutorial-4-geostatistic-example/) | | 5 | html | Accessing vector and raster data | `sf`, `terra`, `leaflet` | [rpubs](https://rpubs.com/jrfep/geospatial-tutorial-5) | --- class: center, middle # Thanks! .center[## Dr. José R. Ferrer-Paris <img class="circle" src="https://unsw-coders.netlify.app/author/dr.-jose-r.-ferrer-paris/avatar_hu5b8b6b713305d35fb8bb18275da87db6_26972_270x270_fill_q75_lanczos_center.jpg" width="150px"/> [
@jrfep](http://github.com/jrfep) / [
j.ferrer@unsw.edu.au](mailto:j.ferrer@unsw.edu.au) ] --- This presentation was prepared by José R. Ferrer-Paris [Attribution 4.0 Internacional (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) .panelset[ .panel[.panel-name[Links] Presentation available at: [rpubs.com/jrfep/geospatial-workshop-2023](https://rpubs.com/jrfep/geospatial-workshop-intro-12-2023) Powered by [RStudio](https://posit.co/products/open-source/rstudio/), [Rmarkdown](https://bookdown.org/yihui/rmarkdown/xaringan.html), and [xaringan](https://github.com/yihui/xaringan). Source code available at: [UNSW codeRs @ GitHub](https://github.com/UNSW-codeRs) / [geospatial-data-in-R](https://github.com/UNSW-codeRs/geospatial-data-in-R) ] .panel[.panel-name[Images] Raster and vector concept images from National Ecological Observatory Network (NEON), downloaded from <https://datacarpentry.org/organization-geospatial/> Rotating 3D visualisation of the globe from [Aleš Bezděk website.](https://www.asu.cas.cz/~bezdek/vyzkum/rotating_3d_globe/index.php) Described in: > Bezděk A, Sebera J, 2013. MATLAB script for 3D visualizing geodata on a rotating globe. Computers & geosciences 56, 127–130, http://dx.doi.org/10.1016/j.cageo.2013.03.007. ] .panel[.panel-name[*R* session] .small-code[ ```r sessionInfo() ## R version 4.3.1 (2023-06-16) ## Platform: aarch64-apple-darwin20 (64-bit) ## Running under: macOS Sonoma 14.2 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## time zone: Australia/Sydney ## tzcode source: internal ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] gstat_2.1-1 knitr_1.45 leaflet_2.2.1 ## [4] ggplot2_3.4.4 RColorBrewer_1.1-3 magrittr_2.0.3 ## [7] dplyr_1.1.4 plotrix_3.8-4 raster_3.6-26 ## [10] rgdal_1.6-6 sf_1.0-14 sp_2.1-2 ## [13] tmap_3.3-4 fontawesome_0.5.2 xaringanthemer_0.4.2 ## ## loaded via a namespace (and not attached): ## [1] tidyselect_1.2.0 viridisLite_0.4.2 farver_2.1.1 ## [4] fastmap_1.1.1 xaringan_0.28 XML_3.99-0.16 ## [7] digest_0.6.33 lifecycle_1.0.4 terra_1.7-55 ## [10] compiler_4.3.1 rlang_1.1.2 sass_0.4.8 ## [13] tools_4.3.1 utf8_1.2.4 yaml_2.3.8 ## [16] FNN_1.1.3.2 labeling_0.4.3 htmlwidgets_1.6.4 ## [19] classInt_0.4-10 curl_5.2.0 here_1.0.1 ## [22] showtextdb_3.0 abind_1.4-5 KernSmooth_2.23-22 ## [25] withr_2.5.2 purrr_1.0.2 leafsync_0.1.0 ## [28] grid_4.3.1 fansi_1.0.6 sysfonts_0.8.8 ## [31] xts_0.13.1 e1071_1.7-14 leafem_0.2.3 ## [34] colorspace_2.1-0 scales_1.3.0 dichromat_2.0-0.1 ## [37] cli_3.6.2 rmarkdown_2.25 intervals_0.15.4 ## [40] generics_0.1.3 rstudioapi_0.15.0 tmaptools_3.1-1 ## [43] DBI_1.1.3 cachem_1.0.8 proxy_0.4-27 ## [46] stars_0.6-4 parallel_4.3.1 s2_1.1.5 ## [49] base64enc_0.1-3 vctrs_0.6.5 jsonlite_1.8.8 ## [52] crosstalk_1.2.1 jquerylib_0.1.4 units_0.8-5 ## [55] glue_1.6.2 lwgeom_0.2-13 codetools_0.2-19 ## [58] leaflet.providers_2.0.0 xaringanExtra_0.7.0 gtable_0.3.4 ## [61] munsell_0.5.0 tibble_3.2.1 pillar_1.9.0 ## [64] htmltools_0.5.7 showtext_0.9-6 R6_2.5.1 ## [67] wk_0.9.1 rprojroot_2.0.4 evaluate_0.23 ## [70] lattice_0.22-5 highr_0.10 png_0.1-8 ## [73] bslib_0.6.1 class_7.3-22 Rcpp_1.0.11 ## [76] spacetime_1.3-1 whisker_0.4.1 xfun_0.41 ## [79] zoo_1.8-12 pkgconfig_2.0.3 ``` ] ] ]