Read and visualize Anemone Species Occurrence Cube

Author

1 From R Markdown to Quarto

Some changes are needed to convert an R Markdown document to a Quarto document.

The config field execute: contains the settings for the code execution, similar to what we did in the very first chunk of the Rmd document: knitr::opts_chunk$set(echo = TRUE). More about it in chapter Managing Execution of the official Quarto guide.

Setting working-dir: project under execute: would make Quarto behave like RMarkdown with “Knit directory” set to “Project directory”. In this way, all relative paths will start from the root directory of the project. IMPORTANT: the project: execute-dir: project setting typically requires a Quarto project rather than just a regular RStudio project (.Rproj). To make it work in your current setup, we can use here package and its function here() in out code, which will work regardless of the working directory, e.g. here::here("data/20241217/20241217_occurrence_cube_anemone.tsv"), or here::here("data", "20241217", "20241217_occurrence_cube_anemone.tsv").

Crosse-references of sections are done by appending {#sec-label} to the header of the section, and then using @ref(sec-label) in the text. Example, let’s cross-reference the section @ref(sec-static-plots} right now 😄

Did I add an emoji in the previous sentence? Yes, I did! Just add from: markdown+emoji in the header and then type the name of the emoji you want to include encased in colons, e.g. :smile: to get 😄 or :muscle: to get 💪.

2 Introduction

In this document we will:

read occurrence cube data
explore data
preprocess data
visualize data

3 Read and preprocess

Load packages:

Code

library(tidyverse)    # to do datascience
library(INBOtheme)    # to apply INBO style to graphs
library(sf)           # to work with geospatial vector data
library(mapview)      # to make dynamic leaflet maps
library(here)         # to work with paths

3.1 Read data

Read Anemone data from the occurrence cube file 20241217_occurrence_cube_anemone.tsv:

Code

anemone_cube <- readr::read_tsv(
  file = here::here("data/20241217/20241217_occurrence_cube_anemone.tsv"),
  na = ""
)

Rows: 459 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): eeacellcode, class, species
dbl (7): year, classkey, specieskey, occurrences, mincoordinateuncertaintyin...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Read the Belgian grid from the geopackage file 20241217_utm1_be.gpkg, derived from the shapefile as provided by the European Environment Agency:

Code

be_grid <- sf::st_read(
  here::here("./data/20241217/20241217_utm1_be.gpkg")
)

Reading layer `20241217_utm1_be' from data source 
  `C:\Users\damiano_oldoni\Documents\GitHub\coding-club\data\20241217\20241217_utm1_be.gpkg' 
  using driver `GPKG'
Simple feature collection with 51726 features and 3 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3768000 ymin: 2926000 xmax: 4080000 ymax: 3237000
Projected CRS: ETRS89 / ETRS-LAEA

3.2 Explore data

This dataset contain data from 2010 to 2023 related to 4 species and their distribution in Belgium based on a grid of 1 km x 1 km.

Preview with the first 30 rows of the dataset:

Code

head(anemone_cube, n = 30)

3.3 Taxonomic information

Species present in the dataset:

Code

anemone_cube %>% distinct(specieskey, species)

3.4 Temporal information

The data are temporally defined at year level. Years present:

Code

anemone_cube %>% dplyr::distinct(year)

3.5 Geographical information

The geographical information is represented by the eeacellcode column, which contains the identifiers of the grid cells containing at least one occurrence of the species.

The dataset contains 367 unique grid cells.

3.6 Preprocess data

Add geometrical information to the occurrence cube via eeacellcode, which contains the identifiers of the grid cells containing at least one occurrence of the species.

Code

cells_in_cube <- be_grid %>%
  dplyr::filter(CELLCODE %in% unique(anemone_cube$eeacellcode)) %>%
  dplyr::select(-c(EOFORIGIN, NOFORIGIN))
sf_anemone_cube <- cells_in_cube %>%
  dplyr::left_join(anemone_cube, by = c("CELLCODE" = "eeacellcode")) %>%
  dplyr::rename("eeacellcode" = "CELLCODE")

Final (spatial) dataset:

Code

sf_anemone_cube %>% head(n = 30)

4 Data visualization

In this section we will show how the number of occurrences and the number of occupied grid cells vary by year and species. Both static plots and dynamic maps are generated.

4.1 Static plots

Show number of occurrences and number of occupied grid cells. Make a tabbed section out of it. How to do it with Quarto? Use the ::: panel-tabset directive. End it with :::.

Code

n_per_species <- sf_anemone_cube %>%
  dplyr::group_by(species) %>%
  dplyr::summarize(occurrences = sum(occurrences),
                   grid_cells = n_distinct(eeacellcode),
                   .groups = "drop") %>%
  tidyr::pivot_longer(cols = c(occurrences, grid_cells),
                      names_to = "variable",
                      values_to = "n")

ggplot(n_per_species, aes(x = species, y = n)) +
  geom_bar(stat = 'identity') +
  facet_grid(.~variable, scales = "free_y") +
  ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))

Code

n_per_year <- sf_anemone_cube %>%
  dplyr::group_by(year) %>%
  dplyr::summarize(occurrences = sum(occurrences),
                   grid_cells = n_distinct(eeacellcode),
                   .groups = "drop") %>%
  tidyr::pivot_longer(cols = c(occurrences, grid_cells),
                      names_to = "variable",
                      values_to = "n")

ggplot(n_per_year,aes(x = year, y = n)) +
  geom_bar(stat = 'identity') +
  facet_grid(.~variable, scales = "free_y") +
  ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))

Code

n_occs_per_year_species <-
  sf_anemone_cube %>%
  dplyr::group_by(year, species) %>%
  dplyr::summarize(occurrences = sum(occurrences),
                   grid_cells = n_distinct(eeacellcode),
                   .groups = "drop") %>%
  tidyr::pivot_longer(cols = c(occurrences, grid_cells),
                      names_to = "variable",
                      values_to = "n")

ggplot(n_occs_per_year_species,
       aes(x = year, y = n, fill = species)) +
  geom_bar(stat = 'identity', scales = "free_y") +
  facet_grid(.~variable) +
  ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))

Warning in geom_bar(stat = "identity", scales = "free_y"): Ignoring unknown
parameters: `scales`

Warning: The `scale_name` argument of `discrete_scale()` is deprecated as of ggplot2
3.5.0.

4.2 Dynamic maps

4.2.1 Leaflet maps

We show a map with the distribution of Anemone in Belgium. We show the total number of occurrences per grid cell. The color of the grid cells is based on the number of occurrences. The legend shows the color scale and the number of occurrences per grid cell.

Code

n_occs_per_cell <- sf_anemone_cube %>%
  dplyr::group_by(eeacellcode) %>%
  dplyr::summarize(
    occurrences = sum(occurrences),
    min_coordinateuncertaintyinmeters = min(mincoordinateuncertaintyinmeters),
    min_mintemporaluncertainty = min(mintemporaluncertainty),
    .groups = "drop")
map_anemone <- mapview::mapview(n_occs_per_cell,
                                zcol = "occurrences",
                                legend = TRUE
)
map_anemone

5 Notes about Quarto

5.1 Caching

Caching in Quarto is done in the header, under execute:, via cache: true. This is similar to the cache = TRUE option in RMarkdown. More info at https://quarto.org/docs/projects/code-execution.html.

--- title: "Read and visualize _Anemone_ Species Occurrence Cube" author: "DO" number-sections: true format: html: df-print: paged toc: true toc-float: true toc-depth: 3 number-sections: true code-fold: true code-tools: true execute: eval: true echo: true warning: true error: false include: true project: execute-dir: project from: markdown+emoji editor: visual --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # From R Markdown to Quarto Some changes are needed to convert an R Markdown document to a Quarto document. The config field `execute:` contains the settings for the code execution, similar to what we did in the very first chunk of the Rmd document: `knitr::opts_chunk$set(echo = TRUE)`. More about it in chapter [Managing Execution](https://quarto.org/docs/projects/code-execution.html) of the official [Quarto guide](https://quarto.org/docs/guide/). Setting `working-dir: project` under `execute:` would make Quarto behave like RMarkdown with "Knit directory" set to "Project directory". In this way, all relative paths will start from the root directory of the project. **IMPORTANT**: the `project: execute-dir: project` setting typically requires a **Quarto project** rather than just a regular RStudio project (.Rproj). To make it work in your current setup, we can use `here` package and its function `here()` in out code, which will work regardless of the working directory, e.g. `here::here("data/20241217/20241217_occurrence_cube_anemone.tsv")`, or `here::here("data", "20241217", "20241217_occurrence_cube_anemone.tsv")`. Crosse-references of sections are done by appending {#sec-label} to the header of the section, and then using `@ref(sec-label)` in the text. Example, let's cross-reference the section @ref(sec-static-plots} right now :smile: Did I add an emoji in the previous sentence? Yes, I did! Just add `from: markdown+emoji` in the header and then type the name of the emoji you want to include encased in colons, e.g. `:smile:` to get :smile: or `:muscle:` to get :muscle:. # Introduction In this document we will: 1. read occurrence cube data 2. explore data 3. preprocess data 4. visualize data # Read and preprocess Load packages: ```{r load-pkgs, message=FALSE, warning=FALSE} library(tidyverse) # to do datascience library(INBOtheme) # to apply INBO style to graphs library(sf) # to work with geospatial vector data library(mapview) # to make dynamic leaflet maps library(here) # to work with paths ``` ## Read data Read *Anemone* data from the occurrence cube file `20241217_occurrence_cube_anemone.tsv`: ```{r read data} anemone_cube <- readr::read_tsv( file = here::here("data/20241217/20241217_occurrence_cube_anemone.tsv"), na = "" ) ``` Read the Belgian grid from the geopackage file `20241217_utm1_be.gpkg`, derived from the shapefile as provided by the [European Environment Agency](https://www.eea.europa.eu/en): ```{r read grid} be_grid <- sf::st_read( here::here("./data/20241217/20241217_utm1_be.gpkg") ) ``` ## Explore data This dataset contain data from `r min(anemone_cube$year)` to `r max(anemone_cube$year)` related to `r length(unique(anemone_cube$specieskey))` species and their distribution in Belgium based on a grid of 1 km x 1 km. Preview with the first 30 rows of the dataset: ```{r} head(anemone_cube, n = 30) ``` ## Taxonomic information Species present in the dataset: ```{r} anemone_cube %>% distinct(specieskey, species) ``` ## Temporal information The data are temporally defined at year level. Years present: ```{r} anemone_cube %>% dplyr::distinct(year) ``` ## Geographical information The geographical information is represented by the `eeacellcode` column, which contains the identifiers of the grid cells containing at least one occurrence of the species. The dataset contains `r length(unique(anemone_cube$eeacellcode))` unique grid cells. ## Preprocess data Add geometrical information to the occurrence cube via `eeacellcode`, which contains the identifiers of the grid cells containing at least one occurrence of the species. ```{r add geom information} cells_in_cube <- be_grid %>% dplyr::filter(CELLCODE %in% unique(anemone_cube$eeacellcode)) %>% dplyr::select(-c(EOFORIGIN, NOFORIGIN)) sf_anemone_cube <- cells_in_cube %>% dplyr::left_join(anemone_cube, by = c("CELLCODE" = "eeacellcode")) %>% dplyr::rename("eeacellcode" = "CELLCODE") ``` Final (spatial) dataset: ```{r} sf_anemone_cube %>% head(n = 30) ``` # Data visualization In this section we will show how the number of occurrences and the number of occupied grid cells vary by year and species. Both static plots and dynamic maps are generated. ## Static plots {#sec-static-plots} Show number of occurrences and number of occupied grid cells. Make a tabbed section out of it. How to do it with Quarto? Use the `::: panel-tabset` directive. End it with `:::`. ::: panel-tabset ### per species ```{r class.source = "fold-hide"} n_per_species <- sf_anemone_cube %>% dplyr::group_by(species) %>% dplyr::summarize(occurrences = sum(occurrences), grid_cells = n_distinct(eeacellcode), .groups = "drop") %>% tidyr::pivot_longer(cols = c(occurrences, grid_cells), names_to = "variable", values_to = "n") ggplot(n_per_species, aes(x = species, y = n)) + geom_bar(stat = 'identity') + facet_grid(.~variable, scales = "free_y") + ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1)) ``` ### per year ```{r class.source = "fold-hide"} n_per_year <- sf_anemone_cube %>% dplyr::group_by(year) %>% dplyr::summarize(occurrences = sum(occurrences), grid_cells = n_distinct(eeacellcode), .groups = "drop") %>% tidyr::pivot_longer(cols = c(occurrences, grid_cells), names_to = "variable", values_to = "n") ggplot(n_per_year,aes(x = year, y = n)) + geom_bar(stat = 'identity') + facet_grid(.~variable, scales = "free_y") + ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1)) ``` ### per year and province ```{r class.source = "fold-hide"} n_occs_per_year_species <- sf_anemone_cube %>% dplyr::group_by(year, species) %>% dplyr::summarize(occurrences = sum(occurrences), grid_cells = n_distinct(eeacellcode), .groups = "drop") %>% tidyr::pivot_longer(cols = c(occurrences, grid_cells), names_to = "variable", values_to = "n") ggplot(n_occs_per_year_species, aes(x = year, y = n, fill = species)) + geom_bar(stat = 'identity', scales = "free_y") + facet_grid(.~variable) + ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1)) ``` ::: ## Dynamic maps ### Leaflet maps We show a map with the distribution of *Anemone* in Belgium. We show the total number of occurrences per grid cell. The color of the grid cells is based on the number of occurrences. The legend shows the color scale and the number of occurrences per grid cell. ```{r class.source = "fold-hide"} n_occs_per_cell <- sf_anemone_cube %>% dplyr::group_by(eeacellcode) %>% dplyr::summarize( occurrences = sum(occurrences), min_coordinateuncertaintyinmeters = min(mincoordinateuncertaintyinmeters), min_mintemporaluncertainty = min(mintemporaluncertainty), .groups = "drop") map_anemone <- mapview::mapview(n_occs_per_cell, zcol = "occurrences", legend = TRUE ) map_anemone ``` # Notes about Quarto ## Caching Caching in Quarto is done in the header, under `execute:`, via `cache: true`. This is similar to the `cache = TRUE` option in RMarkdown. More info at <https://quarto.org/docs/projects/code-execution.html>.