Read and visualize Anemone Species Occurrence Cube

Author

DO

1 From R Markdown to Quarto

Some changes are needed to convert an R Markdown document to a Quarto document.

The config field execute: contains the settings for the code execution, similar to what we did in the very first chunk of the Rmd document: knitr::opts_chunk$set(echo = TRUE). More about it in chapter Managing Execution of the official Quarto guide.

Setting working-dir: project under execute: would make Quarto behave like RMarkdown with “Knit directory” set to “Project directory”. In this way, all relative paths will start from the root directory of the project. IMPORTANT: the project: execute-dir: project setting typically requires a Quarto project rather than just a regular RStudio project (.Rproj). To make it work in your current setup, we can use here package and its function here() in out code, which will work regardless of the working directory, e.g. here::here("data/20241217/20241217_occurrence_cube_anemone.tsv"), or here::here("data", "20241217", "20241217_occurrence_cube_anemone.tsv").

Crosse-references of sections are done by appending {#sec-label} to the header of the section, and then using @ref(sec-label) in the text. Example, let’s cross-reference the section @ref(sec-static-plots} right now 😄

Did I add an emoji in the previous sentence? Yes, I did! Just add from: markdown+emoji in the header and then type the name of the emoji you want to include encased in colons, e.g. :smile: to get 😄 or :muscle: to get 💪.

2 Introduction

In this document we will:

  1. read occurrence cube data
  2. explore data
  3. preprocess data
  4. visualize data

3 Read and preprocess

Load packages:

Code
library(tidyverse)    # to do datascience
library(INBOtheme)    # to apply INBO style to graphs
library(sf)           # to work with geospatial vector data
library(mapview)      # to make dynamic leaflet maps
library(here)         # to work with paths

3.1 Read data

Read Anemone data from the occurrence cube file 20241217_occurrence_cube_anemone.tsv:

Code
anemone_cube <- readr::read_tsv(
  file = here::here("data/20241217/20241217_occurrence_cube_anemone.tsv"),
  na = ""
)
Rows: 459 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): eeacellcode, class, species
dbl (7): year, classkey, specieskey, occurrences, mincoordinateuncertaintyin...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Read the Belgian grid from the geopackage file 20241217_utm1_be.gpkg, derived from the shapefile as provided by the European Environment Agency:

Code
be_grid <- sf::st_read(
  here::here("./data/20241217/20241217_utm1_be.gpkg")
)
Reading layer `20241217_utm1_be' from data source 
  `C:\Users\damiano_oldoni\Documents\GitHub\coding-club\data\20241217\20241217_utm1_be.gpkg' 
  using driver `GPKG'
Simple feature collection with 51726 features and 3 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3768000 ymin: 2926000 xmax: 4080000 ymax: 3237000
Projected CRS: ETRS89 / ETRS-LAEA

3.2 Explore data

This dataset contain data from 2010 to 2023 related to 4 species and their distribution in Belgium based on a grid of 1 km x 1 km.

Preview with the first 30 rows of the dataset:

Code
head(anemone_cube, n = 30)

3.3 Taxonomic information

Species present in the dataset:

Code
anemone_cube %>% distinct(specieskey, species)

3.4 Temporal information

The data are temporally defined at year level. Years present:

Code
anemone_cube %>% dplyr::distinct(year)

3.5 Geographical information

The geographical information is represented by the eeacellcode column, which contains the identifiers of the grid cells containing at least one occurrence of the species.

The dataset contains 367 unique grid cells.

3.6 Preprocess data

Add geometrical information to the occurrence cube via eeacellcode, which contains the identifiers of the grid cells containing at least one occurrence of the species.

Code
cells_in_cube <- be_grid %>%
  dplyr::filter(CELLCODE %in% unique(anemone_cube$eeacellcode)) %>%
  dplyr::select(-c(EOFORIGIN, NOFORIGIN))
sf_anemone_cube <- cells_in_cube %>%
  dplyr::left_join(anemone_cube, by = c("CELLCODE" = "eeacellcode")) %>%
  dplyr::rename("eeacellcode" = "CELLCODE")

Final (spatial) dataset:

Code
sf_anemone_cube %>% head(n = 30)

4 Data visualization

In this section we will show how the number of occurrences and the number of occupied grid cells vary by year and species. Both static plots and dynamic maps are generated.

4.1 Static plots

Show number of occurrences and number of occupied grid cells. Make a tabbed section out of it. How to do it with Quarto? Use the ::: panel-tabset directive. End it with :::.

Code
n_per_species <- sf_anemone_cube %>%
  dplyr::group_by(species) %>%
  dplyr::summarize(occurrences = sum(occurrences),
                   grid_cells = n_distinct(eeacellcode),
                   .groups = "drop") %>%
  tidyr::pivot_longer(cols = c(occurrences, grid_cells),
                      names_to = "variable",
                      values_to = "n")

ggplot(n_per_species, aes(x = species, y = n)) +
  geom_bar(stat = 'identity') +
  facet_grid(.~variable, scales = "free_y") +
  ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))

Code
n_per_year <- sf_anemone_cube %>%
  dplyr::group_by(year) %>%
  dplyr::summarize(occurrences = sum(occurrences),
                   grid_cells = n_distinct(eeacellcode),
                   .groups = "drop") %>%
  tidyr::pivot_longer(cols = c(occurrences, grid_cells),
                      names_to = "variable",
                      values_to = "n")

ggplot(n_per_year,aes(x = year, y = n)) +
  geom_bar(stat = 'identity') +
  facet_grid(.~variable, scales = "free_y") +
  ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))

Code
n_occs_per_year_species <-
  sf_anemone_cube %>%
  dplyr::group_by(year, species) %>%
  dplyr::summarize(occurrences = sum(occurrences),
                   grid_cells = n_distinct(eeacellcode),
                   .groups = "drop") %>%
  tidyr::pivot_longer(cols = c(occurrences, grid_cells),
                      names_to = "variable",
                      values_to = "n")

ggplot(n_occs_per_year_species,
       aes(x = year, y = n, fill = species)) +
  geom_bar(stat = 'identity', scales = "free_y") +
  facet_grid(.~variable) +
  ggplot2::theme(axis.text.x = element_text(angle = 60, hjust = 1))
Warning in geom_bar(stat = "identity", scales = "free_y"): Ignoring unknown
parameters: `scales`
Warning: The `scale_name` argument of `discrete_scale()` is deprecated as of ggplot2
3.5.0.

4.2 Dynamic maps

4.2.1 Leaflet maps

We show a map with the distribution of Anemone in Belgium. We show the total number of occurrences per grid cell. The color of the grid cells is based on the number of occurrences. The legend shows the color scale and the number of occurrences per grid cell.

Code
n_occs_per_cell <- sf_anemone_cube %>%
  dplyr::group_by(eeacellcode) %>%
  dplyr::summarize(
    occurrences = sum(occurrences),
    min_coordinateuncertaintyinmeters = min(mincoordinateuncertaintyinmeters),
    min_mintemporaluncertainty = min(mintemporaluncertainty),
    .groups = "drop")
map_anemone <- mapview::mapview(n_occs_per_cell,
                                zcol = "occurrences",
                                legend = TRUE
)
map_anemone

5 Notes about Quarto

5.1 Caching

Caching in Quarto is done in the header, under execute:, via cache: true. This is similar to the cache = TRUE option in RMarkdown. More info at https://quarto.org/docs/projects/code-execution.html.