How to use the tinyForestR package

Getting started

The package is hosted on Github and is a work in progress. It can be installed by running devtools::install_github("julianflowers/tinyForestR").

The package makes use of a number of Application Programming Interfaces (APIs) some of which require API keys which will need to be applied for separately. This is outlined in the relevant sections of this vignette.

It also uses a range of Python packages to access some datasets (in some cases Python packages are better developed than R). For this reason the first step is to run initialise_tf() to intialise the package.

This:

Loads reticulate which R uses to talk to Python
Sets up a Python virtual environment (VE) called tinyforest, and ensures R uses the correct version of Python by specifying the RETICULATE_PYTHON environment variable. It may be necessary to restart R to make sure R correctly uses this.
You will be asked if you want to remove the tinyforest environment. Say “no” unless you have a previous installation and are having trouble. It will create a clean install of the VE.
Installs a number of Python packages to the tinyforest environment. These include
- earthengine-api which enables access to Google Earth Engine (GEE)
- geemap - a set of added value tools to extract and manipulate GEE data
- osdatahub - access to Ordnance Survey National Geographic Database data.
- OSGridConverter - converts lat-longs to UK Grid references
Imports the relevant modules for use in other packages

Install and initialise

if(!require("tinyForestR"))
#devtools::install_github("julianflowers/tinyForestR", force = FALSE)
if(!require("pacman"))install.packages("pacman")
library(pacman)
library(tinyForestR)

p_load(leaflet.extras2, tidyverse, mapview, sf, ggmap, lubridate, geojsonio)

theme_set(ggthemes::theme_economist())

initialise_tf()
#> virtualenv: tinyforest
#> Using virtual environment 'tinyforest' ...

Load Tiny Forest data

The next step is to load Tiny Forest (TF) data. Because this only exists in a series of web pages the get_tf_data function identifies the relevant pages and iterates over them to extract name, id, location, area, planters, and types of tree planted (as a list column), for those TFs planted at the time of extraction. It does include TFs planted outside the UK. The function takes about 30 seconds to iterate over all the relevant pages.


tf <- tinyForestR::df

tf_df <- tf |>
  unnest("trees") |>
  mutate(year = year(date), 
         month = month(date))

Once the data is loaded we can save it as a csv file and get some high level information on planting, timings, size and so on.

As of 2023-08-01 there are 178 planted TFs.


needs(patchwork)

## annual planting

tf_year <- tf_df |>
  dplyr::select(-trees) |>
  distinct() |>
  count(year) |>
  ggplot(aes(year, n)) +
  geom_col() +
  labs(title = "TFs planted per year")

## area distribution

tf_area <- tf_df |>
  dplyr::select(-trees) |>
  distinct() |>
  ggplot(aes(factor(year), area)) +
  geom_boxplot() +
  labs(title = "TF area year")

tf_trees <- tf_df |>
  group_by(tf_id) |>
  summarise(n_trees = n()) |>
  ggplot() +
  geom_histogram(aes(n_trees)) +
  labs(title = "Distribution of tree species", 
       x = "Number of tree species")

tf_year + tf_area + tf_trees

Locations

It is now straightforward to map locations of TFs using sf and mapview.


library(mapview)

tf_df |>
  dplyr::select(-trees) |>
  distinct() |>
  mutate(year = factor(year)) |>
  sf::st_as_sf(coords = c("lon", "lat"), crs = 4326) |>
  mapview::mapview(zcol = "year")

Tree species

We can also look at planting frequency for different tree species.


tf_df |>
  ungroup() |>
  unnest("trees") |>
  count(trees, sort = TRUE) |>
  top_n(25) |>
  ggplot() +
  geom_col(aes(n, reorder(trees, n))) +
  labs(y = "", 
       title = "25 most commonly planted tree species", 
       x = "Number of Tiny Forests") +
 # ggthemes::theme_base() +
  theme(plot.title.position = "plot")


tf_df |>
  ungroup() |>
  unnest("trees") |>
  count(tf_id, trees, sort = TRUE) |>
  pivot_wider(names_from = trees, values_from = n, values_fill = 0) 
#> # A tibble: 178 × 65
#>    tf_id Apple, crab(Malus sylve…¹ Birch, downy(Betula …² Birch, silver(Betula…³
#>    <dbl>                     <int>                  <int>                  <int>
#>  1    85                         1                      1                      1
#>  2    86                         1                      0                      1
#>  3    87                         1                      0                      1
#>  4    88                         1                      1                      1
#>  5    89                         1                      1                      1
#>  6    90                         1                      0                      0
#>  7    91                         1                      1                      1
#>  8    92                         1                      1                      1
#>  9    93                         1                      0                      1
#> 10    94                         0                      0                      1
#> # ℹ 168 more rows
#> # ℹ abbreviated names: ¹`Apple, crab(Malus sylvestris)`,
#> #   ²`Birch, downy(Betula pubescens)`, ³`Birch, silver(Betula pendula)`
#> # ℹ 61 more variables: `Blackthorn(Prunus spinosa)` <int>,
#> #   `Dogwood(Cornus sanguinea)` <int>, `Elder(Sambucus nigra)` <int>,
#> #   `Guelder, rose(Viburnum opulus)` <int>, `Hazel(Corylus avellana)` <int>,
#> #   `Maple, field(Acer campestre)` <int>, …

Biodiversity data


tf1 <- tf_df |>
 dplyr::select(lat, lon, tf_id) |>
  distinct()

i <- 1

lat <- tf1$lat[i]
lon <- tf1$lon[i]

The get_nbn_buffer downloads occurrence data from the NBN Atlas in a set buffer around a given longitude and latitude. For example we can download 10000 records around lat=51.777889, lon=-1.469139 (Witney TF).¹


safe_buff <- safely(tinyForestR::get_nbn_buffer)

nbn_data <- safe_buff(lon, lat, n = 10000)
#> 2.977 sec elapsed

nbn_data$result |>
  head()
#>    kingdom       phylum        classs         order       family      genus
#> 1 Animalia     Chordata          Aves  Anseriformes     Anatidae     Cygnus
#> 2  Plantae Tracheophyta Magnoliopsida    Dipsacales    Adoxaceae   Sambucus
#> 3 Animalia     Chordata          Aves Passeriformes Aegithalidae Aegithalos
#> 4  Plantae Tracheophyta Magnoliopsida       Rosales     Ulmaceae      Ulmus
#> 5 Animalia     Chordata          Aves Passeriformes      Paridae      Parus
#> 6 Animalia     Chordata          Aves Passeriformes Muscicapidae  Erithacus
#>   decimalLatitude decimalLongitude year month
#> 1        51.78298        -1.472293 1968    04
#> 2        51.76903        -1.470282 1996  <NA>
#> 3        51.78298        -1.472293 2019    12
#> 4        51.78316        -1.475597 2017    04
#> 5        51.78298        -1.472293 2012    04
#> 6        51.78298        -1.472293 2015    10
#>                         dataProviderName           speciesGroups
#> 1          British Trust for Ornithology          Animals, Birds
#> 2 Botanical Society of Britain & Ireland Plants, FloweringPlants
#> 3          British Trust for Ornithology          Animals, Birds
#> 4 Botanical Society of Britain & Ireland Plants, FloweringPlants
#> 5          British Trust for Ornithology          Animals, Birds
#> 6          British Trust for Ornithology          Animals, Birds
#>    vernacularName             species
#> 1       Mute Swan         Cygnus olor
#> 2           Elder      Sambucus nigra
#> 3 Long-tailed Tit Aegithalos caudatus
#> 4     English Elm       Ulmus procera
#> 5       Great Tit         Parus major
#> 6           Robin  Erithacus rubecula

Plant diversity using BSBI data

I have also included functions to extract data for the 2020 Botanic Society of Britain and Ireland survey. This is publicly available from for UK National Grid 1k hectads. This requires conversion of lat-longs to UK grids.


grid_ref <- tinyForestR::os_lat_lon_to_grid(lat = lat, lon = lon)
#> Using virtual environment 'tinyforest' ...

grid_ref$grid
#> [1] "SP3672"


bsbi_data <- tinyForestR::get_bsbi_data(grid_ref = grid_ref$grid)

bsbi_data |>
  enframe() |>
  unnest("value") |>
  unnest("value") |>
  # slice(-c(168:173)) |>
  mutate(year = str_extract(value, "20\\d{2}"),
         value = str_remove(value, year),
         count = parse_number(value),
         value = str_remove(value, as.character(count)), 
         value = str_remove(value, "\\d{1,}"), 
         grid = grid_ref$grid, 
         tf_id = tf_df$tf_id[i]) |>
  arrange(value)  |>
  drop_na()
#> # A tibble: 7 × 6
#>   name    value                       year  count grid   tf_id
#>   <chr>   <chr>                       <chr> <dbl> <chr>  <dbl>
#> 1 records "Clinopodium ascendens()  " 2018      1 SP3672    85
#> 2 records "Cochlearia danica()  "     2016      1 SP3672    85
#> 3 records "Crataegus monogyna()  "    2015      1 SP3672    85
#> 4 records "Cupressus × leylandii()  " 2016      1 SP3672    85
#> 5 records "Galeopsis bifida()  "      2015      1 SP3672    85
#> 6 records "Hesperis matronalis()  "   2017      1 SP3672    85
#> 7 records "Oenothera stricta()  "     2017      1 SP3672    85

Rapid calculation of biodiversity metrics

The calc_bd_metrics function takes an output from get_nbn_buffer or get_bsbi_data, converts the data from long to wide format, creates a species matrix for a specified class (for get_nbn_buffer data), and outputs a list containing:

diversity metrics (Simpson)
species richness by month (plotted)
Bray-Curtis matrix
Raup matrix


metrics <- calc_bd_metrics(df = nbn_data$result, class = "Aves")


metrics$metrics |>
  gt::gt()

month	richness	N	ratio	diversity
01	17	60	0.2833333	0.9294444
02	17	58	0.2931034	0.9256837
03	24	74	0.3243243	0.9382761
04	16	56	0.2857143	0.9266582
05	43	104	0.4134615	0.9613536
06	16	57	0.2807018	0.9258233
07	19	53	0.3584906	0.9284443
08	14	41	0.3414634	0.9125521
09	24	59	0.4067797	0.9399598
10	16	48	0.3333333	0.9253472
11	15	60	0.2500000	0.9205556
12	17	58	0.2931034	0.9304400

metrics$plot +
  labs(title = "Monthly species richness for ",
         subtitle = paste("Aves", tf1$tf_id[1]), 
       y = "Richness", 
       x = "Month") +
  theme(plot.title.position = "plot")

Vegatation indices

The package includes a calc_ndvi_buff function to enable the calculation of normalized vegetation index (NDVI) for the buffer area around a given point. It uses Sentinel2 surface reflectance satellite images which are available at 10m resolution and are regularly updated. The function extracts images via the Google Earth Engine API and requires registration and authentication prior to use (see…).

The function returns a list including, image dates, NDVI statistics for the image, an interactive map and a raster. Note it may take few minutes to run.

The code chunk below calculates the NDVI for each image containing the buffer around the Witney TF for 2019 and 2022 and maps them side-by-side. (Note, the function selects only those S2 images with cloud cover < 10%).


lon <- df$lon |> 
  unique()

lat <- df$lat |> 
  unique()

tf_id <- df$tf_id |> 
  unique()

plant_date <- df$date |>
  unique()

i <- 10

test <- calc_ndvi_buff(lon = lon[i], lat = lat[i], dist1 = 1000, dist2 = 100, start_date = as.character(plant_date[2]), end_date = "2023-06-01", cloud_cover = 10, tf_id = tf_id[i])
#> ── rgee 1.1.5 ─────────────────────────────────────── earthengine-api 0.1.361 ── 
#>  ✔ user: not_defined
#>  ✔ Google Drive credentials:
 ✔ Google Drive credentials:  FOUND
#>  ✔ Initializing Google Earth Engine:
 ✔ Initializing Google Earth Engine:  DONE!
#> 
 ✔ Earth Engine account: users/julianflowers12 
#> ──────────────────────────────────────────────────────────────────────────────── 
#> Number of features: Calculating ...
Number of features: 1                     
#> Number of features: Calculating ...
Number of features: 1


test1 <- calc_ndvi_buff(lon = lon[i],  lat =  lat[i], dist1 = 1000, dist2 = 100, start_date = "2017-01-01", end_date = as.character(plant_date[2]), cloud_cover = 10, tf_id = tf_id[i])
#> ── rgee 1.1.5 ─────────────────────────────────────── earthengine-api 0.1.361 ── 
#>  ✔ user: not_defined
#>  ✔ Google Drive credentials:
 ✔ Google Drive credentials:  FOUND
#>  ✔ Initializing Google Earth Engine:
 ✔ Initializing Google Earth Engine:  DONE!
#> 
 ✔ Earth Engine account: users/julianflowers12 
#> ──────────────────────────────────────────────────────────────────────────────── 
#> Number of features: Calculating ...
Number of features: 1                     
#> Number of features: Calculating ...
Number of features: 1


test1$map | test$map


pre <- test1$point_summary |>
  mutate(period = "a-pre", 
         loc = "point") |>
  mutate_at(.vars = 3:5, scale)

post <- test$point_summary |>
  mutate(period = "b-post", 
         loc = "point") |>
  mutate_at(.vars = 3:5, scale)

pre_buff <- test1$buffer_summary |>
  mutate(period = "a-pre", 
         loc = "buffer") |>
  mutate_at(.vars = 3:5, scale)

post_buff <- test$buffer_summary |>
  mutate(period = "b-post", 
         loc = "buffer") |>
  mutate_at(.vars = 3:5, scale)

points <- bind_rows(pre, post)
buffers <- bind_rows(pre_buff, post_buff)

overall <- bind_rows(points, buffers)

overall |>  
  ggplot() +
  geom_boxplot(aes(loc, ndvi_mean, fill = period))


overall |>
  ggplot() +
  geom_col(aes(date, ndvi_median, fill = loc), position = "dodge") +
  geom_smooth(aes(date, ndvi_median, color = loc), method = "loess", span = 0.5) +
  geom_vline(xintercept = as.Date(plant_date[i]), lty = "dotted") +
  geom_hline(yintercept = 0) +
  facet_wrap(~ loc) +
  labs(title = "Trend in scaled NDVI: 100m and 1000m from TF centroid",
       subtitle = paste0("Tiny Forest ", tf_id[i], "; planted: ", plant_date[i]), 
       x = "")

Covariates

TF specific:

Size
Tree diversity
Age
Urban / rural

Area specific

Green infrastructure (conducive to biodiversity)[(WOS:000882918000004?); (WOS:000327304100013?); (WOS:000572856700001?); (WOS:000374275700069?); (implemen?); (jerome2018?); (Van2020?); (Felappi2020?); (Yin2022?); (Anderson2022?)](burgess2015?; strategi?; Jones et al. 2022)

I have included a snapshotwith environmental variables for the TF dataset. This includes the age of the TF, rural-urban classification of the lower super output area containing the TF centroid, the area (m²) of public parks and allotments in the buffer taken from OS Open Greenspace data, and the area (m²) of deciduous woodland, taken from Priority Habitat Inventory data.

Typology of green infrastructure

Jones, Laurence, Sally Anderson, Jeppe Læssøe, Ellen Banzhaf, Anne Jensen, David Neil Bird, James Miller, et al. 2022. “A Typology for Urban Green Infrastructure to Guide Multifunctional Planning of Nature-Based Solutions.” Nature-Based Solutions 2 (December): 100041. https://doi.org/10.1016/j.nbsj.2022.100041.

Note this may sometimes time out depending on traffic on the NBN Atlas webservice↩︎

How to use the `tinyForestR` package

What is the `tinyForestR` package?

Getting started

Install and initialise

Load Tiny Forest data

Locations

Tree species

Biodiversity data

Plant diversity using BSBI data

Rapid calculation of biodiversity metrics

Vegatation indices

Covariates

Typology of green infrastructure

How to use the tinyForestR package

What is the tinyForestR package?

Getting started

Install and initialise

Load Tiny Forest data

Locations

Tree species

Biodiversity data

Plant diversity using BSBI data

Rapid calculation of biodiversity metrics

Vegatation indices

Covariates

Typology of green infrastructure

How to use the `tinyForestR` package

What is the `tinyForestR` package?