How to use the tinyForestR package

What is the tinyForestR package?

tinyForestR is a set of tools designed to extract, manipulate and analyse data relevant to the location of Tiny Forests in the UK.

Specifically it extracts and processes landcover and biodiversity data from a range of sources for a given area around Tiny Forest locations, and provides a set of tools for analysing citizen science data derived directly from Tiny Forests.

Getting started

The package is hosted on Github and is a work in progress. It can be installed by running devtools::install_github("julianflowers/tinyForestR").

The package makes use of a number of Application Programming Interfaces (APIs) some of which require API keys which will need to be applied for separately. This is outlined in the relevant sections of this vignette.

It also uses a range of Python packages to access some datasets (in some cases Python packages are better developed than R). For this reason the first step is to run initialise_tf() to intialise the package.

This:

Install and initialise

if(!require("tinyForestR"))
#devtools::install_github("julianflowers/tinyForestR", force = FALSE)
if(!require("pacman"))install.packages("pacman")
library(pacman)
library(tinyForestR)

p_load(leaflet.extras2, tidyverse, mapview, sf, ggmap, lubridate, geojsonio)

theme_set(ggthemes::theme_economist())

initialise_tf()
#> virtualenv: tinyforest
#> Using virtual environment 'tinyforest' ...

Load Tiny Forest data

The next step is to load Tiny Forest (TF) data. Because this only exists in a series of web pages the get_tf_data function identifies the relevant pages and iterates over them to extract name, id, location, area, planters, and types of tree planted (as a list column), for those TFs planted at the time of extraction. It does include TFs planted outside the UK. The function takes about 30 seconds to iterate over all the relevant pages.


tf <- tinyForestR::df

tf_df <- tf |>
  unnest("trees") |>
  mutate(year = year(date), 
         month = month(date)) 

Once the data is loaded we can save it as a csv file and get some high level information on planting, timings, size and so on.

As of 2023-08-01 there are 178 planted TFs.


needs(patchwork)

## annual planting

tf_year <- tf_df |>
  dplyr::select(-trees) |>
  distinct() |>
  count(year) |>
  ggplot(aes(year, n)) +
  geom_col() +
  labs(title = "TFs planted per year")

## area distribution

tf_area <- tf_df |>
  dplyr::select(-trees) |>
  distinct() |>
  ggplot(aes(factor(year), area)) +
  geom_boxplot() +
  labs(title = "TF area year")

tf_trees <- tf_df |>
  group_by(tf_id) |>
  summarise(n_trees = n()) |>
  ggplot() +
  geom_histogram(aes(n_trees)) +
  labs(title = "Distribution of tree species", 
       x = "Number of tree species")

tf_year + tf_area + tf_trees

Locations

It is now straightforward to map locations of TFs using sf and mapview.


library(mapview)

tf_df |>
  dplyr::select(-trees) |>
  distinct() |>
  mutate(year = factor(year)) |>
  sf::st_as_sf(coords = c("lon", "lat"), crs = 4326) |>
  mapview::mapview(zcol = "year")

Tree species

We can also look at planting frequency for different tree species.


tf_df |>
  ungroup() |>
  unnest("trees") |>
  count(trees, sort = TRUE) |>
  top_n(25) |>
  ggplot() +
  geom_col(aes(n, reorder(trees, n))) +
  labs(y = "", 
       title = "25 most commonly planted tree species", 
       x = "Number of Tiny Forests") +
 # ggthemes::theme_base() +
  theme(plot.title.position = "plot")


tf_df |>
  ungroup() |>
  unnest("trees") |>
  count(tf_id, trees, sort = TRUE) |>
  pivot_wider(names_from = trees, values_from = n, values_fill = 0) 
#> # A tibble: 178 × 65
#>    tf_id Apple, crab(Malus sylve…¹ Birch, downy(Betula …² Birch, silver(Betula…³
#>    <dbl>                     <int>                  <int>                  <int>
#>  1    85                         1                      1                      1
#>  2    86                         1                      0                      1
#>  3    87                         1                      0                      1
#>  4    88                         1                      1                      1
#>  5    89                         1                      1                      1
#>  6    90                         1                      0                      0
#>  7    91                         1                      1                      1
#>  8    92                         1                      1                      1
#>  9    93                         1                      0                      1
#> 10    94                         0                      0                      1
#> # ℹ 168 more rows
#> # ℹ abbreviated names: ¹​`Apple, crab(Malus sylvestris)`,
#> #   ²​`Birch, downy(Betula pubescens)`, ³​`Birch, silver(Betula pendula)`
#> # ℹ 61 more variables: `Blackthorn(Prunus spinosa)` <int>,
#> #   `Dogwood(Cornus sanguinea)` <int>, `Elder(Sambucus nigra)` <int>,
#> #   `Guelder, rose(Viburnum opulus)` <int>, `Hazel(Corylus avellana)` <int>,
#> #   `Maple, field(Acer campestre)` <int>, …

Biodiversity data


tf1 <- tf_df |>
 dplyr::select(lat, lon, tf_id) |>
  distinct()

i <- 1

lat <- tf1$lat[i]
lon <- tf1$lon[i]

The get_nbn_buffer downloads occurrence data from the NBN Atlas in a set buffer around a given longitude and latitude. For example we can download 10000 records around lat=51.777889, lon=-1.469139 (Witney TF).1


safe_buff <- safely(tinyForestR::get_nbn_buffer)

nbn_data <- safe_buff(lon, lat, n = 10000)
#> 2.977 sec elapsed

nbn_data$result |>
  head()
#>    kingdom       phylum        classs         order       family      genus
#> 1 Animalia     Chordata          Aves  Anseriformes     Anatidae     Cygnus
#> 2  Plantae Tracheophyta Magnoliopsida    Dipsacales    Adoxaceae   Sambucus
#> 3 Animalia     Chordata          Aves Passeriformes Aegithalidae Aegithalos
#> 4  Plantae Tracheophyta Magnoliopsida       Rosales     Ulmaceae      Ulmus
#> 5 Animalia     Chordata          Aves Passeriformes      Paridae      Parus
#> 6 Animalia     Chordata          Aves Passeriformes Muscicapidae  Erithacus
#>   decimalLatitude decimalLongitude year month
#> 1        51.78298        -1.472293 1968    04
#> 2        51.76903        -1.470282 1996  <NA>
#> 3        51.78298        -1.472293 2019    12
#> 4        51.78316        -1.475597 2017    04
#> 5        51.78298        -1.472293 2012    04
#> 6        51.78298        -1.472293 2015    10
#>                         dataProviderName           speciesGroups
#> 1          British Trust for Ornithology          Animals, Birds
#> 2 Botanical Society of Britain & Ireland Plants, FloweringPlants
#> 3          British Trust for Ornithology          Animals, Birds
#> 4 Botanical Society of Britain & Ireland Plants, FloweringPlants
#> 5          British Trust for Ornithology          Animals, Birds
#> 6          British Trust for Ornithology          Animals, Birds
#>    vernacularName             species
#> 1       Mute Swan         Cygnus olor
#> 2           Elder      Sambucus nigra
#> 3 Long-tailed Tit Aegithalos caudatus
#> 4     English Elm       Ulmus procera
#> 5       Great Tit         Parus major
#> 6           Robin  Erithacus rubecula

Plant diversity using BSBI data

I have also included functions to extract data for the 2020 Botanic Society of Britain and Ireland survey. This is publicly available from for UK National Grid 1k hectads. This requires conversion of lat-longs to UK grids.


grid_ref <- tinyForestR::os_lat_lon_to_grid(lat = lat, lon = lon)
#> Using virtual environment 'tinyforest' ...

grid_ref$grid
#> [1] "SP3672"

bsbi_data <- tinyForestR::get_bsbi_data(grid_ref = grid_ref$grid)

bsbi_data |>
  enframe() |>
  unnest("value") |>
  unnest("value") |>
  # slice(-c(168:173)) |>
  mutate(year = str_extract(value, "20\\d{2}"),
         value = str_remove(value, year),
         count = parse_number(value),
         value = str_remove(value, as.character(count)), 
         value = str_remove(value, "\\d{1,}"), 
         grid = grid_ref$grid, 
         tf_id = tf_df$tf_id[i]) |>
  arrange(value)  |>
  drop_na()
#> # A tibble: 7 × 6
#>   name    value                       year  count grid   tf_id
#>   <chr>   <chr>                       <chr> <dbl> <chr>  <dbl>
#> 1 records "Clinopodium ascendens()  " 2018      1 SP3672    85
#> 2 records "Cochlearia danica()  "     2016      1 SP3672    85
#> 3 records "Crataegus monogyna()  "    2015      1 SP3672    85
#> 4 records "Cupressus × leylandii()  " 2016      1 SP3672    85
#> 5 records "Galeopsis bifida()  "      2015      1 SP3672    85
#> 6 records "Hesperis matronalis()  "   2017      1 SP3672    85
#> 7 records "Oenothera stricta()  "     2017      1 SP3672    85

Rapid calculation of biodiversity metrics

The calc_bd_metrics function takes an output from get_nbn_buffer or get_bsbi_data, converts the data from long to wide format, creates a species matrix for a specified class (for get_nbn_buffer data), and outputs a list containing:


metrics <- calc_bd_metrics(df = nbn_data$result, class = "Aves")


metrics$metrics |>
  gt::gt()
month richness N ratio diversity
01 17 60 0.2833333 0.9294444
02 17 58 0.2931034 0.9256837
03 24 74 0.3243243 0.9382761
04 16 56 0.2857143 0.9266582
05 43 104 0.4134615 0.9613536
06 16 57 0.2807018 0.9258233
07 19 53 0.3584906 0.9284443
08 14 41 0.3414634 0.9125521
09 24 59 0.4067797 0.9399598
10 16 48 0.3333333 0.9253472
11 15 60 0.2500000 0.9205556
12 17 58 0.2931034 0.9304400
metrics$plot +
  labs(title = "Monthly species richness for ",
         subtitle = paste("Aves", tf1$tf_id[1]), 
       y = "Richness", 
       x = "Month") +
  theme(plot.title.position = "plot")

Vegatation indices

The package includes a calc_ndvi_buff function to enable the calculation of normalized vegetation index (NDVI) for the buffer area around a given point. It uses Sentinel2 surface reflectance satellite images which are available at 10m resolution and are regularly updated. The function extracts images via the Google Earth Engine API and requires registration and authentication prior to use (see…).

The function returns a list including, image dates, NDVI statistics for the image, an interactive map and a raster. Note it may take few minutes to run.

The code chunk below calculates the NDVI for each image containing the buffer around the Witney TF for 2019 and 2022 and maps them side-by-side. (Note, the function selects only those S2 images with cloud cover < 10%).


lon <- df$lon |> 
  unique()

lat <- df$lat |> 
  unique()

tf_id <- df$tf_id |> 
  unique()

plant_date <- df$date |>
  unique()

i <- 10

test <- calc_ndvi_buff(lon = lon[i], lat = lat[i], dist1 = 1000, dist2 = 100, start_date = as.character(plant_date[2]), end_date = "2023-06-01", cloud_cover = 10, tf_id = tf_id[i])
#> ── rgee 1.1.5 ─────────────────────────────────────── earthengine-api 0.1.361 ── 
#>  ✔ user: not_defined
#>  ✔ Google Drive credentials:
 ✔ Google Drive credentials:  FOUND
#>  ✔ Initializing Google Earth Engine:
 ✔ Initializing Google Earth Engine:  DONE!
#> 
 ✔ Earth Engine account: users/julianflowers12 
#> ──────────────────────────────────────────────────────────────────────────────── 
#> Number of features: Calculating ...
Number of features: 1                     
#> Number of features: Calculating ...
Number of features: 1


test1 <- calc_ndvi_buff(lon = lon[i],  lat =  lat[i], dist1 = 1000, dist2 = 100, start_date = "2017-01-01", end_date = as.character(plant_date[2]), cloud_cover = 10, tf_id = tf_id[i])
#> ── rgee 1.1.5 ─────────────────────────────────────── earthengine-api 0.1.361 ── 
#>  ✔ user: not_defined
#>  ✔ Google Drive credentials:
 ✔ Google Drive credentials:  FOUND
#>  ✔ Initializing Google Earth Engine:
 ✔ Initializing Google Earth Engine:  DONE!
#> 
 ✔ Earth Engine account: users/julianflowers12 
#> ──────────────────────────────────────────────────────────────────────────────── 
#> Number of features: Calculating ...
Number of features: 1                     
#> Number of features: Calculating ...
Number of features: 1


test1$map | test$map


pre <- test1$point_summary |>
  mutate(period = "a-pre", 
         loc = "point") |>
  mutate_at(.vars = 3:5, scale)

post <- test$point_summary |>
  mutate(period = "b-post", 
         loc = "point") |>
  mutate_at(.vars = 3:5, scale)

pre_buff <- test1$buffer_summary |>
  mutate(period = "a-pre", 
         loc = "buffer") |>
  mutate_at(.vars = 3:5, scale)

post_buff <- test$buffer_summary |>
  mutate(period = "b-post", 
         loc = "buffer") |>
  mutate_at(.vars = 3:5, scale)

points <- bind_rows(pre, post)
buffers <- bind_rows(pre_buff, post_buff)

overall <- bind_rows(points, buffers)

overall |>  
  ggplot() +
  geom_boxplot(aes(loc, ndvi_mean, fill = period)) 


overall |>
  ggplot() +
  geom_col(aes(date, ndvi_median, fill = loc), position = "dodge") +
  geom_smooth(aes(date, ndvi_median, color = loc), method = "loess", span = 0.5) +
  geom_vline(xintercept = as.Date(plant_date[i]), lty = "dotted") +
  geom_hline(yintercept = 0) +
  facet_wrap(~ loc) +
  labs(title = "Trend in scaled NDVI: 100m and 1000m from TF centroid",
       subtitle = paste0("Tiny Forest ", tf_id[i], "; planted: ", plant_date[i]), 
       x = "")

Covariates

TF specific:

Area specific

I have included a snapshotwith environmental variables for the TF dataset. This includes the age of the TF, rural-urban classification of the lower super output area containing the TF centroid, the area (m2) of public parks and allotments in the buffer taken from OS Open Greenspace data, and the area (m2) of deciduous woodland, taken from Priority Habitat Inventory data.

Typology of green infrastructure

Jones, Laurence, Sally Anderson, Jeppe Læssøe, Ellen Banzhaf, Anne Jensen, David Neil Bird, James Miller, et al. 2022. “A Typology for Urban Green Infrastructure to Guide Multifunctional Planning of Nature-Based Solutions.” Nature-Based Solutions 2 (December): 100041. https://doi.org/10.1016/j.nbsj.2022.100041.

  1. Note this may sometimes time out depending on traffic on the NBN Atlas webservice↩︎