Upgrading ForIt design

Species codes are essential keys

The functions available in the package estimate volume and phytomass for a given species or for a group of similar species. An unequivocal species identification procedure is hence essential to access the system.
“European and Mediterranean Plant Protection Organization” EPPO offers a coding system accessible also via REST API services, a basic building block for the develoment of a procedure for species codes selection, dynamically based on EPPO services.
ForIt ver 2 will adopt EPPO codes and rely on its web services to match species names with avaliable volume and weight estimation functions.

The package will include a subset of EPPO species DB, limited to the species named in the reference work for INFC volume estimation (Tabacchi et al., 2011) and to a subset of connected attributes, to provide a local key to access the available functions.

This set is obtained using EPPO on line tools - Batch processing. Quick and simple login is required to access the tools. Uploading an Excel file with a column containing species names, corresponding codes can be found in the successive column of the Execel file one can download after processing.

Tabacchi, G., L. Di Cosmo, Patrizia Gasparini, e S. Morelli. 2011. Stima del volume e della fitomassa delle principali specie forestali italiane. Equazioni di previsione, tavole del volume e tavole della fitomassa arborea epigea. CRA.

library(tidyverse)
library(readxl)
library(writexl)
require(Matrix)

library(ForIT)
data("INFCstats")
data("INFCdomain")

INFCstats %>% 
  dplyr::select(specie) %>% 
  unique %>%
  write_xlsx(path = "INFC_bare_species_list.xlsx") 

Batch processing with EPPO] the list obtained from ForIt ver.1 data, two names are not recognised.

INFCstats %>% 
  dplyr::select(specie) %>% 
  unique %>%
  mutate(specie2 = as.character(specie)) %>%
# ci sono 2 correzioni da apportare: angustiGolia e togliere spp.
  mutate(specie2 = replace(specie2, specie2 == "Fraxinus angustigolia", "Fraxinus angustifolia")) %>%
  mutate(specie2 = replace(specie2, specie2 == "Eucalyptus spp.", "Eucalyptus")) %>%
  write_xlsx(path = "INFC_corr_species_list.xlsx") 

Using the ‘Batch processing’ EPPO tool, EPPOCodes have been coupled with species (specie2) names producing EPPO_01.xlsx file Further processing the latter, following attributes have been added * PrefName,
* Synonyms_la, # arguable info!!
* the Taxonomy columns,
producing EPPO_04.xlsx.
The last step sligthly corrupts the table: what I suppose should be the ‘Subclass’ info (only defined for some classes), ends up in the first column, substituting only partially its content.
The original content of the first column, key connecting this table with the version 1 tables, has to be recovered joining, on EPPOCode, EPPO_01.xlsx and EPPO_04.xlsx.
Modifications with respect to its original content have to be moved to the Subclass column (My hypotesis!!)

INFC_EPPO_db <- read_xlsx("EPPO_01.xlsx") %>%
  dplyr::select(specie, EPPOCode) %>%
  full_join(unique(dplyr::select(INFCstats, specie, pag))) %>%
  rename(Species_ForIt1 = specie) %>%
  inner_join(read_xlsx("EPPO_04.xlsx"), by = c("EPPOCode" = "EPPOCode")) %>%
  mutate(Subclass = ifelse(Species_ForIt1 == specie, Subclass, specie)) %>%
  dplyr::select(-specie, -specie2)
## Joining, by = "specie"
## Warning: Column `specie` joining character vector and factor, coercing into
## character vector
# just for a quick check
INFC_EPPO_db %>% dplyr::select(1:5) %>% filter(Species_ForIt1 != PrefName)
## # A tibble: 4 x 5
##   Species_ForIt1   EPPOCode   pag PrefName      Synonyms_la               
##   <chr>            <chr>    <int> <chr>         <chr>                     
## 1 Eucalyptus spp.  1EUCG      287 Eucalyptus    <NA>                      
## 2 Fraxinus angust~ FRXAN      315 Fraxinus ang~ Fraxinus angustifolia sub~
## 3 Pinus laricio    PIUNL      117 Pinus nigra ~ Pinus laricio;Pinus laric~
## 4 Populus canesce~ POPCN      399 Populus x ca~ Populus alba x P. tremula~

Catalog of INFC functions

There are 26 distinct function groups, estimating stem volume and 4 tree biomass dryweigth components (see below).
The five enities are estimated based on a common pool of samples for each group sharing also model structure.
n_par structure
2 y = b_0 + b_1 * d^2 * h
3 y = b_0 + b_1 * d^2 * h + b_2 * d
Groups are identified through the page number (pag) of the reference publication.

INFCcatalog <- INFCstats %>%
  dplyr::select(pag, n_oss, n_par) %>%
  unique

The key to the individual functions is given combinig the ‘page number’ with the acronym of the estimated entity.
For each function, beside parameter values, also the var-covar matrix as well as the standard error have been reported in the publication and hence acqired in the data base.

(estimated_entities <- tribble(
  ~entity, ~entity_definition,
  "vol", "volume of the stem and large branches [dm^3]",
  "dw1", "phytomass of the stem and large braches [kg]", 
  "dw2", "phytomass of the small branches [kg]",
  "dw3", "phytomass of the stump [kg]", 
  "dw4", "phytomass of the whole tree [kg]"))
## # A tibble: 5 x 2
##   entity entity_definition                           
##   <chr>  <chr>                                       
## 1 vol    volume of the stem and large branches [dm^3]
## 2 dw1    phytomass of the stem and large braches [kg]
## 3 dw2    phytomass of the small branches [kg]        
## 4 dw3    phytomass of the stump [kg]                 
## 5 dw4    phytomass of the whole tree [kg]
# assemble b-coefficients matix (it is just an array!)
assemble_bm <- function(n, b0, b1, b2) {
  if (n == 2)   x <- array(c(b0, b1), dim = c(n, 1)) else
    if (n == 3) x <- array(c(b0, b1, b2), dim = c(n, 1)) else
                x <- NULL
}

# assemble variance-covariance-matix (only the upper triangle is required)
assemble_vcm <- function(n, m1, m3, m4, m5, m7, m8, m9) {
  if (n == 2)   x <- c(m1, m3, m4) else
    if (n == 3) x <- c(m1, m4, m7, m5, m8, m9) else
      x <- NULL
  return(new("dspMatrix", Dim = as.integer(c(n,n)), x = x))
}

INFCparam <- INFCstats %>%
  dplyr::select(pag, mod, b.1., b.2., b.3., mvc.1., mvc.3., mvc.4., mvc.5., mvc.7., mvc.8., mvc.9., saq) %>%
  unique
# There is an error: there are  pag+mod key replicates
INFCparam %>%
  group_by(pag, mod) %>%
  summarise(c = n()) %>%
  filter(c !=1) %>%
  inner_join(INFCparam)
## Joining, by = c("pag", "mod")
## # A tibble: 2 x 14
## # Groups:   pag [?]
##     pag mod       c  b.1.    b.2.  b.3. mvc.1.   mvc.3.  mvc.4. mvc.5.
##   <int> <fct> <int> <dbl>   <dbl> <dbl>  <dbl>    <dbl>   <dbl>  <dbl>
## 1   231 dw2       2  5.33 0.00556     0  0.332 -1.59e-4 2.80e-7      0
## 2   231 dw2       2  5.33 0.00556     0  0.332 -1.59e-4 2.80e-7      0
## # ... with 4 more variables: mvc.7. <dbl>, mvc.8. <dbl>, mvc.9. <dbl>,
## #   saq <dbl>
INFCparam <- INFCparam %>%
# correction of error above
        filter(!(pag == 231 & mod == "dw2" & saq == 0.00000028)) %>%
  add_column(entity = as.character(.$mod), .after = 1) %>%
  mutate(entity = replace(entity, entity == "v", "vol")) %>%
  dplyr::select(-mod) %>%
  inner_join(INFCcatalog) %>%
  mutate(bm = pmap(list(n_par, b.1., b.2., b.3.), assemble_bm),
         vcm = pmap(list(n_par, mvc.1., mvc.3., mvc.4., mvc.5., mvc.7., mvc.8., mvc.9.), assemble_vcm)) %>%
  dplyr::select(pag, entity, saq, bm, vcm)
## Joining, by = "pag"

Domanin of applicability

Interpolation functions produce unreliable extrapolations. It is quite important to verify, for every single estimation, if, at least, input values fall within the domain defined by the calibration sample extent. Actually original calibration dataset is not available, otherwise the density distribution of the observations could be used as a base for this evaluation. In the reference work the only information available concerning applicabilty domain of the functions proposed is given by the extent and limits of the tabulated versions of the functions. Tables charateristics are constant for each species (or species group): minimum amd maximum diameter and height are the same and the distribution of the table cells for which no value is calculated is identical.

library(raster)
library(rgeos)
outOfRangeDistance <- function(mat) {
#  mat[!mat] <- NA
#  mdrmat <- raster(vals = mat, crs = '+proj=utm +zone=12 +datum=WGS84',
  rst <- raster(vals = mat, crs = '+proj=utm +zone=12 +datum=WGS84',
                 nrows = dim(mat)[1],
                 ncols = dim(mat)[2],
                 xmn = range(as.integer(dimnames(mat)[[2]]))[1],
                 xmx = range(as.integer(dimnames(mat)[[1]]))[2],
                 ymn = range(as.integer(dimnames(mat)[[1]]))[1],
                 ymx = range(as.integer(dimnames(mat)[[1]]))[2])
  domLim <- rasterToPolygons(rst, fun = function(x) x==T, dissolve = T)
#    distance %>%
#    as.matrix
#  mdrmat[mdrmat == 0] <- NA
#  dimnames(mdrmat) <- dimnames(mat)
  return(domLim)
}

assemble_dm <- function(d, h) {
# Present domain as a lgl matrix
  d_range <- range(as.numeric(d))
  h_range <- range(as.numeric(h))
  dom <- matrix(F, 
                nrow = d_range[2] - d_range[1] +1, 
                ncol = h_range[2] - h_range[1] +1,
                dimnames = list(d = c(d_range[1]:d_range[2]),
                                h = c(h_range[1]:h_range[2]))
  )
  for (i in 1:length(d)) dom[d[i],h[i]] <- T
  return(outOfRangeDistance(dom))
}

INFCdomains <- INFCdomain %>%
  mutate(spg = map_chr(key, ~str_split(., "-")[[1]][1]), 
         dbh   = map_chr(key, ~str_split(., "-")[[1]][2]), 
         h_tot = map_chr(key, ~str_split(., "-")[[1]][3])) %>%
  dplyr::select(-key, -in.range) %>%
  left_join(mutate(unique(dplyr::select(INFCstats, spg, pag)), 
                   spg = as.character(spg))) 
## Joining, by = "spg"
# Some species codes mismatch:
# compare
INFCdomains %>% dplyr::select(spg, pag) %>% unique %>% filter(is.na(pag))
##    spg pag
## 1 Cusp  NA
## 2 Euoc  NA
## 3 Pips  NA
# with
INFCstats %>%  dplyr::select(spg, pag, specie) %>% unique %>%
  filter(str_detect(spg, "^(?:Cu|Eu|Pi)"))
##     spg pag                 specie
## 1  Cuar  47    Cupressus arizonica
## 2  Cuse  47 Cupressus sempervirens
## 3  Eusp 287        Eucalyptus spp.
## 4  Piab  75            Picea abies
## 5  Pice  89           Pinus cembra
## 6  Piha 103       Pinus halepensis
## 7  Pila 117          Pinus laricio
## 8  Pini 131            Pinus nigra
## 9  Pipi 145         Pinus pinaster
## 10 Pipi 159            Pinus pinea
## 11 Pira 187          Pinus radiata
## 12 Pist 187          Pinus strobus
## 13 Pisy 173       Pinus sylvestris
# and hence correct
INFCdomains <- INFCdomains %>%
  mutate(
    pag = case_when(spg == "Cusp" ~  47L,
                    spg == "Euoc" ~ 287L,
                    spg == "Pips" ~ 145L,
                    TRUE ~ pag))
INFCdomains %>% dplyr::select(spg, pag) %>% unique %>% filter(is.na(pag))
## [1] spg pag
## <0 rows> (or 0-length row.names)
INFCdomains %>% dplyr::select(pag) %>% unique %>% nrow 
## [1] 26
# If 'pag', contains no NA and all 26 entry pages are considered, then we can proceed
INFCdomains <- INFCdomains %>%
  unique %>%
  group_by(pag) %>%
  nest() %>%
  mutate(dom = map(data, ~assemble_dm(.$dbh, .$h_tot))) #%>%
#  dplyr::select(-data) ## Dominio in ForIT-1, da eliminare conclusa la fase di test

Visual check of the domains

# for domains represented as poligons
# st_distance(x, y, ..., dist_fun, by_element = FALSE, which = "distance",  par = 0, tolerance = 0)



## OLD for Domains represented by matrices
# library(magrittr)
# plotDom <- function(x, t) {
#   image(as.integer(dimnames(x)[[1]]), 
#       as.integer(dimnames(x)[[2]]), 
#       x, 
#       ylim = rev(range(as.integer(dimnames(x)[[2]]))),
#       xlab = 'h_tot',
#       ylab = 'dbh',
#       col = (terrain.colors(12)),
#       main = paste("Page:", pag)
#   )
# }

# INFCdomains %$% map2(dom, pag, plotDom)
rmarkdown::render("PreparingUpgrade.Rmd")
markdown::rpubsUpload("ForIt CRAN package - Preparing Upgrade", "PreparingUpgrade.html",
                      id = "https://api.rpubs.com/api/v1/document/440006/6be3bb33c3004b5cad3cc32d8baae593")
# continueUrl
# [1] "http://rpubs.com/publish/claim/440006/37856ca880654cfcaefc313d451bf664"
# http://rpubs.com/scotti/440006