Summary

The R programming language and environment can be really useful for geographic data analysis and modeling due to its capacity of reading, creating, manipulating and analyzing of spatial data with general raster data manipulation functions (Hijmas, 2020; Hijmas, 2021), making static and interactive maps, applying geocomputation to solve real-world problems, etc. (Lovelace, Nowosad & Muenchow, 2021). We can handle several research objectives related with remote sensing, spatial ecology, land use and land cover issues, agriculture, urban expansion, climate change, biological invasions, etc., that involves multitemporal information either in raster or shape data. In this case, we want to get habitat information from land cover data of the continental part of Colombia, as an external input data to improve species distributions models (SDM). The land cover product used here is CORINE Land Cover adjusted to Colombia, and the output result is a habitat map as defined in the International Union for Conservation of Nature (IUCN) habitat classification scheme (Jung et al., 2020). The habitat map had to be reclassified to set links between CORINE land covers and IUCN habitats.

Introduction

The patterns that shape the spatial and temporal distribution of the species is one of the most fundamental pieces of ecological information (Beale & Lennon, 2012). Species distributions models (SDMs) explore the relationship between geographical occurrences of species and corresponding environmental variables (Naimi & Araujo, 2016) and essentially are based in spatial (and some in temporal) interpolation and extrapolation for the present or into the future. The accuracy of each model will depend on how can handle the different processes that a species have to occur within an ecological community like dispersal, environmental tolerance, species interactions, biogeographical history, etc.(Li et al., 2020).

Species distribution modeling has grown rapidly in the last years and with these the software and tools available to display this analysis (specially MaxEnt standalone applications and R programming packages like sdm, zoon R, phyr, etc), and select the most reliable variables for efficient calibrations (kuenm, ENMTools, ENMeval, MaxentVariableSelection, etc) (Golding et al., 2017; Cobos et al., 2019a; 2019b). Each model with its combination of parameters can identify uncertainty in different levels to see where future improvements can be made to diminish overprediction or subprediction on species distribution. One of the ways to identify sources of uncertainty is through acknowledging the species habitat.

According to the definition of Kearney (2006), habitat is a description of a physical place, at a particular scale of space and time, where an organism either actually or potentially lives. So we have an approximation of the habitat if we know the land covers within a particular area of interest, and this combined with sufficient ecological information of a specie could be an interesting input to evaluate a candidate model. This work flow is the second of two documents indicating the process to homologate land covers with habitats. This document will show the homologation between shape land cover data from CORINE Land Cover Colombia and habitat classification scheme of International Union for Conservation of Nature (IUCN).

Package overview

The functions used here are from different packages to read, manipulate and write geographical data in shape and raster format. This libraries and functions are as follows:

  • library(sf): refers to a formal standard that describes objects in the real world as a computer representation
  • read_sf() read shapefiles with a format easily to read for fasterize() function
  • library(dplyr) grammar of data manipulation
  • mutate() create, modifiy and delete columns, necessary for the reclassification of the shapefile
  • library(fasterize) fast polygon to raster conversion, improve version of rasterize() function
  • library(raster)
  • raster() load the raster document with land cover data.
  • reclassify() the land cover data to make the homologation.
  • projectRaster() change the projection of the input raster into another raster of interest.
  • resample() transfers values between non matching Raster objects in terms of origin and resolution.
  • overlay() create a new raster based on two or more raster objects (Similar to raster calculator in SIG software).
  • rasterToPoints() Raster to point conversion.
  • rasterFromXYZ() Create a Raster object with x,y and z values, being x and y the spatial coordinates.
  • writeRaster() write an entire Raster object to a file.

CORINE Land Cover Colombia

As well as ESA’s products are good references of coverage at a global level, in the case of Colombia there is the National Legend of Land Cover, CORINE Land Cover methodology adapted for Colombia scale 1: 100,000 which consolidates the characterization of the natural and anthropized covers present in the Colombian territory and has been an input for the study of various national processes such as ecosystem maps, land use conflict, watershed and territory management, monitoring of forest deforestation, forest inventories, etc. (IDEAM, 2010). This coverage map can be downloaded for free from the IDEAM website http://www.ideam.gov.co/web/ecosistemas/coberturas-nacionales. The same homologation exercise between covers and habitats will be carried out as was presented in the first document with the ESA product, only this time it will be based on Shapefile data.

corine <- read_sf("CbTr_2010_2012.shp")

Shapefile preparation (Optional)

Variables from the Shapefile that were added in other routines will be removed. This step does not need to be replicated.

colnames(corine)
##  [1] "OBJECTID_1" "OBJECTID"   "APOYO"      "CAMBIO"     "CONFIABILI"
##  [6] "INSUMO"     "CODIGO"     "LEYENDA3N"  "Area_ha"    "Cod"       
## [11] "Categoria"  "Cod_Finer"  "Cod_ESA"    "Cod_Coper"  "geometry"
eliminar <- c("OBJECTID_1", "APOYO", "CAMBIO", "CONFIABILI", "INSUMO", "Cod_ESA", "Cod_Coper", "Corine_hab")
corine_2 <- corine[, !(names(corine) %in% eliminar)]

Homologation

Here we present the proposal of homologation between CORINE Land Cover and IUCN Habitat classification. For the most of the habitat classes we used the third level of Corine´s legend, but some important habitats are best differentiated with fourth levels or higher land cover codes. That is why in the next code we used in some cases the “LEYENDA3N” variable and in others the “CODE” variable of the input shapefile.

corine_2 <- corine_2 %>%
  mutate(
    habitat = case_when(
      LEYENDA3N %in% "1.1.1. Tejido urbano continuo" ~ 1404,
      LEYENDA3N %in% "1.1.2. Tejido urbano discontinuo" ~ 1404,
      LEYENDA3N %in% "1.2.1. Zonas industriales o comerciales" ~ 1404,
      LEYENDA3N %in% "1.2.2. Red vial, ferroviaria y terrenos asociados" ~ 1404,
      LEYENDA3N %in% "1.2.3. Zonas portuarias" ~ 1404,
      LEYENDA3N %in% "1.2.4. Aeropuertos" ~ 1404,
      LEYENDA3N %in% "1.2.5. Obras hidraulicas" ~ 1404,
      LEYENDA3N %in% "1.3.1. Zonas de extraccion minera" ~ 1404,
      LEYENDA3N %in% "1.3.2. Zona de disposicion de residuos" ~ 1404,
      LEYENDA3N %in% "1.4.1. Zonas verdes urbanas" ~ 1404,
      LEYENDA3N %in% "1.4.2. Instalaciones recreativas" ~ 1404,
      LEYENDA3N %in% "2.1.1. Otros cultivos transitorios" ~ 1401,
      LEYENDA3N %in% "2.1.2. Cereales" ~ 1401,
      LEYENDA3N %in% "2.1.3. Oleaginosas y leguminosas" ~ 1401,
      LEYENDA3N %in% "2.1.4. Hortalizas" ~ 1401,
      LEYENDA3N %in% "2.1.5. Tuberculos" ~ 1401,
      LEYENDA3N %in% "2.2.1. Cultivos permanentes herbaceos" ~ 1401,
      LEYENDA3N %in% "2.2.2. Cultivos permanentes arbustivos" ~ 1401,
      LEYENDA3N %in% "2.2.3. Cultivos permanentes arboreos" ~ 1401,
      LEYENDA3N %in% "2.2.4. Cultivos agroforestales" ~ 1401,
      LEYENDA3N %in% "2.2.5. Cultivos confinados" ~ 1401,
      LEYENDA3N %in% "2.3.1. Pastos limpios" ~ 1402,
      LEYENDA3N %in% "2.3.2. Pastos arbolados" ~ 1402,
      LEYENDA3N %in% "2.3.3. Pastos enmalezados" ~ 1402,
      LEYENDA3N %in% "2.4.1. Mosaico de cultivos" ~ 1401,
      LEYENDA3N %in% "2.4.2. Mosaico de pastos y cultivos" ~ 1401,
      LEYENDA3N %in% "2.4.3. Mosaico de cultivos, pastos y espacios naturales" ~ 1401,
      LEYENDA3N %in% "2.4.4. Mosaico de pastos con espacios naturales" ~ 1402,
      LEYENDA3N %in% "2.4.5. Mosaico de cultivos con espacios naturales" ~ 1401,
      CODIGO %in% 311 ~ 105,
      CODIGO %in% 3111 ~ 105,
      CODIGO %in% 31111 ~ 105,
      CODIGO %in% 31112 ~ 107,
      CODIGO %in% 311121 ~ 107,
      CODIGO %in% 311122 ~ 107,
      CODIGO %in% 311123 ~ 107,
      CODIGO %in% 3112 ~ 105,
      CODIGO %in% 31121 ~ 105,
      CODIGO %in% 31122 ~ 107,
      CODIGO %in% 312 ~ 105,
      CODIGO %in% 3121 ~ 105,
      CODIGO %in% 31211 ~ 105,
      CODIGO %in% 31212 ~ 107,
      CODIGO %in% 3122 ~ 105,
      CODIGO %in% 31221 ~ 105,
      CODIGO %in% 31222 ~ 107,
      LEYENDA3N %in% "3.1.3. Bosque fragmentado" ~ 105,
      LEYENDA3N %in% "3.1.4. Bosque de galeria y ripario" ~ 107,
      LEYENDA3N %in% "3.1.5. Plantacion forestal" ~ 1401,
      CODIGO %in% 321 ~ 405,
      CODIGO %in% 3211 ~ 405,
      CODIGO %in% 32111 ~ 405,
      CODIGO %in% 321111 ~ 405,
      CODIGO %in% 321112 ~ 405,
      CODIGO %in% 321113 ~ 405,
      CODIGO %in% 32112 ~ 406,
      CODIGO %in% 321121 ~ 406,
      CODIGO %in% 321122 ~ 406,
      CODIGO %in% 321123 ~ 406,
      CODIGO %in% 3212 ~ 405,
      CODIGO %in% 32121 ~ 405,
      CODIGO %in% 32122 ~ 405,
      LEYENDA3N %in% "3.2.2. Arbustal" ~ 305,
      LEYENDA3N %in% "3.2.3. Vegetacion secundaria o en transicion" ~ 1406,
      LEYENDA3N %in% "3.3.1. Zonas arenosas naturales" ~ 600,
      LEYENDA3N %in% "3.3.2. Afloramientos rocosos" ~ 600,
      LEYENDA3N %in% "3.3.3. Tierras desnudas y degradadas" ~ 600,
      LEYENDA3N %in% "3.3.4. Zonas quemadas" ~ 600,
      LEYENDA3N %in% "3.3.5. Zonas glaciares y nivales" ~ 600,
      LEYENDA3N %in% "4.1.1. Zonas Pantanosas" ~ 600,
      LEYENDA3N %in% "4.1.2. Turberas" ~ 600,
      LEYENDA3N %in% "4.1.3. Vegetacion acuatica sobre cuerpos de agua" ~ 600,
      LEYENDA3N %in% "4.2.1. Pantanos costeros" ~ 1300,
      LEYENDA3N %in% "4.2.2. Salitral" ~ 1300,
      LEYENDA3N %in% "4.2.3. Sedimentos expuestos en bajamar" ~ 1300,
      LEYENDA3N %in% "5.1.1. Rios (50 m)" ~ 500,
      LEYENDA3N %in% "5.1.2. Lagunas, lagos y cienagas naturales" ~ 500,
      LEYENDA3N %in% "5.1.3. Canales" ~ 1500,
      LEYENDA3N %in% "5.1.4. Cuerpos de agua artificiales" ~ 1500,
      LEYENDA3N %in% "5.2.1. Lagunas costeras" ~ 1300,
      LEYENDA3N %in% "5.2.2. Mares y oceanos" ~ 900,
      LEYENDA3N %in% "5.2.3. Estanques para acuicultura marina" ~ 1500,
      TRUE ~ 1700
    )
  )

Land Cover rasterization with fasterize function

As an input for uncertainty protocols in species distribution models, habitat information from land covers can be more easily manipulated in raster format. Since a considerably large area is being shaped to raster, it is recommended to use the fasterize function, which is an enhancement to the rasterize function, processing the supplied shapefile information about a hundred times faster. In the raster_template attribute of the fasterize function, the habitat map built with the ESA land cover data will be located, or any other raster with the desired extent, dimensions and resolution can be placed.

raster_template <- raster("Habitats_ESA.tif") # If you want to upload a raster file
#raster_template <- esa_hab #If you already have your raster template
clc_raster <- fasterize(corine_2, raster_template, field = "habitat", fun = "sum")
plot(clc_raster, main= "Homologación Coberturas-Hábitats CORINE")

Add altitude variable

The IUCN has habitats with a very particular geographic delimitation for Colombia, but it is not possible to assign a coverage category to them. These habitats include 1.9. Forest - Subtropical/Tropical moist montane, 3.7 Shrubland - High altitude and 4.7 Grassland - High altitude. Many species use these habitats in Colombia, so they must be differentiated; For this, a DEM of 90 meters will be used that will allow the delimitation of these habitats. To do this correctly, the DEM must be reprojected and have the same extent, resolution and dimensions as the ESA map.

dem_colombia <- raster("dem_colombia3.tif")
dem_proj <- projectRaster(dem_colombia, crs = crs(clc_raster))
dem_al <- resample(dem_proj, clc_raster, method= "ngb")

It was proposed to establish a limit of 500 meters to differentiate lowland from montane habitats.

corine_hab <- overlay(dem_al, clc_raster, 
                       fun= function(altura, cover){
  ifelse((altura >= 500) & (cover == 105),109,
         ifelse((altura >= 500) & (cover == 305),307,
                ifelse((altura >= 500) & (cover == 405),407,cover)))
})

plot(corine_hab, main= "Habitats_CORINE_300m") #Visualmente se ve igual al anterior pero es por la disposición de los códigos

Map visualization

This is a representation with colors more suited to the kinds of habitats that can be generated. However, as it requires an additional transformation of the data, it is suggested to work with the map corine_hab.

dfc <- data.frame(rasterToPoints(corine_hab))
raster_testc <- rasterFromXYZ(dfc[1:3])
cut <- sort(unique(corine_hab))
mypal <- c("darkcyan", "darkgreen", "darkorchid", "red", "chartreuse", "aquamarine3", "cyan", "blue3", "darkorange", "black", "darkkhaki", "hotpink", "blanchedalmond", "black")
plot(raster_testc,
     main= "Habitats_CORINE_300m",
     col= mypal,
     breaks= cut)

Create raster file

This habitat map built from ESA covers can be an important input in updating and refining routines for the construction of species distribution models. For this reason, it is important to save a copy in .tif format in the working directory.

#writeRaster(corine_hab, "Habitats_CORINE.tif", overwrite= T)

Confusion matrix between land cover products

This routine corresponds to the second one referring to homologation between land covers and habitats. Two homologation proposals have been built from ESA and CORINE products, and one way to compare them is through a confusion matrix. In this way, the degree of similarity that these two products have when studying the habitat of a species will be determined.

classes_esa <- sort(unique(raster_template))
classes_corine <- sort(unique(corine_hab))
classes_habitats <- sort(unique(c(classes_esa, classes_corine)))
rf1_corine <- factor(corine_hab[],levels= classes_habitats)
rf2_esa <- factor(raster_template[],levels= classes_habitats)
confusionMatrix(rf1_corine,rf2_esa)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction     101     105     107     109     305     307     405     406
##       101        0       0       0       0       0       0       0       0
##       105       33 4241130   20644    4826    8771      13   12251    4013
##       107       30  477756   65827   22665    4526    8577  111529   11402
##       109       34    4803    2422  953892      14    7803      47     513
##       305        0   50693    2722     291   18166      71   21783     499
##       307        9     284     237   62124      75   23835       8     149
##       405       31   67990    3321      50   17238      78  561377   13337
##       406       13   26024   19042      34    2437      27  283164  163376
##       407        5      70     179   34066     109   80282      32    2663
##       500       12   28897    9271    3772    2596     618    4828   15039
##       600       24   18426    9242    1178   16557    3429   36898   26368
##       900        0      41      67       0       0       0       2      14
##       1300      92      93    1602       0      90       0     129     614
##       1401      19  150220   11875  333998   14546   80739   73783    4000
##       1402      13  288453    9771  369901   96394  166728  195642    9903
##       1404      13    3072     310    1491    1735    1286    2656     980
##       1406       4  176960    6439  147996   11177   11829   10149     888
##       1500       0     338     254     359      77      36      34     160
##       1700       0     721     234    6783      14     324       0      40
##           Reference
## Prediction     407     500     600     900    1300    1401    1402    1404
##       101        0       0       0       0       0       0       0       0
##       105       37    1643    1070       0       0   12364    3974      50
##       107      742    6041     223       0       0   21631    3965      68
##       109     4356      30     208       0       0   10180    2209      58
##       305        6     385   17219       0       0   22214    2700      90
##       307     9364      13     103       0       0    2064      81      55
##       405       20     180    3784       0       0    1192     347      66
##       406        0    2820     320       0       0   15096    1098     156
##       407    83249      41    1490       0       0    3187     283     237
##       500      766   95311     166       0       0    7829     764     243
##       600     6673   10115   27732       0       0   16558    1977     365
##       900        0     127       2       0       0       2       0       0
##       1300       0    2938    2428       0       0      19       0      19
##       1401   20705    1891    1566       0       0  157569   27408    1364
##       1402   32363    3278     406       0       0  523120  196295    1575
##       1404    1385     348     604       0       0    8621     705   19866
##       1406    2200    1212      37       0       0   24725    5991      56
##       1500      21    3340      34       0       0     113      11      46
##       1700     110       0      46       0       0     317     112       4
##           Reference
## Prediction    1406    1500    1700
##       101        0       0       0
##       105    20709       0       0
##       107    57830       0       0
##       109    11049       0       0
##       305    25455       0       0
##       307     7894       0       0
##       405    94165       0       0
##       406    31575       0       0
##       407    11049       0       7
##       500     6145       0       0
##       600    20809       0     692
##       900        2       0       0
##       1300      32       0       0
##       1401  108115       0       0
##       1402  255912       0       0
##       1404    4255       0       0
##       1406   21889       0       0
##       1500     117       0       0
##       1700     227       0       0
## 
## Overall Statistics
##                                          
##                Accuracy : 0.5563         
##                  95% CI : (0.556, 0.5566)
##     No Information Rate : 0.4645         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.4426         
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: 101 Class: 105 Class: 107 Class: 109 Class: 305
## Sensitivity           0.000e+00     0.7661   0.402713    0.49083   0.093388
## Specificity           1.000e+00     0.9858   0.938152    0.99562   0.987706
## Pos Pred Value              NaN     0.9791   0.083030    0.95617   0.111933
## Neg Pred Value        1.000e+00     0.8293   0.991224    0.90938   0.984998
## Prevalence            2.786e-05     0.4645   0.013716    0.16307   0.016322
## Detection Rate        0.000e+00     0.3559   0.005523    0.08004   0.001524
## Detection Prevalence  0.000e+00     0.3635   0.066524    0.08371   0.013618
## Balanced Accuracy     5.000e-01     0.8760   0.670432    0.74322   0.540547
##                      Class: 307 Class: 405 Class: 406 Class: 407 Class: 500
## Sensitivity            0.061801    0.42713    0.64332   0.513892   0.734784
## Specificity            0.992850    0.98097    0.96727   0.988627   0.993133
## Pos Pred Value         0.224234    0.73558    0.29967   0.383726   0.540750
## Neg Pred Value         0.969365    0.93250    0.99204   0.993270   0.997070
## Prevalence             0.032361    0.11028    0.02131   0.013593   0.010884
## Detection Rate         0.002000    0.04710    0.01371   0.006985   0.007997
## Detection Prevalence   0.008919    0.06404    0.04575   0.018204   0.014789
## Balanced Accuracy      0.527325    0.70405    0.80529   0.751260   0.863958
##                      Class: 600 Class: 900 Class: 1300 Class: 1401 Class: 1402
## Sensitivity            0.482816         NA          NA     0.19058     0.79177
## Specificity            0.985725  1.000e+00    0.999324     0.92514     0.83261
## Pos Pred Value         0.140741         NA          NA     0.15952     0.09131
## Neg Pred Value         0.997466         NA          NA     0.93877     0.99471
## Prevalence             0.004820  0.000e+00    0.000000     0.06938     0.02080
## Detection Rate         0.002327  0.000e+00    0.000000     0.01322     0.01647
## Detection Prevalence   0.016534  2.156e-05    0.000676     0.08288     0.18038
## Balanced Accuracy      0.734270         NA          NA     0.55786     0.81219
##                      Class: 1404 Class: 1406 Class: 1500 Class: 1700
## Sensitivity             0.816926    0.032321          NA   0.000e+00
## Specificity             0.997691    0.964445   0.9995855   9.993e-01
## Pos Pred Value          0.419760    0.051925          NA   0.000e+00
## Neg Pred Value          0.999625    0.942995          NA   9.999e-01
## Prevalence              0.002040    0.056825   0.0000000   5.865e-05
## Detection Rate          0.001667    0.001837   0.0000000   0.000e+00
## Detection Prevalence    0.003971    0.035372   0.0004145   7.495e-04
## Balanced Accuracy       0.907308    0.498383          NA   4.996e-01

It is observed that the two land cover products have a similarity of only 55.63%. This difference must be taken into account when choosing a land cover product as an approximation to the habitat of the species, depending on whether what is desired is a better discrimination between habitats (Offered by CORINE), or if monitoring is preferred at different instants of time (Offered by ESA and its annual update).

References

Beale, C. M. & Lennon, J. J. (2012). Incorporating uncertainty in predictive species distribution modelling. Philosophical Transactions of the royal society B, 367, 247-258.

Cobos, M. E., Peterson, A. T., Barve, N. & Osorio-Olvera, L. (2019a). kuenm: an R package for detailed development of ecological niche models using Maxent. Bioinformatics and genomics. PeerJ 7:e6281 https://doi.org/10.7717/peerj.6281

Cobos, M. E., Peterson, A. E., Osorio-Olvera, L. & Jiménez-García, D. (2019b). An exhaustive analysis of heuristic methods for variable selection in ecological niche modeling and species distribution modeling. Ecological Informatics, 53, 1-10.

Golding, N., August, T. A., Lucas, T. C. D., Gavaghan, D. J., van Loon, E. E. & McInerny, G. (2017). The zoon r package for reproducible and shareable species distribution modelling. Methods in ecology and evolution, 9, 260-268.

Hijmas, R. (2020). Raster package v.3.4-5. Available in https://www.rdocumentation.org/packages/raster/versions/3.4-5

Hijmas, R. (2021). The raster package. Available in https://rspatial.org/raster/pkg/RasterPackage.pdf

IDEAM (2010). Leyenda Nacional de Coberturas de la Tierra. Metodología CORINE Land Cover, adaptada para Colombia Escala 1:100.000. Instituto de Hidrología, Metereología y Estudios Ambientales. Bogotá, D.C., 72 p.

Jung, M., Dahal, P. R., Butchart, S. H. M., et al. (2020). A global map of terrestrial habitat types. Nature - Scientific data, 7:256, 1-8

Kearney, M. (2006). Habitat, environment and niche: what are we modelling?. OIKOS, 115, 186-191.

Li, D., Dinnage, R., Nell, L. A., Helmus, M. R. & Ives, A. R. (2020). phyr: An r package for phylogenetic species‐distribution modelling in ecological communities. Methods in ecology and evolution, 11, 1455-1463.

Lovelace, R., Nowosad, J. & Muenchow, J. (2021). Geocomputation with R. The R series. CRC Press Taylor & Francis Group. Available in: https://geocompr.robinlovelace.net/

Naimi, B. & Araújo, M. B. (2016). sdm: a reproducible and extensible R platform for species distribution modelling. Ecography, 39, 368-375.