LULCC and friends

Introduction

This document serves as a detailed guide through LULCC and associated R-packages for analysing land use change. Let’s start by installing the packages. Next to LULCC we recommend you install the packages below additionally, since some dependencies of LULCC seem to not work, but are required for the example code to work:

install.packages("lulcc")
install.packages("gsubfn")
install.packages("caret")
install.packages("Hmisc")
install.packages("randomForest")

Package overviews

LULCC

LULCC (Land Use and Land Cover Classification) is a package designed for change detection of land use and land cover. LULCC is not specifically designed for ecology or biodiversity tasks, but for all kinds of land use/land cover changes.

To model land use change we require multiple inputs:

At least one classified land use map (every pixel may only hold one value)
Raster files with predictor variables (e.g. slope, temperature, socio-economic data etc.).

In theory LULCC would work with only one input map and nothing else, no subsequent timesteps or predictor variables. This however affects the resulting model in that the modelled change will be based solely on the input and the selected algorithm 8bad practice). With the afforementioned additional inputs we can use additional information (timesteps) and expert knowledge (by selection of explanatory variables) to make our model more accurate! So: think about what factors might affect your land use changes before you start modelling.

The choice for a specific model is very much de- pendent on the nature of the most important land-use conversions taking place within the study area and the scenarios that need to be considered. (Verburg et al. 2002)

NLMR

NLMR (Neutral Landscape Models (in R?)) is a package designed to create imaginary landscapes with algorithms, for ecological applocations. In ecology, a neutral landscape model serve as a ‘null-hypothesis’, in contrast to actual real-world landscapes. They are used to test for example: * Landscape connectivity * Evaluation of consequences of habitat fragmentation for population subdivision * Occurence of extinction thresholds

If this doesn’t sound too familiar - don’t worry. We aren’t ecologists and this is somewhat out of our ballpark. Until we do a deep dive into ecology the most important thing we have to know about NLMR is that we can create modelled landscapes without any input data.

landscapetools

This small package was designed alongside NLMR and holds basic raster-based functions like classification, binary classification and more, with a focus on ecology applications.

Plum Island example

LULCC contains two datasets to test their functionality. We will go through the Plum dataset to get familiar with the workflow of LULCC. You can find the same example on GitHub but since the used functions and underlying models are very complex we try to give a more thorough and newbie-friendly explanation.

Let’s start by loading the package and creating a ObsLulcRasterStack object from the test data:

library(lulcc)
#> Loading required package: raster
#> Loading required package: sp
data(pie)
obs <- ObsLulcRasterStack(x=pie,
                          pattern="lu",
                          categories=c(1,2,3),
                          labels=c("Forest","Built","Other"),
                          t=c(0,6,14))

categories should be a numeric vector that represents the codes for you land use classes, labels are the corresponding labels. ´t´ is a numeric vector that represents the timesteps of your land use maps. In this example we have one map for 1985, 1991 and 1999 respectively. The numbers in t are then the cumulative years passed, going from your first timestep. (Yes, this is the first of many weird data/naming conventions)

As a first view of how our land use classes have changed from 1985 to 1991 we can generate a transition matrix like this:

crossTabulate(obs, times=c(0,6))
#>        Forest Built Other
#> Forest  46672  1926   415
#> Built       0 37085    37
#> Other     359  1339 25730

This is relatively straightforward since we know it from remote sensing methodology. On the left are t=0 (1985) classes, at the top are t=6 (1991) classes, so the main change that occured was from forest to built.

Explanatory variables

The next step is to include predictive variables that attempt to explain the occuring land use class:

ef <- ExpVarRasterList(x=pie, pattern="ef")

Included in the example data are elevation, slope and distance to built land. In reality, as we can see in Verburg et al. (2002), many more variables should probably included, depending on what classes of land use we want to observe. For the example data loading and using explanatory variables is easy, but we will see later that many problems can arise once you work with your own data.

To better understand the explanatory variables we can plot them:

plot(pie$ef_001, main = "ef_001")
plot(pie$ef_002, main = "ef_002")
plot(pie$ef_003, main = "ef_003")

One oddity of LULCC lies in the naming convention of explanatory variables. In the code above we see that the variables are named ef_001 for example. This is because the names have to follow a distinct pattern, otherwise the LULCC functions are not able to properly read and apply them. The pattern is always:

xx: (two) characters that follow a patter, to differentiate between land use data and explanatory variables (We recommend keeping it to two characters exactly - better safe than sorry!)
_
001: three numbers, to indicate the variable (e.g. 001 for slope, 002 for elevation, etc.)
_
01: two numbers, to indicate the timestep. Attention! Unlike before, you should not take the difference in years, but rather chronologically from one up. In our example this would be 01 for 1985, 02 for 1991 and 03 for 1999 but only if explanatory variables are available for more than one timestep. If not, don’t inlcude the last underscore and two numbers!

In the end your explanatory variables should look like this: ef_003_02 or ef_001

Model training

We then separate our land use maps into training and testing partitions, and extract the cell values:

part <- partition(x=obs[[1]],
                  size=0.3, spatial=TRUE)

# extract training data
train.data <- getPredictiveModelInputData(obs=obs,
                                          ef=ef,
                                          cells=part[["train"]],
                                          t=0)

test.data <- getPredictiveModelInputData(obs=obs,
                                         ef=ef,
                                         cells=part[["test"]])

Note that getPredictiveModelInputData() extracts both class values and the values of explanatory variables. In our case we will have 6 values per pixel.

The next step is constructing our model (As mentioned on the GitHub page one model is required for each land use class):

forms <- list(Built~ef_001+ef_002+ef_003,
              Forest~ef_001+ef_002,
              Other~ef_001+ef_002)

Here we construct a list of our land use categories, and which of our explanatory variables we want to use to predict the respective class. “ef_003” for example is our variable “distance to built area, and in the list we see that we only use this to predict built area (because we assume spatial autocorrelation).

Spatial autocorrelation is also the reason why we partitioned our data in the step before. Logistic regressions assumes the data to be independent and identically distributed, which is violated by spatial autocorrelation. Partitioning works against this by selecting a random subset of the data (Moulds et al. 2015).

We can then generate our predictive models:

glm.models <- glmModels(formula=forms,
                        family=binomial,
                        data=train.data,
                        obs=obs)
rpart.models <- rpartModels(formula=forms,
                            data=train.data,
                            obs=obs)

LULCC supports generalized linear models (glmModels), recursive partitioning (rpartModels) and random Forest (randomForestModels). randomForest may take some time to compute, also depending on the size of your dataset. We won’t run the randomForest models for now.

rf.models <- randomForestModels(formula=forms,
                                data=train.data,
                                obs=obs)

Predictive modeling

Once the models are fitted we can use them to predict over our whole dataset:

all.data <- as.data.frame(x=ef, obs=obs, cells=part[["all"]])

# GLM
probmaps <- predict(object=glm.models,
                    newdata=all.data,
                    data.frame=TRUE)
points <- rasterToPoints(obs[[1]], spatial=TRUE)
probmaps <- SpatialPointsDataFrame(points, probmaps)
probmaps <- rasterize(x=probmaps, y=obs[[1]],
                      field=names(probmaps))
rasterVis::levelplot(probmaps)

We can exectude the above code the same way with our rpart model by changing the predict() input model:

all.data <- as.data.frame(x=ef, obs=obs, cells=part[["all"]])

# GLM
probmaps <- predict(object=rpart.models,
                    newdata=all.data,
                    data.frame=TRUE)
points <- rasterToPoints(obs[[1]], spatial=TRUE)
probmaps <- SpatialPointsDataFrame(points, probmaps)
probmaps <- rasterize(x=probmaps, y=obs[[1]],
                      field=names(probmaps))
rasterVis::levelplot(probmaps)

Attention! When working with your own dataset you may encounter an Error that occurs due to large vector sizes out of the rasterize() function (950 GB in one case).

Model performance assessment

Assessing model performance in LULCC is done with the ROCR R-package:

glm.pred <- PredictionList(models=glm.models,
                           newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
                            measure="rch")
rpart.pred <- PredictionList(models=rpart.models,
                             newdata=test.data)
rpart.perf <- PerformanceList(pred=rpart.pred,
                              measure="rch")
plot(list(glm=glm.perf,
          rpart=rpart.perf))

ROC stands for “recursive operator characteristic” and is a graphical approach for analyzing the performance of a classifier, using a ratio of true-positive and false-positive classifications. In LULCC true-positives are referred to as hits and indicate Change simulated correctly, while false-positives are reffered to as wrong hits and indicate Change simulated as change to the wrong category (See also Validation section)

AUC stands for “area under curve” and is the primary measure of fit evaluation. A value of 1 indicates a perfect fit, while a value of 0.5 indicates a completely random fit

Assessment of gain prediction

Attention! The above examples only assessed model accuracy based on one timestep. We can also test how well the model performs in predicting cells where gain occurs between two timepoints, but only if we have a second land use map from a later point in time.

To model gain we want to exclude cells in 1985 that are not candidate for gain in the “built” class (class 2). We therefore create a data partition that omits all points belonging to class 2 in 1985 like this:

part <- rasterToPoints (obs[[1]],
                        fun=function(x) x != 2,
                        spatial=TRUE)

We can then use this as test.data:

test.data <- getPredictiveModelInputData(obs=obs,
                                        ef=ef,
                                        cells=part,
                                        t=6)

After that we use the same functions as before for the assessment. This can be repeated for any class and any timestep by changing the respective variables:

glm.pred <- PredictionList(models=glm.models[[2]],
                           newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
                            measure="rch")
plot(list(glm=glm.perf),
     main = "Gain prediction 1985 to 1991")

If we wanted to assess the same change, but from 1991 to 1999 we can do:

part <- rasterToPoints(obs[[2]],
                        fun=function(x) x != 2,
                        spatial=TRUE)
test.data <- getPredictiveModelInputData(obs=obs,
                                        ef=ef,
                                        cells=part,
                                        t=14)
glm.pred <- PredictionList(models=glm.models[[3]],
                           newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
                            measure="rch")
plot(list(glm=glm.perf),
     main = "Gain prediction 1991 to 1999")

We can see that in order to alter the looked-at years we have to change

the obs[[]] index of rasterToPoints() to our initial (first) timestep
the t argument of getPredictiveModelInputData() to the “year-code” of our second timestep (14 for 1999 in this case)
the glm.models[[]] index of PredictionList() to the subset of our second timestep

Always be careful and double check, the way data is handled in LULCC can get confusing very quickly!

Allocation

Allocation in LULCC is basically the process of giving our models a spatial dimension. As stated in Moulds et al. 2015:

“Spatially explicit land use change models are normally driven by non-spatial estimates of either the total number of cells oc- cupied by each category at each time point or the number of transitions among the various categories during each time in- terval”

This simply means that the model contains a non-spatial and a spatial module. The non-spatial module is also called “demand”, because we look at how the total area of a class changes between two timesteps:

# obtain demand scenario
dmd <- approxExtrapDemand(obs=obs, tout=0:14)

Additionally, from the documentation:

Some routines are coupled to complex economic models that predict future or past land use demand based on economic considerations; however, linear extrapolation of trends remains a useful technique.

To allocate, so to add the spatial dimension, we first create a filter matrix with weights (values) of 1 as shown below:

w <- matrix(data=1, nrow=3, ncol=3)
nb <- NeighbRasterStack(x=obs[[1]], weights=w,
                        categories=c(1,2,3))

This matrix will act as a moving window and by using NeighbRasterStack() pass over each cell. By default the fraction of non-NA cells within the 3x3 window of each land use class is calculated. This is how the relation of our target cell (and it’s value) and the surrounding cells (and their respective values) is established.

Short detour: Decision rules

Decision rules in LULCC are rules that prohibit certain land use changes to be considered by the models. They are:

LUC prohibition: Under the assumption that some land use change scenarios are very unlikely, some transitions are illegal in LULCC models, for example the transition from urban areas to agriculture.
Time-dependent prohibition: Under the assumption that proceses take a certain amount of time in the real world, some transitions are illegal in LULCC models if the change happens too fast. This is for example a transition from shrubland to forest happening in only a year, where a minimum of timesteps can be required.
- Under the same assumption, certain transitions are prohibited if they go above a maximum number of timesteps
neighbourhood prohibition: When a cell transitions to a land use, and that cell is not in a (user-defined) neighbourhood of cells already belonging to that neighbourhood, that transition is considered illegal (spatial autocorrelation).

Back to allocation:

Now the CLUE-S model comes into play, to simulate the location of land use change:

clues.rules <- matrix(data=1, nrow=3, ncol=3)
clues.parms <- list(jitter.f=0.0002,
                    scale.f=0.000001,
                    max.iter=1000,
                    max.diff=50,
                    ave.diff=50)
clues.model <- CluesModel(obs=obs,
                          ef=ef,
                          models=glm.models,
                          time=0:14,
                          demand=dmd,
                          elas=c(0.2,0.2,0.2),
                          rules=clues.rules,
                          params=clues.parms)

One task of the CLUE-S model is to meet the (area) demand for each class. The total demand across all categories (land use classes) may be more than 100% of the available area. CLUE-S therefore uses, among others, the parameters provided in clues.parms to calculate competition between the land use classes.

The process is as follows:

Each cell is assigned the land use class of the highest probability, based on our explanatory variables
The total area for each land use class is calculated and compared to the demand
If the area of a land use class is not equal to the demand the suitability (probability) will be increased or decreased. The amount in in- or decrease depends on the difference between assigned area and demand
If the area of a land use class is equal to the demand of that class, suitability remains unchanged
This is repeated until the demand for all land use classes is met (within a user-defined tolerance)

Note: For a detailed explanation of the clues.parms list, see the ´?help´ page of ´CluesModel()´

The actual allocation then only requires one argument:

clues.model <- allocate(clues.model)

Validation

Validation of the model performance in LULCC is done via an approach developed by Pontius et al. (2011), where three maps are compared:

Reference map for timestep 1
Reference map for timestep 2
Simulated (modeled) map for timestep 2

We create a three-dimensional contingency table like this:

clues.tabs <- ThreeMapComparison(x=clues.model,
                                   factors=2^(1:8),
                                   timestep=14)

Note: factors = refers to the resolution (pixel size) at which the model shall be tested.

These tables already allow us to locate agreement and disagreement sources, mostly by comparing the reference and simulated maps of timestep 2. We can then ask questions like:

Did the model allocate change where in reality no change happened? (misses)
Did the model allocate no change, where change actually happened? (false alarm)
Did the model allocate a land use change correctly, but to the wrong class? (wrong hits)
Did the model allocate land use change correctly, and also to the correct class? (hits)

The visualization of these questions, at different spatial resolutions (pixel sizes) is possible with the AgreemnentBudget() function:

clues.agr <- AgreementBudget(x=clues.tabs)
plot(clues.agr, from=1, to=2)

Note: The terminology deviates from terms you might already be familiar with in the context of classification validation. Usually the four categories are referred to as:

true positive (hits)
false positive (false alarms)
true negative (wrong hits)
false negative (misses)

True and false here refer to the equality of the predicition, while positive and negative refer to the class for which the metric is calculated (Sefrin et al. 2020)

We can also compute a figure of merit:

clues.fom <- FigureOfMerit(x=clues.agr)
plot(clues.fom, from=1, to=2)

Note: (from the documentation)

It is useful to calculate the figure of merit at three levels: (1) considering all possible transitions from all land use categories, (2) considering all transitions from specific land use categories and (3) considering a specific transition from one land use category to another.

NLMR & landscapetools

As stated in the beginning, we also want to shine some light on the NLMR package. Originally we had intended to offer a workflow that combines NLMR, landscapetools and LULCC into one with the following:

Create (multiple) neutral landscape models in NLMR
Classify/preprocess them with landscapetools
Analyse the (fictional) change between (fictional) landscapes with LULCC

While in theory this works, you will run into difficulties because NLMR can not generate rasters with a spatial (coordinate) reference system. The coordinates of NLMR generated rasters are abstract.

We will however show you shortly the main functionalities of NLMR and landscapetools. Let’s start by installing the packages. You have to have Rtools installed for your R version. The remotes package is required because we install NLMR from gitHub instead of CRAN. igraph is a possibly broken dependence of NLMR and should be installed fro m CRAN:

install.packages("remotes")
install.packages("igraph")

remotes::install_github("cran/RandomFieldsUtils")
remotes::install_github("cran/RandomFields")
remotes::install_github("ropensci/NLMR")

of the following packages you should have most already installed:

library(remotes)
library(terra)
library(landscapetools)
library(NLMR)
library(lulcc)
library(raster)
library(rasterVis)

NLMR contains 15 algorithms to generate landscapes. For descriptions and the underlying literature see the NLMR gitHub page. Since they are supposed to model neutral landscapes they are in no way related to biophysical parameters or factors that would influence a landscape in the real world, and based solely on the respective algorithm. Let’s generate 3 landscapes and look at what different algorithms are capable to produce:

random_fbm <- NLMR::nlm_fbm(ncol = 200,
                            nrow = 200,
                            resolution = 1,
                            fract_dim = 1,
                            user_seed = 256)
plot(random_fbm, main = "random FBM")

random_neigh <- NLMR::nlm_neigh(ncol = 200,
                nrow = 200,
                resolution = 1,
                p_neigh = 0.9,
                p_empty = 0.1,
                categories = 5,
                neighbourhood = 8,
                rescale = F)
plot(random_neigh, main = "neighbourhood characteristics")

random_cluster <- NLMR::nlm_randomcluster(nrow = 100,
                                      ncol = 100,
                                      p    = 0.5,
                                      ai   = c(0.3, 0.6, 0.1),
                                      rescale = FALSE)
plot(random_cluster, main = "random cluster")

We can see that fractional brownian motion (FBM) creates a landscape that looks quite real and believeable, while the algorithm based on neighbourhood characteristics has seemingly no spatial autocorrelation. It does however offer the option to produce an already classified raster.

This is a glimpse into how different, sometimes completely abstract, neutral landscapes still have merit to exist when it comes to testing ecological relations or models. We encourage you to play around with the different algorithms and their parameters!

landscapetools is a small package that was developed in union with NLMR until the developers decided to split the two. landscapetools functionality is therefore heavily geared towards NLMR generated data. When looking at the landscapetools gitHub page it becomes apparent that it is a small package with limited but useful functionality for handling NLMR generated raster data. Feel free to look at all the landscapetools functions, there aren’t that many :)

Let’s keep it simple. First we generate a second raster wíth fractional brownian motion, using a different seed so we can simulate change. We then classify the two rasters with landscapetools’ util_classify():

random_fbm2 <- NLMR::nlm_fbm(ncol = 200,
                             nrow = 200,
                             resolution = 1,
                             fract_dim = 1,
                             user_seed = 7)
fbm_class <- util_classify(random_fbm, n = 5)
fbm_class2 <- util_classify(random_fbm2, n = 4)

Homework <3

Now we have two classified rasters, land use maps of fictional landscapes. Try to apply what you learned about LULCC today and analyze what change happened between the two, pretending they are the same area but at two different times. What problems could arise when working with this data?

Good luck, and thank you for your attention so far!

Max & Lena