This document serves as a detailed guide through LULCC and associated R-packages for analysing land use change. Let’s start by installing the packages. Next to LULCC we recommend you install the packages below additionally, since some dependencies of LULCC seem to not work, but are required for the example code to work:
install.packages("lulcc")
install.packages("gsubfn")
install.packages("caret")
install.packages("Hmisc")
install.packages("randomForest")
LULCC (Land Use and Land Cover Classification) is a package designed for change detection of land use and land cover. LULCC is not specifically designed for ecology or biodiversity tasks, but for all kinds of land use/land cover changes.
To model land use change we require multiple inputs:
In theory LULCC would work with only one input map and nothing else, no subsequent timesteps or predictor variables. This however affects the resulting model in that the modelled change will be based solely on the input and the selected algorithm 8bad practice). With the afforementioned additional inputs we can use additional information (timesteps) and expert knowledge (by selection of explanatory variables) to make our model more accurate! So: think about what factors might affect your land use changes before you start modelling.
The choice for a specific model is very much de- pendent on the nature of the most important land-use conversions taking place within the study area and the scenarios that need to be considered. (Verburg et al. 2002)
NLMR (Neutral Landscape Models (in R?)) is a package designed to create imaginary landscapes with algorithms, for ecological applocations. In ecology, a neutral landscape model serve as a ‘null-hypothesis’, in contrast to actual real-world landscapes. They are used to test for example: * Landscape connectivity * Evaluation of consequences of habitat fragmentation for population subdivision * Occurence of extinction thresholds
If this doesn’t sound too familiar - don’t worry. We aren’t ecologists and this is somewhat out of our ballpark. Until we do a deep dive into ecology the most important thing we have to know about NLMR is that we can create modelled landscapes without any input data.
This small package was designed alongside NLMR and holds basic raster-based functions like classification, binary classification and more, with a focus on ecology applications.
LULCC contains two datasets to test their functionality. We will go through the Plum dataset to get familiar with the workflow of LULCC. You can find the same example on GitHub but since the used functions and underlying models are very complex we try to give a more thorough and newbie-friendly explanation.
Let’s start by loading the package and creating a
ObsLulcRasterStack object from the test data:
library(lulcc)
#> Loading required package: raster
#> Loading required package: sp
data(pie)
obs <- ObsLulcRasterStack(x=pie,
pattern="lu",
categories=c(1,2,3),
labels=c("Forest","Built","Other"),
t=c(0,6,14))
categories should be a numeric vector that represents
the codes for you land use classes, labels are the
corresponding labels. ´t´ is a numeric vector that represents the
timesteps of your land use maps. In this example we have one
map for 1985, 1991 and 1999 respectively. The numbers in t
are then the cumulative years passed, going from your first timestep.
(Yes, this is the first of many weird data/naming conventions)
As a first view of how our land use classes have changed from 1985 to 1991 we can generate a transition matrix like this:
crossTabulate(obs, times=c(0,6))
#> Forest Built Other
#> Forest 46672 1926 415
#> Built 0 37085 37
#> Other 359 1339 25730
This is relatively straightforward since we know it from remote
sensing methodology. On the left are t=0 (1985) classes, at
the top are t=6 (1991) classes, so the main change that
occured was from forest to built.
The next step is to include predictive variables that attempt to explain the occuring land use class:
ef <- ExpVarRasterList(x=pie, pattern="ef")
Included in the example data are elevation, slope and distance to built land. In reality, as we can see in Verburg et al. (2002), many more variables should probably included, depending on what classes of land use we want to observe. For the example data loading and using explanatory variables is easy, but we will see later that many problems can arise once you work with your own data.
To better understand the explanatory variables we can plot them:
plot(pie$ef_001, main = "ef_001")
plot(pie$ef_002, main = "ef_002")
plot(pie$ef_003, main = "ef_003")
One oddity of LULCC lies in the naming convention of explanatory
variables. In the code above we see that the variables are named
ef_001 for example. This is because the names have to
follow a distinct pattern, otherwise the LULCC functions are not able to
properly read and apply them. The pattern is always:
xx: (two) characters that follow a patter, to
differentiate between land use data and explanatory variables (We
recommend keeping it to two characters exactly - better safe than
sorry!)_001: three numbers, to indicate the variable (e.g. 001
for slope, 002 for elevation, etc.)_01: two numbers, to indicate the timestep.
Attention! Unlike before, you should
not take the difference in years, but rather
chronologically from one up. In our example this would be 01 for 1985,
02 for 1991 and 03 for 1999 but only if explanatory
variables are available for more than one timestep. If not, don’t
inlcude the last underscore and two numbers!In the end your explanatory variables should look like this:
ef_003_02 or ef_001
We then separate our land use maps into training and testing partitions, and extract the cell values:
part <- partition(x=obs[[1]],
size=0.3, spatial=TRUE)
# extract training data
train.data <- getPredictiveModelInputData(obs=obs,
ef=ef,
cells=part[["train"]],
t=0)
test.data <- getPredictiveModelInputData(obs=obs,
ef=ef,
cells=part[["test"]])
Note that getPredictiveModelInputData() extracts both
class values and the values of explanatory variables. In our
case we will have 6 values per pixel.
The next step is constructing our model (As mentioned on the GitHub page one model is required for each land use class):
forms <- list(Built~ef_001+ef_002+ef_003,
Forest~ef_001+ef_002,
Other~ef_001+ef_002)
Here we construct a list of our land use categories, and which of our explanatory variables we want to use to predict the respective class. “ef_003” for example is our variable “distance to built area, and in the list we see that we only use this to predict built area (because we assume spatial autocorrelation).
Spatial autocorrelation is also the reason why we partitioned our data in the step before. Logistic regressions assumes the data to be independent and identically distributed, which is violated by spatial autocorrelation. Partitioning works against this by selecting a random subset of the data (Moulds et al. 2015).
We can then generate our predictive models:
glm.models <- glmModels(formula=forms,
family=binomial,
data=train.data,
obs=obs)
rpart.models <- rpartModels(formula=forms,
data=train.data,
obs=obs)
LULCC supports generalized linear models (glmModels), recursive partitioning (rpartModels) and random Forest (randomForestModels). randomForest may take some time to compute, also depending on the size of your dataset. We won’t run the randomForest models for now.
rf.models <- randomForestModels(formula=forms,
data=train.data,
obs=obs)
Once the models are fitted we can use them to predict over our whole dataset:
all.data <- as.data.frame(x=ef, obs=obs, cells=part[["all"]])
# GLM
probmaps <- predict(object=glm.models,
newdata=all.data,
data.frame=TRUE)
points <- rasterToPoints(obs[[1]], spatial=TRUE)
probmaps <- SpatialPointsDataFrame(points, probmaps)
probmaps <- rasterize(x=probmaps, y=obs[[1]],
field=names(probmaps))
rasterVis::levelplot(probmaps)
We can exectude the above code the same way with our
rpart model by changing the predict()
input model:
all.data <- as.data.frame(x=ef, obs=obs, cells=part[["all"]])
# GLM
probmaps <- predict(object=rpart.models,
newdata=all.data,
data.frame=TRUE)
points <- rasterToPoints(obs[[1]], spatial=TRUE)
probmaps <- SpatialPointsDataFrame(points, probmaps)
probmaps <- rasterize(x=probmaps, y=obs[[1]],
field=names(probmaps))
rasterVis::levelplot(probmaps)
Attention! When working with your own dataset you
may encounter an Error that occurs due to large vector sizes out of the
rasterize() function (950 GB in one case).
Assessing model performance in LULCC is done with the
ROCR R-package:
glm.pred <- PredictionList(models=glm.models,
newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
measure="rch")
rpart.pred <- PredictionList(models=rpart.models,
newdata=test.data)
rpart.perf <- PerformanceList(pred=rpart.pred,
measure="rch")
plot(list(glm=glm.perf,
rpart=rpart.perf))
ROC stands for “recursive operator characteristic” and is a graphical approach for analyzing the performance of a classifier, using a ratio of true-positive and false-positive classifications. In LULCC true-positives are referred to as hits and indicate Change simulated correctly, while false-positives are reffered to as wrong hits and indicate Change simulated as change to the wrong category (See also Validation section)
AUC stands for “area under curve” and is the primary measure of fit evaluation. A value of 1 indicates a perfect fit, while a value of 0.5 indicates a completely random fit
Attention! The above examples only assessed model accuracy based on one timestep. We can also test how well the model performs in predicting cells where gain occurs between two timepoints, but only if we have a second land use map from a later point in time.
To model gain we want to exclude cells in 1985 that are not candidate for gain in the “built” class (class 2). We therefore create a data partition that omits all points belonging to class 2 in 1985 like this:
part <- rasterToPoints (obs[[1]],
fun=function(x) x != 2,
spatial=TRUE)
We can then use this as test.data:
test.data <- getPredictiveModelInputData(obs=obs,
ef=ef,
cells=part,
t=6)
After that we use the same functions as before for the assessment. This can be repeated for any class and any timestep by changing the respective variables:
glm.pred <- PredictionList(models=glm.models[[2]],
newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
measure="rch")
plot(list(glm=glm.perf),
main = "Gain prediction 1985 to 1991")
If we wanted to assess the same change, but from 1991 to 1999 we can do:
part <- rasterToPoints(obs[[2]],
fun=function(x) x != 2,
spatial=TRUE)
test.data <- getPredictiveModelInputData(obs=obs,
ef=ef,
cells=part,
t=14)
glm.pred <- PredictionList(models=glm.models[[3]],
newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
measure="rch")
plot(list(glm=glm.perf),
main = "Gain prediction 1991 to 1999")
We can see that in order to alter the looked-at years we have to change
obs[[]] index of rasterToPoints() to
our initial (first) timestept argument of
getPredictiveModelInputData() to the “year-code” of our
second timestep (14 for 1999 in this case)glm.models[[]] index of
PredictionList() to the subset of our second timestepAlways be careful and double check, the way data is handled in LULCC can get confusing very quickly!
Allocation in LULCC is basically the process of giving our models a spatial dimension. As stated in Moulds et al. 2015:
“Spatially explicit land use change models are normally driven by non-spatial estimates of either the total number of cells oc- cupied by each category at each time point or the number of transitions among the various categories during each time in- terval”
This simply means that the model contains a non-spatial and a spatial module. The non-spatial module is also called “demand”, because we look at how the total area of a class changes between two timesteps:
# obtain demand scenario
dmd <- approxExtrapDemand(obs=obs, tout=0:14)
Additionally, from the documentation:
Some routines are coupled to complex economic models that predict future or past land use demand based on economic considerations; however, linear extrapolation of trends remains a useful technique.
To allocate, so to add the spatial dimension, we first create a filter matrix with weights (values) of 1 as shown below:
w <- matrix(data=1, nrow=3, ncol=3)
nb <- NeighbRasterStack(x=obs[[1]], weights=w,
categories=c(1,2,3))
This matrix will act as a moving window and by using
NeighbRasterStack() pass over each cell. By default the
fraction of non-NA cells within the 3x3 window of each land use class is
calculated. This is how the relation of our target cell (and it’s value)
and the surrounding cells (and their respective values) is
established.
Decision rules in LULCC are rules that prohibit certain land use changes to be considered by the models. They are:
Now the CLUE-S model comes into play, to simulate the location of land use change:
clues.rules <- matrix(data=1, nrow=3, ncol=3)
clues.parms <- list(jitter.f=0.0002,
scale.f=0.000001,
max.iter=1000,
max.diff=50,
ave.diff=50)
clues.model <- CluesModel(obs=obs,
ef=ef,
models=glm.models,
time=0:14,
demand=dmd,
elas=c(0.2,0.2,0.2),
rules=clues.rules,
params=clues.parms)
One task of the CLUE-S model is to meet the (area) demand for each
class. The total demand across all categories (land use classes) may be
more than 100% of the available area. CLUE-S therefore uses, among
others, the parameters provided in clues.parms to calculate
competition between the land use classes.
The process is as follows:
Note: For a detailed explanation of the clues.parms
list, see the ´?help´ page of ´CluesModel()´
The actual allocation then only requires one argument:
clues.model <- allocate(clues.model)
Validation of the model performance in LULCC is done via an approach developed by Pontius et al. (2011), where three maps are compared:
We create a three-dimensional contingency table like this:
clues.tabs <- ThreeMapComparison(x=clues.model,
factors=2^(1:8),
timestep=14)
Note: factors = refers to the resolution (pixel size) at
which the model shall be tested.
These tables already allow us to locate agreement and disagreement sources, mostly by comparing the reference and simulated maps of timestep 2. We can then ask questions like:
The visualization of these questions, at different spatial
resolutions (pixel sizes) is possible with the
AgreemnentBudget() function:
clues.agr <- AgreementBudget(x=clues.tabs)
plot(clues.agr, from=1, to=2)
Note: The terminology deviates from terms you might already be familiar with in the context of classification validation. Usually the four categories are referred to as:
True and false here refer to the equality of the predicition, while positive and negative refer to the class for which the metric is calculated (Sefrin et al. 2020)
We can also compute a figure of merit:
clues.fom <- FigureOfMerit(x=clues.agr)
plot(clues.fom, from=1, to=2)
Note: (from the documentation)
It is useful to calculate the figure of merit at three levels: (1) considering all possible transitions from all land use categories, (2) considering all transitions from specific land use categories and (3) considering a specific transition from one land use category to another.
As stated in the beginning, we also want to shine some light on the NLMR package. Originally we had intended to offer a workflow that combines NLMR, landscapetools and LULCC into one with the following:
While in theory this works, you will run into difficulties because NLMR can not generate rasters with a spatial (coordinate) reference system. The coordinates of NLMR generated rasters are abstract.
We will however show you shortly the main functionalities of NLMR and
landscapetools. Let’s start by installing the packages. You have to have
Rtools
installed for your R version. The remotes package is
required because we install NLMR from gitHub instead of CRAN.
igraph is a possibly broken dependence of NLMR and should
be installed fro m CRAN:
install.packages("remotes")
install.packages("igraph")
remotes::install_github("cran/RandomFieldsUtils")
remotes::install_github("cran/RandomFields")
remotes::install_github("ropensci/NLMR")
of the following packages you should have most already installed:
library(remotes)
library(terra)
library(landscapetools)
library(NLMR)
library(lulcc)
library(raster)
library(rasterVis)
NLMR contains 15 algorithms to generate landscapes. For descriptions and the underlying literature see the NLMR gitHub page. Since they are supposed to model neutral landscapes they are in no way related to biophysical parameters or factors that would influence a landscape in the real world, and based solely on the respective algorithm. Let’s generate 3 landscapes and look at what different algorithms are capable to produce:
random_fbm <- NLMR::nlm_fbm(ncol = 200,
nrow = 200,
resolution = 1,
fract_dim = 1,
user_seed = 256)
plot(random_fbm, main = "random FBM")
random_neigh <- NLMR::nlm_neigh(ncol = 200,
nrow = 200,
resolution = 1,
p_neigh = 0.9,
p_empty = 0.1,
categories = 5,
neighbourhood = 8,
rescale = F)
plot(random_neigh, main = "neighbourhood characteristics")
random_cluster <- NLMR::nlm_randomcluster(nrow = 100,
ncol = 100,
p = 0.5,
ai = c(0.3, 0.6, 0.1),
rescale = FALSE)
plot(random_cluster, main = "random cluster")
We can see that fractional brownian motion (FBM) creates a landscape that looks quite real and believeable, while the algorithm based on neighbourhood characteristics has seemingly no spatial autocorrelation. It does however offer the option to produce an already classified raster.
This is a glimpse into how different, sometimes completely abstract, neutral landscapes still have merit to exist when it comes to testing ecological relations or models. We encourage you to play around with the different algorithms and their parameters!
landscapetools is a small package that was developed in
union with NLMR until the developers decided to split the two.
landscapetools functionality is therefore heavily geared towards NLMR
generated data. When looking at the landscapetools gitHub
page it becomes apparent that it is a small package with limited but
useful functionality for handling NLMR generated raster data. Feel free
to look at all the landscapetools functions, there aren’t that many
:)
Let’s keep it simple. First we generate a second raster wíth
fractional brownian motion, using a different seed so we can simulate
change. We then classify the two rasters with landscapetools’
util_classify():
random_fbm2 <- NLMR::nlm_fbm(ncol = 200,
nrow = 200,
resolution = 1,
fract_dim = 1,
user_seed = 7)
fbm_class <- util_classify(random_fbm, n = 5)
fbm_class2 <- util_classify(random_fbm2, n = 4)
Now we have two classified rasters, land use maps of fictional landscapes. Try to apply what you learned about LULCC today and analyze what change happened between the two, pretending they are the same area but at two different times. What problems could arise when working with this data?
Good luck, and thank you for your attention so far!
Max & Lena