Machine Learning for Land Cover Classification

Objectives

To learn,how to build models usimg machine learning and how to use such models for spatial predictions(eg.Landcover Classification).
To learn how to evaluate such models using a test data.
To familiarize onself with the caret package in R,and the classifier algorithm Random Forest.

Overview

This tutorial seeks to demonstrate the basic workflow of building,testing,tuning and deploying models for landcover classification.The data being used is a(Landsat-8 OLI & TIRS) multispectral image.The image is a coverage of Binduri Distrcit,Bawku Muncipal and Pusiga District in the North-East Region of Ghana . The Caret package in R, provides a standard platorm to use a variety of Classifier Algorithm.However, In this tutorial we would only be particular about the Random Forest Algorithm which have gain tremendous use in spatial predictions in the area of Remote Sensing. We would build a model using the Random Forest algorithm, and use the model to make spatial predictions on the landsat 8 image collected over the area of Binduri,bawku and Pusiga. The 10 LandCover classes digitized(roi) within QGIS would serve as the response parameter, whereas the various band of landsat image, in this case, the blue,green,red,nir(near-infrared),swir1(shortwave infrared 1),swir2(shortwave infrared 2)bands in addition ndvi(Normalized Diffrence Vegetation Index),dem(Digital Elevation Model) would serve as the predictor variables.

Let’s load the following libraries.

#rm(list=ls(all=TRUE))
library(sf)
library(dplyr)
library(raster)
library(tmap)
library(caret)

Let’s load our raster image file using the stack()function. NB: remember to set your working directory using setwd().

setwd("C:/MachineLearning")
bawkuStack <- raster::stack("binduri_bawku_pusiga.grd")

print(bawkuStack) #Lets print-out to know more about the structure of image data
class      : RasterStack 
dimensions : 1389, 1561, 2168229, 8  (nrow, ncol, ncell, nlayers)
resolution : 30, 30  (x, y)
extent     : 783915, 830745, 1194855, 1236525  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs 
names      :         blue,        green,          red,          nir,        swir1,        swir2,         ndvi,          dem 
min values :   0.02920927,   0.03624698,   0.03759349,   0.08096451,   0.09161354,   0.05913325,  -0.13536817, 159.00000000 
max values :    0.1095205,    0.1644335,    0.2353656,    0.3561705,    0.5407009,    0.6490462,    0.7583994,  384.0000000

This a landsat 8 satellite image covering the binduri,Bawku(municipal)and Pusiga district of Ghana downloaded from the USGS staellite data platform.It has a Path and Row of (194,052)respectively.The regions of interest(roi) were collected based on the landsat, and also Google Satellite in QGIS with expert knowledge of the site and terrain.A total of 392 ploygons were collected(digitized) and grouped into 10 LandCover Classes.

Lets plot a 5,4,3 false colour composite of our area of study(imageLayer) using the plotRGB() function.

par(col.axis="white",tck=0)
raster::plotRGB(bawkuStack,
        r=5,g=4,b=3,
        stretch="hist",
        axes=TRUE,
        main= "False Colour composite (5,4,3) of the Study Area")
box(col="white")

Let’s load our digitized polygons(Shapefiles) using the read_sf() function.

roiTraining <- sf::read_sf("trainingRegions.shp") 

#Lets convert Class to factor(to assign levels to the response)
roiTraining$Class <- as.factor(roiTraining$Class)


unique(roiTraining$Class) #This prints out the various unique classes
 [1] River                         Dam                          
 [3] River_DriedUp                 Settlements                  
 [5] Tree_Shrub_Savanna            Agriculture_shallow_recession
 [7] Wooded_Savanna                Agriculture                  
 [9] Burntland                     Plantation                   
10 Levels: Agriculture Agriculture_shallow_recession Burntland ... Wooded_Savanna

The output shows that there are indeed 10 unique classes.

It’s also very important to make sure our raster layer(bawkuStack) and the sf objects(roiTraining) have the same coordinate refrence system.

cat("CRS of bawkuStack\n"); bawkuStack@crs;
CRS of bawkuStack
CRS arguments:
 +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs 

cat("\n") #just to create a break

#Intentionally used crs for the sf object insted of st_crs.
cat("CRS of roiTraining\n"); crs(roiTraining) 
CRS of roiTraining
CRS arguments:
 +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs

The two object have the same coordinate refrence system.

Extraction of RasterLayer values

We would extract the raster values at the various polygon locations using the extract function raster::extract().This are the values that would be used to train our models.

extract <- raster::extract(bawkuStack,roiTraining,df=TRUE)

head(extract)
  ID       blue     green       red       nir     swir1      swir2       ndvi
1  1 0.07858848 0.1283467 0.1681415 0.1350957 0.1307055 0.09721603 -0.1089767
2  1 0.07726602 0.1278760 0.1668638 0.1349612 0.1303917 0.09587111 -0.1056989
3  1 0.07650393 0.1269571 0.1647343 0.1332352 0.1289123 0.09392104 -0.1057123
4  1 0.07621255 0.1269122 0.1649809 0.1331904 0.1315349 0.09338309 -0.1066181
5  1 0.07587632 0.1264640 0.1650705 0.1325180 0.1281054 0.09304685 -0.1093876
6  1 0.07497974 0.1250967 0.1639049 0.1322939 0.1270743 0.09302444 -0.1067225
  dem
1 174
2 171
3 172
4 174
5 172
6 171

Good, now we have a dataframe of values for the predictors.Now we need to find way to join our response variable(Classes) to this dataframe.We can easily do that using the dplyr inner_join() function .We then set the geometry column to NULL.

extractMerged <- dplyr::inner_join(extract,roiTraining,by= c("ID"="id")) 

extractMerged$geometry <- NULL

Data Partitioning

At this stage we partition our dataset into two, thats a train and test dataset. We train the model with the train dataset and evaluate the model perfomance using the test data.

We would use the caret function createDataPartition() to split data into training and testing sets.

set.seed(123) 
trainIndex <- caret::createDataPartition(extractMerged$ID,list = FALSE,p=0.7)
trainData <- extractMerged[trainIndex,]  # 70% for training Data
testData <- extractMerged[-trainIndex,] # 30% for testing Data

Let’s now look at the count of the various Classes of the {trainData}.

classCount <- trainData %>%
  dplyr::group_by(Class) %>% 
  count()

print(classCount)
# A tibble: 10 x 2
# Groups:   Class [10]
   Class                             n
   <fct>                         <int>
 1 Agriculture                     394
 2 Agriculture_shallow_recession    93
 3 Burntland                       358
 4 Dam                              90
 5 Plantation                      378
 6 River                            42
 7 River_DriedUp                    47
 8 Settlements                     479
 9 Tree_Shrub_Savanna              405
10 Wooded_Savanna                  118

Proportion of the response Classes that makes up the training data.

(prop.table(table(trainData$Class))*100)

                  Agriculture Agriculture_shallow_recession 
                    16.389351                      3.868552 
                    Burntland                           Dam 
                    14.891847                      3.743760 
                   Plantation                         River 
                    15.723794                      1.747088 
                River_DriedUp                   Settlements 
                     1.955075                     19.925125 
           Tree_Shrub_Savanna                Wooded_Savanna 
                    16.846922                      4.908486

Clearly an imbalance data but not that bad, very common in geospatial studies in relation to land cover analysis.

We would also again look at the scale of the values for the vraious predictors.We print first six(6) rows of the trainData.

print(head(trainData))
   ID       blue     green       red       nir     swir1      swir2       ndvi
2   1 0.07726602 0.1278760 0.1668638 0.1349612 0.1303917 0.09587111 -0.1056989
5   1 0.07587632 0.1264640 0.1650705 0.1325180 0.1281054 0.09304685 -0.1093876
6   1 0.07497974 0.1250967 0.1639049 0.1322939 0.1270743 0.09302444 -0.1067225
8   2 0.07229001 0.1224518 0.1606995 0.1285058 0.1171668 0.08349814 -0.1113178
10  2 0.07047442 0.1209501 0.1583459 0.1257488 0.1150149 0.08125669 -0.1147402
11  2 0.06955542 0.1193138 0.1561940 0.1242695 0.1142080 0.08006869 -0.1138277
   dem Class
2  171 River
5  172 River
6  171 River
8  170 River
10 173 River
11 173 River

We can see clearly the different scale of measurement,especially for the dem and also ndvi.However,scale diffrence is not a issue for tree based classifier(Decision Tree,Random forest,Bagging etc.).

Model Training

At this satge, we are ready to build our model.As was stated in the objectives, we would be developing the model using Random Forest algorithm.

Random Forest

Random Forest basically consist of an ensemble of trees. It also uses random Sampling with replacement(bootsrap samples).It can be used for both regression and classification.Rgression when the response variable is numeric and classification when the response variable is a factor(categorical).Each tree within the ensemble(forest) is considered as a model on its own, and at each terminal node, a random number of the predictor variables(mtry) are sampled for splitting.This split the new sample into the various class.Since each tree within the ensemble has equal weight, the total votes in each class is aggregated from each tree to identify the class with the highest votes.

From the breif explaination above, we have 2 hyperparamters we can tune.

The number of trees(ntree).
The number of predictor variable randomly sampled as candidates for splitting at each terminal node(mtry).The default is the square root of the predictors.

NB:Random Forest does quite well in handling multicollinearity.

Let’s define our response and predictor variables.

respVar <- c("Class")

predVar <- c("blue","green","red","nir","swir1","swir2","ndvi","dem")

After building the model,we would use test data(not seen by the model*) to provide unbiased evaluation of the model. We then use Kappa Index as the main performance metric to choose the best model.

It is recommended to use ntree >= 500.

set.seed(124)
cvControl <- caret::trainControl(method = 'cv',
                                 number = 10,
                                 savePredictions = TRUE,
                                 verboseIter = FALSE)

# Train model using random Forest algorithm
set.seed(124)
rfModel <- caret::train(trainData[,predVar],
                        trainData[,respVar],
                        method="rf",
                        metric = "Kappa",
                        ntree= 500,
                        trControl= cvControl,
                        tuneLength=6,
                        importance=TRUE)

Cross-validation results for rfModel.

print(rfModel)
Random Forest 

2404 samples
   8 predictor
  10 classes: 'Agriculture', 'Agriculture_shallow_recession', 'Burntland', 'Dam', 'Plantation', 'River', 'River_DriedUp', 'Settlements', 'Tree_Shrub_Savanna', 'Wooded_Savanna' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 2164, 2165, 2162, 2164, 2166, 2163, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
  2     0.9708929  0.9657977
  3     0.9725665  0.9677642
  4     0.9713165  0.9663031
  5     0.9721551  0.9672991
  6     0.9708981  0.9658202
  8     0.9659013  0.9599479

Kappa was used to select the optimal model using the largest value.
The final value used for the model was mtry = 3.

Cross-validation results shows a high Kappa Index of 0.9725665 at mtry = 3.This depicts a high concordance between the observed and predicted classes.

Model Evaluation using Test Data

We would evaluate the model using the test data.We would look at the confusion matrix, which would indicate to us how well our model fared in predicting the various classes in the test data.We would also compare the overall acurracy to the No Information Rate, this would indicate to us whether our model is worth using.

Evaluation of the rfModel model using the test data.

rfPredict <- predict(rfModel,testData)
confusionMatrix(rfPredict,testData$Class)
Confusion Matrix and Statistics

                               Reference
Prediction                      Agriculture Agriculture_shallow_recession
  Agriculture                           183                             2
  Agriculture_shallow_recession           0                            43
  Burntland                               1                             0
  Dam                                     0                             0
  Plantation                              0                             0
  River                                   0                             0
  River_DriedUp                           2                             0
  Settlements                             0                             0
  Tree_Shrub_Savanna                      0                             0
  Wooded_Savanna                          0                             2
                               Reference
Prediction                      Burntland Dam Plantation River River_DriedUp
  Agriculture                           0   0          0     0             0
  Agriculture_shallow_recession         0   0          0     0             0
  Burntland                           127   0          0     0             0
  Dam                                   0  34          0     0             0
  Plantation                            0   0        165     0             0
  River                                 0   0          0    31             0
  River_DriedUp                         0   0          0     0            17
  Settlements                           1   0          0     0             0
  Tree_Shrub_Savanna                    0   0          0     0             0
  Wooded_Savanna                        0   0          0     0             0
                               Reference
Prediction                      Settlements Tree_Shrub_Savanna Wooded_Savanna
  Agriculture                             1                  0              0
  Agriculture_shallow_recession           0                  0              5
  Burntland                               0                  0              0
  Dam                                     0                  0              0
  Plantation                              0                  1              0
  River                                   0                  0              0
  River_DriedUp                           0                  0              0
  Settlements                           197                  0              0
  Tree_Shrub_Savanna                      0                163              6
  Wooded_Savanna                          0                  1             46

Overall Statistics
                                          
               Accuracy : 0.9786          
                 95% CI : (0.9678, 0.9865)
    No Information Rate : 0.1926          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.975           
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: Agriculture Class: Agriculture_shallow_recession
Sensitivity                      0.9839                              0.91489
Specificity                      0.9964                              0.99490
Pos Pred Value                   0.9839                              0.89583
Neg Pred Value                   0.9964                              0.99592
Prevalence                       0.1809                              0.04572
Detection Rate                   0.1780                              0.04183
Detection Prevalence             0.1809                              0.04669
Balanced Accuracy                0.9902                              0.95490
                     Class: Burntland Class: Dam Class: Plantation Class: River
Sensitivity                    0.9922    1.00000            1.0000      1.00000
Specificity                    0.9989    1.00000            0.9988      1.00000
Pos Pred Value                 0.9922    1.00000            0.9940      1.00000
Neg Pred Value                 0.9989    1.00000            1.0000      1.00000
Prevalence                     0.1245    0.03307            0.1605      0.03016
Detection Rate                 0.1235    0.03307            0.1605      0.03016
Detection Prevalence           0.1245    0.03307            0.1615      0.03016
Balanced Accuracy              0.9955    1.00000            0.9994      1.00000
                     Class: River_DriedUp Class: Settlements
Sensitivity                       1.00000             0.9949
Specificity                       0.99802             0.9988
Pos Pred Value                    0.89474             0.9949
Neg Pred Value                    1.00000             0.9988
Prevalence                        0.01654             0.1926
Detection Rate                    0.01654             0.1916
Detection Prevalence              0.01848             0.1926
Balanced Accuracy                 0.99901             0.9969
                     Class: Tree_Shrub_Savanna Class: Wooded_Savanna
Sensitivity                             0.9879               0.80702
Specificity                             0.9930               0.99691
Pos Pred Value                          0.9645               0.93878
Neg Pred Value                          0.9977               0.98876
Prevalence                              0.1605               0.05545
Detection Rate                          0.1586               0.04475
Detection Prevalence                    0.1644               0.04767
Balanced Accuracy                       0.9905               0.90196

Kappa Index of 0.975 with 95% confidence interval(0.9678, 0.9865) indicates high concordance between the observed and the predicted classes.Also, an Accuracy value 0.9786 is very encouraging, compared to the No Information Rate of 0.1926.We can also see high values of sensitivity and specificity for the various classes of the response variable.

Feature Selection(recursive feature elimination)

In machine learning, we are usually face with the problem of large numbers of predictors,some of this predictors have little to no contribution to the performance of the model.Given such situations, there is the need to find a way to drop some of this non-informative features(predictors) mainly to increase the performance metric(be it Kappa Index or Accuracy) or to find a subset of features(predictors) that reduces the complexity of the model.

NB:Multicollinearity does not affect the performance of Random Forest models.However the interpretability of variable importance becomes a problem

RFE: It begins with all predictors(backward selection), and works it way down by a given number of iterations untill a subset(combination of predictors) of optimal model performance is achieved.

In the practical field of devloping models, large number of features(predictors) would increase

Cost(cost of collecting/measuring information on the predictors)
Computation time.
Complexity of the model.

NB: This is just a demonstration, we are not burden with large number of features(predictors). Our interest here, is whether there is a subset that could reduce the model complexity and yet increase the Kappa Index. There could also be a trade-off, where a reduction in complexity would lead to a minimal reduction in the performance metric(Kappa Index).

Subset sequence and recursive feature control

set.seed(124)
indexRfe <- createMultiFolds(trainData$Class, times=5) 

##A sequence indicating how we want the predictor variables to be subseted
predSeq <-seq(from=1, to =length(predVar),by=2)

#RFE control function
rfeCont <- rfeControl(method="cv",
                      number = 10,
                      verbose=FALSE,
                      functions=rfFuncs,
                      index=indexRfe)

Recursive feature elimination(ntree=500)

#lets pre-process our training data before feeding into the rfe algorithm
set.seed(124)
rfModelRfe <- caret::rfe(trainData[,predVar],
                         trainData[,respVar],
                         sizes = predSeq,
                         metric = "Kappa",
                         ntree=500,
                         method="rf",
                         rfeControl = rfeCont)

Cross-validation results for the recursive feature elimination.

print(rfModelRfe)

Recursive feature selection

Outer resampling method: Cross-Validated (10 fold) 

Resampling performance over subset size:

 Variables Accuracy  Kappa AccuracySD KappaSD Selected
         1   0.4451 0.3468   0.028206 0.03219         
         3   0.9569 0.9494   0.011752 0.01379         
         5   0.9698 0.9645   0.009200 0.01078         
         7   0.9695 0.9641   0.010353 0.01215         
         8   0.9724 0.9676   0.009196 0.01079        *

The top 5 variables (out of 8):
   nir, dem, ndvi, swir1, green

From running the Recursive feature Elimination(rfe), the top five variables were given as nir, dem, ndvi, swir1, green.The cross-validation assessment indicates that all eight(8) features were selected at a Kappa Index of 0.9676.

Evaluation of the testData with the recursive feature elimination model.

rfElemPredict <- predict(rfModelRfe,testData)

confusionMatrix(rfElemPredict$pred,testData$Class)
Confusion Matrix and Statistics

                               Reference
Prediction                      Agriculture Agriculture_shallow_recession
  Agriculture                           184                             2
  Agriculture_shallow_recession           0                            43
  Burntland                               1                             0
  Dam                                     0                             0
  Plantation                              0                             0
  River                                   0                             0
  River_DriedUp                           1                             0
  Settlements                             0                             0
  Tree_Shrub_Savanna                      0                             0
  Wooded_Savanna                          0                             2
                               Reference
Prediction                      Burntland Dam Plantation River River_DriedUp
  Agriculture                           0   0          0     0             0
  Agriculture_shallow_recession         0   0          0     0             0
  Burntland                           127   0          0     0             0
  Dam                                   0  34          0     0             0
  Plantation                            0   0        165     0             0
  River                                 0   0          0    31             0
  River_DriedUp                         0   0          0     0            17
  Settlements                           1   0          0     0             0
  Tree_Shrub_Savanna                    0   0          0     0             0
  Wooded_Savanna                        0   0          0     0             0
                               Reference
Prediction                      Settlements Tree_Shrub_Savanna Wooded_Savanna
  Agriculture                             2                  0              0
  Agriculture_shallow_recession           0                  1              5
  Burntland                               0                  0              0
  Dam                                     0                  0              0
  Plantation                              0                  0              0
  River                                   0                  0              0
  River_DriedUp                           0                  0              0
  Settlements                           195                  0              0
  Tree_Shrub_Savanna                      1                163              5
  Wooded_Savanna                          0                  1             47

Overall Statistics
                                          
               Accuracy : 0.9786          
                 95% CI : (0.9678, 0.9865)
    No Information Rate : 0.1926          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.975           
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: Agriculture Class: Agriculture_shallow_recession
Sensitivity                      0.9892                              0.91489
Specificity                      0.9952                              0.99388
Pos Pred Value                   0.9787                              0.87755
Neg Pred Value                   0.9976                              0.99591
Prevalence                       0.1809                              0.04572
Detection Rate                   0.1790                              0.04183
Detection Prevalence             0.1829                              0.04767
Balanced Accuracy                0.9922                              0.95439
                     Class: Burntland Class: Dam Class: Plantation Class: River
Sensitivity                    0.9922    1.00000            1.0000      1.00000
Specificity                    0.9989    1.00000            1.0000      1.00000
Pos Pred Value                 0.9922    1.00000            1.0000      1.00000
Neg Pred Value                 0.9989    1.00000            1.0000      1.00000
Prevalence                     0.1245    0.03307            0.1605      0.03016
Detection Rate                 0.1235    0.03307            0.1605      0.03016
Detection Prevalence           0.1245    0.03307            0.1605      0.03016
Balanced Accuracy              0.9955    1.00000            1.0000      1.00000
                     Class: River_DriedUp Class: Settlements
Sensitivity                       1.00000             0.9848
Specificity                       0.99901             0.9988
Pos Pred Value                    0.94444             0.9949
Neg Pred Value                    1.00000             0.9964
Prevalence                        0.01654             0.1926
Detection Rate                    0.01654             0.1897
Detection Prevalence              0.01751             0.1907
Balanced Accuracy                 0.99951             0.9918
                     Class: Tree_Shrub_Savanna Class: Wooded_Savanna
Sensitivity                             0.9879               0.82456
Specificity                             0.9930               0.99691
Pos Pred Value                          0.9645               0.94000
Neg Pred Value                          0.9977               0.98978
Prevalence                              0.1605               0.05545
Detection Rate                          0.1586               0.04572
Detection Prevalence                    0.1644               0.04864
Balanced Accuracy                       0.9905               0.91074

This is not different from the results of the initial model rfModel.

We would still proceed to making spatial predictions with the initial model rfModel.This is because rfModelRfe(recursive feature model) had no improvement(reduction) in it’s complexity, nor improvement in it’s perfomance metric(Kappa Index) in refrence to the intial model rfModel.

Spatial Prediction

We would make spatial predictions of the stack image using the initial model rfModel. We would use theraster::predict() function.

bawkuPredictions<- raster::predict(bawkuStack,rfModel)

#Lets define the plalette(mainly be using Hexadecimal colours).
pal <-c('#ffffb3','#d8b365','#404040','#6a51a3','#018571','#2171b5','#bdd7e7',
        '#cb181d','#1a9641','#a6d96a')

tm_shape(bawkuPredictions)+
  tm_raster(style = "cat",labels = c("Agriculture",
                                     "Agriculture in shallows and recession","Burntland",
                                     "Dam","Plantation","River","River(dried-up)",
                                     "Settlements","Shrub and tree savanna","Wooded savanna"
                                     ),
            palette = pal,
            title = "Legend")+
  tm_layout(main.title= "Land Use Land Cover Map of Binduri,Bawku and Pusiga for April 2021",
            main.title.size =.9 )+
  tm_layout(legend.outside = TRUE,
            legend.outside.position = "right",
            legend.title.size = .8)

Machine Learning for Land Cover Classification