Machine Learning for Land Cover Classification
Objectives
To learn,how to build models usimg machine learning and how to use such models for spatial predictions(eg.Landcover Classification).
To learn how to evaluate such models using a test data.
To familiarize onself with the caret package in R,and the classifier algorithm Random Forest.
Overview
This tutorial seeks to demonstrate the basic workflow of building,testing,tuning and deploying models for landcover classification.The data being used is a(Landsat-8 OLI & TIRS) multispectral image.The image is a coverage of Binduri Distrcit,Bawku Muncipal and Pusiga District in the North-East Region of Ghana . The Caret package in R, provides a standard platorm to use a variety of Classifier Algorithm.However, In this tutorial we would only be particular about the Random Forest Algorithm which have gain tremendous use in spatial predictions in the area of Remote Sensing. We would build a model using the Random Forest algorithm, and use the model to make spatial predictions on the landsat 8 image collected over the area of Binduri,bawku and Pusiga. The 10 LandCover classes digitized(roi) within QGIS would serve as the response parameter, whereas the various band of landsat image, in this case, the blue,green,red,nir(near-infrared),swir1(shortwave infrared 1),swir2(shortwave infrared 2)bands in addition ndvi(Normalized Diffrence Vegetation Index),dem(Digital Elevation Model) would serve as the predictor variables.
Let’s load the following libraries.
Let’s load our raster image file using the stack()function. NB: remember to set your working directory using setwd().
setwd("C:/MachineLearning")
bawkuStack <- raster::stack("binduri_bawku_pusiga.grd")
print(bawkuStack) #Lets print-out to know more about the structure of image data
class : RasterStack
dimensions : 1389, 1561, 2168229, 8 (nrow, ncol, ncell, nlayers)
resolution : 30, 30 (x, y)
extent : 783915, 830745, 1194855, 1236525 (xmin, xmax, ymin, ymax)
crs : +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs
names : blue, green, red, nir, swir1, swir2, ndvi, dem
min values : 0.02920927, 0.03624698, 0.03759349, 0.08096451, 0.09161354, 0.05913325, -0.13536817, 159.00000000
max values : 0.1095205, 0.1644335, 0.2353656, 0.3561705, 0.5407009, 0.6490462, 0.7583994, 384.0000000 This a landsat 8 satellite image covering the binduri,Bawku(municipal)and Pusiga district of Ghana downloaded from the USGS staellite data platform.It has a Path and Row of (194,052)respectively.The regions of interest(roi) were collected based on the landsat, and also Google Satellite in QGIS with expert knowledge of the site and terrain.A total of 392 ploygons were collected(digitized) and grouped into 10 LandCover Classes.
Lets plot a 5,4,3 false colour composite of our area of study(imageLayer) using the plotRGB() function.
par(col.axis="white",tck=0)
raster::plotRGB(bawkuStack,
r=5,g=4,b=3,
stretch="hist",
axes=TRUE,
main= "False Colour composite (5,4,3) of the Study Area")
box(col="white")Let’s load our digitized polygons(Shapefiles) using the read_sf() function.
roiTraining <- sf::read_sf("trainingRegions.shp")
#Lets convert Class to factor(to assign levels to the response)
roiTraining$Class <- as.factor(roiTraining$Class)
unique(roiTraining$Class) #This prints out the various unique classes
[1] River Dam
[3] River_DriedUp Settlements
[5] Tree_Shrub_Savanna Agriculture_shallow_recession
[7] Wooded_Savanna Agriculture
[9] Burntland Plantation
10 Levels: Agriculture Agriculture_shallow_recession Burntland ... Wooded_SavannaThe output shows that there are indeed 10 unique classes.
It’s also very important to make sure our raster layer(bawkuStack) and the sf objects(roiTraining) have the same coordinate refrence system.
cat("CRS of bawkuStack\n"); bawkuStack@crs;
CRS of bawkuStack
CRS arguments:
+proj=utm +zone=30 +datum=WGS84 +units=m +no_defs
cat("\n") #just to create a break
#Intentionally used crs for the sf object insted of st_crs.
cat("CRS of roiTraining\n"); crs(roiTraining)
CRS of roiTraining
CRS arguments:
+proj=utm +zone=30 +datum=WGS84 +units=m +no_defs The two object have the same coordinate refrence system.
Extraction of RasterLayer values
We would extract the raster values at the various polygon locations using the extract function raster::extract().This are the values that would be used to train our models.
extract <- raster::extract(bawkuStack,roiTraining,df=TRUE)
head(extract)
ID blue green red nir swir1 swir2 ndvi
1 1 0.07858848 0.1283467 0.1681415 0.1350957 0.1307055 0.09721603 -0.1089767
2 1 0.07726602 0.1278760 0.1668638 0.1349612 0.1303917 0.09587111 -0.1056989
3 1 0.07650393 0.1269571 0.1647343 0.1332352 0.1289123 0.09392104 -0.1057123
4 1 0.07621255 0.1269122 0.1649809 0.1331904 0.1315349 0.09338309 -0.1066181
5 1 0.07587632 0.1264640 0.1650705 0.1325180 0.1281054 0.09304685 -0.1093876
6 1 0.07497974 0.1250967 0.1639049 0.1322939 0.1270743 0.09302444 -0.1067225
dem
1 174
2 171
3 172
4 174
5 172
6 171Good, now we have a dataframe of values for the predictors.Now we need to find way to join our response variable(Classes) to this dataframe.We can easily do that using the dplyr inner_join() function .We then set the geometry column to NULL.
Data Partitioning
At this stage we partition our dataset into two, thats a train and test dataset. We train the model with the train dataset and evaluate the model perfomance using the test data.
We would use the caret function createDataPartition() to split data into training and testing sets.
set.seed(123)
trainIndex <- caret::createDataPartition(extractMerged$ID,list = FALSE,p=0.7)
trainData <- extractMerged[trainIndex,] # 70% for training Data
testData <- extractMerged[-trainIndex,] # 30% for testing DataLet’s now look at the count of the various Classes of the {trainData}.
classCount <- trainData %>%
dplyr::group_by(Class) %>%
count()
print(classCount)
# A tibble: 10 x 2
# Groups: Class [10]
Class n
<fct> <int>
1 Agriculture 394
2 Agriculture_shallow_recession 93
3 Burntland 358
4 Dam 90
5 Plantation 378
6 River 42
7 River_DriedUp 47
8 Settlements 479
9 Tree_Shrub_Savanna 405
10 Wooded_Savanna 118Proportion of the response Classes that makes up the training data.
(prop.table(table(trainData$Class))*100)
Agriculture Agriculture_shallow_recession
16.389351 3.868552
Burntland Dam
14.891847 3.743760
Plantation River
15.723794 1.747088
River_DriedUp Settlements
1.955075 19.925125
Tree_Shrub_Savanna Wooded_Savanna
16.846922 4.908486 Clearly an imbalance data but not that bad, very common in geospatial studies in relation to land cover analysis.
We would also again look at the scale of the values for the vraious predictors.We print first six(6) rows of the trainData.
print(head(trainData))
ID blue green red nir swir1 swir2 ndvi
2 1 0.07726602 0.1278760 0.1668638 0.1349612 0.1303917 0.09587111 -0.1056989
5 1 0.07587632 0.1264640 0.1650705 0.1325180 0.1281054 0.09304685 -0.1093876
6 1 0.07497974 0.1250967 0.1639049 0.1322939 0.1270743 0.09302444 -0.1067225
8 2 0.07229001 0.1224518 0.1606995 0.1285058 0.1171668 0.08349814 -0.1113178
10 2 0.07047442 0.1209501 0.1583459 0.1257488 0.1150149 0.08125669 -0.1147402
11 2 0.06955542 0.1193138 0.1561940 0.1242695 0.1142080 0.08006869 -0.1138277
dem Class
2 171 River
5 172 River
6 171 River
8 170 River
10 173 River
11 173 RiverWe can see clearly the different scale of measurement,especially for the dem and also ndvi.However,scale diffrence is not a issue for tree based classifier(Decision Tree,Random forest,Bagging etc.).
Model Training
At this satge, we are ready to build our model.As was stated in the objectives, we would be developing the model using Random Forest algorithm.
Random Forest
Random Forest basically consist of an ensemble of trees. It also uses random Sampling with replacement(bootsrap samples).It can be used for both regression and classification.Rgression when the response variable is numeric and classification when the response variable is a factor(categorical).Each tree within the ensemble(forest) is considered as a model on its own, and at each terminal node, a random number of the predictor variables(mtry) are sampled for splitting.This split the new sample into the various class.Since each tree within the ensemble has equal weight, the total votes in each class is aggregated from each tree to identify the class with the highest votes.
From the breif explaination above, we have 2 hyperparamters we can tune.
The number of trees(ntree).
The number of predictor variable randomly sampled as candidates for splitting at each terminal node(mtry).The default is the square root of the predictors.
NB:Random Forest does quite well in handling multicollinearity.
Let’s define our response and predictor variables.
After building the model,we would use test data(not seen by the model*) to provide unbiased evaluation of the model. We then use Kappa Index as the main performance metric to choose the best model.
It is recommended to use ntree >= 500.
set.seed(124)
cvControl <- caret::trainControl(method = 'cv',
number = 10,
savePredictions = TRUE,
verboseIter = FALSE)
# Train model using random Forest algorithm
set.seed(124)
rfModel <- caret::train(trainData[,predVar],
trainData[,respVar],
method="rf",
metric = "Kappa",
ntree= 500,
trControl= cvControl,
tuneLength=6,
importance=TRUE)Cross-validation results for rfModel.
print(rfModel)
Random Forest
2404 samples
8 predictor
10 classes: 'Agriculture', 'Agriculture_shallow_recession', 'Burntland', 'Dam', 'Plantation', 'River', 'River_DriedUp', 'Settlements', 'Tree_Shrub_Savanna', 'Wooded_Savanna'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 2164, 2165, 2162, 2164, 2166, 2163, ...
Resampling results across tuning parameters:
mtry Accuracy Kappa
2 0.9708929 0.9657977
3 0.9725665 0.9677642
4 0.9713165 0.9663031
5 0.9721551 0.9672991
6 0.9708981 0.9658202
8 0.9659013 0.9599479
Kappa was used to select the optimal model using the largest value.
The final value used for the model was mtry = 3.Cross-validation results shows a high Kappa Index of 0.9725665 at mtry = 3.This depicts a high concordance between the observed and predicted classes.
Model Evaluation using Test Data
We would evaluate the model using the test data.We would look at the confusion matrix, which would indicate to us how well our model fared in predicting the various classes in the test data.We would also compare the overall acurracy to the No Information Rate, this would indicate to us whether our model is worth using.
Evaluation of the rfModel model using the test data.
rfPredict <- predict(rfModel,testData)
confusionMatrix(rfPredict,testData$Class)
Confusion Matrix and Statistics
Reference
Prediction Agriculture Agriculture_shallow_recession
Agriculture 183 2
Agriculture_shallow_recession 0 43
Burntland 1 0
Dam 0 0
Plantation 0 0
River 0 0
River_DriedUp 2 0
Settlements 0 0
Tree_Shrub_Savanna 0 0
Wooded_Savanna 0 2
Reference
Prediction Burntland Dam Plantation River River_DriedUp
Agriculture 0 0 0 0 0
Agriculture_shallow_recession 0 0 0 0 0
Burntland 127 0 0 0 0
Dam 0 34 0 0 0
Plantation 0 0 165 0 0
River 0 0 0 31 0
River_DriedUp 0 0 0 0 17
Settlements 1 0 0 0 0
Tree_Shrub_Savanna 0 0 0 0 0
Wooded_Savanna 0 0 0 0 0
Reference
Prediction Settlements Tree_Shrub_Savanna Wooded_Savanna
Agriculture 1 0 0
Agriculture_shallow_recession 0 0 5
Burntland 0 0 0
Dam 0 0 0
Plantation 0 1 0
River 0 0 0
River_DriedUp 0 0 0
Settlements 197 0 0
Tree_Shrub_Savanna 0 163 6
Wooded_Savanna 0 1 46
Overall Statistics
Accuracy : 0.9786
95% CI : (0.9678, 0.9865)
No Information Rate : 0.1926
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.975
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: Agriculture Class: Agriculture_shallow_recession
Sensitivity 0.9839 0.91489
Specificity 0.9964 0.99490
Pos Pred Value 0.9839 0.89583
Neg Pred Value 0.9964 0.99592
Prevalence 0.1809 0.04572
Detection Rate 0.1780 0.04183
Detection Prevalence 0.1809 0.04669
Balanced Accuracy 0.9902 0.95490
Class: Burntland Class: Dam Class: Plantation Class: River
Sensitivity 0.9922 1.00000 1.0000 1.00000
Specificity 0.9989 1.00000 0.9988 1.00000
Pos Pred Value 0.9922 1.00000 0.9940 1.00000
Neg Pred Value 0.9989 1.00000 1.0000 1.00000
Prevalence 0.1245 0.03307 0.1605 0.03016
Detection Rate 0.1235 0.03307 0.1605 0.03016
Detection Prevalence 0.1245 0.03307 0.1615 0.03016
Balanced Accuracy 0.9955 1.00000 0.9994 1.00000
Class: River_DriedUp Class: Settlements
Sensitivity 1.00000 0.9949
Specificity 0.99802 0.9988
Pos Pred Value 0.89474 0.9949
Neg Pred Value 1.00000 0.9988
Prevalence 0.01654 0.1926
Detection Rate 0.01654 0.1916
Detection Prevalence 0.01848 0.1926
Balanced Accuracy 0.99901 0.9969
Class: Tree_Shrub_Savanna Class: Wooded_Savanna
Sensitivity 0.9879 0.80702
Specificity 0.9930 0.99691
Pos Pred Value 0.9645 0.93878
Neg Pred Value 0.9977 0.98876
Prevalence 0.1605 0.05545
Detection Rate 0.1586 0.04475
Detection Prevalence 0.1644 0.04767
Balanced Accuracy 0.9905 0.90196Kappa Index of 0.975 with 95% confidence interval(0.9678, 0.9865) indicates high concordance between the observed and the predicted classes.Also, an Accuracy value 0.9786 is very encouraging, compared to the No Information Rate of 0.1926.We can also see high values of sensitivity and specificity for the various classes of the response variable.
Feature Selection(recursive feature elimination)
In machine learning, we are usually face with the problem of large numbers of predictors,some of this predictors have little to no contribution to the performance of the model.Given such situations, there is the need to find a way to drop some of this non-informative features(predictors) mainly to increase the performance metric(be it Kappa Index or Accuracy) or to find a subset of features(predictors) that reduces the complexity of the model.
NB:Multicollinearity does not affect the performance of Random Forest models.However the interpretability of variable importance becomes a problem
RFE: It begins with all predictors(backward selection), and works it way down by a given number of iterations untill a subset(combination of predictors) of optimal model performance is achieved.
In the practical field of devloping models, large number of features(predictors) would increase
- Cost(cost of collecting/measuring information on the predictors)
- Computation time.
- Complexity of the model.
NB: This is just a demonstration, we are not burden with large number of features(predictors). Our interest here, is whether there is a subset that could reduce the model complexity and yet increase the Kappa Index. There could also be a trade-off, where a reduction in complexity would lead to a minimal reduction in the performance metric(Kappa Index).
Subset sequence and recursive feature control
set.seed(124)
indexRfe <- createMultiFolds(trainData$Class, times=5)
##A sequence indicating how we want the predictor variables to be subseted
predSeq <-seq(from=1, to =length(predVar),by=2)
#RFE control function
rfeCont <- rfeControl(method="cv",
number = 10,
verbose=FALSE,
functions=rfFuncs,
index=indexRfe)Recursive feature elimination(ntree=500)
#lets pre-process our training data before feeding into the rfe algorithm
set.seed(124)
rfModelRfe <- caret::rfe(trainData[,predVar],
trainData[,respVar],
sizes = predSeq,
metric = "Kappa",
ntree=500,
method="rf",
rfeControl = rfeCont)Cross-validation results for the recursive feature elimination.
print(rfModelRfe)
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold)
Resampling performance over subset size:
Variables Accuracy Kappa AccuracySD KappaSD Selected
1 0.4451 0.3468 0.028206 0.03219
3 0.9569 0.9494 0.011752 0.01379
5 0.9698 0.9645 0.009200 0.01078
7 0.9695 0.9641 0.010353 0.01215
8 0.9724 0.9676 0.009196 0.01079 *
The top 5 variables (out of 8):
nir, dem, ndvi, swir1, greenFrom running the Recursive feature Elimination(rfe), the top five variables were given as nir, dem, ndvi, swir1, green.The cross-validation assessment indicates that all eight(8) features were selected at a Kappa Index of 0.9676.
Evaluation of the testData with the recursive feature elimination model.
rfElemPredict <- predict(rfModelRfe,testData)
confusionMatrix(rfElemPredict$pred,testData$Class)
Confusion Matrix and Statistics
Reference
Prediction Agriculture Agriculture_shallow_recession
Agriculture 184 2
Agriculture_shallow_recession 0 43
Burntland 1 0
Dam 0 0
Plantation 0 0
River 0 0
River_DriedUp 1 0
Settlements 0 0
Tree_Shrub_Savanna 0 0
Wooded_Savanna 0 2
Reference
Prediction Burntland Dam Plantation River River_DriedUp
Agriculture 0 0 0 0 0
Agriculture_shallow_recession 0 0 0 0 0
Burntland 127 0 0 0 0
Dam 0 34 0 0 0
Plantation 0 0 165 0 0
River 0 0 0 31 0
River_DriedUp 0 0 0 0 17
Settlements 1 0 0 0 0
Tree_Shrub_Savanna 0 0 0 0 0
Wooded_Savanna 0 0 0 0 0
Reference
Prediction Settlements Tree_Shrub_Savanna Wooded_Savanna
Agriculture 2 0 0
Agriculture_shallow_recession 0 1 5
Burntland 0 0 0
Dam 0 0 0
Plantation 0 0 0
River 0 0 0
River_DriedUp 0 0 0
Settlements 195 0 0
Tree_Shrub_Savanna 1 163 5
Wooded_Savanna 0 1 47
Overall Statistics
Accuracy : 0.9786
95% CI : (0.9678, 0.9865)
No Information Rate : 0.1926
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.975
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: Agriculture Class: Agriculture_shallow_recession
Sensitivity 0.9892 0.91489
Specificity 0.9952 0.99388
Pos Pred Value 0.9787 0.87755
Neg Pred Value 0.9976 0.99591
Prevalence 0.1809 0.04572
Detection Rate 0.1790 0.04183
Detection Prevalence 0.1829 0.04767
Balanced Accuracy 0.9922 0.95439
Class: Burntland Class: Dam Class: Plantation Class: River
Sensitivity 0.9922 1.00000 1.0000 1.00000
Specificity 0.9989 1.00000 1.0000 1.00000
Pos Pred Value 0.9922 1.00000 1.0000 1.00000
Neg Pred Value 0.9989 1.00000 1.0000 1.00000
Prevalence 0.1245 0.03307 0.1605 0.03016
Detection Rate 0.1235 0.03307 0.1605 0.03016
Detection Prevalence 0.1245 0.03307 0.1605 0.03016
Balanced Accuracy 0.9955 1.00000 1.0000 1.00000
Class: River_DriedUp Class: Settlements
Sensitivity 1.00000 0.9848
Specificity 0.99901 0.9988
Pos Pred Value 0.94444 0.9949
Neg Pred Value 1.00000 0.9964
Prevalence 0.01654 0.1926
Detection Rate 0.01654 0.1897
Detection Prevalence 0.01751 0.1907
Balanced Accuracy 0.99951 0.9918
Class: Tree_Shrub_Savanna Class: Wooded_Savanna
Sensitivity 0.9879 0.82456
Specificity 0.9930 0.99691
Pos Pred Value 0.9645 0.94000
Neg Pred Value 0.9977 0.98978
Prevalence 0.1605 0.05545
Detection Rate 0.1586 0.04572
Detection Prevalence 0.1644 0.04864
Balanced Accuracy 0.9905 0.91074This is not different from the results of the initial model rfModel.
We would still proceed to making spatial predictions with the initial model rfModel.This is because rfModelRfe(recursive feature model) had no improvement(reduction) in it’s complexity, nor improvement in it’s perfomance metric(Kappa Index) in refrence to the intial model rfModel.
Spatial Prediction
We would make spatial predictions of the stack image using the initial model rfModel. We would use theraster::predict() function.
#Lets define the plalette(mainly be using Hexadecimal colours).
pal <-c('#ffffb3','#d8b365','#404040','#6a51a3','#018571','#2171b5','#bdd7e7',
'#cb181d','#1a9641','#a6d96a')
tm_shape(bawkuPredictions)+
tm_raster(style = "cat",labels = c("Agriculture",
"Agriculture in shallows and recession","Burntland",
"Dam","Plantation","River","River(dried-up)",
"Settlements","Shrub and tree savanna","Wooded savanna"
),
palette = pal,
title = "Legend")+
tm_layout(main.title= "Land Use Land Cover Map of Binduri,Bawku and Pusiga for April 2021",
main.title.size =.9 )+
tm_layout(legend.outside = TRUE,
legend.outside.position = "right",
legend.title.size = .8)
Further Reading
Kuhn,M., and Johnson, K. (2013). Applied Predictive Modeling.1st edn. New York: Springer.
Hastie,T.,Tibshirani,R., and Friedman,J.H. (2009). The Elements of Statistical Learning
:Data Mining,Inference and Prediction.2nd edn. New York: Springer.