This notebook illustrates how to assess thematic accuracy of land cover classification. It aims at helping Percepcion Remota students at UNAL to get started with remote sensing image analysis in R.
I will use a classified image obtained in a previous notebook. The classification covers the area known as Montes de Maria, in the Colombian Caribbean region.
I will also use an existing 2010-2012 land cover dataset, obtained by IDEAM, as reference data to evaluate thematic accuracy of the results.
Let’s use the terra library to create a raster object.
rm(list=ls())
library(sp)
library(terra)
clas13 <- rast("./landsat/montes_LC_MAX.TIF")
Let’s check what is clas13:
clas13
## class : SpatRaster
## dimensions : 3392, 3409, 1 (nrow, ncol, nlyr)
## resolution : 30, 30 (x, y)
## extent : 422745, 525015, 1019775, 1121535 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : montes_LC_MAX.TIF
## name : lyr.1
## min value : 1
## max value : 8
unique(clas13$lyr.1)
## lyr.1
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
## [5,] 5
## [6,] 6
## [7,] 7
## [8,] 8
Let’s remember the meaning of the land cover categories: - 1. Urbano - 2. Erial - 3. Cultivo - 4. Pasto - 5. Bosque - 6. Arbustal - 7. Agua - 8. NoData
Let’s plot the classified image:
library(leaflet)
library(RColorBrewer)
pal <- colorFactor(c('#fc4e2a','#f7fcb9' , '#99d8c9','#006837','#41ae76', '#d9f0a3',
'#addd8e', '#2b8cbe','white'),
c(1, 2, 3, 4, 5, 6, 7, 8), na.color = "transparent")
library(raster)
rclas13 <- raster(clas13)
leaflet() %>% addTiles() %>%
addRasterImage(rclas13, colors = pal, opacity = 1.0, method="ngb") %>%
addLegend(pal = pal, values = c(1,2,3,4,5,6,7,8),
title = "Montes de María - CART-based land cover in 2013")
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
## +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
## +wktext +no_defs +type=crs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
## +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
## +wktext +no_defs +type=crs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition
Let’s read the data:
ref_data <- vect("./shapes/MontesLC_UTM18N.shp")
What is ref_data?
ref_data
## class : SpatVector
## geometry : polygons
## dimensions : 3426, 5 (geometries, attributes)
## extent : 422788.8, 524926.8, 1019785, 1121539 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## names : OBJECTID CODIGO LEYENDA3N LC Code
## type : <num> <num> <chr> <num> <num>
## values : 1.599e+05 3.112e+04 3.1.1. Bosque denso 311 5
## 1.599e+05 233 2.3.3. Pastos enmal~ 233 4
## 1.612e+05 244 2.4.4. Mosaico de p~ 244 4
Note that the Code attribute holds the land cover codes corresponding to the generalized land cover categories used in our classification experiment.
unique(ref_data$Code)
## [1] 5 4 7 1 6 2 3
Let’s obtain a stratified random sample from the reference data. Note that, in real-life projects, this sample is usually obtained through field-work.
set.seed(15)
# generate stratified random samples from the polygons
# change size parameter as wanted
ptsamp <- spatSample(ref_data, size=350, method="random", strata=
ref_data$Code, chess="")
What is ptsamp?
ptsamp
## class : SpatVector
## geometry : points
## dimensions : 2124, 1 (geometries, attributes)
## extent : 424350.1, 524836, 1021458, 1120819 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## names : pol.id
## type : <num>
## values : 1
## 1
## 1
Note that the attribute pol.id holds the reference land cover codes.
ptsamp$reference <- as.factor(ptsamp$pol.id)
Now we can extract the cell values from the classified images. The extract function from terra seems appropriate. Let’s review the documentation for this function:
Extract values from a SpatRaster for a set of locations. The locations can be a SpatVector (points, lines, polygons), a matrix with (x, y) or (longitude, latitude – in that order!) coordinates, or a vector with cell numbers.
prediction <- extract(clas13, ptsamp)
ptsamp$prediction <- as.factor(prediction$lyr.1)
# Quick check for the extracted values
ptsamp$prediction[900:915]
## [1] "6" "6" "5" "8" "4" "4" "3" "6" "4" "4" "6" "1" "5" "4" "3" "6"
ptsamp$reference[900:915]
## [1] "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4"
(error_matrix <- table(ptsamp$prediction, ptsamp$reference))
##
## 1 2 3 4 5 6 7
## 1 96 10 12 7 0 0 7
## 2 9 35 4 9 1 3 12
## 3 13 6 156 35 30 26 30
## 4 15 56 80 158 45 56 37
## 5 2 10 42 40 149 91 32
## 6 9 10 28 87 89 165 27
## 7 7 11 20 8 34 7 205
## 8 81 4 8 6 2 2 0
error_matrix <- error_matrix[-8,]
error_matrix
##
## 1 2 3 4 5 6 7
## 1 96 10 12 7 0 0 7
## 2 9 35 4 9 1 3 12
## 3 13 6 156 35 30 26 30
## 4 15 56 80 158 45 56 37
## 5 2 10 42 40 149 91 32
## 6 9 10 28 87 89 165 27
## 7 7 11 20 8 34 7 205
Let’s compute the overall accuracy and the “kappa” statistic.
Overall accuracy:
# number of total cases/samples
n <- sum(error_matrix)
n
## [1] 2021
# number of correctly classified cases per class
diag <- diag(error_matrix)
# Overall Accuracy
(OA <- sum(diag) / n)
## [1] 0.4769916
Kappa Index:
# observed (true) cases per class
rowsums <- apply(error_matrix, 1, sum)
p <- rowsums / n
# predicted cases per class
colsums <- apply(error_matrix, 2, sum)
q <- colsums / n
expAccuracy <- sum(p*q)
kappa <- (OA - expAccuracy) / (1 - expAccuracy)
kappa
## [1] 0.3763742
[1] 0.2173659
Producer and user accuracy:
# Producer accuracy
# uncomment lines to run them
PA <- diag / colsums
# User accuracy
UA <- diag / rowsums
(outAcc <- data.frame(producerAccuracy = PA, userAccuracy = UA))
Now, let’s compute per-class precision, recall, and F-1.
precision = diag / colsums
recall = diag / rowsums
f1 = 2 * precision * recall / (precision + recall)
data.frame(precision, recall, f1)
The per-class metrics can be averaged over all the classes resulting in macro-averaged precision, recall and F-1.
(macroPrecision = mean(precision))
## [1] 0.4704058
(macroRecall = mean(recall))
## [1] 0.5134241
(macroF1 = mean(f1))
## [1] 0.4838887
In your technical report, you should describe the relevance of these accuracy metrics, identify the categories with higher thematic errors and discuss further processing tasks (e.g. improve land cover classification).
Let’s quote here a final advice to remote sensing practitioners from Foody (2002): Thus, in addition to providing basic accuracy assessment components such as a quantitative metric of accuracy and a confusion matrix, if appropriate, it may be helpful if the information on issues such as the sampling design used to acquire the testing set, the confidence in the ground data labels, the classification protocols and lineage of the data sets used were provided.
Cite as: Lizarazo, I., 2021. A tutorial on accuracy assessment of land cover classification using R. Available at: https://rpubs.com/ials2un/th_accuracy
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] raster_3.4-5 RColorBrewer_1.1-2 leaflet_2.0.4.1 terra_1.0-10
## [5] sp_1.4-5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 knitr_1.31 magrittr_2.0.1 munsell_0.5.0
## [5] colorspace_2.0-0 lattice_0.20-41 R6_2.5.0 rlang_0.4.10
## [9] stringr_1.4.0 tools_4.0.4 rgdal_1.5-23 grid_4.0.4
## [13] xfun_0.21 png_0.1-7 jquerylib_0.1.3 htmltools_0.5.1.1
## [17] crosstalk_1.1.1 yaml_2.2.1 digest_0.6.27 lifecycle_1.0.0
## [21] farver_2.0.3 vctrs_0.3.6 base64enc_0.1-3 sass_0.3.1
## [25] htmlwidgets_1.5.3 codetools_0.2-18 evaluate_0.14 rmarkdown_2.7
## [29] stringi_1.5.3 compiler_4.0.4 bslib_0.2.4 scales_1.1.1
## [33] jsonlite_1.7.2