A tutorial on accuracy assessment of land cover classification

1. Introduction

This notebook illustrates how to assess thematic accuracy of land cover classification. It aims at helping Percepcion Remota students at UNAL to get started with remote sensing image analysis in R.

I will use a classified image obtained in a previous notebook. The classification covers the area known as Montes de Maria, in the Colombian Caribbean region.

I will also use an existing 2010-2012 land cover dataset, obtained by IDEAM, as reference data to evaluate thematic accuracy of the results.

2. Reading input data

2.1 Reading the classified image

Let’s use the terra library to create a raster object.

rm(list=ls())
library(sp)
library(terra)
clas13 <- rast("./landsat/montes_LC_MAX.TIF")

Let’s check what is clas13:

clas13

## class       : SpatRaster 
## dimensions  : 3392, 3409, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 422745, 525015, 1019775, 1121535  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs 
## source      : montes_LC_MAX.TIF 
## name        : lyr.1 
## min value   :     1 
## max value   :     8

unique(clas13$lyr.1)

##      lyr.1
## [1,]     1
## [2,]     2
## [3,]     3
## [4,]     4
## [5,]     5
## [6,]     6
## [7,]     7
## [8,]     8

Let’s remember the meaning of the land cover categories: - 1. Urbano - 2. Erial - 3. Cultivo - 4. Pasto - 5. Bosque - 6. Arbustal - 7. Agua - 8. NoData

Let’s plot the classified image:

library(leaflet)
library(RColorBrewer)
pal <- colorFactor(c('#fc4e2a','#f7fcb9' , '#99d8c9','#006837','#41ae76', '#d9f0a3',
                  '#addd8e', '#2b8cbe','white'), 
    c(1, 2, 3, 4, 5, 6, 7, 8), na.color = "transparent")

library(raster)
rclas13 <- raster(clas13)

leaflet() %>% addTiles() %>%
  addRasterImage(rclas13, colors = pal, opacity = 1.0, method="ngb") %>%
  addLegend(pal = pal, values = c(1,2,3,4,5,6,7,8),
    title = "Montes de María - CART-based  land cover in 2013")

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
## +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
## +wktext +no_defs +type=crs

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
## +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
## +wktext +no_defs +type=crs

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition

2.2 Reading the reference data

Let’s read the data:

ref_data <- vect("./shapes/MontesLC_UTM18N.shp")

What is ref_data?

ref_data

##  class       : SpatVector 
##  geometry    : polygons 
##  dimensions  : 3426, 5  (geometries, attributes)
##  extent      : 422788.8, 524926.8, 1019785, 1121539  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs 
##  names       :  OBJECTID    CODIGO            LEYENDA3N    LC  Code
##  type        :     <num>     <num>                <chr> <num> <num>
##  values      : 1.599e+05 3.112e+04  3.1.1. Bosque denso   311     5
##                1.599e+05       233 2.3.3. Pastos enmal~   233     4
##                1.612e+05       244 2.4.4. Mosaico de p~   244     4

Note that the Code attribute holds the land cover codes corresponding to the generalized land cover categories used in our classification experiment.

unique(ref_data$Code)

## [1] 5 4 7 1 6 2 3

3. Data processing

3.1 Getting validation data

Let’s obtain a stratified random sample from the reference data. Note that, in real-life projects, this sample is usually obtained through field-work.

set.seed(15)
# generate stratified random samples from the polygons
# change size parameter as wanted
ptsamp <- spatSample(ref_data, size=350, method="random", strata=
                        ref_data$Code, chess="")

What is ptsamp?

ptsamp

##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 2124, 1  (geometries, attributes)
##  extent      : 424350.1, 524836, 1021458, 1120819  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs 
##  names       : pol.id
##  type        :  <num>
##  values      :      1
##                     1
##                     1

Note that the attribute pol.id holds the reference land cover codes.

ptsamp$reference <-  as.factor(ptsamp$pol.id)

Now we can extract the cell values from the classified images. The extract function from terra seems appropriate. Let’s review the documentation for this function:

Extract values from a SpatRaster for a set of locations. The locations can be a SpatVector (points, lines, polygons), a matrix with (x, y) or (longitude, latitude – in that order!) coordinates, or a vector with cell numbers.

prediction <- extract(clas13, ptsamp)

ptsamp$prediction <- as.factor(prediction$lyr.1)

# Quick check for the extracted values
ptsamp$prediction[900:915]

##  [1] "6" "6" "5" "8" "4" "4" "3" "6" "4" "4" "6" "1" "5" "4" "3" "6"

ptsamp$reference[900:915]

##  [1] "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4"

3.2 Creating the confusion matrix

(error_matrix <- table(ptsamp$prediction, ptsamp$reference))

##    
##       1   2   3   4   5   6   7
##   1  96  10  12   7   0   0   7
##   2   9  35   4   9   1   3  12
##   3  13   6 156  35  30  26  30
##   4  15  56  80 158  45  56  37
##   5   2  10  42  40 149  91  32
##   6   9  10  28  87  89 165  27
##   7   7  11  20   8  34   7 205
##   8  81   4   8   6   2   2   0

error_matrix <- error_matrix[-8,]

error_matrix

##    
##       1   2   3   4   5   6   7
##   1  96  10  12   7   0   0   7
##   2   9  35   4   9   1   3  12
##   3  13   6 156  35  30  26  30
##   4  15  56  80 158  45  56  37
##   5   2  10  42  40 149  91  32
##   6   9  10  28  87  89 165  27
##   7   7  11  20   8  34   7 205

3.3 Calculation of accuracy metrics

Let’s compute the overall accuracy and the “kappa” statistic.

Overall accuracy:

# number of total cases/samples
n <- sum(error_matrix)
n

## [1] 2021

# number of correctly classified cases per class
diag <- diag(error_matrix)
# Overall Accuracy
(OA <- sum(diag) / n)

## [1] 0.4769916

Kappa Index:

# observed (true) cases per class
rowsums <- apply(error_matrix, 1, sum)
p <- rowsums / n
# predicted cases per class
colsums <- apply(error_matrix, 2, sum)
q <- colsums / n
expAccuracy <- sum(p*q)
kappa <- (OA - expAccuracy) / (1 - expAccuracy)
kappa

## [1] 0.3763742

[1] 0.2173659

Producer and user accuracy:

# Producer accuracy
# uncomment lines to run them
PA <- diag / colsums
# User accuracy
UA <- diag / rowsums
(outAcc <- data.frame(producerAccuracy = PA, userAccuracy = UA))

Now, let’s compute per-class precision, recall, and F-1.

precision = diag / colsums 
recall = diag / rowsums 
f1 = 2 * precision * recall / (precision + recall)

data.frame(precision, recall, f1)

The per-class metrics can be averaged over all the classes resulting in macro-averaged precision, recall and F-1.

(macroPrecision = mean(precision))

## [1] 0.4704058

(macroRecall = mean(recall))

## [1] 0.5134241

(macroF1 = mean(f1))

## [1] 0.4838887

In your technical report, you should describe the relevance of these accuracy metrics, identify the categories with higher thematic errors and discuss further processing tasks (e.g. improve land cover classification).

Let’s quote here a final advice to remote sensing practitioners from Foody (2002): Thus, in addition to providing basic accuracy assessment components such as a quantitative metric of accuracy and a confusion matrix, if appropriate, it may be helpful if the information on issues such as the sampling design used to acquire the testing set, the confidence in the ground data labels, the classification protocols and lineage of the data sets used were provided.

4. Reproducibility

Cite as: Lizarazo, I., 2021. A tutorial on accuracy assessment of land cover classification using R. Available at: https://rpubs.com/ials2un/th_accuracy

sessionInfo()

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] raster_3.4-5       RColorBrewer_1.1-2 leaflet_2.0.4.1    terra_1.0-10      
## [5] sp_1.4-5          
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6        knitr_1.31        magrittr_2.0.1    munsell_0.5.0    
##  [5] colorspace_2.0-0  lattice_0.20-41   R6_2.5.0          rlang_0.4.10     
##  [9] stringr_1.4.0     tools_4.0.4       rgdal_1.5-23      grid_4.0.4       
## [13] xfun_0.21         png_0.1-7         jquerylib_0.1.3   htmltools_0.5.1.1
## [17] crosstalk_1.1.1   yaml_2.2.1        digest_0.6.27     lifecycle_1.0.0  
## [21] farver_2.0.3      vctrs_0.3.6       base64enc_0.1-3   sass_0.3.1       
## [25] htmlwidgets_1.5.3 codetools_0.2-18  evaluate_0.14     rmarkdown_2.7    
## [29] stringi_1.5.3     compiler_4.0.4    bslib_0.2.4       scales_1.1.1     
## [33] jsonlite_1.7.2