RMSE comparison

Preprocessing

The preprocessing chain consists of loading the data and preparing a raster brick.

library(raster)

## Loading required package: sp

First, all data is loaded into memory.

load("data/GewataB1.rda")
load("data/GewataB2.rda")
load("data/GewataB3.rda")
load("data/GewataB4.rda")
load("data/GewataB5.rda")
load("data/GewataB7.rda")
load("data/vcfGewata.rda")

The preprocessed data gets sent to the regression model. By using step, we derive some information on which bands contribute to the accuracy of the model.

vcfGewata[vcfGewata > 100] <- NA
alldata <- brick(GewataB1, GewataB2, GewataB3, GewataB4, GewataB5, GewataB7, vcfGewata)
names(alldata) <- c("band1", "band2", "band3", "band4", "band5", "band7", "VCF")

Exploring the data

The following plots explore the relations between the input data.

pairs(alldata[, 3:5])

hist(alldata, xlim = c(0, 3000), ylim = c(0, 750000))#, breaks = seq(0, 500, by = 100))

Modeling

Next, we extract and calibrate the model.

## Extract all data to a data.frame
df <- as.data.frame(getValues(alldata))

## Calibrate model
model <- lm(VCF ~ df$band1 + df$band2 + df$band3 + df$band4 + df$band5 + df$band7, data = df)
step(model) #more statistics of the model

## Start:  AIC=7784664
## VCF ~ df$band1 + df$band2 + df$band3 + df$band4 + df$band5 + 
##     df$band7
## 
##            Df Sum of Sq       RSS     AIC
## - df$band7  1         6 133937265 7784662
## <none>                  133937259 7784664
## - df$band3  1     16420 133953679 7784884
## - df$band5  1   6021286 139958546 7864181
## - df$band1  1  17916072 151853332 8011681
## - df$band4  1  26255617 160192877 8108358
## - df$band2  1  30302856 164240115 8153476
## 
## Step:  AIC=7784662
## VCF ~ df$band1 + df$band2 + df$band3 + df$band4 + df$band5
## 
##            Df Sum of Sq       RSS     AIC
## <none>                  133937265 7784662
## - df$band3  1     21399 133958664 7784949
## - df$band5  1  17210900 151148165 8003262
## - df$band1  1  18173224 152110489 8014738
## - df$band2  1  31342888 165280153 8164889
## - df$band4  1  34235262 168172528 8196260

## 
## Call:
## lm(formula = VCF ~ df$band1 + df$band2 + df$band3 + df$band4 + 
##     df$band5, data = df)
## 
## Coefficients:
## (Intercept)     df$band1     df$band2     df$band3     df$band4  
##   85.495798     0.092905    -0.150505    -0.002504     0.016608  
##    df$band5  
##   -0.020044

summary(model)$r.squared

## [1] 0.8582281

## Use the model to predict tree cover
treeCoverMap <- predict(model,df)

predictedRaster <- alldata$band1
predictedRaster$predictedVCF <- treeCoverMap
predictedRaster <- predictedRaster$predictedVCF
predictedRaster[predictedRaster < 0] <- NA #Remove all invalid values

Checking the model result

As can be seen, the correlation coefficient that is achieved is highly significant. Now let’s see the difference between the model and the original VCF.

opar <- par(mfrow=c(1, 2)) 
plot(predictedRaster, main="Predicted VCF")
plot(alldata$VCF, main="Original VCF")

par(opar)
compareRaster <- predictedRaster - alldata$VCF
plot(compareRaster, main="Comparison between predicted and original VCF")

In the histogram, values close to 0 represent a higher accuracy:

hist(compareRaster)

RMSE

Finally, we inspect the RMSE

predictedRasterDF <- as.data.frame(predictedRaster) #we need a DataFrame to operate with its values
sqrt(mean((predictedRasterDF$predictedVCF-df$VCF)^2, na.rm = TRUE ))

## [1] 8.356825

As can be seen, the RMS indicates that the computed linear model corresponds reasonably well with the calculated land cover.

Comparison with other land use classes

Lastly, we inspect the RMSE when calculated with other land use classes.

rm(model) # Delete the model (500mb in RAM)
dev.off() # Clear all existing pots to free memory

## null device 
##           1

load("data/trainingPoly.rda")
trainingPoly@data$Code <- as.numeric(trainingPoly@data$Class)
trainingPoly@data$Class #1 is cropland, 2 is forest and 3 is wetland

##  [1] wetland  wetland  wetland  wetland  wetland  forest   forest  
##  [8] forest   forest   forest   cropland cropland cropland cropland
## [15] cropland cropland
## Levels: cropland forest wetland

classes <- rasterize(trainingPoly,predictedRaster, field='Code')

predictedRasterBrick <- brick(predictedRaster,alldata$VCF)
zonalStatistics<-zonal(predictedRasterBrick, classes, fun='mean')

zonalStatisticsDF <- as.data.frame(zonalStatistics)

rmseComparison <- (sqrt((zonalStatisticsDF$predictedVCF-zonalStatisticsDF$VCF)^2))
r<- c("RMSE per class:", "\nCropland:", rmseComparison[1],"\nForest:", rmseComparison[2],"\nWetland", rmseComparison[3])
#1 is cropland, 2 is forest and 3 is wetland

RMSE per class:

Cropland: 2.51838559231991

Forest: 2.31564664554675

Wetland 7.23413643448576