The preprocessing chain consists of loading the data and preparing a raster brick.
library(raster)
## Loading required package: sp
First, all data is loaded into memory.
load("data/GewataB1.rda")
load("data/GewataB2.rda")
load("data/GewataB3.rda")
load("data/GewataB4.rda")
load("data/GewataB5.rda")
load("data/GewataB7.rda")
load("data/vcfGewata.rda")
The preprocessed data gets sent to the regression model. By using step, we derive some information on which bands contribute to the accuracy of the model.
vcfGewata[vcfGewata > 100] <- NA
alldata <- brick(GewataB1, GewataB2, GewataB3, GewataB4, GewataB5, GewataB7, vcfGewata)
names(alldata) <- c("band1", "band2", "band3", "band4", "band5", "band7", "VCF")
The following plots explore the relations between the input data.
pairs(alldata[, 3:5])
hist(alldata, xlim = c(0, 3000), ylim = c(0, 750000))#, breaks = seq(0, 500, by = 100))
Next, we extract and calibrate the model.
## Extract all data to a data.frame
df <- as.data.frame(getValues(alldata))
## Calibrate model
model <- lm(VCF ~ df$band1 + df$band2 + df$band3 + df$band4 + df$band5 + df$band7, data = df)
step(model) #more statistics of the model
## Start: AIC=7784664
## VCF ~ df$band1 + df$band2 + df$band3 + df$band4 + df$band5 +
## df$band7
##
## Df Sum of Sq RSS AIC
## - df$band7 1 6 133937265 7784662
## <none> 133937259 7784664
## - df$band3 1 16420 133953679 7784884
## - df$band5 1 6021286 139958546 7864181
## - df$band1 1 17916072 151853332 8011681
## - df$band4 1 26255617 160192877 8108358
## - df$band2 1 30302856 164240115 8153476
##
## Step: AIC=7784662
## VCF ~ df$band1 + df$band2 + df$band3 + df$band4 + df$band5
##
## Df Sum of Sq RSS AIC
## <none> 133937265 7784662
## - df$band3 1 21399 133958664 7784949
## - df$band5 1 17210900 151148165 8003262
## - df$band1 1 18173224 152110489 8014738
## - df$band2 1 31342888 165280153 8164889
## - df$band4 1 34235262 168172528 8196260
##
## Call:
## lm(formula = VCF ~ df$band1 + df$band2 + df$band3 + df$band4 +
## df$band5, data = df)
##
## Coefficients:
## (Intercept) df$band1 df$band2 df$band3 df$band4
## 85.495798 0.092905 -0.150505 -0.002504 0.016608
## df$band5
## -0.020044
summary(model)$r.squared
## [1] 0.8582281
## Use the model to predict tree cover
treeCoverMap <- predict(model,df)
predictedRaster <- alldata$band1
predictedRaster$predictedVCF <- treeCoverMap
predictedRaster <- predictedRaster$predictedVCF
predictedRaster[predictedRaster < 0] <- NA #Remove all invalid values
As can be seen, the correlation coefficient that is achieved is highly significant. Now let’s see the difference between the model and the original VCF.
opar <- par(mfrow=c(1, 2))
plot(predictedRaster, main="Predicted VCF")
plot(alldata$VCF, main="Original VCF")
par(opar)
compareRaster <- predictedRaster - alldata$VCF
plot(compareRaster, main="Comparison between predicted and original VCF")
In the histogram, values close to 0 represent a higher accuracy:
hist(compareRaster)
Finally, we inspect the RMSE
predictedRasterDF <- as.data.frame(predictedRaster) #we need a DataFrame to operate with its values
sqrt(mean((predictedRasterDF$predictedVCF-df$VCF)^2, na.rm = TRUE ))
## [1] 8.356825
As can be seen, the RMS indicates that the computed linear model corresponds reasonably well with the calculated land cover.
Lastly, we inspect the RMSE when calculated with other land use classes.
rm(model) # Delete the model (500mb in RAM)
dev.off() # Clear all existing pots to free memory
## null device
## 1
load("data/trainingPoly.rda")
trainingPoly@data$Code <- as.numeric(trainingPoly@data$Class)
trainingPoly@data$Class #1 is cropland, 2 is forest and 3 is wetland
## [1] wetland wetland wetland wetland wetland forest forest
## [8] forest forest forest cropland cropland cropland cropland
## [15] cropland cropland
## Levels: cropland forest wetland
classes <- rasterize(trainingPoly,predictedRaster, field='Code')
predictedRasterBrick <- brick(predictedRaster,alldata$VCF)
zonalStatistics<-zonal(predictedRasterBrick, classes, fun='mean')
zonalStatisticsDF <- as.data.frame(zonalStatistics)
rmseComparison <- (sqrt((zonalStatisticsDF$predictedVCF-zonalStatisticsDF$VCF)^2))
r<- c("RMSE per class:", "\nCropland:", rmseComparison[1],"\nForest:", rmseComparison[2],"\nWetland", rmseComparison[3])
#1 is cropland, 2 is forest and 3 is wetland
RMSE per class:
Cropland: 2.51838559231991
Forest: 2.31564664554675
Wetland 7.23413643448576