Cargo Librerias
library(raster)
## Loading required package: sp
Espacio de Trabajo
setwd("~/Curso_R/git/Raster_Avanzado")
Descargo Bandas para calcular NDVI
#download.file(url = "https://github.com/GeoScripting-WUR/AdvancedRasterAnalysis/raw/gh-pages/data/GewataB2.rda", destfile = "data/GewataB2.rda", method = "wget")
#download.file(url = "https://github.com/GeoScripting-WUR/AdvancedRasterAnalysis/raw/gh-pages/data/GewataB3.rda", destfile = "data/GewataB3.rda", method = "wget")
#download.file(url = "https://github.com/GeoScripting-WUR/AdvancedRasterAnalysis/raw/gh-pages/data/GewataB4.rda", destfile = "data/GewataB4.rda", method = "wget")
Cargo Bandas al proyecto
load("data/GewataB2.rda")
load("data/GewataB3.rda")
load("data/GewataB4.rda")
Trabajo con las bandas de landsat
#info basica
GewataB2
## class : RasterLayer
## dimensions : 1177, 1548, 1821996 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 808755, 855195, 817635, 852945 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs
## data source : in memory
## names : gewataB2
## values : 127, 2220 (min, max)
#Estadistica basica
cellStats(GewataB2, stat = max)
## [1] 2220
cellStats(GewataB2, stat = mean)
## [1] 502.9058
# o tambien
maxValue(GewataB2)
## [1] 2220
# o maximo valor de las tres
max(c(maxValue(GewataB2),maxValue(GewataB3),maxValue(GewataB4)))
## [1] 4903
# summary es el mas usado para una vista rapida
summary(GewataB2)
## gewataB2
## Min. 127
## 1st Qu. 410
## Median 469
## 3rd Qu. 594
## Max. 2220
## NA's 0
Layer Stack de las bandas
gewata <- raster::brick(GewataB2, GewataB3, GewataB4)
# Acomodo salida grafica - no hizo falta - argandando la zona de representacion se resolvio
raster::hist(gewata)
# hay que acomodar las escalar en la grafica
par(mfrow=c(1,1))
Histogramas de las bandas con escalas similares
Aunque tambien puedo usar la funcion pairs(gewata) que nos muestra mas informacion ademas de los histogramas
Question: Given what we know about the location of these bands along the EM spectrum, how could these scatterplots be explained? ETM+ band 4 (nearly equivalent to band 5 in the Landsat 8 OLI sensor) is situated in the near infrared (NIR) region of the EM spectrum and is often used to describe vegetation-related features.
We observe a strong correlation between two of the Landsat bands of the gewata subset, but a very different distribution of values in band 4 (NIR). This distribution stems from the fact that vegetation reflects very highly in the NIR range, compared to the visual range of the EM spectrum. A commonly used metric for assessing vegetation dynamics, the normalized difference vegetation index (NDVI), explained in the previous lesson, takes advantage of this fact and is computed from Landsat bands 3 (visible red) and 4 (near infra-red).
Construyo el NDVI
# puedo hacerlo con calc o con overlay
NDVI <- overlay(GewataB4, GewataB3, fun=function(IRC,R){(IRC-R)/(IRC+R)})
writeRaster(NDVI, filename="data/NDVI.grd", datatype="FLT4S", overwrite=T)
## class : RasterLayer
## dimensions : 1177, 1548, 1821996 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 808755, 855195, 817635, 852945 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs
## data source : /home/hernan/Curso_R/git/Raster_Avanzado/data/NDVI.grd
## names : layer
## values : 0.1378273, 0.8975694 (min, max)
#load("data/NDVI.grd")
Visualizamos el resultado
CLASIFICACION DE RASTERS: Una de las mas importantes tareas en analisis de sensores remotos es la clasificacion de imagenes. En este proceso se cuenta con informacion de varias bandas (con la posibilidad de incluir otras bandas como NDVI o componentes principales). Exploraremos dos metodos - Supervisado: con clasificacion RandomFores - No Supervisado: con K-means
SUPERVISADO - RandomForest Este algoritmo es un metodo ensamblado de aprendisaje, usado para clasificacion y regresion. Trabaja con un muestreo aleatorio desde un dataset de entrenamiento y construye un arbol de clasificacion para cada dataset. Formado por ramas y hojas.
Las ramas contienen las reglas de decision definidas a menudo por umbrales del dataset. Las hojas representan las diferentes clases. En el siguiente link hay una descripcion mas detallada http://www.slideshare.net/0xdata/jan-vitek-distributedrandomforest522013
| Uno de los mayores avances de RandomForest es Out of the Bag (OOB), Error estimado y valor de performance alcanzado. Tambien valores predichos para cada clase del dataset de entrenamiento y una matriz de confusion |
Para correr este metodo tenemos que preparar los datos. Compuestos por las bandas de 2 a 4 de una imagen ETM+ del 2001 y el NDVI calculado. Ademas hay un archivo de vegetacion continua (VCF) producido para el mismo periodo. Sobre VCF se puede consultar mas informacion aca http://glcf.umd.edu/data/landsatTreecover/ Presenta una estimacion de cobertura de arboles en % y puede ser incluido como una covarianza en la clasificacion
# cargo VCF
load("data/vcfGewata.rda")
vcfGewata
## class : RasterLayer
## dimensions : 1177, 1548, 1821996 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 808755, 855195, 817635, 852945 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=36 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : vcf2000Gewata
## values : 0, 254 (min, max)
summary(vcfGewata)
## vcf2000Gewata
## Min. 0
## 1st Qu. 32
## Median 64
## 3rd Qu. 75
## Max. 254
## NA's 8289
Grafico VCF
Histograma de VCF
Los valores mayores a 100 corresponden a flags como agua, nubes, etc y deben ir a NA
vcfGewata[vcfGewata > 100] <- NA
plot(vcfGewata)
summary(vcfGewata)
## vcf2000Gewata
## Min. 0
## 1st Qu. 32
## Median 64
## 3rd Qu. 75
## Max. 100
## NA's 13712
Ahora para terminar de preparar los datos, juntamos todas las bandas (lamadas covarianzas) en un nuevo raster:brik. Ademas, dependiendo de los formatos de los datos podriamos tener que ajustar la escala de estos
gewata <- calc(gewata, fun=function(x) x / 10000) #continuo con datos enteros
# make a new raster brick of covariates by adding NDVI and VCF layers
covs <- addLayer(gewata, NDVI, vcfGewata)
Grafico de covs
Renombro correctamente para no tener dudas del procediemiento
names(covs) <- c("band2", "band3", "band4", "NDVI", "VCF")
plot(covs)
Ahora vamos a cargar poligonos a clasificar. Para otro proposito puede servir generar mascara forestal de 2001
load("data/trainingPoly.rda")
# superimpose training polygons onto ndvi plot
plot(NDVI)
plot(trainingPoly, add = TRUE)
Las Clases de nuestra capa de entrenamiento deben ser numericas Transformamos
trainingPoly@data
## OBJECTID Class
## 0 1 wetland
## 1 2 wetland
## 2 3 wetland
## 3 4 wetland
## 4 5 wetland
## 5 6 forest
## 6 7 forest
## 7 8 forest
## 8 9 forest
## 9 10 forest
## 10 11 cropland
## 11 12 cropland
## 12 13 cropland
## 13 14 cropland
## 14 15 cropland
## 15 16 cropland
trainingPoly@data$Class
## [1] wetland wetland wetland wetland wetland forest forest
## [8] forest forest forest cropland cropland cropland cropland
## [15] cropland cropland
## Levels: cropland forest wetland
str(trainingPoly@data$Class)
## Factor w/ 3 levels "cropland","forest",..: 3 3 3 3 3 2 2 2 2 2 ...
# we can convert to integer by using the as.numeric() function,
# which takes the factor levels
trainingPoly@data$Code <- as.numeric(trainingPoly@data$Class)
trainingPoly@data
## OBJECTID Class Code
## 0 1 wetland 3
## 1 2 wetland 3
## 2 3 wetland 3
## 3 4 wetland 3
## 4 5 wetland 3
## 5 6 forest 2
## 6 7 forest 2
## 7 8 forest 2
## 8 9 forest 2
## 9 10 forest 2
## 10 11 cropland 1
## 11 12 cropland 1
## 12 13 cropland 1
## 13 14 cropland 1
## 14 15 cropland 1
## 15 16 cropland 1
Ahora necesito rasterizar el vector de entrenamiento y uso la funcion rasterize()
classes <- rasterize(trainingPoly, NDVI, field="Code", progress="text")
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, cnt, value = p@polygons[[i]]@Polygons[[j]]):
## implicit list embedding of S4 objects is deprecated
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========= | 15%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|==================== | 32%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|=================================== | 55%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|============================================== | 72%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|=========================================================== | 92%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
##
# el parametro progress puede resultar muy util
Grafico resultados
Perfecto hasta aca’ Ahora vamos a enmascarar el paquete de raste cov con classes
covmasked <- mask(covs, classes)
plot(covmasked)
| AHORA SI TENEMOS EL DATASET DE ENTRENAMIENTO Nombro classes a la capa classes y la agrego al layer stack |
| ```r names(classes) <- “class” trainingbrick <- addLayer(covmasked, classes) |
| # grafico el training dataset plot(trainingbrick) ``` |
RandomFores necesita un data.frame como entrada Transformamos
# extract all values into a matrix
valuetable <- getValues(trainingbrick)
#omito los NA
valuetable <- na.omit(valuetable)
# ahora si el data.frame
valuetable <- as.data.frame(valuetable)
head(valuetable, n = 10)
## band2 band3 band4 NDVI VCF class
## 1 0.0354 0.0281 0.2527 0.7998575 77 2
## 2 0.0417 0.0301 0.2816 0.8068656 74 2
## 3 0.0418 0.0282 0.2857 0.8203249 72 2
## 4 0.0397 0.0282 0.2651 0.8077054 71 2
## 5 0.0355 0.0263 0.2237 0.7896000 77 2
## 6 0.0396 0.0281 0.2693 0.8110289 75 2
## 7 0.0375 0.0300 0.2817 0.8075072 76 2
## 8 0.0396 0.0263 0.2610 0.8169161 76 2
## 9 0.0354 0.0263 0.2320 0.7963608 76 2
## 10 0.0333 0.0263 0.2113 0.7786195 73 2
tail(valuetable, n = 10)
## band2 band3 band4 NDVI VCF class
## 36211 0.0451 0.0293 0.2984 0.8211779 76 2
## 36212 0.0406 0.0275 0.2561 0.8060649 76 2
## 36213 0.0361 0.0293 0.2179 0.7629450 75 2
## 36214 0.0406 0.0313 0.2222 0.7530572 74 2
## 36215 0.0405 0.0313 0.2222 0.7530572 73 2
## 36216 0.0406 0.0293 0.2646 0.8006125 79 2
## 36217 0.0429 0.0293 0.2774 0.8089338 70 2
## 36218 0.0451 0.0333 0.2900 0.7939994 77 2
## 36219 0.0406 0.0293 0.2689 0.8034876 81 2
## 36220 0.0429 0.0293 0.2434 0.7851118 73 2
# Convertimos en factor la columna classes
valuetable$class <- factor(valuetable$class, levels = c(1:3))
Para comprender mejor grafiquemos las tres clases del entrenamiento para NDVI
val_crop <- subset(valuetable, class == 1)
val_forest <- subset(valuetable, class == 2)
val_wetland <- subset(valuetable, class == 3)
# 1. NDVI
par(mfrow = c(3, 1))
hist(val_crop$NDVI, main = "cropland", xlab = "NDVI", xlim = c(0, 1), ylim = c(0, 4000), col = "orange")
hist(val_forest$NDVI, main = "forest", xlab = "NDVI", xlim = c(0, 1), ylim = c(0, 4000), col = "dark green")
hist(val_wetland$NDVI, main = "wetland", xlab = "NDVI", xlim = c(0, 1), ylim = c(0, 4000), col = "light blue")
Ahora para VCF
par(mfrow = c(1, 1))
# 2. VCF
par(mfrow = c(3, 1))
hist(val_crop$VCF, main = "cropland", xlab = "% tree cover", xlim = c(0, 100), ylim = c(0, 7500), col = "orange")
hist(val_forest$VCF, main = "forest", xlab = "% tree cover", xlim = c(0, 100), ylim = c(0, 7500), col = "dark green")
hist(val_wetland$VCF, main = "wetland", xlab = "% tree cover", xlim = c(0, 100), ylim = c(0, 7500), col = "light blue")
Ahora un scatterplots para dos bandas
par(mfrow = c(1, 1))
# 3. Bands 3 and 4 (scatterplots)
plot(band4 ~ band3, data = val_crop, pch = ".", col = "orange", xlim = c(0, 0.2), ylim = c(0, 0.5))
points(band4 ~ band3, data = val_forest, pch = ".", col = "dark green")
points(band4 ~ band3, data = val_wetland, pch = ".", col = "light blue")
legend("topright", legend=c("cropland", "forest", "wetland"), fill=c("orange", "dark green", "light blue"), bg="white")
When performing this classification on large datasets and with a large amount of training data, now may be a good time to save this table using the write.csv() command,
CORREMOS EL MODELO RandomForest
# construct a random forest model
# covariates (x) are found in columns 1 to 5 of valuetable
# training classes (y) are found in the 'class' column of valuetable
# caution: this step takes fairly long!
# but can be shortened by setting importance=FALSE
library(randomForest)
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
modelRF <- randomForest(x=valuetable[ ,c(1:5)], y=valuetable$class, importance = F)
save(modelRF, file = "modelRF.rda")
Analizamos el resultado y el tipo de objeto
# inspect the structure and element names of the resulting model
modelRF
##
## Call:
## randomForest(x = valuetable[, c(1:5)], y = valuetable$class, importance = F)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 3.16%
## Confusion matrix:
## 1 2 3 class.error
## 1 15174 3 526 3.368783e-02
## 2 0 13718 1 7.289161e-05
## 3 615 0 6183 9.046778e-02
class(modelRF)
## [1] "randomForest"
str(modelRF)
## List of 18
## $ call : language randomForest(x = valuetable[, c(1:5)], y = valuetable$class, importance = F)
## $ type : chr "classification"
## $ predicted : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 2 2 ...
## ..- attr(*, "names")= chr [1:36220] "1" "2" "3" "4" ...
## $ err.rate : num [1:500, 1:4] 0.0521 0.0519 0.0517 0.0504 0.0483 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:4] "OOB" "1" "2" "3"
## $ confusion : num [1:3, 1:4] 15174 0 615 3 13718 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "1" "2" "3"
## .. ..$ : chr [1:4] "1" "2" "3" "class.error"
## $ votes : matrix [1:36220, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:36220] "1" "2" "3" "4" ...
## .. ..$ : chr [1:3] "1" "2" "3"
## ..- attr(*, "class")= chr [1:2] "matrix" "votes"
## $ oob.times : num [1:36220] 197 182 183 188 184 181 186 191 191 192 ...
## $ classes : chr [1:3] "1" "2" "3"
## $ importance : num [1:5, 1] 5863 8154 2499 3172 3248
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:5] "band2" "band3" "band4" "NDVI" ...
## .. ..$ : chr "MeanDecreaseGini"
## $ importanceSD : NULL
## $ localImportance: NULL
## $ proximity : NULL
## $ ntree : num 500
## $ mtry : num 2
## $ forest :List of 14
## ..$ ndbigtree : int [1:500] 2433 2259 2387 2257 2229 2313 2359 2229 2267 2257 ...
## ..$ nodestatus: int [1:2557, 1:500] 1 1 1 1 1 1 1 1 -1 1 ...
## ..$ bestvar : int [1:2557, 1:500] 2 2 2 4 3 5 5 4 0 3 ...
## ..$ treemap : int [1:2557, 1:2, 1:500] 2 4 6 8 10 12 14 16 0 18 ...
## ..$ nodepred : int [1:2557, 1:500] 0 0 0 0 0 0 0 0 2 0 ...
## ..$ xbestsplit: num [1:2557, 1:500] 0.0377 0.036 0.0662 0.7551 0.216 ...
## ..$ pid : num [1:3] 1 1 1
## ..$ cutoff : num [1:3] 0.333 0.333 0.333
## ..$ ncat : Named int [1:5] 1 1 1 1 1
## .. ..- attr(*, "names")= chr [1:5] "band2" "band3" "band4" "NDVI" ...
## ..$ maxcat : int 1
## ..$ nrnodes : int 2557
## ..$ ntree : num 500
## ..$ nclass : int 3
## ..$ xlevels :List of 5
## .. ..$ band2: num 0
## .. ..$ band3: num 0
## .. ..$ band4: num 0
## .. ..$ NDVI : num 0
## .. ..$ VCF : num 0
## $ y : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 2 2 ...
## $ test : NULL
## $ inbag : NULL
## - attr(*, "class")= chr "randomForest"
names(modelRF)
## [1] "call" "type" "predicted"
## [4] "err.rate" "confusion" "votes"
## [7] "oob.times" "classes" "importance"
## [10] "importanceSD" "localImportance" "proximity"
## [13] "ntree" "mtry" "forest"
## [16] "y" "test" "inbag"
# inspect the confusion matrix of the OOB error assessment
modelRF$confusion
## 1 2 3 class.error
## 1 15174 3 526 3.368783e-02
## 2 0 13718 1 7.289161e-05
## 3 615 0 6183 9.046778e-02
# to make the confusion matrix more readable
colnames(modelRF$confusion) <- c("cropland", "forest", "wetland", "class.error")
rownames(modelRF$confusion) <- c("cropland", "forest", "wetland")
modelRF$confusion
## cropland forest wetland class.error
## cropland 15174 3 526 3.368783e-02
## forest 0 13718 1 7.289161e-05
## wetland 615 0 6183 9.046778e-02
## MeanDecreaseGini
## band2 5862.983
## band3 8153.737
## band4 2498.785
## NDVI 3171.958
## VCF 3247.638
Las figuras arriba muestran la importancia de las variables para random forest The figure above shows the variable importance plots for a Random Forest model showing the mean decrease in accuracy (left) and the decrease in Gini Impurity Coefficient (right) for each variable.
These two plots give two different reports on variable importance (see ?importance()).
First, the mean decrease in accuracy indicates the amount by which the classification accuracy decreased based on the OOB assessment. Second, the Gini impurity coefficient gives a measure of class homogeneity. More specifically, the decrease in the Gini impurity coefficient when including a particular variable is shown in the plot. From Wikipedia: “Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset”.
In this case, it seems that Gewata bands 3 and 4 have the highest impact on accuracy, while bands 3 and 2 score highest with the Gini impurity criterion. For especially large datasets, it may be helpful to know this information, and leave out less important variables for subsequent runs of the randomForest() function.
AHORA APLICAMOS EL MODELO AL RESTO DE LA IMAGEN notemos que el nombre de las capas del input.brick (covs) es igual al nombre de las columnas de la tabla de entrenamiento. Usaremos raster::predict() para predecir la clase Este modelo es derivado de una regresion lineal.
names(covs)
## [1] "band2" "band3" "band4" "NDVI" "VCF"
names(valuetable)
## [1] "band2" "band3" "band4" "NDVI" "VCF" "class"
#Predice la cobertura usando RF model
predLC <- predict(covs, model=modelRF, na.rm=T)
writeRaster(predLC, filename = "data/predLC.grd", datatype="INT1U", overwrite=T)
## class : RasterLayer
## dimensions : 1177, 1548, 1821996 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 808755, 855195, 817635, 852945 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs
## data source : /home/hernan/Curso_R/git/Raster_Avanzado/data/predLC.grd
## names : layer
## values : 1, 3 (min, max)
## attributes :
## ID value
## 1 1
## 2 2
## 3 3
# plot the results
# recall: 1 = cropland, 2 = forest, 3 = wetland
cols <- c("orange", "dark green", "light blue")
Graficamos
NO SUPERVISADO - K-Means __________________________________________ K-means va a agrupar los valores en clases. Sin datos de entrenamiento.
valuetable <- raster::getValues(covs)
head(valuetable)
## band2 band3 band4 NDVI VCF
## [1,] 0.0519 0.0520 0.2479 0.6532177 49
## [2,] 0.0563 0.0577 0.2310 0.6002771 37
## [3,] 0.0585 0.0674 0.2140 0.5209666 28
## [4,] 0.0563 0.0694 0.2054 0.4949054 28
## [5,] 0.0651 0.0771 0.2096 0.4621556 26
## [6,] 0.0739 0.0943 0.2351 0.4274438 19
# ahora construimos el objeto Kmeans
km <- kmeans(na.omit(valuetable), centers = 3, iter.max = 100, nstart = 10 )
head(km$cluster)
## [1] 1 2 2 2 2 2
unique(km$cluster)
## [1] 1 2 3
Como kmeans no sabe donde estan los valores NA, que generan ploblemas, necesitamos hacer una mascara raster de estos
# create a blank raster with default values of 0
rNA <- setValues(raster(covs), 0)
# loop through layers of covs
# assign a 1 to rNA wherever an NA is enountered in covs
for(i in 1:nlayers(covs)){
rNA[is.na(covs[[i]])] <- 1
}
# convert rNA to an integer vector
rNA <- getValues(rNA)
Ahora vamos a asignar el valor donde corresponde en el dataframe valuetable
# convert valuetable to a data.frame
valuetable <- as.data.frame(valuetable)
# if rNA is a 0, assign the cluster value at that position
valuetable$class[rNA==0] <- km$cluster
# if rNA is a 1, assign an NA at that position
valuetable$class[rNA==1] <- NA
plots, for example:
MarkDown Description: For more details on using R Markdown see http://rmarkdown.rstudio.com.