ESTIMACIÓN DE RENDIMIENTO DE ARROZ CON RANDOM FOREST

RESUMEN

“La altillanura colombiana presenta gran variabilidad espacio temporal de los de los rendimientos de arroz debido a la variabilidad de los suelos, el manejo y el clima. De allí la importancia que ésta se puede cuantificar, mapear georreferenciadamente de manera eficiente, mediante el sensoramiento remoto y proximal de suelo y los cultivos, que nos permitan gestionar la variabilidad de los agro ecosistemas” (Gomez. J et al, 2014)

INTRODUCCIÓN

Tener modelos de predicciones precisas y confiables sobre el rendimiento de los cultivos (producción), derivadas de modelos matemáticos, proporciona a los agricultores y a los asistentes técnicos una herramienta de apoyo que garantice la adecuación de políticas de suministro de alimentos, precios de importación y exportación de alimentos. La regresión lineal múltiple trata de ajustar modelos lineales o linealizables entre una variable dependiente y más de una variable independiente. En este tipo de modelos es importante testar la heterocedasticidad, la multicolinealidad y la especificación. En este Informe tratare de introducir un modelo para la estimación de rendimiento de cosecha en arroz (basado en el rpubs de Ing. Ivan Lizaraso) con variables de química-física de suelos, además que se aplicaran índices espectrales como el NDVI (Índice de vegetación diferencial normalizada), ya que el NDVI responde a cambios en la biomasa en la banda del verde , contenido de clorofila, y estrés hídrico en el dosel de la planta, es un índice sencillo y fácil de implementar, puede ser bueno para la estimación de propiedades superficiales cuando el cultivo no es muy denso.

Área de estudio

Éste estudió se llevó a cabo en el lote (Carlo I, finca Versalles), ubicada en el km 5 de la vía Cabuyaro, municipio de Puerto Lopez Meta, el suelo es un oxisol Typic Hapludox, paisaje altillanura plana. Los datos fueron tomados después de la primera cosecha del cultivo de arroz en el semestre A del año 2013, en etapa de cosecha, por lo que los datos que serán utilizados para predecir serán para el año 2014, del cual se tiene datos de cosecha en el segundo semestre del 2014. La recolección de muestras de suelo y de rendimiento se hizo bajo el método de grilla rígida de 70 x 70, se tomaron 3 muestras por celda. Se cosecho a mano en cada celda de 4 surcos por 5 metros de largo llevándolo a una resolución de pixel de 20 m

## Loading required package: sp

## Loading required package: lattice

## Loading required package: latticeExtra

## Loading required package: RColorBrewer

## Loading required package: leaflet

## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x 1140085.9 1141022.5
## y  951959.7  952676.9
## Is projected: TRUE 
## proj4string :
## [+proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666
## +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +units=m +no_defs]
## Data attributes:
##      Name                Area      
##  Length:1           Min.   :37.91  
##  Class :character   1st Qu.:37.91  
##  Mode  :character   Median :37.91  
##                     Mean   :37.91  
##                     3rd Qu.:37.91  
##                     Max.   :37.91

analicemos la normalidad de los datos de rendimiento mediante el histograma. y mediante la visualizacion de distribucion espacial, observar si los datos tiene dependencia espacial

Como se puede observar en la distribución espacial, es posible ver la variabilidad del rendimiento de la produccion de arroz, teniendo valores min: 1,966 Ton/Ha y max:3,75 Ton/Ha,para el productor en los llanos, rendimientos mayores a 3.50 Ton/Ha es una producción regula y mayores a 4 Ton/ha es una buena producción. Estos valores de rendimiento puede verse influenciados por la topografia del terreno, la caracteristicas de los suelos, la variabilidad climatica y el manejo de la aplicacion de insumos.

POr lo anteriro para la estimacion de rendimiento, se trabajara con variables que influyan en la calidad de rendimeinto como suelos (DA, Porosidad, Magnecio,Calcio, Materi organica, Agua disponoble en suelo) y topografía. Observemos el comportamiento topografico de la zona de estudio, el raster que se esta trabajando del DEM tiene una resolucion de 3,5cm.

## class       : RasterLayer 
## dimensions  : 251, 273, 68523  (nrow, ncol, ncell)
## resolution  : 3.588572, 3.588572  (x, y)
## extent      : 1140052, 1141032, 951779.3, 952680.1  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +units=m +no_defs 
## data source : D:\Estimacion RTDO_ PRAB\topRasPto.tif 
## names       : topRasPto 
## values      : 216.7254, 241.5484  (min, max)

## rgdal: version: 1.2-15, (SVN revision 691)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.2.0, released 2017/04/28
##  Path to GDAL shared files: C:/Users/EQUIPO1/Documents/R/win-library/3.4/sf/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: C:/Users/EQUIPO1/Documents/R/win-library/3.4/sf/proj
##  Linking to sp version: 1.2-5

Computando variables topográficas

Usemos la biblioteca dynatopmodel para calcular los valores del área de pendiente ascendente.

## class       : RasterLayer 
## dimensions  : 251, 273, 68523  (nrow, ncol, ncell)
## resolution  : 3.588572, 3.588572  (x, y)
## extent      : 1140052, 1141032, 951779.3, 952680.1  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +units=m +no_defs 
## data source : in memory
## names       : aspect 
## values      : 2.711938e-05, 6.283094  (min, max)

Al calcular la pendiente y orientación del terreno, nos permite tener una mejor percepción morfológica del área de estudio. observando que la zona azul tiene una pendiente baja, haciendo asi que en esa área se forma un bajo en el terreno (estas variables topograficas seran utilizadas para el entreno del modelo de predicción).

El NDVI sera una de las variables predictorias para el estudio, este indice fue derivado de las bandas RED y NIR de una camara cannon 100 en plataforma dron a una altura de 80m. El NDVI es un indicador gráfico simple que se puede usar para definir el verdor, densidad relativa y el estado de la vegetación (Powell et al. 2010). Este indice es calculado utilizando la ecuacion:

\[NDVI=\frac{NIR-RED}{NIR+RED}\\\]

## llamar capas con los que se entrenara el modelo 

NDVI=raster("NDVI_new_mask.tif")
Ca1=raster ("VCA1.tif")
Mg1 = raster ("VMG1.tif")
MO1 = raster ("VMO1.tif")
P1 = raster ("VP1.tif")
DA = raster ("DA1.tif")
ADS = raster ("ADS1.tif")
POR = raster ("POR1.tif")

########Ajustar extencion y resolucíon de las capas realizando un remuestreo
## a cada raster igualandolos a la resolucion y extencion del DEM  

n_NDVI <- resample(NDVI, dem1, method='bilinear')
n_Ca1 <- resample(Ca1, dem1, method='bilinear')
n_Mg1 <- resample(Mg1, dem1, method='bilinear')
n_MO1 <- resample(MO1, dem1, method='bilinear')
n_P1 <- resample(P1, dem1, method='bilinear')
n_DA <- resample(DA, dem1, method='bilinear')
n_ADS <- resample(ADS, dem1, method='bilinear')
n_POR <- resample(POR, dem1, method='bilinear')

topo1$NDVI<-n_NDVI
topo1$Ca<-n_Ca1
topo1$Mg<-n_Mg1
topo1$MO<-n_MO1
topo1$P<-n_P1
topo1$DA<-n_DA
topo1$ADS<-n_ADS
topo1$POR<-n_POR

La propiedades fisico_quimicos del suelo estan directamente relacionadas con la variabilidad climatica, estas propiedades presentan gran variabilidad espacial (como se vera en los siguientes mapas), segundo Florinsky (2012), mucha de esa variabilidad puede ser explicado por atributos mofológicos derivados del modelo digital de elevacion DEM (realizado en el parrafo anterior) .

library(rasterVis)

Ver<- shapefile("PoligonoVer.shp")

hist(NDVI)

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(NDVI, margin=F, par.settings=mapTheme, at = do.breaks(c(0,0.5),6),
                 main="NDVI RENDIMIENTO ARROZ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

hist(Ca1)

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 6% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(Ca1, margin=F, par.settings=mapTheme, at = do.breaks(c(1,2),4),
                 main="Calcio (10cm) ARROZ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

hist(Mg1)

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 6% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(Mg1, margin=F, par.settings=mapTheme, at = do.breaks(c(1,2),4),
                 main="Magnecio (10cm) ARROZ ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

hist(MO1)

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 6% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(MO1, margin=F, par.settings=mapTheme, at = do.breaks(c(1.4,5),4),
                 main="Materia Organica (10 cm) ARROZ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

hist(P1)

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 6% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(P1, margin=F, par.settings=mapTheme, at = do.breaks(c(41,90),4),
                 main="Fosforo (10cm) ARROZ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

hist(DA)

## Warning in sampleInt(length(x), size): size changed to n because it cannot
## be larger than n when replace is FALSE

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 57% of the raster cells were used (of which 0% were NA). 99995 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(DA, margin=F, par.settings=mapTheme, at = do.breaks(c(1,2),4),
                 main="Densidad aparente (10cm) ARROZ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

hist(ADS)

## Warning in sampleInt(length(x), size): size changed to n because it cannot
## be larger than n when replace is FALSE

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 57% of the raster cells were used (of which 0% were NA). 99997 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(ADS, margin=F, par.settings=mapTheme, at = do.breaks(c(2.5,7),4),
                 main="Agua disponible en suelo (10cm) ARROZ ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

hist(POR)

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 57% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlGn"))
plt <- levelplot(POR, margin=F, par.settings=mapTheme, at = do.breaks(c(37,55),4),
                 main="POROSIDAD (10cm) ARROZ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

#generar data frame
RDTO_pts <- RDTO_rep

##Comience a extraer los valores de esos píxeles particulares en los que se 
#encuentran nuestros diez puntos muestreados aleatoriamente.

data1 <- data.frame(RDTO_pts$X,RDTO_pts$Y,RDTO_pts$VRDTOAZ201,
                    extract(topo1,RDTO_pts))

RANDOM FOREST

Bosques al azar se generan mediante la elaboración de muestras bootstrap de los datos originales, y se construye un árbol de cada muestra bootstrap. Construiremos un árbol introduciendo divisiones binarias basándonos en las covariables. Para reducir la correlación entre los árboles, no se usaran todas las covariables. Sólo un subconjunto de covariables de tamaño predefinido mtry es seleccionado al azar en cada nodo. Cuando se detiene el proceso de tree-building la muestra del tamaño de un nodo será un umbral predefinido. Sin embargo, las propiedades de convergencia de los bosques al azar dependen de un equilibrio entre tamaño de nodo terminal y el tamaño de la muestra.

Asi que, antes de construir nuestro modelo, vamos a separar nuestros datos en pruebas y juegos de entrenamiento.

Pondremos el 40% de las observaciones en el conjunto de datos original en el train y el 60% restante de las observaciones en la prueba.

#Sample Indexes
indexes = sample(1:nrow(data1), size=0.4*nrow(data1))

# Split data
test = data1[indexes,]
dim(test)  # 80 35

## [1] 86 16

train = data1[-indexes,]
dim(train) # 120 35

## [1] 130  16

ntrain <- train[(names(train))] #remove columns "AREA" and "PERIMETER"
ntrain[1,3:16]

##   RDTO_pts.VRDTOAZ201     dem1        a     atb    slope    aspect
## 2              2.9179 239.6604 6.087454 9.15311 1.177352 0.4982586
##        NDVI       Ca       Mg       MO        P       DA      ADS      POR
## 2 0.3213523 1.614668 1.510063 2.132794 58.57316 1.304463 4.544449 48.85262

####################################################################
## Random Forest prediction of Surface Soil Moisture
set.seed(42)
library(randomForest)

## randomForest 4.6-12

## Type rfNews() to see new features/changes/bug fixes.

xx <- data.frame(ntrain$RDTO_pts.VRDTOAZ201, ntrain$dem1, ntrain$a, ntrain$atb, ntrain$slope, ntrain$aspect,
                 ntrain$NDVI, ntrain$Ca, ntrain$Mg, ntrain$MO, ntrain$P, ntrain$DA, ntrain$ADS,ntrain$POR) 
xx <- xx[complete.cases(xx),]
y <- xx[1]
x <- xx[-1] 

####Podemos ver que se construyeron 500 árboles, y el modelo muestreados al azar 5 ##predictores en cada división. También se muestra una matriz que contiene la ##predicción vs reales, así como el error de clasificación para cada clase. Vamos #a probar el modelo en el conjunto de datos de prueba

fit1 <- randomForest(RDTO_pts.VRDTOAZ201 ~ dem1 + a + atb + slope + aspect + NDVI + Ca +
                     Mg + MO + P + DA + ADS,data=train,ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit1) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ dem1 + a + atb +      slope + aspect + NDVI + Ca + Mg + MO + P + DA + ADS, data = train,      ntrees = 5000, mtry = 5, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##           Mean of squared residuals: 0.03943014
##                     % Var explained: 73.02

importance(fit1) # importance of each predictor

##          %IncMSE IncNodePurity
## dem1   20.501429     2.9901473
## a       5.833015     0.6379168
## atb     5.811206     0.4513258
## slope  10.873904     1.3387443
## aspect  9.904091     0.5349680
## NDVI   14.896905     2.7606670
## Ca     12.913845     0.7464813
## Mg     20.429186     4.9856882
## MO     16.147241     2.0813935
## P       6.556317     0.5638588
## DA      8.099211     0.5492669
## ADS     9.967893     0.8164587

par(mar=c(1,1,1,1))
varImpPlot(fit1,type=2)

rf1 <- predict(fit1, test)
summary(rf1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   2.118   2.378   2.624   2.647   2.856   3.403       1

compare1 <- cbind (actual=test$VRDTOAZ201, rf1) 
mean (apply(compare1, 1, min, na.rm=T)/apply(compare1, 1, max, na.rm=T)) # calculate accuracy

## Warning in FUN(newX[, i], ...): ningún argumento finito para min;
## retornando Inf

## Warning in FUN(newX[, i], ...): ningun argumento finito para max;
## retornando -Inf

## [1] NaN

# 
######################################################################
############## RF implemented in party library

library(party)

## Loading required package: grid

## Loading required package: mvtnorm

## Loading required package: modeltools

## Loading required package: stats4

## Loading required package: strucchange

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

cf <- cforest(RDTO_pts.VRDTOAZ201 ~ dem1 + a + atb + slope + aspect + NDVI + Ca +
                Mg + MO + P + DA + ADS, data = train) 
pt <- party:::prettytree(cf@ensemble[[1]], names(cf@data@get("input"))) 
pt

## 1) Mg <= 1.393812; criterion = 1, statistic = 24.045
##   2) slope <= 3.592305; criterion = 0.974, statistic = 4.987
##     3)*  weights = 0 
##   2) slope > 3.592305
##     4) Ca <= 1.370545; criterion = 0.889, statistic = 2.536
##       5)*  weights = 0 
##     4) Ca > 1.370545
##       6) MO <= 1.195528; criterion = 0.975, statistic = 5.054
##         7) NDVI <= 0.2180964; criterion = 0.796, statistic = 1.613
##           8)*  weights = 0 
##         7) NDVI > 0.2180964
##           9)*  weights = 0 
##       6) MO > 1.195528
##         10)*  weights = 0 
## 1) Mg > 1.393812
##   11) Mg <= 1.494349; criterion = 1, statistic = 13.267
##     12) slope <= 3.134843; criterion = 0.902, statistic = 2.745
##       13)*  weights = 0 
##     12) slope > 3.134843
##       14)*  weights = 0 
##   11) Mg > 1.494349
##     15)*  weights = 0

nt <- new("BinaryTree") 
nt@tree <- pt 
nt@data <- cf@data 
nt@responses <- cf@responses 
nt

## 
##   Conditional inference tree with 0 terminal nodes
## 
## Response:  RDTO_pts.VRDTOAZ201 
## Inputs:  dem1, a, atb, slope, aspect, NDVI, Ca, Mg, MO, P, DA, ADS 
## Number of observations:  130 
## 
## 1) Mg <= 1.393812; criterion = 1, statistic = 24.045
##   2) slope <= 3.592305; criterion = 0.974, statistic = 4.987
##     3)*  weights = 0 
##   2) slope > 3.592305
##     4) Ca <= 1.370545; criterion = 0.889, statistic = 2.536
##       5)*  weights = 0 
##     4) Ca > 1.370545
##       6) MO <= 1.195528; criterion = 0.975, statistic = 5.054
##         7) NDVI <= 0.2180964; criterion = 0.796, statistic = 1.613
##           8)*  weights = 0 
##         7) NDVI > 0.2180964
##           9)*  weights = 0 
##       6) MO > 1.195528
##         10)*  weights = 0 
## 1) Mg > 1.393812
##   11) Mg <= 1.494349; criterion = 1, statistic = 13.267
##     12) slope <= 3.134843; criterion = 0.902, statistic = 2.745
##       13)*  weights = 0 
##     12) slope > 3.134843
##       14)*  weights = 0 
##   11) Mg > 1.494349
##     15)*  weights = 0

plot(nt, type="simple", main = "A Tree from the RDTO ARROZ Random Forest")

#############################################################################################
#modelo de prediccion con todos los datos en todo el area de estudio 


r_RDTO <- predict(topo1, fit1, filename="rf_RDTO.tif", overwrite=T)
mapTheme <- rasterTheme(region=brewer.pal(4,"RdYlGn"))
plt <- levelplot(r_RDTO, margin=F, par.settings=mapTheme, at = do.breaks(c(1.9,4),6), main="Rendimiento Arroz con todas las variables")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

########################################################
#modelo solo con NDVI 
set.seed(42)
fit2 <- randomForest(RDTO_pts.VRDTOAZ201 ~ NDVI,   data=train, 
                     ntrees=5000, mtry=1, na.action=na.omit, importance=TRUE)
print(fit2) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ NDVI, data = train,      ntrees = 5000, mtry = 1, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.09469855
##                     % Var explained: 35.21

rf2 <- predict(fit2,test)
summary(rf2)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.138   2.424   2.617   2.649   2.816   3.511

hist(rf2)

compare2 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf2) 
mean (apply(compare2, 1, min, na.rm=T)/apply(compare2, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.9177401

#### Estimacion en todo el area de estudio

r_RDTO2 <- predict(topo1,fit2, filename="rf_NDVI.tif", overwrite=T)
hist(r_RDTO2)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO2, margin=F, par.settings=mapTheme, at = do.breaks(c(1.9,4),4), 
                 main="Estimación RDTO Arroz con  NDVI ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

##########################################################################
######Modelo predictivo con variables topograficas
set.seed(42)
fit3 <- randomForest(RDTO_pts.VRDTOAZ201 ~ dem1 + a +atb + aspect + slope ,   data=train, 
                     ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit3) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ dem1 + a + atb +      aspect + slope, data = train, ntrees = 5000, mtry = 5, importance = TRUE,      na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##           Mean of squared residuals: 0.09499911
##                     % Var explained: 35.01

rf3 <- predict(fit3,test)
summary(rf3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   2.227   2.442   2.571   2.648   2.915   3.148       1

hist(rf3)

compare3 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf3) 
mean (apply(compare3, 1, min, na.rm=T)/apply(compare3, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.9230083

#### estimation for the whole area

r_RDTO3 <- predict(topo1,fit3, filename="rf_topo.tif", overwrite=T)
hist(r_RDTO3)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO3, margin=F, par.settings=mapTheme, at = do.breaks(c(1.9,4),4), 
                 main="prediccion RDTO en arroz con topografia ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

####################################################################
#modelo solo con Agua Disponible 
set.seed(42)
fit4 <- randomForest(RDTO_pts.VRDTOAZ201 ~ ADS + a + atb,   data=train, 
                     ntrees=5000, mtry=1, na.action=na.omit, importance=TRUE)
print(fit4) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ ADS + a + atb, data = train,      ntrees = 5000, mtry = 1, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.07755439
##                     % Var explained: 46.94

rf4 <- predict(fit4,test)
summary(rf4)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.284   2.453   2.571   2.609   2.706   3.242

hist(rf4)

compare4 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf4) 
mean (apply(compare4, 1, min, na.rm=T)/apply(compare4, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.9096488

#### Estimacion en todo el area de estudio

r_RDTO4 <- predict(topo1,fit4, filename="rf_ADS.tif", overwrite=T)
hist(r_RDTO4)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO4, margin=F, par.settings=mapTheme, at = do.breaks(c(1.9,4),4), 
                 main="Estimación RDTO Arroz respecto  Agua Disponoble en Suelo ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

################################################
set.seed(42)
fit6 <- randomForest(RDTO_pts.VRDTOAZ201 ~ DA+POR,   data=train, 
                     ntrees=5000, mtry=2, na.action=na.omit, importance=TRUE)
print(fit6) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ DA + POR, data = train,      ntrees = 5000, mtry = 2, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##           Mean of squared residuals: 0.1403997
##                     % Var explained: 3.94

rf6 <- predict(fit6,test)
summary(rf6)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.265   2.441   2.593   2.649   2.838   3.267

hist(rf6)

compare6 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf6) 
mean (apply(compare6, 1, min, na.rm=T)/apply(compare6, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.9064451

#### Estimacion en todo el area de estudio

r_RDTO6 <- predict(topo1,fit6, filename="rf_FisicaSuelo.tif", overwrite=T)
hist(r_RDTO6)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO6, margin=F, par.settings=mapTheme, at = do.breaks(c(1.9,4),4), 
                 main="Estimación RDTO Arroz a partir de porosidad ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

#modelo solo Materia organica 
set.seed(42)
fit7 <- randomForest(RDTO_pts.VRDTOAZ201 ~ MO,   data=train, 
                     ntrees=5000, mtry=2, na.action=na.omit, importance=TRUE)

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range

print(fit7) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ MO, data = train,      ntrees = 5000, mtry = 2, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.1091893
##                     % Var explained: 25.3

rf7 <- predict(fit7,test)
summary(rf7)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.005   2.425   2.666   2.677   2.909   3.353

hist(rf7)

compare7 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf7) 
mean (apply(compare7, 1, min, na.rm=T)/apply(compare7, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.9141911

#### Estimacion en todo el area de estudio

r_RDTO7 <- predict(topo1,fit7, filename="rf_MateriaOrg.tif", overwrite=T)
hist(r_RDTO7)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO7, margin=F, par.settings=mapTheme, at = do.breaks(c(0,5),6), 
                 main="Estimación RDTO Arroz a partir de Materia Organica ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

##########################################################################
#modelopara Calcio
set.seed(42)
fit8 <- randomForest(RDTO_pts.VRDTOAZ201 ~ Ca,   data=train, 
                     ntrees=5000, mtry=2, na.action=na.omit, importance=TRUE)

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range

print(fit8) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ Ca, data = train,      ntrees = 5000, mtry = 2, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.1776924
##                     % Var explained: -21.57

rf8 <- predict(fit8,test)
summary(rf8)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.206   2.415   2.639   2.673   2.938   3.414

hist(rf8)

compare8 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf8) 
mean (apply(compare8, 1, min, na.rm=T)/apply(compare8, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.8885065

#### Estimacion en todo el area de estudio

r_RDTO8 <- predict(topo1,fit8, filename="rf_Calcio.tif", overwrite=T)
hist(r_RDTO8)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO8, margin=F, par.settings=mapTheme, at = do.breaks(c(0,5),6), 
                 main="Estimación RDTO Arroz a partir de Calcio ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

###################################################
#modelopara Magnecio
set.seed(42)
fit9 <- randomForest(RDTO_pts.VRDTOAZ201 ~ Mg,   data=train, 
                     ntrees=5000, mtry=2, na.action=na.omit, importance=TRUE)

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range

print(fit9) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ Mg, data = train,      ntrees = 5000, mtry = 2, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.08289887
##                     % Var explained: 43.28

rf9 <- predict(fit9,test)
summary(rf9)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.093   2.373   2.577   2.655   2.940   3.571

hist(rf9)

compare9 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf9) 
mean (apply(compare9, 1, min, na.rm=T)/apply(compare9, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.9161217

#### Estimacion en todo el area de estudio

r_RDTO9 <- predict(topo1,fit9, filename="rf_Magnecio.tif", overwrite=T)
hist(r_RDTO9)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO9, margin=F, par.settings=mapTheme, at = do.breaks(c(0,5),6), 
                 main="Estimación RDTO Arroz a partir de Magnesio ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

###################################################
#modelopara Magnecio, NDVI
set.seed(42)
fit10 <- randomForest(RDTO_pts.VRDTOAZ201 ~ Mg + NDVI,   data=train, 
                     ntrees=5000, mtry=2, na.action=na.omit, importance=TRUE)
print(fit10) # view results

## 
## Call:
##  randomForest(formula = RDTO_pts.VRDTOAZ201 ~ Mg + NDVI, data = train,      ntrees = 5000, mtry = 2, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##           Mean of squared residuals: 0.06766813
##                     % Var explained: 53.7

rf10 <- predict(fit10,test)
summary(rf10)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.135   2.396   2.621   2.660   2.892   3.578

hist(rf10)

compare10 <- cbind (actual=test$RDTO_pts.VRDTOAZ201, rf10) 
mean (apply(compare10, 1, min, na.rm=T)/apply(compare10, 1, max, na.rm=T)) # calculate accuracy

## [1] 0.9313555

#### Estimacion en todo el area de estudio

r_RDTO10 <- predict(topo1,fit10, filename="rf_Magnecio_NDVI.tif", overwrite=T)
hist(r_RDTO10)

mapTheme <- rasterTheme(region=brewer.pal(6,"Spectral"))
plt <- levelplot(r_RDTO10, margin=F, par.settings=mapTheme, at = do.breaks(c(1.9,4),4), 
                 main="Estimación RDTO Arroz a partir de Magnesio y NDVI ")
plt + layer(sp.lines(Ver, col="red", lwd=0.5))

se genera varios modelos de ajuste, relacionando las diferentes variables. Se relacionara cada variable, analizando los datos de nuestra muestra (variables de suelo y topografia) que se asemejen a los datos reales (rendimiento). Aplicaremos correlacion de Pearson

## $`pearson correlation coefficient`
##                    rf_RDTO   rf_NDVI   rf_topo    rf_ADS rf_FisicaSuelo
## rf_RDTO          1.0000000 0.8044078 0.8518235 0.8001296      0.6306178
## rf_NDVI          0.8044078 1.0000000 0.6395873 0.6876340      0.5364581
## rf_topo          0.8518235 0.6395873 1.0000000 0.7483346      0.5111675
## rf_ADS           0.8001296 0.6876340 0.7483346 1.0000000      0.5247401
## rf_FisicaSuelo   0.6306178 0.5364581 0.5111675 0.5247401      1.0000000
## rf_MateriaOrg    0.7216269 0.6552861 0.5320005 0.5739929      0.5824581
## rf_Calcio        0.3908343 0.2524849 0.3369290 0.2641205      0.3035571
## rf_Magnecio      0.8988710 0.7429772 0.7035800 0.7287540      0.6111609
## rf_Magnecio_NDVI 0.9447426 0.8167105 0.7368602 0.7461704      0.6285152
##                  rf_MateriaOrg rf_Calcio rf_Magnecio rf_Magnecio_NDVI
## rf_RDTO              0.7216269 0.3908343   0.8988710        0.9447426
## rf_NDVI              0.6552861 0.2524849   0.7429772        0.8167105
## rf_topo              0.5320005 0.3369290   0.7035800        0.7368602
## rf_ADS               0.5739929 0.2641205   0.7287540        0.7461704
## rf_FisicaSuelo       0.5824581 0.3035571   0.6111609        0.6285152
## rf_MateriaOrg        1.0000000 0.2653211   0.6702198        0.7027544
## rf_Calcio            0.2653211 1.0000000   0.3619175        0.3656543
## rf_Magnecio          0.6702198 0.3619175   1.0000000        0.9616429
## rf_Magnecio_NDVI     0.7027544 0.3656543   0.9616429        1.0000000
## 
## $mean
##          rf_RDTO          rf_NDVI          rf_topo           rf_ADS 
##         2.650246         2.659714         2.643470         2.636416 
##   rf_FisicaSuelo    rf_MateriaOrg        rf_Calcio      rf_Magnecio 
##         2.659298         2.649219         2.624803         2.679200 
## rf_Magnecio_NDVI 
##         2.673700

CONCLUSIÓN

Analizando los datos de rendimiento utilizados en la implementación del modelo, se observa que existe una alta dependencia con el Magnecio seguido por la influencia de las caracteristicas topograficas del terreno. Durante la corrida de laetapa de ajuste el NDVI tambien se categorizo como variable influyente en el modelo, pero este no es del todo confiable al tener encuenta solo los datos tomados por indice, ya que me representa áreas donde se produjo el mayor rendimeinto durante la cosecha de arroz 2014

Referencias Bibliograficas

Gomez.J & Bernal. J et al(2014). VARIABILIDAD ESPACIAL DE LA CONDUCTIVIDAD ELECTRICA APARENTE Y SU RELACION CON LOS ATRIBUTOS DEL SUELO EN UN OXISOL DE LA ALTILLANURA. 2.rpubs Ivan Lizarazo.(2017). Soil Moisture from remotely sensed data.https://rpubs.com/ials2un/ssm. [consulta 16 de noviembre 2017]
Barge.S et al(2016).Digital mapping of sand, clay, and soil carbon by Random Forest models under different spatial resolutions.
Rueda-Ayala. V et al (2015). DEVELOPMENT OF YIELD PREDICTION MODELS IN THE MAIZE CROP USING SPECTRAL DATA FOR PRECISION AGRICULTURE APPLICATIONS
Gouri. B et al (2017). Soil organic carbon mapping using remote sensing techniques and multivariate regression model