ICTA

Use R!

R Markdown

Sobre Programación básica en R, puede ser consultada Aqui

Paquete que utilizan datos de la estadística espacial en software R Analysis of Spatial Data

Organización de datos espaciales

¿Es posible agregar covariables al modelo?

Al agregar covariables se puede mejorar considerablemente la exactitud del modelo y se puede afectar significativamente los resultados del análisis final. Incluyendo una covariable en el modelo se puede reducir el error en el modelo para incrementar la potencia de las pruebas de los factores.

Sobre Programación básica en R, puede ser consultada Aqui

Comienza cargando las librerías y preparando los datos a utilizar.

#El análisis de los datos Geoestadísticos
library(geoR)

## --------------------------------------------------------------
##  Analysis of Geostatistical Data
##  For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
##  geoR version 1.7-5.2 (built on 2016-05-02) is now loaded
## --------------------------------------------------------------

#Estadística descriptiva
library(MASS)  
#Estadística descriptiva
library(moments) 
#Obtener gráficos de dispersión 3d
library(scatterplot3d)

Descripción

Este conjunto de datos contiene el contenido de calcio medido en muestras de suelo tomadas de la capa de 0-20 cm en 178 ubicaciones dentro de una determinada área de estudio dividida en tres subáreas. La elevación en cada ubicación también se registró.

La primera región suele estar inundada durante la temporada de lluvias y no se usa como área experimental. Los niveles de calcio representarían el contenido natural en la región. La segunda región ha recibido fertilizantes hace un tiempo y está típicamente ocupada por campos de arroz. La tercera región ha recibido fertilizantes recientemente y se usa con frecuencia como área experimental.

Estos datos están disponibles en el paquete geoR (Paulo J. Ribeiro Jr and Peter J. Diggle (2016)).

Para realizar los análisis exploratorios a seguir, utilizaremos el paquete geoR e inicialmente necesitaremos crear un objeto del tipo geodata, como sigue.

Importacion del datos para el R

suelo = read.table("regiones.txt", header=TRUE)
# Leyendo las seis primeras líneas
head(suelo)

##   este norte altitud ca20 mg20 ctc20 ca40 mg40 ctc40 muestra region
## 1 5710  4829    6.10   52   18 106.0   40   16  86.3      a1     R3
## 2 5727  4875    6.05   57   20 131.1   49   21 123.1      a2     R3
## 3 5745  4922    6.30   72   22 114.6   63   22 101.7      a3     R3
## 4 5764  4969    6.60   74   11 114.4   74   12  95.6      a4     R3
## 5 5781  5015    6.60   68   34 124.4   44   36 106.3      a5     R3
## 6 5799  5062    5.75   45   27 132.9   27   22 105.3      a6     R3

# Mostrar de forma compacta la estructura de un objeto R arbitrario
str(camg)

## 'data.frame':    178 obs. of  10 variables:
##  $ east     : int  5710 5727 5745 5764 5781 5799 5817 5837 5873 5907 ...
##  $ north    : int  4829 4875 4922 4969 5015 5062 5109 5161 5254 5347 ...
##  $ elevation: num  6.1 6.05 6.3 6.6 6.6 5.75 5.35 5.1 5 5.2 ...
##  $ region   : Factor w/ 3 levels "1","2","3": 3 3 3 3 3 3 3 3 2 2 ...
##  $ ca020    : int  52 57 72 74 68 45 47 49 38 53 ...
##  $ mg020    : int  18 20 22 11 34 27 27 30 25 26 ...
##  $ ctc020   : num  106 131 115 114 124 ...
##  $ ca2040   : int  40 49 63 74 44 27 35 35 29 23 ...
##  $ mg2040   : int  16 21 22 12 36 22 25 25 28 18 ...
##  $ ctc2040  : num  86.3 123.1 101.7 95.6 106.3 ...

Eligiendo una variable para análisis y dos como covariables

par.ori <- par(no.readonly=T)
ctc40 <- as.geodata(suelo, coords.col=1:2, data.col=9, covar.col=c(3, 11))
summary(ctc40)

## Number of data points: 179 
## 
## Coordinates summary
##     este norte
## min 4957  4829
## max 5961  5720
## 
## Distance summary
##        min        max 
##   43.01163 1138.11774 
## 
## Data summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  73.100 114.500 128.000 127.257 139.100 212.400 
## 
## Covariates summary
##     altitud      region  
##  Min.   :3.300   R1: 14  
##  1st Qu.:5.200   R2: 49  
##  Median :5.650   R3:116  
##  Mean   :5.513           
##  3rd Qu.:6.000           
##  Max.   :6.600

Comprobando posible relación con covariables

with(suelo, plot(ctc40 ~ altitud)); 
with(suelo, lines(lowess(ctc40 ~ altitud)))

with(suelo, plot(ctc40 ~ region))

Gráficos con la inclusión de diferentes covariables

plot(ctc40, low=T)

plot(ctc40, low=T, trend=~region)

plot(ctc40, low=T, trend=~altitud)

plot(ctc40, low=T, trend=~altitud+region)

Con el fin de hacer constante la variabilidad en los datos, se ha evaludado la posibilidad de transformarlos usando la familia de transformaciones potenciales de Box-Cox.

par(mfrow=c(2,2), mar=c(3,3,0, 0.5))
boxcox(ctc40)
boxcox(ctc40, trend=~altitud)
boxcox(ctc40, trend=~region)
boxcox(ctc40, trend=~region+altitud)

Variograma para diferentes modelos

par(mfrow=c(2,2), mar=c(3,3,0.3, 0.3), mgp=c(1.8, .7, 0))
plot(variog(ctc40, max.dist=900))

## variog: computing omnidirectional variogram

plot(variog(ctc40, max.dist=900, trend=~region))

## variog: computing omnidirectional variogram

plot(variog(ctc40, max.dist=900, trend=~altitud))

## variog: computing omnidirectional variogram

plot(variog(ctc40, max.dist=900, trend=~region+altitud))

## variog: computing omnidirectional variogram

Identificando posibles datos atípicos/influyentes

par(par.ori)
with(suelo, boxplot(ctc40))

points(ctc40)
points(suelo[with(suelo, which(ctc40 > 200)), 1:2], col=2, pch=19)

Creando un nuevo objeto excluyendo observaciones atípicas

ctc40.1 <- subset(ctc40, data < 200)
class(ctc40.1)

## [1] "geodata"

summary(ctc40.1)

## Number of data points: 178 
## 
## Coordinates summary
##     este norte
## min 4957  4829
## max 5961  5673
## 
## Distance summary
##        min        max 
##   43.01163 1138.11774 
## 
## Data summary
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  73.1000 114.4000 128.0000 126.7787 138.6750 183.2000 
## 
## Covariates summary
##     altitud      region  
##  Min.   :3.400   R1: 13  
##  1st Qu.:5.200   R2: 49  
##  Median :5.650   R3:116  
##  Mean   :5.525           
##  3rd Qu.:6.000           
##  Max.   :6.600

Rehacer variogramas sin datos atípicos

par(mfrow=c(2,2), mar=c(3,3,0.3, 0.3), mgp=c(1.8, 0.7, 0))
plot(variog(ctc40.1, max.dist=900))

## variog: computing omnidirectional variogram

plot(variog(ctc40.1, max.dist=900, trend=~region))

## variog: computing omnidirectional variogram

plot(variog(ctc40.1, max.dist=900, trend=~altitud))

## variog: computing omnidirectional variogram

plot(variog(ctc40.1, max.dist=900, trend=~region+altitud))

## variog: computing omnidirectional variogram

diferentes funciones de correlación y modelos para media (covariables)

ml6 <- likfit(ctc40.1,  ini=c(600, 200), nug=100)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml7 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, kappa=1.5)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml8 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, kappa=2.5)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml9 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, trend=~region)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml10 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, kappa=1.5, trend=~region)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml11 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, kappa=2.5, trend=~region)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml12 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, trend=~region+altitud)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml13 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, kappa=1.5, trend=~region+altitud)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ml14 <- likfit(ctc40.1,  ini=c(600, 200), nug=100, kappa=2.5, trend=~region+altitud)

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

AIC_modelo1=round(data.frame(ml6$AIC,ml7$AIC,ml8$AIC,ml9$AIC,ml10$AIC,
                             ml11$AIC,
                       ml12$AIC,ml13$AIC,ml14$AIC),2);AIC_modelo1

##   ml6.AIC ml7.AIC ml8.AIC ml9.AIC ml10.AIC ml11.AIC ml12.AIC ml13.AIC
## 1 1570.36 1569.35 1569.33 1561.91  1563.54  1565.64  1556.19  1556.74
##   ml14.AIC
## 1  1556.87

plot(variog(ctc40.1, max.dist=900, kappa=1.5, trend=~region+altitud))

## variog: computing omnidirectional variogram

lines.variomodel(ml13,col="black")
legend("bottomright", c("matérn k=1.5"), lty=1, col=c("black"))

title("Ajuste del variograma con función de correlación Matern")

Predicción espacial (kriging)

## Predicción espacial
## tomando los bordes de la región como un todo de otro objeto:
data(ca20)
ctc40$borders <- ca20$borders
plot(ctc40)

## malla de puntos de predicción
args(pred_grid)

## function (coords, y.coords = NULL, ..., y.by = NULL, y.length.out = NULL, 
##     y.along.with = NULL) 
## NULL

gr <- pred_grid(ctc40$borders, by=20)

par(par.ori)
points(ctc40)
points(gr, col=2, pch=19, cex=0.3)

## construyendo las covariables en las ubicaciones de predicción
## regiones
ctc40$reg1 <- ca20$reg1
ctc40$reg1

##   east north
## 1 5590  5690
## 2 5340  5800
## 3 5220  5700
## 4 5250  5370
## 5 5350  5370
## 6 5450  5500
## 7 5510  5600
## 8 5590  5690

ctc40$reg2 <- ca20$reg2
ctc40$reg2

##   east north
## 1 5990  5100
## 2 5590  5300
## 3 5350  5370
## 4 5450  5500
## 5 5510  5600
## 6 5590  5690
## 7 5800  5690
## 8 5990  5690

ctc40$reg3 <- ca20$reg3
ctc40$reg3

##    east north
## 1  5990  5100
## 2  5590  5300
## 3  5350  5370
## 4  5150  5370
## 5  4920  5000
## 6  4920  4900
## 7  5150  4920
## 8  5350  4900
## 9  5590  4800
## 10 5780  4800

points(ctc40)
points(gr, col=2, pch=19, cex=0.3)
polygon(ctc40$reg1)
polygon(ctc40$reg2)
polygon(ctc40$reg3)

gr <- pred_grid(ctc40$borders, by=20)

par(par.ori)
points(ctc40)
points(gr, col=2, pch=19, cex=0.3)
gr0 <- gr[.geoR_inout(gr, ctc40$borders),]
points(gr0, col=4, pch=19, cex=0.3)

dim(gr)

## [1] 2754    2

dim(gr0)

## [1] 1858    2

cov.regiao <- numeric(nrow(gr0))
cov.regiao[.geoR_inout(gr0, ctc40$reg1)] <- "R1"
cov.regiao[.geoR_inout(gr0, ctc40$reg2)] <- "R2"
cov.regiao[.geoR_inout(gr0, ctc40$reg3)] <- "R3"
cov.regiao <- as.factor(cov.regiao)
table(cov.regiao)

## cov.regiao
##   R1   R2   R3 
##  239  608 1011

## interpolando altitud utilizando splines (paquete akima)
#install.packages("akima")
require(akima)

## Loading required package: akima

head(camg)

##   east north elevation region ca020 mg020 ctc020 ca2040 mg2040 ctc2040
## 1 5710  4829      6.10      3    52    18  106.0     40     16    86.3
## 2 5727  4875      6.05      3    57    20  131.1     49     21   123.1
## 3 5745  4922      6.30      3    72    22  114.6     63     22   101.7
## 4 5764  4969      6.60      3    74    11  114.4     74     12    95.6
## 5 5781  5015      6.60      3    68    34  124.4     44     36   106.3
## 6 5799  5062      5.75      3    45    27  132.9     27     22   105.3

cov.altitude <- interpp(camg[,1], camg[,2], camg[,3], gr0[,1], gr0[,2], linear=F, extrap=T)$z

## Warning in interpp(camg[, 1], camg[, 2], camg[, 3], gr0[, 1], gr0[, 2], :
## success: collinearities reduced through jitter

#cov.altitude

KC <- krige.control(type="OK",
                    obj.model=ml13,
                    trend.d=~regiao+altitude,
                    trend.l=~cov.regiao+cov.altitude)

length(cov.regiao)

## [1] 1858

length(cov.altitude)

## [1] 1858

gr = pred_grid(ctc40$borders, by=20)

s.out = output.control(n.predictive = 1000, n.post=1000, quant=0.95,                             thres=150)

ctc40.kc = krige.conv(ctc40, loc=gr, krige=krige.control(obj=ml13),                             output = s.out)

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with constant mean

## Warning in krige.conv(ctc40, loc = gr, krige = krige.control(obj =
## ml13), : .Random.seed not initialised. Creating it with runif(1)

## krige.conv: sampling from the predictive distribution (conditional simulations)
## krige.conv: Kriging performed using global neighbourhood

image(ctc40.kc, gr,
      ctc40$borders,
      col=terrain.colors(200),
      x.leg=c(4950, 5450), y.leg=c(4770, 4830))
polygon(ctc40$reg1)
polygon(ctc40$reg2)

## explorando más los resultados:
image(ctc40.kc, loc=gr,
      bor=ctc40$borders, col=terrain.colors(21),
      val=sqrt(ctc40.kc$krige.var),
      x.leg=c(4950, 5450), y.leg=c(4770, 4830))
polygon(ctc40$reg1)
polygon(ctc40$reg2)

## mapa de probabilidades: P(S(x)|y > 150)
image(ctc40.kc, loc=gr, bor=ctc40$borders, col=terrain.colors(21), 
      val=1-ctc40.kc$prob, x.leg=c(4950, 5450), y.leg=c(4770, 4830))
polygon(ctc40$reg1)
polygon(ctc40$reg2)

## quantiles
names(ctc40.kc$quant)

## NULL

## otros resultados

names(ctc40.kc)

##  [1] "predict"                   "krige.var"                
##  [3] "beta.est"                  "distribution"             
##  [5] "simulations"               "mean.simulations"         
##  [7] "variance.simulations"      "quantiles.simulations"    
##  [9] "probabilities.simulations" "sim.means"                
## [11] ".Random.seed"              "message"                  
## [13] "call"

dim(ctc40.kc$simul)

## [1] 1858 1000

length(ctc40.kc$mean.sim)

## [1] 1858

length(ctc40.kc$sim.means)

## [1] 1000

## histograma de media general
hist(ctc40.kc$sim.means)

## histograma de la proporción del área superior a cierto valor (150)
p150 <- apply(ctc40.kc$simul, 2, function(x) mean(x > 150))
hist(p150, prob=T); lines(density(p150))

quantile(p150, probs=c(0.025, 0.975))

##      2.5%     97.5% 
## 0.1210980 0.1921421

## evaluación del error de Monte Carlo (simulación)
names(ctc40.kc)

##  [1] "predict"                   "krige.var"                
##  [3] "beta.est"                  "distribution"             
##  [5] "simulations"               "mean.simulations"         
##  [7] "variance.simulations"      "quantiles.simulations"    
##  [9] "probabilities.simulations" "sim.means"                
## [11] ".Random.seed"              "message"                  
## [13] "call"

with(ctc40.kc, summary(mean.simulations - predict))

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -1.911833 -0.393825 -0.007700 -0.007421  0.374069  1.902357

Referencia

R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Paulo J. Ribeiro Jr and Peter J. Diggle (2016). geoR: Analysis of Geostatistical Data. R package version 1.7-5.2. https://CRAN.R-project.org/package=geoR

Gracias!

Geoestadística aplicada a suelos, Agronomía y ciencias afines: parte2

Olinda R. A.

Octubre, 2018

Universidad de San Carlos de Guatemala

Instituto de Ciencia y Tecnologías Agrícolas