Use R!
Sobre Programación básica en R, puede ser consultada Aqui
Paquete que utilizan datos de la estadÃstica espacial en software R Analysis of Spatial Data
Organización de datos espaciales
Al agregar covariables se puede mejorar considerablemente la exactitud del modelo y se puede afectar significativamente los resultados del análisis final. Incluyendo una covariable en el modelo se puede reducir el error en el modelo para incrementar la potencia de las pruebas de los factores.
Sobre Programación básica en R, puede ser consultada Aqui
Comienza cargando las librerÃas y preparando los datos a utilizar.
#El análisis de los datos GeoestadÃsticos
library(geoR)
## --------------------------------------------------------------
## Analysis of Geostatistical Data
## For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
## geoR version 1.7-5.2 (built on 2016-05-02) is now loaded
## --------------------------------------------------------------
#EstadÃstica descriptiva
library(MASS)
#EstadÃstica descriptiva
library(moments)
#Obtener gráficos de dispersión 3d
library(scatterplot3d)
Este conjunto de datos contiene el contenido de calcio medido en muestras de suelo tomadas de la capa de 0-20 cm en 178 ubicaciones dentro de una determinada área de estudio dividida en tres subáreas. La elevación en cada ubicación también se registró.
La primera región suele estar inundada durante la temporada de lluvias y no se usa como área experimental. Los niveles de calcio representarÃan el contenido natural en la región. La segunda región ha recibido fertilizantes hace un tiempo y está tÃpicamente ocupada por campos de arroz. La tercera región ha recibido fertilizantes recientemente y se usa con frecuencia como área experimental.
Estos datos están disponibles en el paquete geoR (Paulo J. Ribeiro Jr and Peter J. Diggle (2016)).
Para realizar los análisis exploratorios a seguir, utilizaremos el paquete geoR e inicialmente necesitaremos crear un objeto del tipo geodata, como sigue.
Importacion del datos para el R
suelo = read.table("regiones.txt", header=TRUE)
# Leyendo las seis primeras lÃneas
head(suelo)
## este norte altitud ca20 mg20 ctc20 ca40 mg40 ctc40 muestra region
## 1 5710 4829 6.10 52 18 106.0 40 16 86.3 a1 R3
## 2 5727 4875 6.05 57 20 131.1 49 21 123.1 a2 R3
## 3 5745 4922 6.30 72 22 114.6 63 22 101.7 a3 R3
## 4 5764 4969 6.60 74 11 114.4 74 12 95.6 a4 R3
## 5 5781 5015 6.60 68 34 124.4 44 36 106.3 a5 R3
## 6 5799 5062 5.75 45 27 132.9 27 22 105.3 a6 R3
# Mostrar de forma compacta la estructura de un objeto R arbitrario
str(camg)
## 'data.frame': 178 obs. of 10 variables:
## $ east : int 5710 5727 5745 5764 5781 5799 5817 5837 5873 5907 ...
## $ north : int 4829 4875 4922 4969 5015 5062 5109 5161 5254 5347 ...
## $ elevation: num 6.1 6.05 6.3 6.6 6.6 5.75 5.35 5.1 5 5.2 ...
## $ region : Factor w/ 3 levels "1","2","3": 3 3 3 3 3 3 3 3 2 2 ...
## $ ca020 : int 52 57 72 74 68 45 47 49 38 53 ...
## $ mg020 : int 18 20 22 11 34 27 27 30 25 26 ...
## $ ctc020 : num 106 131 115 114 124 ...
## $ ca2040 : int 40 49 63 74 44 27 35 35 29 23 ...
## $ mg2040 : int 16 21 22 12 36 22 25 25 28 18 ...
## $ ctc2040 : num 86.3 123.1 101.7 95.6 106.3 ...
par.ori <- par(no.readonly=T)
ctc40 <- as.geodata(suelo, coords.col=1:2, data.col=9, covar.col=c(3, 11))
summary(ctc40)
## Number of data points: 179
##
## Coordinates summary
## este norte
## min 4957 4829
## max 5961 5720
##
## Distance summary
## min max
## 43.01163 1138.11774
##
## Data summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 73.100 114.500 128.000 127.257 139.100 212.400
##
## Covariates summary
## altitud region
## Min. :3.300 R1: 14
## 1st Qu.:5.200 R2: 49
## Median :5.650 R3:116
## Mean :5.513
## 3rd Qu.:6.000
## Max. :6.600
with(suelo, plot(ctc40 ~ altitud));
with(suelo, lines(lowess(ctc40 ~ altitud)))
with(suelo, plot(ctc40 ~ region))
Gráficos con la inclusión de diferentes covariables
plot(ctc40, low=T)
plot(ctc40, low=T, trend=~region)
plot(ctc40, low=T, trend=~altitud)
plot(ctc40, low=T, trend=~altitud+region)
Con el fin de hacer constante la variabilidad en los datos, se ha evaludado la posibilidad de transformarlos usando la familia de transformaciones potenciales de Box-Cox.
par(mfrow=c(2,2), mar=c(3,3,0, 0.5))
boxcox(ctc40)
boxcox(ctc40, trend=~altitud)
boxcox(ctc40, trend=~region)
boxcox(ctc40, trend=~region+altitud)
Variograma para diferentes modelos
par(mfrow=c(2,2), mar=c(3,3,0.3, 0.3), mgp=c(1.8, .7, 0))
plot(variog(ctc40, max.dist=900))
## variog: computing omnidirectional variogram
plot(variog(ctc40, max.dist=900, trend=~region))
## variog: computing omnidirectional variogram
plot(variog(ctc40, max.dist=900, trend=~altitud))
## variog: computing omnidirectional variogram
plot(variog(ctc40, max.dist=900, trend=~region+altitud))
## variog: computing omnidirectional variogram
Identificando posibles datos atÃpicos/influyentes
par(par.ori)
with(suelo, boxplot(ctc40))
points(ctc40)
points(suelo[with(suelo, which(ctc40 > 200)), 1:2], col=2, pch=19)
Creando un nuevo objeto excluyendo observaciones atÃpicas
ctc40.1 <- subset(ctc40, data < 200)
class(ctc40.1)
## [1] "geodata"
summary(ctc40.1)
## Number of data points: 178
##
## Coordinates summary
## este norte
## min 4957 4829
## max 5961 5673
##
## Distance summary
## min max
## 43.01163 1138.11774
##
## Data summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 73.1000 114.4000 128.0000 126.7787 138.6750 183.2000
##
## Covariates summary
## altitud region
## Min. :3.400 R1: 13
## 1st Qu.:5.200 R2: 49
## Median :5.650 R3:116
## Mean :5.525
## 3rd Qu.:6.000
## Max. :6.600
Rehacer variogramas sin datos atÃpicos
par(mfrow=c(2,2), mar=c(3,3,0.3, 0.3), mgp=c(1.8, 0.7, 0))
plot(variog(ctc40.1, max.dist=900))
## variog: computing omnidirectional variogram
plot(variog(ctc40.1, max.dist=900, trend=~region))
## variog: computing omnidirectional variogram
plot(variog(ctc40.1, max.dist=900, trend=~altitud))
## variog: computing omnidirectional variogram
plot(variog(ctc40.1, max.dist=900, trend=~region+altitud))
## variog: computing omnidirectional variogram
diferentes funciones de correlación y modelos para media (covariables)
ml6 <- likfit(ctc40.1, ini=c(600, 200), nug=100)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml7 <- likfit(ctc40.1, ini=c(600, 200), nug=100, kappa=1.5)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml8 <- likfit(ctc40.1, ini=c(600, 200), nug=100, kappa=2.5)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml9 <- likfit(ctc40.1, ini=c(600, 200), nug=100, trend=~region)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml10 <- likfit(ctc40.1, ini=c(600, 200), nug=100, kappa=1.5, trend=~region)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml11 <- likfit(ctc40.1, ini=c(600, 200), nug=100, kappa=2.5, trend=~region)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml12 <- likfit(ctc40.1, ini=c(600, 200), nug=100, trend=~region+altitud)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml13 <- likfit(ctc40.1, ini=c(600, 200), nug=100, kappa=1.5, trend=~region+altitud)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
ml14 <- likfit(ctc40.1, ini=c(600, 200), nug=100, kappa=2.5, trend=~region+altitud)
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
AIC_modelo1=round(data.frame(ml6$AIC,ml7$AIC,ml8$AIC,ml9$AIC,ml10$AIC,
ml11$AIC,
ml12$AIC,ml13$AIC,ml14$AIC),2);AIC_modelo1
## ml6.AIC ml7.AIC ml8.AIC ml9.AIC ml10.AIC ml11.AIC ml12.AIC ml13.AIC
## 1 1570.36 1569.35 1569.33 1561.91 1563.54 1565.64 1556.19 1556.74
## ml14.AIC
## 1 1556.87
plot(variog(ctc40.1, max.dist=900, kappa=1.5, trend=~region+altitud))
## variog: computing omnidirectional variogram
lines.variomodel(ml13,col="black")
legend("bottomright", c("matérn k=1.5"), lty=1, col=c("black"))
title("Ajuste del variograma con función de correlación Matern")
## Predicción espacial
## tomando los bordes de la región como un todo de otro objeto:
data(ca20)
ctc40$borders <- ca20$borders
plot(ctc40)
## malla de puntos de predicción
args(pred_grid)
## function (coords, y.coords = NULL, ..., y.by = NULL, y.length.out = NULL,
## y.along.with = NULL)
## NULL
gr <- pred_grid(ctc40$borders, by=20)
par(par.ori)
points(ctc40)
points(gr, col=2, pch=19, cex=0.3)
## construyendo las covariables en las ubicaciones de predicción
## regiones
ctc40$reg1 <- ca20$reg1
ctc40$reg1
## east north
## 1 5590 5690
## 2 5340 5800
## 3 5220 5700
## 4 5250 5370
## 5 5350 5370
## 6 5450 5500
## 7 5510 5600
## 8 5590 5690
ctc40$reg2 <- ca20$reg2
ctc40$reg2
## east north
## 1 5990 5100
## 2 5590 5300
## 3 5350 5370
## 4 5450 5500
## 5 5510 5600
## 6 5590 5690
## 7 5800 5690
## 8 5990 5690
ctc40$reg3 <- ca20$reg3
ctc40$reg3
## east north
## 1 5990 5100
## 2 5590 5300
## 3 5350 5370
## 4 5150 5370
## 5 4920 5000
## 6 4920 4900
## 7 5150 4920
## 8 5350 4900
## 9 5590 4800
## 10 5780 4800
points(ctc40)
points(gr, col=2, pch=19, cex=0.3)
polygon(ctc40$reg1)
polygon(ctc40$reg2)
polygon(ctc40$reg3)
gr <- pred_grid(ctc40$borders, by=20)
par(par.ori)
points(ctc40)
points(gr, col=2, pch=19, cex=0.3)
gr0 <- gr[.geoR_inout(gr, ctc40$borders),]
points(gr0, col=4, pch=19, cex=0.3)
dim(gr)
## [1] 2754 2
dim(gr0)
## [1] 1858 2
cov.regiao <- numeric(nrow(gr0))
cov.regiao[.geoR_inout(gr0, ctc40$reg1)] <- "R1"
cov.regiao[.geoR_inout(gr0, ctc40$reg2)] <- "R2"
cov.regiao[.geoR_inout(gr0, ctc40$reg3)] <- "R3"
cov.regiao <- as.factor(cov.regiao)
table(cov.regiao)
## cov.regiao
## R1 R2 R3
## 239 608 1011
## interpolando altitud utilizando splines (paquete akima)
#install.packages("akima")
require(akima)
## Loading required package: akima
head(camg)
## east north elevation region ca020 mg020 ctc020 ca2040 mg2040 ctc2040
## 1 5710 4829 6.10 3 52 18 106.0 40 16 86.3
## 2 5727 4875 6.05 3 57 20 131.1 49 21 123.1
## 3 5745 4922 6.30 3 72 22 114.6 63 22 101.7
## 4 5764 4969 6.60 3 74 11 114.4 74 12 95.6
## 5 5781 5015 6.60 3 68 34 124.4 44 36 106.3
## 6 5799 5062 5.75 3 45 27 132.9 27 22 105.3
cov.altitude <- interpp(camg[,1], camg[,2], camg[,3], gr0[,1], gr0[,2], linear=F, extrap=T)$z
## Warning in interpp(camg[, 1], camg[, 2], camg[, 3], gr0[, 1], gr0[, 2], :
## success: collinearities reduced through jitter
#cov.altitude
KC <- krige.control(type="OK",
obj.model=ml13,
trend.d=~regiao+altitude,
trend.l=~cov.regiao+cov.altitude)
length(cov.regiao)
## [1] 1858
length(cov.altitude)
## [1] 1858
gr = pred_grid(ctc40$borders, by=20)
s.out = output.control(n.predictive = 1000, n.post=1000, quant=0.95, thres=150)
ctc40.kc = krige.conv(ctc40, loc=gr, krige=krige.control(obj=ml13), output = s.out)
## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with constant mean
## Warning in krige.conv(ctc40, loc = gr, krige = krige.control(obj =
## ml13), : .Random.seed not initialised. Creating it with runif(1)
## krige.conv: sampling from the predictive distribution (conditional simulations)
## krige.conv: Kriging performed using global neighbourhood
image(ctc40.kc, gr,
ctc40$borders,
col=terrain.colors(200),
x.leg=c(4950, 5450), y.leg=c(4770, 4830))
polygon(ctc40$reg1)
polygon(ctc40$reg2)
## explorando más los resultados:
image(ctc40.kc, loc=gr,
bor=ctc40$borders, col=terrain.colors(21),
val=sqrt(ctc40.kc$krige.var),
x.leg=c(4950, 5450), y.leg=c(4770, 4830))
polygon(ctc40$reg1)
polygon(ctc40$reg2)
## mapa de probabilidades: P(S(x)|y > 150)
image(ctc40.kc, loc=gr, bor=ctc40$borders, col=terrain.colors(21),
val=1-ctc40.kc$prob, x.leg=c(4950, 5450), y.leg=c(4770, 4830))
polygon(ctc40$reg1)
polygon(ctc40$reg2)
## quantiles
names(ctc40.kc$quant)
## NULL
## otros resultados
names(ctc40.kc)
## [1] "predict" "krige.var"
## [3] "beta.est" "distribution"
## [5] "simulations" "mean.simulations"
## [7] "variance.simulations" "quantiles.simulations"
## [9] "probabilities.simulations" "sim.means"
## [11] ".Random.seed" "message"
## [13] "call"
dim(ctc40.kc$simul)
## [1] 1858 1000
length(ctc40.kc$mean.sim)
## [1] 1858
length(ctc40.kc$sim.means)
## [1] 1000
## histograma de media general
hist(ctc40.kc$sim.means)
## histograma de la proporción del área superior a cierto valor (150)
p150 <- apply(ctc40.kc$simul, 2, function(x) mean(x > 150))
hist(p150, prob=T); lines(density(p150))
quantile(p150, probs=c(0.025, 0.975))
## 2.5% 97.5%
## 0.1210980 0.1921421
## evaluación del error de Monte Carlo (simulación)
names(ctc40.kc)
## [1] "predict" "krige.var"
## [3] "beta.est" "distribution"
## [5] "simulations" "mean.simulations"
## [7] "variance.simulations" "quantiles.simulations"
## [9] "probabilities.simulations" "sim.means"
## [11] ".Random.seed" "message"
## [13] "call"
with(ctc40.kc, summary(mean.simulations - predict))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.911833 -0.393825 -0.007700 -0.007421 0.374069 1.902357
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Paulo J. Ribeiro Jr and Peter J. Diggle (2016). geoR: Analysis of Geostatistical Data. R package version 1.7-5.2. https://CRAN.R-project.org/package=geoR
Gracias!