Análise de dados do solo para variável Cálcio(Ca) no estado do Piauí

Gabriel Messias Santana Peixoto

Para começarmos a nossa ánalise iremos detalhar o que é o cálcio:

O cálcio é um elemento químico, símbolo Ca, de número atómico 20 e massa atómica 40u. É um metal da família dos alcalino-terrosos, pertencente ao grupo 2 da classificação periódica dos elementos químicos.

Carregando os pacotes necessários para iniciar os processos

library(geoR)
library(MASS)
library(moments)
library(fBasics)
library(akima)
library(readxl)
library(skimr)
library(terra)
library(rmdformats)
library(knitr)

Carregando os nossos dados

solo <- read_excel("Dados.xlsx")
geosolo <- as.geodata(solo, coords.col=1:2, data.col=7)
bord = read.table("bord.txt")
geosolo$borders <- with(bord, cbind(V1,V2))

Iremos agora ter um visão das primeiras 5 linhas da nossa base de dados e puxar as informações do nosso objeto o “geosolo”.

head(solo)

## # A tibble: 6 × 10
##     X...1    Y...2 X...3 Y...4  X_km  Y_km    Ca Carbono    pH    Pr
##     <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>
## 1 468785. 9157565.  85.5  147. -7.62  56.7 0       1.50   4.22  1.08
## 2 460290. 9136197.  85.5  147. -7.81  56.6 0       0.908  4.4   0.5 
## 3 733695. 9071127.  85.5  147. -8.40  59.1 0       0.832  3.8   1.18
## 4 789261. 8962017.  85.5  147. -9.38  59.6 0.532   0.397  5.16 34.1 
## 5 783997. 8965400.  85.5  147. -9.35  59.6 1.72    0.457  5.45 24.2 
## 6 772256. 8981080.  85.5  147. -9.21  59.5 0.835   0.525  5.43 32.7

summary(geosolo)

## Number of data points: 243 
## 
## Coordinates summary
##        X...1   Y...2
## min 420797.9 8806076
## max 985985.9 9686916
## 
## Distance summary
##         min         max 
##    120.1225 968602.3272 
## 
## Borders summary
##            V1      V2
## min  390646.6 8791849
## max 1012045.0 9696707
## 
## Data summary
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##  0.000000  0.000000  0.402000  1.644601  1.896000 22.890000

Podemos notar que nossas coordenadas minimas e máximas são para o X[420797.9;985985.9], para o Y[8806076;9686916], temos uma disntância mínima de 120.1225 e máxima de 968602.3272. A mediana da nossa variável é 0.4020 e a média 1.6446.

Tendo a primeira visualização gráfica do nosso geodado

plot(geosolo, low=T)

No primeiro gráfico temos a relação da nossa variável com os eixos X & Y. No segundo gráfico temos a relação em função da coordenadas Y, é possível observar que não há um efeito de tendência no mesmo. No terceiro temos a relação em função da cordenada X podemos notar um efeito de têndencia crescente nas distâncias finais do gráfico. O quarto gráfico é responsável por nos representar o histograma e a densidade dos dados, é possível notar um grande agrupamento em torno do 0.

Testando agora com diferentes níveis de tendência para o comportamento dos nossos dados

Para observar o comportamento dos dados e notarmos se há uma melhora ou não no comportamento dos mesmos, iremos aplicar diferntes níveis de têndencia e observar o seu comportamento.

plot(geosolo, low=T)

plot(geosolo, low=T, trend="1st")

plot(geosolo, low=T, trend="2nd")

Podemos notar pela análise gráfica que o efeito que melhor se encaixa nos dados é o sem nível de tendência, estão tentaremos seguir a análise com o mesmo.

Iremos plotar agora o nosso semivariograma

v1 <- variog(geosolo, max.dist=322000)

## variog: computing omnidirectional variogram

plot(v1)

Podemos notar que há depêndencia espacial em determinados intervalos de distância, pela variabilidade dos pontos do semivariograma

### Envelope espacial para a checagem da dependencia espacial

enve.var = variog.mc.env(geosolo, obj.v=v1, nsim=100)

## variog.env: generating 100 simulations by permutating data values
## variog.env: computing the empirical variogram for the 100 simulations
## variog.env: computing the envelops

plot(v1, env=enve.var, main="Envelope Espacial")

Observa-se após análise do envelope espacial que há depêndencia espacial entre os dados

Chute inicial

Construindo os nossos modelos com os parâmetros estimados, usaremos um chute apos ajuste visual via eyefit, sigmasq = “3.43”, phi = 108783.16, tausq = 5.28.

Sendo assim a construção dos modelos se dará por:

ca.ml0 <- list()
ca.ml0$fit1 <- likfit(geosolo,  ini=c(3.43, 108783.16), nug=5.28,cov.model = "matern")

## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ca.ml0$fit2 <- likfit(geosolo,  ini=c(3.43, 108783.16), nug=5.28,cov.model = "exponential")

## kappa not used for the exponential correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ca.ml0$fit3 <- likfit(geosolo,  ini=c(3.43, 108783.16), nug=5.28,cov.model = "gaussian")

## kappa not used for the gaussian correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ca.ml0$fit4 <- likfit(geosolo,  ini=c(3.43, 108783.16), nug=5.28,cov.model = "spherical")

## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ca.ml0$fit5 <- likfit(geosolo,  ini=c(3.43, 108783.16), nug=5.28,cov.model = "circular")

## kappa not used for the circular correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ca.ml0$fit6 <- likfit(geosolo,  ini=c(3.43, 108783.16), nug=5.28,cov.model = "cubic")

## kappa not used for the cubic correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

ca.ml0$fit7 <- likfit(geosolo,  ini=c(3.43, 108783.16), nug=5.28,cov.model = "wave")

## kappa not used for the wave correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

Checando o melhor AIC

sapply(ca.ml0, AIC)

##     fit1     fit2     fit3     fit4     fit5     fit6     fit7 
## 1203.402 1203.402 1205.329 1204.257 1205.352 1202.516 1205.604

Podemos observar que o melhor modelo será o fit6, o modelo “circular” que detem um AIC de 1202.516, este será o modelo escolhido para seguir o trabalho.

AIC & BIC Espacial

AICsp = sapply(ca.ml0, AIC)
BICsp = c(ca.ml0$fit1$BIC,ca.ml0$fit2$BIC,ca.ml0$fit3$BIC,ca.ml0$fit4$BIC,ca.ml0$fit5$AIC,ca.ml0$fit6$AIC)
AIC_BIC_spatial = rbind(AICsp, BICsp)

kable(AIC_BIC_spatial, main = "AIC e BIC espacial dos nossos modelos testados")

	fit1	fit2	fit3	fit4	fit5	fit6	fit7
AICsp	1203.402	1203.402	1205.329	1204.257	1205.352	1202.516	1205.604
BICsp	1217.375	1217.375	1219.301	1218.229	1205.352	1202.516	1217.375

Diante do exposto é notável que o modelo fit6, apresenta um AIC & BIC equivalentes.

AIC & BIC Não espacial

AICnsp = c(ca.ml0$fit1$nospatial$AIC.ns,ca.ml0$fit2$nospatial$AIC.ns,ca.ml0$fit3$nospatial$AIC.ns,ca.ml0$fit4$nospatial$AIC.ns,ca.ml0$fit5$nospatial$AIC.ns,ca.ml0$fit6$nospatial$AIC.ns)


BICnsp = c(ca.ml0$fit1$nospatial$BIC.ns,ca.ml0$fit2$nospatial$BIC.ns,ca.ml0$fit3$nospatial$BIC.ns,ca.ml0$fit4$nospatial$BIC.ns,ca.ml0$fit5$nospatial$BIC.ns,ca.ml0$fit6$nospatial$BIC.ns) 
AIC_BIC_nspatial = rbind(AICnsp, BICnsp)

kable(AIC_BIC_nspatial, main = "AIC e BIC não espacial dos nossos modelos testados")

AICnsp	1227.475	1227.475	1227.475	1227.475	1227.475	1227.475
BICnsp	1234.461	1234.461	1234.461	1234.461	1234.461	1234.461

É notável que tanto o AIC quanto BIC dos modelos não espaciais são iguais

Checando agora a distribuição dos pontos por quintis

points(geosolo, pt.div="data.proportional", xlab="Coord X", ylab="Coord Y",x.leg=250000, y.leg=9650000, trend = "cte")

Podemos notar a distribuição previa dos pontos é possível notar uma maior concentração dos dados na região leste e norte do estado.

inciando as visualizações gráficas

gr <- pred_grid(geosolo$borders, by=12500)
gr0 <- gr[.geoR_inout(gr, geosolo$borders),]

KC <- krige.control(type="SK",obj.model = ca.ml0$fit6)
geosolo.kc <- krige.conv(geosolo, loc=gr0, krige=KC)

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with constant mean
## krige.conv: Kriging performed using global neighbourhood

Checando a intensidade do mapa

image(geosolo.kc, gr,
      geosolo$borders,
      col=terrain.colors(200),
      x.leg=c(858233, 1200000), y.leg=c(8859000, 8910972),cex=0.7)

Podemos notar uma maior intensidade na região leste do estado, assim também como uma concentração na região norte e um parte da oeste.

Utilizando a raiz da variância que encontramos na kringagem

image(geosolo.kc, loc=gr,
      bor=geosolo$borders, col=terrain.colors(21),
      val=sqrt(geosolo.kc$krige.var),
      x.leg=c(858233, 1250000), y.leg=c(8859000, 8910972),cex=0.6)

Já nesse caso podemos notar uma forte concentração chegando na região sul do estado assim como na nordeste e na sudoeste.

iniciando a simulação

OC <- output.control(thres=0.402,
                     quantile=c(0.1, 0.9))

geosolo.kc <- krige.conv(geosolo, loc=gr0, krige=KC, out=OC)

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with constant mean
## krige.conv: sampling from the predictive distribution (conditional simulations)
## krige.conv: Kriging performed using global neighbourhood

Chegando as regiões onde há probabilidade de haver uma concentração maior que a mediana (0.402)

image(geosolo.kc, loc=gr, bor=geosolo$borders, col=terrain.colors(21), 
      val=1-geosolo.kc$prob,x.leg=c(858233, 1200000), y.leg=c(8859000, 8910972),cex=0.7)

É possível notar que há uma alta probabilidade na região leste, assim também como nas bordas da região norte e oeste.

Plotando o gráfico do 90° percentil

image(geosolo.kc, loc=gr, bor=geosolo$borders, col=terrain.colors(21), 
      val=geosolo.kc$quantile$q90,x.leg=c(858233, 1200000), y.leg=c(8859000, 8910972),cex=0.7)

Podemos então ter a visualização dos dados chamados de “baixos confiáveis”, que são os valores que representam 10% dos dados com os valores mais altos, é possível notar sua concentração também nas regiões leste e nas bordas da oeste e norte.

Plotando o gráfico para o decentil

image(geosolo.kc, loc=gr, bor=geosolo$borders, col=terrain.colors(21), 
      val=geosolo.kc$quantile$q10,x.leg=c(858233, 1200000), y.leg=c(8859000, 8910972),cex=0.7)

Aqui temos a visualização para os “altos confiáiveis” que representam cerca de 90% dos dados que variam de -2 a 2. Também é possível notar sua concentração também nas regiões leste e nas bordas da oeste e norte.

Histograma da nossa simulação

p091 <- apply(geosolo.kc$simul, 2, function(x) mean(x >1.644601  ))
hist(p091, prob=T); lines(density(p091))

# Conclusão

Diante das ánalises vistas, podemos notar uma forte concentração de cálcio na região Leste do estado do Piauí e no entorno das bordas das regiões Norte e Oeste. Foi possível notar pela ánalise do nosso semivariograma, que dependendo do intervalo estudado há depêndencia espacial nos dados. Diante disso, é notável que a região do estado com maior concentração é a região leste.