Com pacotes extras para fazer gráfico
require(rjags)
require(coda)
require(DT)
require(tidyverse)
require(car)
#install.packages("MCMCpack")
require(MCMCpack)
#install.packages("bayesplot")
require(bayesplot)
#install.packages("ggrepel")
require(grid)
Modelo de regressão linear simples para investigar o efeito de diferentes quantidades de fertilizante na produção de grama em solo calcário.
Copiar os dados em um txt no seu computador
Leitura dos dados & representação gráfica
grama=read.table("exemplo2_grama.txt",sep=";",head=TRUE)
datatable(grama)
reta = lm(y ~ x,data=grama)
a=round(reta$coefficients[1],4)
b=round(reta$coefficients[2],4)
paste0("intercepto (a) = ",a)
## [1] "intercepto (a) = 51.9333"
paste0(" coef. angular (b) = ",b)
## [1] " coef. angular (b) = 0.8114"
#para fazer anotação no gráfico
grob <- grobTree(textGrob(paste0("equação da reta: f(x)=",a,"+",b,"x"), x=0.1, y=0.95, hjust=0,
gp=gpar(col="red", fontsize=13, fontface="italic")))
grama %>%
ggplot(aes(x=x,y=y)) +
geom_point()+
geom_smooth(method="lm") +
annotation_custom(grob)
## `geom_smooth()` using formula 'y ~ x'
dados=list("n"=nrow(grama),"x"=grama$x,"y"=grama$y)
iniciais=list(
list(a=1,b=1,tau=0.5),
list(a=2,b=2,tau=1)
)
cat("
model{
for (i in 1 : n) {
y[i] ~ dnorm(a+b*x[i], tau)
y_pred[i] ~ dnorm(a+b*x[i], tau)
}
a ~ dnorm(0,0.000001)
b ~ dnorm(0,0.000001)
tau ~ dgamma(0.1,0.1)
sigma2 = 1/tau
}
",file="modelo4.txt")
modelo=jags.model(file="modelo4.txt",data=dados,inits=iniciais,n.chains=2, n.adapt=1000)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 10
## Unobserved stochastic nodes: 13
## Total graph size: 59
##
## Initializing model
algumas vêm direto dos parâmetros que foram amostrados, e outras são funções dos parâmetros
params = c("a","b","sigma2","y_pred")
samps = coda.samples( modelo, params, n.iter=50000,thin=5)
summary(samps)
##
## Iterations = 5:50000
## Thinning interval = 5
## Number of chains = 2
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## a 51.952 14.95920 0.1057775 0.1462206
## b 0.811 0.09664 0.0006833 0.0009408
## sigma2 468.030 326.65826 2.3098227 2.3835688
## y_pred[1] 72.070 25.20583 0.1782321 0.1910323
## y_pred[2] 92.445 24.00072 0.1697107 0.1836772
## y_pred[3] 112.659 23.53390 0.1664098 0.1723310
## y_pred[4] 132.845 23.13299 0.1635750 0.1679575
## y_pred[5] 153.443 22.87300 0.1617366 0.1642920
## y_pred[6] 173.480 22.43541 0.1586423 0.1568106
## y_pred[7] 193.831 22.89086 0.1618629 0.1618665
## y_pred[8] 214.220 23.64262 0.1671786 0.1691019
## y_pred[9] 234.370 24.06942 0.1701965 0.1737490
## y_pred[10] 254.737 24.94151 0.1763631 0.1853729
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## a 22.2212 42.8420 51.8911 61.13 81.965
## b 0.6194 0.7523 0.8104 0.87 1.003
## sigma2 162.2155 277.0328 384.7963 549.94 1287.389
## y_pred[1] 22.3474 56.5451 72.0882 87.54 121.669
## y_pred[2] 44.6449 77.5161 92.5017 107.18 140.463
## y_pred[3] 66.2603 98.0224 112.5979 127.19 159.833
## y_pred[4] 86.9307 118.9984 132.7779 146.73 179.816
## y_pred[5] 107.7808 139.5451 153.0812 167.46 199.687
## y_pred[6] 128.7315 159.7183 173.5052 187.31 218.394
## y_pred[7] 148.3430 179.7279 193.7303 207.89 239.031
## y_pred[8] 167.0889 199.7429 214.2945 228.71 260.956
## y_pred[9] 186.2481 219.5696 234.3146 249.28 282.635
## y_pred[10] 205.2872 239.3108 254.7807 269.87 304.523
utilizando a função as.mcmc.list do pacote coda
samps2=rbind(as.mcmc.list(samps)[[1]],as.mcmc.list(samps)[[2]])
summary(samps2)
## a b sigma2 y_pred[1]
## Min. :-110.99 Min. :0.01802 Min. : 77.04 Min. :-140.68
## 1st Qu.: 42.84 1st Qu.:0.75233 1st Qu.: 277.03 1st Qu.: 56.55
## Median : 51.89 Median :0.81038 Median : 384.80 Median : 72.09
## Mean : 51.95 Mean :0.81104 Mean : 468.03 Mean : 72.07
## 3rd Qu.: 61.13 3rd Qu.:0.86999 3rd Qu.: 549.94 3rd Qu.: 87.54
## Max. : 181.11 Max. :1.74082 Max. :7830.03 Max. : 315.59
## y_pred[2] y_pred[3] y_pred[4] y_pred[5]
## Min. :-59.46 Min. :-50.90 Min. : -8.362 Min. :-12.17
## 1st Qu.: 77.52 1st Qu.: 98.02 1st Qu.:118.998 1st Qu.:139.55
## Median : 92.50 Median :112.60 Median :132.778 Median :153.08
## Mean : 92.45 Mean :112.66 Mean :132.845 Mean :153.44
## 3rd Qu.:107.18 3rd Qu.:127.19 3rd Qu.:146.735 3rd Qu.:167.46
## Max. :226.03 Max. :273.93 Max. :256.648 Max. :308.53
## y_pred[6] y_pred[7] y_pred[8] y_pred[9]
## Min. : 31.46 Min. : 66.53 Min. : 73.85 Min. : 62.05
## 1st Qu.:159.72 1st Qu.:179.73 1st Qu.:199.74 1st Qu.:219.57
## Median :173.51 Median :193.73 Median :214.29 Median :234.31
## Mean :173.48 Mean :193.83 Mean :214.22 Mean :234.37
## 3rd Qu.:187.31 3rd Qu.:207.89 3rd Qu.:228.71 3rd Qu.:249.28
## Max. :298.91 Max. :368.24 Max. :394.02 Max. :378.15
## y_pred[10]
## Min. : 65.28
## 1st Qu.:239.31
## Median :254.78
## Mean :254.74
## 3rd Qu.:269.87
## Max. :411.55
summary(samps2[[1]])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 26.41 26.41 26.41 26.41 26.41 26.41
summary(reta)
##
## Call:
## lm(formula = y ~ x, data = grama)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.79 -11.07 -5.00 12.00 29.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.93333 12.97904 4.001 0.00394 **
## x 0.81139 0.08367 9.697 1.07e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19 on 8 degrees of freedom
## Multiple R-squared: 0.9216, Adjusted R-squared: 0.9118
## F-statistic: 94.04 on 1 and 8 DF, p-value: 1.067e-05
Diagnóstico de Gelman & Rubin: através do pacote coda
Referências na literatura:
[1] Gelman, A and Rubin, DB (1992) Inference from iterative simulation using multiple sequences, Statistical Science, 7, 457-511.
[2] Brooks, SP. and Gelman, A. (1998) General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434-455.
gelman.diag(samps, confidence = 0.95, transform=FALSE)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## a 1.00 1.00
## b 1.00 1.00
## sigma2 1.01 1.01
## y_pred[1] 1.00 1.00
## y_pred[2] 1.00 1.00
## y_pred[3] 1.00 1.00
## y_pred[4] 1.00 1.00
## y_pred[5] 1.00 1.00
## y_pred[6] 1.00 1.00
## y_pred[7] 1.00 1.00
## y_pred[8] 1.00 1.00
## y_pred[9] 1.00 1.00
## y_pred[10] 1.00 1.01
##
## Multivariate psrf
##
## 1
gelman.plot(samps)
Traços e densidades através do pacote bayesplot
mcmc_trace(samps)
mcmc_dens(samps)
Gráficos de autocorrelação do pacote coda
autocorr.plot(samps[[1]],col=1)
autocorr.plot(samps[[2]],col=2)
Conclusões: Com a inferência Bayesiana e o pacote Rjags, é possível estabelecer a relação linear entre quantidade de fertilizante e produção de grama. Verificou-se convergência do algoritmo.
Pendências: