Com pacotes extras para fazer gráfico
require(rjags)
require(coda)
require(DT)
require(tidyverse)
require(car)
#install.packages("MCMCpack")
require(MCMCpack)
#install.packages("bayesplot")
require(bayesplot)
#install.packages("ggrepel")
require(grid)
Modelo de regressão linear simples para investigar o efeito de diferentes quantidades de fertilizante na produção de grama em solo calcário.
Copiar os dados em um txt no seu computador
Leitura dos dados & representação gráfica
grama=read.table("https://raw.githubusercontent.com/liamorita/estatisticabayesiana/master/exemplo2_grama.txt",sep=";",head=TRUE)
datatable(grama)
reta = lm(y ~ x,data=grama)
reta
##
## Call:
## lm(formula = y ~ x, data = grama)
##
## Coefficients:
## (Intercept) x
## 51.9333 0.8114
a=round(reta$coefficients[1],4)
b=round(reta$coefficients[2],4)
paste0("intercepto (a) = ",a)
## [1] "intercepto (a) = 51.9333"
paste0(" coef. angular (b) = ",b)
## [1] " coef. angular (b) = 0.8114"
#para fazer anotação no gráfico
grob <- grobTree(textGrob(paste0("equação da reta: f(x)=",a,"+",b,"x"), x=0.1, y=0.95, hjust=0,
gp=gpar(col="red", fontsize=13, fontface="italic")))
grama %>%
ggplot(aes(x=x,y=y)) +
geom_point()+
geom_smooth(method="lm") +
annotation_custom(grob)
## `geom_smooth()` using formula 'y ~ x'
dados=list("n"=nrow(grama),"x"=grama$x,"y"=grama$y)
iniciais=list(
list(a=1,b=1,tau=0.5),
list(a=2,b=2,tau=1)
)
cat("
model{
for (i in 1 : n) {
y[i] ~ dnorm(a+b*x[i], tau)
y_pred[i] ~ dnorm(a+b*x[i], tau)
}
a ~ dnorm(0,0.000001)
b ~ dnorm(0,0.000001)
tau ~ dgamma(0.1,0.1)
sigma2 = 1/tau
}
",file="modelo4.txt")
modelo=jags.model(file="modelo4.txt",data=dados,inits=iniciais,n.chains=2, n.adapt=1000)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 10
## Unobserved stochastic nodes: 13
## Total graph size: 59
##
## Initializing model
algumas vêm direto dos parâmetros que foram amostrados, e outras são funções dos parâmetros
params = c("a","b","sigma2","y_pred")
samps = coda.samples( modelo, params, n.iter=50000,thin=5)
summary(samps)
##
## Iterations = 5:50000
## Thinning interval = 5
## Number of chains = 2
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## a 51.787 14.67680 0.103781 0.141416
## b 0.812 0.09517 0.000673 0.000917
## sigma2 468.697 319.12344 2.256543 2.357885
## y_pred[1] 72.194 24.85713 0.175766 0.189164
## y_pred[2] 92.295 24.02229 0.169863 0.179505
## y_pred[3] 112.781 23.42258 0.165623 0.174094
## y_pred[4] 132.957 22.97878 0.162485 0.163920
## y_pred[5] 153.069 22.77523 0.161045 0.161043
## y_pred[6] 173.567 22.80127 0.161229 0.159933
## y_pred[7] 193.803 23.01051 0.162709 0.162712
## y_pred[8] 214.182 23.58043 0.166739 0.170276
## y_pred[9] 234.422 24.33244 0.172056 0.177768
## y_pred[10] 254.774 25.23661 0.178450 0.184487
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## a 22.024 42.7869 51.8534 60.7411 80.796
## b 0.626 0.7538 0.8118 0.8703 1.003
## sigma2 163.170 278.5111 386.2110 556.5112 1271.957
## y_pred[1] 22.802 56.8686 72.1234 87.4405 121.107
## y_pred[2] 44.546 77.3585 92.4620 107.2093 140.334
## y_pred[3] 66.479 98.4016 112.7332 127.2629 159.453
## y_pred[4] 86.842 118.9792 132.8797 147.1140 179.050
## y_pred[5] 107.759 139.0405 153.2543 167.0917 198.680
## y_pred[6] 128.704 159.6002 173.5131 187.4689 218.883
## y_pred[7] 148.247 179.8788 193.9072 207.8972 239.527
## y_pred[8] 166.949 199.8503 214.0168 228.5483 261.344
## y_pred[9] 185.863 219.6057 234.3517 249.1862 282.818
## y_pred[10] 205.624 239.3317 254.5465 270.1967 305.025
utilizando a função as.mcmc.list do pacote coda
samps2=rbind(as.mcmc.list(samps)[[1]],as.mcmc.list(samps)[[2]])
summary(samps2)
## a b sigma2 y_pred[1]
## Min. :-111.16 Min. :0.0765 Min. : 86.69 Min. :-128.07
## 1st Qu.: 42.79 1st Qu.:0.7538 1st Qu.: 278.51 1st Qu.: 56.87
## Median : 51.85 Median :0.8118 Median : 386.21 Median : 72.12
## Mean : 51.79 Mean :0.8120 Mean : 468.70 Mean : 72.19
## 3rd Qu.: 60.74 3rd Qu.:0.8703 3rd Qu.: 556.51 3rd Qu.: 87.44
## Max. : 155.85 Max. :1.7425 Max. :6467.55 Max. : 235.46
## y_pred[2] y_pred[3] y_pred[4] y_pred[5]
## Min. :-76.10 Min. :-12.75 Min. :-11.52 Min. :-29.41
## 1st Qu.: 77.36 1st Qu.: 98.40 1st Qu.:118.98 1st Qu.:139.04
## Median : 92.46 Median :112.73 Median :132.88 Median :153.25
## Mean : 92.29 Mean :112.78 Mean :132.96 Mean :153.07
## 3rd Qu.:107.21 3rd Qu.:127.26 3rd Qu.:147.11 3rd Qu.:167.09
## Max. :234.47 Max. :242.83 Max. :262.80 Max. :294.75
## y_pred[6] y_pred[7] y_pred[8] y_pred[9]
## Min. :-43.37 Min. :-50.13 Min. : -4.846 Min. : 57.26
## 1st Qu.:159.60 1st Qu.:179.88 1st Qu.:199.850 1st Qu.:219.61
## Median :173.51 Median :193.91 Median :214.017 Median :234.35
## Mean :173.57 Mean :193.80 Mean :214.183 Mean :234.42
## 3rd Qu.:187.47 3rd Qu.:207.90 3rd Qu.:228.548 3rd Qu.:249.19
## Max. :323.55 Max. :319.39 Max. :380.120 Max. :383.25
## y_pred[10]
## Min. : 19.61
## 1st Qu.:239.33
## Median :254.55
## Mean :254.77
## 3rd Qu.:270.20
## Max. :409.90
summary(samps2[,1])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -111.16 42.79 51.85 51.79 60.74 155.85
summary(reta)
##
## Call:
## lm(formula = y ~ x, data = grama)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.79 -11.07 -5.00 12.00 29.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.93333 12.97904 4.001 0.00394 **
## x 0.81139 0.08367 9.697 1.07e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19 on 8 degrees of freedom
## Multiple R-squared: 0.9216, Adjusted R-squared: 0.9118
## F-statistic: 94.04 on 1 and 8 DF, p-value: 1.067e-05
Diagnóstico de Gelman & Rubin: através do pacote coda
Referências na literatura:
[1] Gelman, A and Rubin, DB (1992) Inference from iterative simulation using multiple sequences, Statistical Science, 7, 457-511.
[2] Brooks, SP. and Gelman, A. (1998) General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434-455.
gelman.diag(samps, confidence = 0.95, transform=FALSE)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## a 1 1
## b 1 1
## sigma2 1 1
## y_pred[1] 1 1
## y_pred[2] 1 1
## y_pred[3] 1 1
## y_pred[4] 1 1
## y_pred[5] 1 1
## y_pred[6] 1 1
## y_pred[7] 1 1
## y_pred[8] 1 1
## y_pred[9] 1 1
## y_pred[10] 1 1
##
## Multivariate psrf
##
## 1
gelman.plot(samps)
Traços e densidades através do pacote bayesplot
mcmc_trace(samps)
mcmc_dens(samps)
Gráficos de autocorrelação do pacote coda
autocorr.plot(samps[[1]],col=1)
autocorr.plot(samps[[2]],col=2)
Conclusões: Com a inferência Bayesiana e o pacote Rjags, é possível estabelecer a relação linear entre quantidade de fertilizante e produção de grama. Verificou-se convergência do algoritmo.
Pendências: