<-bayesx(Y~X1+X2+sx(region,bs="mrf"),family="binomial",
slmdata=mydata,
map="region.graph")
uses a normal prior for the \(\beta\)s and an Inverse Gamma, \(IG(a,b)\) for precision parameter \(\tau^{2}\). \(a=b=0.001\) is assumed
The variance parameter \(tau^{2}_{j}\) is equivalent to the inverse smoothing parameter in a frequentist approach and controls the trade off between flexibility and smoothness.
Markov random fields (Fahrmeir et al. 2004): Defines a Markov random field prior for a spatial covariate, where geographical information is provided by a map object in boundary or graph file format.
mrf prior is used for the spatial effects and an MCMC algorithm implemented for sampling on the parameters
BayesX model formula
\[logit(P(Y_i=1))=\beta_0+\beta_1x_{i1}+...+\beta_px_{ip}+f_{spat}(s_i)\]
\[f_{spat}(s_i) \sim MRF ~PRIOR\]
Syntax
INLA model
\[logit(P(Y_i=1))=\beta_0+\beta_1x_{i1}+...+\beta_px_{ip}+u(s_i)\]
\[u(s_i) \sim ICAR ~or~ SPDE\]
Syntax
<-Y~X1+X2+f(region,model="besag",graph="region.graph")
formula
<-bayesx(formula,family="binomial",data=mydata)
result
Installing R packages
Part 1 - Installations (ignore this part and move straight to Part 2 and choose yes on updating all)
#renv::init(repos = "https://packagemanager.posit.co/cran/2023-10-13")
#install.packages("maptools")
<- c("methods",
pkg "kfigr",
"bookdown",
"markdown",
"bamlss",
"ape",
"BayesX",
"R2BayesX",
"bayespref",
"R2WinBUGS",
"BayesXsrc",
"spikeSlabGAM",
"sp", "leaflet", "raster", "spdep", "rgdal","akima", "httpuv", "e1071", "XML", "foreign",
"dplyr", "Hmisc", "readr", "haven", "data.table", "car", "tidyr", "aod", "ggplot2", "maptools",
"rlang", "tibble", "mcmc", "quantreg", "mvtnorm", "deldir", "expm", "purrr", "backports", "sf",
"tmaptools", "digest", "mime", "tmap", "RColorBrewer", "reshape2", "formatR", "spacetime", "plm","maptools")
::p_load(pkg) pacman
Installing INLA - 2024 August seems to be a tough one unless one uses this binary type use of this website https://www.r-inla.org/download-install
Windows will not update a package already loaded, so then you have to remove [use remove.packages] the package and install it from scratch. There are two suggested packages, ‘graph’ and ‘Rgraphviz’, that are on Bioconductor, and you can install those with:
Part 2 - Installations
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
::install(c("graph", "Rgraphviz"), dep=TRUE)
BiocManager
install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/testing"), type="binary", dep=TRUE)
#Update all/some/none? [a/s/n]:
Calling in the libraries
# Loading key packages for this session
library(INLA)
library(spdep)
library(stringdist)
#library(rgeos)
library(sf)
library(SpatialEpi)
#library(rgdal)
library(raster)
library(tidyverse)
library(dplyr)
The Moran’s I statistic
- Checking for spatial autocorrelation
- p$percent<-c(21.9, 24.6, 15.5, 15.5, 13.4, 16.2, 21.2, 25.5, 28.9) is bringing in provincial percentages
<- shapefile("ZAF_adm/ZAF_adm1_1.shp")
p
$percent<-c(21.9, 24.6, 15.5, 15.5, 13.4, 16.2, 21.2, 25.5, 28.9)
p<- poly2nb(p, row.names=p$PR_CODE)
w class(w)
## [1] "nb"
data.frame(p)
ID_0 | ISO | NAME_0 | ID_1 | NAME_1 | TYPE_1 | ENGTYPE_1 | NL_NAME_1 | VARNAME_1 | percent |
---|---|---|---|---|---|---|---|---|---|
211 | ZAF | South Africa | 1 | Eastern Cape | Provinsie | Province | NA | Oos-Kaap | 21.9 |
211 | ZAF | South Africa | 2 | Free State | Provinsie | Province | NA | Orange Free State|Vrystaat | 24.6 |
211 | ZAF | South Africa | 3 | Gauteng | Provinsie | Province | NA | Pretoria/Witwatersrand/Vaal | 15.5 |
211 | ZAF | South Africa | 4 | KwaZulu-Natal | Provinsie | Province | NA | Natal and Zululand | 15.5 |
211 | ZAF | South Africa | 5 | Limpopo | Provinsie | Province | NA | Noordelike Provinsie|Northern Transvaal|Northern Province | 13.4 |
211 | ZAF | South Africa | 6 | Mpumalanga | Provinsie | Province | NA | Eastern Transvaal | 16.2 |
211 | ZAF | South Africa | 7 | North West | Provinsie | Province | NA | North-West|Noordwes | 21.2 |
211 | ZAF | South Africa | 8 | Northern Cape | Provinsie | Province | NA | Noord-Kaap | 25.5 |
211 | ZAF | South Africa | 9 | Western Cape | Provinsie | Province | NA | Wes-Kaap | 28.9 |
summary(w)
## Neighbour list object:
## Number of regions: 9
## Number of nonzero links: 34
## Percentage nonzero weights: 41.97531
## Average number of links: 3.777778
## Link number distribution:
##
## 2 3 4 6
## 1 2 5 1
## 1 least connected region:
## 9 with 2 links
## 1 most connected region:
## 2 with 6 links
Plotting
#display.brewer.all()
<- st_as_sf(p)
p library(tmap)
tm_shape(p) +
tm_polygons()+
tm_fill("percent", style = "quantile", n=5, palette = "Reds", title = "Hypertension prevalence") +
tm_borders(alpha = .4)+
tm_compass() +
tm_text("NAME_1") +
tm_layout(title = "South AFRICA, DHIS2016",
legend.text.size = 0.5,
legend.title.size =0.5 ,
legend.position = c("right", "bottom"),
frame = FALSE)
Adjacency matrix
- determining which province is closer to which
- depicting whether one is a neighbor or not
<- nb2mat(w, style='B')
wm
wm## 1 2 3 4 5 6 7 8 9
## 1 0 1 0 1 0 0 0 1 1
## 2 1 0 1 1 0 1 1 1 0
## 3 0 1 0 0 1 1 1 0 0
## 4 1 1 0 0 0 1 0 0 0
## 5 0 0 1 0 0 1 1 0 0
## 6 0 1 1 1 1 0 0 0 0
## 7 0 1 1 0 1 0 0 1 0
## 8 1 1 0 0 0 0 1 0 1
## 9 1 0 0 0 0 0 0 1 0
## attr(,"call")
## nb2mat(neighbours = w, style = "B")
<- length(p)
n
<- nb2listw(w, style='B')
ww
ww## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 9
## Number of nonzero links: 34
## Percentage nonzero weights: 41.97531
## Average number of links: 3.777778
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 9 81 34 68 552
moran(p$percent, ww, n=length(ww$neighbours), S0=Szero(ww))
## $I
## [1] 0.2909315
##
## $K
## [1] 1.679342
Szero(ww)
## [1] 34
moran.test(p$percent, ww, randomisation=FALSE)
##
## Moran I test under normality
##
## data: p$percent
## weights: ww
##
## Moran I statistic standard deviate = 2.4985, p-value = 0.006237
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.2909315 -0.1250000 0.0277141
# Moran's I global test
moran.mc(p$percent, ww, nsim=99)
##
## Monte-Carlo simulation of Moran I
##
## data: p$percent
## weights: ww
## number of simulations + 1: 100
##
## statistic = 0.29093, observed rank = 96, p-value = 0.04
## alternative hypothesis: greater
- Index - indicate random or presence of clustering
BayesX CAR model for hypertension in South Africa
library(sf)
<-st_read("ZAF_adm/ZAF_adm1_1.shp")
SA_shp## Reading layer `ZAF_adm1_1' from data source
## `C:\Users\r1953\Downloads\Spatial Tutorial\ZAF_adm\ZAF_adm1_1.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 9 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 16.45802 ymin: -34.83514 xmax: 32.89125 ymax: -22.12661
## Geodetic CRS: WGS 84
plot(SA_shp)
- Getting the SA shape file using rgdal package
SA_shp<-readOGR(dsn=“ZAF_adm”,layer=“ZAF_adm1_1”)
library(R2BayesX)
library(BayesX)
#Getting the SA shape file using rgdal package - uncomment the commands below
#SA_shp<-readOGR(dsn="ZAF_adm",layer="ZAF_adm1_1")
#plot(SA_shp)
#setting library for bnd file using bayesX
## read shapefile into bnd object
<- file.path("ZAF_adm" , "ZAF_adm1_1")
shpname2 <- BayesX::shp2bnd(shpname = shpname2, regionnames = "ID_1", check.is.in = F)
SA_bnd_prov ## Reading map ... finished
## Note: map consists originally of 53 polygons
## Note: map consists of 9 regions
::drawmap(map=SA_bnd_prov) BayesX
<-BayesX::bnd2gra(SA_bnd_prov)
sagra## Start neighbor search ...
## progress: 50%
## progress: 100%
## Neighbor search finished.
- shp2bnd - shapefile to a boundary file
- bnd2gra - boundary to graph
Loading Hypertension dataset
library(haven)
<- haven::read_dta("hypertensiondat.dta")
hypertensiondata head(hypertensiondata, 3)
current_age | age_cat | region | type_residence | residence | education_level | wealth_level | hypertension | sex | hypertension2 | caseid | province | diabetes |
---|---|---|---|---|---|---|---|---|---|---|---|---|
45 | 7 | 7 | 1 | 1 | 2 | 2 | 0 | 0 | 0 | 1 | 3 | 0 |
60 | 10 | 7 | 1 | 1 | 2 | 2 | 0 | 0 | 0 | 2 | 3 | 0 |
30 | 4 | 7 | 1 | 1 | 2 | 3 | 0 | 0 | 0 | 3 | 3 | 0 |
recoding data to align with shapefile regions codes and names
<-hypertensiondata |>
hypertensiondatamutate(province =factor(province,
levels=c(1,2,3,4,5,6,7,8,9),
labels=c("Eastern cape",
"Freestate",
"Gauteng","Kwazulu natal",
"Limpopo", "Mpumalanga",
"North west",
"Northern cape",
"Western cape")),
residence = factor(residence,
levels=c(1,2),
labels=c("Urban", "Rural")),
education_level=factor(education_level,
levels = c(0,1,2,3),
labels = c("no education", "primary", "secondary","higher")),
wealth_level= factor(wealth_level,
levels = c(1,2,3,4,5),
labels = c("poorest","poor","middle","richer","richest")),
sex=factor(sex, levels = c(0,1), labels = c("male","female")))
Logistic regression model using Statamarkdown
#install.packages("Statamarkdown")
library(Statamarkdown)
cduse "C:\Users\a0006892\Downloads\CAR_models-master 2024\CAR_2024\hypertensiondat.dta", clear
describe
tab1 hypertension2
logistic hypertension2 i.sex i.residence current_age i.wealth_level
histogram(diabetes)
*
## C:\Users\r1953\Downloads\Spatial Tutorial
##
## file C:\Users\a0006892\Downloads\CAR_models-masternot found
## 2024\CAR_2024\hypertensiondat.dta r(601);
##
## r(601); ##
Running Bayesx model
library(colorspace)
library(R2BayesX)
<- bayesx(hypertension2 ~ sex + residence + wealth_level + sx(current_age) + sx(province, bs="mrf", map=sagra)
imodel + sx(province, bs="re"), family = "binomial", method = "MCMC", iterations = 12000, burnin = 2000,
step = 10, weights = NULL,subset = NULL, offset = NULL, na.action = NULL, seed = 12345,
predict = TRUE, data = hypertensiondata)
summary(imodel)
## Call:
## bayesx(formula = hypertension2 ~ sex + residence + wealth_level +
## sx(current_age) + sx(province, bs = "mrf", map = sagra) +
## sx(province, bs = "re"), data = hypertensiondata, weights = NULL,
## subset = NULL, offset = NULL, na.action = NULL, family = "binomial",
## method = "MCMC", iterations = 12000, burnin = 2000, step = 10,
## seed = 12345, predict = TRUE)
##
## Fixed effects estimation results:
##
## Parametric coefficients:
## Mean Sd 2.5% 50% 97.5%
## (Intercept) -1.7710 0.1226 -2.0097 -1.7741 -1.5310
## sexfemale 0.6363 0.0605 0.5228 0.6347 0.7545
## residenceRural -0.1413 0.0695 -0.2744 -0.1431 -0.0066
## wealth_levelpoor 0.4170 0.1050 0.2171 0.4170 0.6200
## wealth_levelmiddle 0.3824 0.0995 0.1836 0.3829 0.5747
## wealth_levelricher 0.4000 0.1026 0.1938 0.4042 0.5951
## wealth_levelrichest 0.4161 0.1009 0.2231 0.4150 0.6087
##
## Smooth terms variances:
## Mean Sd 2.5% 50% 97.5% Min Max
## sx(current_age) 0.0153 0.0155 0.0027 0.0102 0.0621 0.0016 0.1353
## sx(province):mrf 0.2329 0.3008 0.0015 0.1622 0.9720 0.0002 4.6444
##
## Random effects variances:
## Mean Sd 2.5% 50% 97.5% Min Max
## sx(province):re 0.0508 0.0727 0.0008 0.0240 0.2491 0.0003 0.8077
##
## N = 10294 burnin = 2000 method = MCMC family = binomial
## iterations = 12000 step = 10
Getting Odds Ratios
exp(coef(imodel))
Mean | Sd | 2.5% | 50% | 97.5% | |
---|---|---|---|---|---|
(Intercept) | 0.1701610 | 1.130458 | 0.1340302 | 0.1696428 | 0.2163214 |
sexfemale | 1.8895392 | 1.062383 | 1.6867760 | 1.8863675 | 2.1264544 |
residenceRural | 0.8682583 | 1.071962 | 0.7600394 | 0.8666787 | 0.9933815 |
wealth_levelpoor | 1.5174511 | 1.110705 | 1.2424249 | 1.5173661 | 1.8589355 |
wealth_levelmiddle | 1.4657822 | 1.104662 | 1.2014895 | 1.4664742 | 1.7765531 |
wealth_levelricher | 1.4918516 | 1.108018 | 1.2138778 | 1.4980571 | 1.8131760 |
wealth_levelrichest | 1.5160678 | 1.106205 | 1.2498856 | 1.5143526 | 1.8380496 |
#wald.test(b = coef(imodel), Sigma = vcov(imodel), Terms=3:4)
## odds ratios and 95% CI
exp(cbind(OR = coef(imodel), confint(imodel)))
OR.Mean | OR.Sd | OR.2.5% | OR.50% | OR.97.5% | 2.5% | 97.5% | |
---|---|---|---|---|---|---|---|
(Intercept) | 0.1701610 | 1.130458 | 0.1340302 | 0.1696428 | 0.2163214 | 0.1341410 | 0.2161817 |
sexfemale | 1.8895392 | 1.062383 | 1.6867760 | 1.8863675 | 2.1264544 | 1.6868416 | 2.1254891 |
residenceRural | 0.8682583 | 1.071962 | 0.7600394 | 0.8666787 | 0.9933815 | 0.7612302 | 0.9933567 |
wealth_levelpoor | 1.5174511 | 1.110705 | 1.2424249 | 1.5173661 | 1.8589355 | 1.2425264 | 1.8575973 |
wealth_levelmiddle | 1.4657822 | 1.104662 | 1.2014895 | 1.4664742 | 1.7765531 | 1.2021311 | 1.7761734 |
wealth_levelricher | 1.4918516 | 1.108018 | 1.2138778 | 1.4980571 | 1.8131760 | 1.2147972 | 1.8119689 |
wealth_levelrichest | 1.5160678 | 1.106205 | 1.2498856 | 1.5143526 | 1.8380496 | 1.2501492 | 1.8338462 |
Plotting some results
par(mar=c(1,1,1,1))
plot(imodel, term = "sx(current_age)")
Visualization of posterior mean of structured and unstructured spatial effects
plot all spatial effects
plot(imodel, term = c("sx(province):mrf", "sx(province):re"))
par(mar=c(1,1,1,1))
Plotting autocorrelation for age variance samples
plot(imodel, term = "sx(current_age)", which = "var-samples", acf = TRUE)
par(mar=c(1,1,1,1))
#plot(imodel, which = "max-acf")
<- samples(imodel, term = "linear-samples")
zs par(mar=c(1,1,1,1))
plot(zs)
extract model residuals and plot histogram
#par(mar=c(1,1,1,1))
hist(residuals(imodel))
Plot structured spatial effects maps etc
plot(imodel, term = "sx(province):mrf", map = SA_bnd_prov, main = "Structured spatial effect")
- Plot unstructured spatial effects
plot(imodel, term = "sx(province):re", map = SA_bnd_prov, main = "Unstructured spatial effect")
- Plot both structured and unstructured effects
plot(imodel, term = "sx(province):total", map = SA_bnd_prov, main = "Total spatial effect", digits = 4)
- Map the fitted spatial term only
$spatial <- fitted(imodel, term = "sx(province):mrf")[, "Mean"]
SA_shpplot(SA_shp, "spatial", main="fitted logit values")
Running same model with INLA
#install.packages("INLA", repos=c(getOption("repos"), INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)
#SA_OGR <- rgdal::readOGR("ZAF_adm/ZAF_adm1_1.shp")
<- sf::st_read("ZAF_adm/ZAF_adm1_1.shp")
SA_OGR ## Reading layer `ZAF_adm1_1' from data source
## `C:\Users\r1953\Downloads\Spatial Tutorial\ZAF_adm\ZAF_adm1_1.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 9 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 16.45802 ymin: -34.83514 xmax: 32.89125 ymax: -22.12661
## Geodetic CRS: WGS 84
plot(SA_OGR, border = "blue", axes = TRUE, las = 1)
#SA_valid <- data.table::data.table(valid = gIsValid(SA_OGR, byid = TRUE))
<- data.table::data.table(SA_OGR, byid = TRUE)
SA_valid sum(SA_valid$valid == F)
## [1] 0
#install.packages("https://cran.r-project.org/src/contrib/Archive/maptools/maptools_1.1-6.tar.gz", repos = NULL, type = "source")
## saving the Ssplus map in nam_shp folder
#maptools::sp2WB(map = as(SA_shp, "SpatialPolygons"), filename = "ZAF_adm/sa_spus")
library(maptools)
<- readShapePoly("SA_shp.shp",IDvar="OBJECTID",proj4string=CRS("+proj=longlat +ellps=clrk66"))
SA_shp_s ## Error in getinfo.shape(filen): Error opening SHP file
plot(SA_shp_s, border="blue", axes=TRUE, las=1)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'SA_shp_s' not found
#tf <- tempfile()
sp2WB(as(SA_shp_s, "SpatialPolygons"), filename="splus_911")
## Error: object 'SA_shp_s' not found
## create the nam_nb map using poly2nb function
<- spdep::poly2nb(SA_shp)
SA_nb
<- sapply(SA_nb, length)
num <- unlist(SA_nb)
adj
<- length(unlist(SA_nb))
sumNumNeigh
# Generate an adjacency
# Spasial model in INLA
nb2INLA("ZAF_adm/sa_inla.graph", SA_nb)
<- paste(getwd(), "/ZAF_adm/sa_inla.graph", sep = "")
sa_adj
#generate adjacency
<- inla.read.graph(filename = "ZAF_adm/sa_inla.graph")
H image(inla.graph2matrix(H), xlab = "", ylab = "")
add ID_1 to the data
<- readr::read_csv("ZAF_adm/ZAF_adm1_1`.csv")
hyp_id <- hypertensiondata |>
hypertensiondata mutate(NAME_1=province) |>
left_join(hyp_id |>
::select(NAME_0,ID_1,NAME_1) , by="NAME_1") dplyr
fitting CAR model
#image(inla.graph2matrix(H), xlab = "", ylab = "")
<- hypertension2 ~ 1 + as.factor(sex) + as.factor(residence) + current_age + as.factor(wealth_level) +
formula_inla ##our CAR specification
f(ID_1, model="bym",graph=sa_adj, scale.model=TRUE, ## spefiying the priors for the unstri and str
hyper=list(prec.unstruct=list(prior="loggamma",param=c(1,0.001)),
prec.spatial=list(prior="loggamma",param=c(1,0.001))))
<- inla(formula_inla,family="binomial", data=hypertensiondata, control.compute=list(dic=TRUE))
model_inla summary(model_inla)
## Time used:
## Pre = 0.859, Running = 2.08, Post = 0.266, Total = 3.21
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) -4.971 0.124 -5.214 -4.971 -4.729
## as.factor(sex)female 0.629 0.062 0.508 0.629 0.750
## as.factor(residence)Rural -0.260 0.063 -0.385 -0.260 -0.136
## current_age 0.067 0.002 0.064 0.067 0.070
## as.factor(wealth_level)poor 0.407 0.101 0.209 0.407 0.605
## as.factor(wealth_level)middle 0.402 0.100 0.206 0.402 0.598
## as.factor(wealth_level)richer 0.435 0.099 0.241 0.435 0.630
## as.factor(wealth_level)richest 0.462 0.098 0.270 0.462 0.654
## mode kld
## (Intercept) -4.971 0
## as.factor(sex)female 0.629 0
## as.factor(residence)Rural -0.260 0
## current_age 0.067 0
## as.factor(wealth_level)poor 0.407 0
## as.factor(wealth_level)middle 0.402 0
## as.factor(wealth_level)richer 0.435 0
## as.factor(wealth_level)richest 0.462 0
##
## Random effects:
## Name Model
## ID_1 BYM model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for ID_1 (iid component) 1097.78 1205.74 72.58 719.41
## Precision for ID_1 (spatial component) 10.32 7.50 2.00 8.39
## 0.975quant mode
## Precision for ID_1 (iid component) 4299.03 197.59
## Precision for ID_1 (spatial component) 29.92 5.20
##
## Deviance Information Criterion (DIC) ...............: 7813.98
## Deviance Information Criterion (DIC, saturated) ....: 7799.09
## Effective number of parameters .....................: 10.84
##
## Marginal log-Likelihood: -3947.62
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
#summary of fixed effects
round(model_inla$summary.fixed, 3)
mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
---|---|---|---|---|---|---|---|
(Intercept) | -4.971 | 0.124 | -5.214 | -4.971 | -4.729 | -4.971 | 0 |
as.factor(sex)female | 0.629 | 0.062 | 0.508 | 0.629 | 0.750 | 0.629 | 0 |
as.factor(residence)Rural | -0.260 | 0.063 | -0.385 | -0.260 | -0.136 | -0.260 | 0 |
current_age | 0.067 | 0.002 | 0.064 | 0.067 | 0.070 | 0.067 | 0 |
as.factor(wealth_level)poor | 0.407 | 0.101 | 0.209 | 0.407 | 0.605 | 0.407 | 0 |
as.factor(wealth_level)middle | 0.402 | 0.100 | 0.206 | 0.402 | 0.598 | 0.402 | 0 |
as.factor(wealth_level)richer | 0.435 | 0.099 | 0.241 | 0.435 | 0.630 | 0.435 | 0 |
as.factor(wealth_level)richest | 0.462 | 0.098 | 0.270 | 0.462 | 0.654 | 0.462 | 0 |
#summary of random effects
round(head(model_inla$summary.random$ID_1), 3)
ID | mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld |
---|---|---|---|---|---|---|---|
1 | 0.328 | 0.328 | -0.329 | 0.325 | 0.995 | 0.325 | 0.001 |
2 | 0.086 | 0.288 | -0.496 | 0.085 | 0.670 | 0.085 | 0.001 |
3 | -0.352 | 0.100 | -0.549 | -0.351 | -0.156 | -0.351 | 0.000 |
4 | 0.194 | 0.404 | -0.614 | 0.191 | 1.016 | 0.192 | 0.001 |
5 | -0.777 | 0.103 | -0.980 | -0.777 | -0.575 | -0.777 | 0.000 |
6 | -0.144 | 0.093 | -0.326 | -0.144 | 0.037 | -0.144 | 0.000 |
Plotting posterior graphs from INLA
Try this out in your own time….
Count Data at location level
For this we will use the COVID19 data
We use the Poisson log normal model for this analysis
Three steps for the lognormal model
Stage 1: The likelihood \[Y_{i}|\theta_{i}\sim Poisson(E_{i}\theta_{i})\] with \(i=1,2,\dots, n\) with \[log(\theta_{i})=\beta_{0}+\beta_{1}X_{i} + v_{i}\] where \(X_{i}\) is the covariate measured at county level (areal level)
Stage 2: The random effects (prior distribution) is \[v_{i}|\sigma^{2}_{i}\sim N(0,\sigma^{2}_{v})\]
stage 3: The hyperprior on the hyperparameters \(\beta_{0}, ~\beta_{1},~ \sigma^{2}_{v}\) is given by \[p(\beta_{0},\beta_{1}, \sigma^{2}_{v})=p(\beta_{0})p(\beta_{1})p(\sigma^{2}_{v})\]
priors are assumed independent and for ease of Bayesian implementation, we work with the precision \(\tau_{v}=\sigma^{-2}_{v}\) rather than \(\sigma^{2}_{v}\).
Lognormal spatial model with covariates
We now add spatial (ICAR) random effects to the model. and for this we need a graph file as before
INLA specification of the model
\[log(\theta_{i})=\beta_{0}+\beta_{1}X_{i}+u_{i}+v_{i}\] with \(u_{i}\) structured spatial effects with a CAR prior and \(v_{i}\) the unstructured random effects with an iid normal prior.
- Data exploration and computing standardized mortality rate
Modelling corona virus deaths in South Africa
Hypothesis: COVID deaths in South Africa have a significant spatial association
<- shapefile("ZAF_adm/district2.shp")
sadistricts
#attach(sadistricts)
Describe patterns of deaths with hypertension prevalence (assess association visually)
summary(sadistricts)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 16.45189 32.94498
## y -34.83417 -22.12503
## Is projected: FALSE
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Data attributes:
## CATEGORY DC_MDB_C ID_3 DC_MN_C
## Length:52 Length:52 Min. : 1.00 Min. :101.0
## Class :character Class :character 1st Qu.:13.75 1st Qu.:289.2
## Mode :character Mode :character Median :26.50 Median :522.5
## Mean :26.50 Mean :498.7
## 3rd Qu.:39.25 3rd Qu.:665.5
## Max. :52.00 Max. :947.0
## DC_MN_C_st DC_NAME DC_NAME_C MAP_TITLE
## Length:52 Length:52 Length:52 Length:52
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## PR_MDB_C PR_CODE PR_CODE_st PR_NAME
## Length:52 Min. :1.000 Length:52 Length:52
## Class :character 1st Qu.:2.000 Class :character Class :character
## Mode :character Median :5.000 Mode :character Mode :character
## Mean :4.615
## 3rd Qu.:6.250
## Max. :9.000
## ALBERS_ARE SHAPE_Leng SHAPE_Area FID_1
## Min. : 1645 Min. : 2.806 Min. : 0.1485 Min. : 1.00
## 1st Qu.: 7888 1st Qu.: 6.261 1st Qu.: 0.7269 1st Qu.:13.75
## Median : 15779 Median : 9.512 Median : 1.4313 Median :26.50
## Mean : 23477 Mean :10.011 Mean : 2.1766 Mean :26.50
## 3rd Qu.: 28934 3rd Qu.:12.311 3rd Qu.: 2.6284 3rd Qu.:39.25
## Max. :126836 Max. :28.727 Max. :11.9244 Max. :52.00
## district covidcases deaths expected
## Length:52 Min. : 345 Min. : 3.00 Min. : 6.90
## Class :character 1st Qu.: 4702 1st Qu.: 40.75 1st Qu.: 94.05
## Mode :character Median : 8906 Median : 82.00 Median : 178.12
## Mean : 19209 Mean : 248.00 Mean : 384.19
## 3rd Qu.: 18274 3rd Qu.: 259.25 3rd Qu.: 365.47
## Max. :138219 Max. :2764.00 Max. :2764.38
## hypertensi
## Min. :19.50
## 1st Qu.:24.38
## Median :30.05
## Mean :31.11
## 3rd Qu.:37.58
## Max. :56.00
<- st_as_sf(sadistricts)
sadistricts1 tm_shape(sadistricts1) +
tm_polygons(col = "hypertensi", title = "Hypertension") +
tm_bubbles(size = "deaths", col = "deaths", palette = "Blues",
style = "quantile", legend.size.show = FALSE, title.col = "Corona virus deaths",
border.col = "black", border.lwd = 0.1, border.alpha = 0.1) +
tm_layout(legend.text.size = 0.8, legend.title.size = 1.1,
frame = FALSE)
Pull the data points (information) from the map object for use in simple models
Simple poisson model withn intercept and random error. The exponentiated intercept is the average rate (relative risk rate) fo deaths across the districts.
<-inla(deaths ~ 1 + f(FID_1, model = "iid", param = c(0.5, 5e-04)), data=sadata, family = "poisson", E=expected)
deaths.fit1
summary(deaths.fit1)
## Time used:
## Pre = 0.407, Running = 0.236, Post = 0.0469, Total = 0.69
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.683 0.081 -0.844 -0.683 -0.525 -0.683 0
##
## Random effects:
## Name Model
## FID_1 IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for FID_1 3.27 0.71 2.05 3.21 4.82 3.10
##
## Marginal log-Likelihood: -288.30
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
exp(deaths.fit1$summary.fixed[1,])
mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
---|---|---|---|---|---|---|---|
(Intercept) | 0.5050132 | 1.08442 | 0.4299473 | 0.5052501 | 0.5916091 | 0.505248 | 1 |
Fitting non-spatial lognormal model with covariate
<-inla(deaths ~ 1 + hypertensi + f(FID_1, model = "iid", param=c(1, 0.014)), data=sadata, family = "poisson", E = expected, control.compute =list(dic = TRUE))
deaths.fit2
summary(deaths.fit2)
## Time used:
## Pre = 0.281, Running = 0.198, Post = 0.0397, Total = 0.518
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -1.273 0.306 -1.874 -1.274 -0.667 -1.274 0
## hypertensi 0.019 0.010 0.000 0.019 0.038 0.019 0
##
## Random effects:
## Name Model
## FID_1 IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for FID_1 3.64 0.808 2.26 3.57 5.42 3.43
##
## Deviance Information Criterion (DIC) ...............: 435.78
## Deviance Information Criterion (DIC, saturated) ....: 104.08
## Effective number of parameters .....................: 48.79
##
## Marginal log-Likelihood: -293.76
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
- Examine the posterior summaries for covariate, on the original (log scale- log RR) or exponential (RR)
$summary.fixed deaths.fit2
mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
---|---|---|---|---|---|---|---|
(Intercept) | -1.2729664 | 0.3063442 | -1.8738130 | -1.2738241 | -0.6672445 | -1.2738137 | 0 |
hypertensi | 0.0192011 | 0.0095942 | 0.0001816 | 0.0192459 | 0.0379660 | 0.0192455 | 0 |
$summary.fixed[2,] deaths.fit2
mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
---|---|---|---|---|---|---|---|
hypertensi | 0.0192011 | 0.0095942 | 0.0001816 | 0.0192459 | 0.037966 | 0.0192455 | 0 |
exp(deaths.fit2$summary.fixed[2,])
mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
---|---|---|---|---|---|---|---|
hypertensi | 1.019387 | 1.00964 | 1.000182 | 1.019432 | 1.038696 | 1.019432 | 1 |
Lognormal spatial model with covariates
#install.packages("SpatialEpi")
library(SpatialEpi)
<-sadistricts@data
sadatadata(sadata)
<- sadistricts@data$deaths
Y<- sadistricts@data$hypertensi
X<- sadistricts@data$expected
E<-Y/E
smr$smr<-smr
sadata
#plot the smr by district
par(mar = c(1, 1, 1, 1))
#mapvariable(sadata$smr, sadistrict.map)
#display.brewer.all()
@data$smr<-smr
sadistricts
<- st_as_sf(sadistricts)
sadistricts1 tm_shape(sadistricts1) + tm_fill("smr", style = "quantile", n=5, palette = "Blues", title = "Covid smr") +tm_borders(alpha = .4)+ tm_compass() + tm_layout(title = "South AFRICA, COVID2020", legend.text.size = 0.5, legend.title.size =0.5 ,legend.position = c("right", "bottom"), frame = FALSE)
#plot(sadistricts)
<- spdep::poly2nb(sadistricts)
sadist_nb nb2INLA("ZAF_adm/sadist_inla.graph", sadist_nb)
<-paste(getwd(), "/ZAF_adm/sadist_inla.graph", sep = "")
sadist_adj
#Take a look at the data. Uncomment the following command!
#dplyr::glimpse(sadata)
#add variables region 1 and region 2 into the dataset for
@data$region1<-sadata$FID_1
sadistricts@data$region2<-sadata$FID_1
sadistricts#The dataset is as below. Uncomment to see...
#sadistricts@data
Plotting the expected deaths
summary(sadistricts@data)
## CATEGORY DC_MDB_C ID_3 DC_MN_C
## Length:52 Length:52 Min. : 1.00 Min. :101.0
## Class :character Class :character 1st Qu.:13.75 1st Qu.:289.2
## Mode :character Mode :character Median :26.50 Median :522.5
## Mean :26.50 Mean :498.7
## 3rd Qu.:39.25 3rd Qu.:665.5
## Max. :52.00 Max. :947.0
## DC_MN_C_st DC_NAME DC_NAME_C MAP_TITLE
## Length:52 Length:52 Length:52 Length:52
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## PR_MDB_C PR_CODE PR_CODE_st PR_NAME
## Length:52 Min. :1.000 Length:52 Length:52
## Class :character 1st Qu.:2.000 Class :character Class :character
## Mode :character Median :5.000 Mode :character Mode :character
## Mean :4.615
## 3rd Qu.:6.250
## Max. :9.000
## ALBERS_ARE SHAPE_Leng SHAPE_Area FID_1
## Min. : 1645 Min. : 2.806 Min. : 0.1485 Min. : 1.00
## 1st Qu.: 7888 1st Qu.: 6.261 1st Qu.: 0.7269 1st Qu.:13.75
## Median : 15779 Median : 9.512 Median : 1.4313 Median :26.50
## Mean : 23477 Mean :10.011 Mean : 2.1766 Mean :26.50
## 3rd Qu.: 28934 3rd Qu.:12.311 3rd Qu.: 2.6284 3rd Qu.:39.25
## Max. :126836 Max. :28.727 Max. :11.9244 Max. :52.00
## district covidcases deaths expected
## Length:52 Min. : 345 Min. : 3.00 Min. : 6.90
## Class :character 1st Qu.: 4702 1st Qu.: 40.75 1st Qu.: 94.05
## Mode :character Median : 8906 Median : 82.00 Median : 178.12
## Mean : 19209 Mean : 248.00 Mean : 384.19
## 3rd Qu.: 18274 3rd Qu.: 259.25 3rd Qu.: 365.47
## Max. :138219 Max. :2764.00 Max. :2764.38
## hypertensi smr region1 region2
## Min. :19.50 Min. :0.1106 Min. : 1.00 Min. : 1.00
## 1st Qu.:24.38 1st Qu.:0.3683 1st Qu.:13.75 1st Qu.:13.75
## Median :30.05 Median :0.5642 Median :26.50 Median :26.50
## Mean :31.11 Mean :0.5755 Mean :26.50 Mean :26.50
## 3rd Qu.:37.58 3rd Qu.:0.8409 3rd Qu.:39.25 3rd Qu.:39.25
## Max. :56.00 Max. :1.0331 Max. :52.00 Max. :52.00
mapvariable(sadistricts@data$expected, sadistricts)
Plotting the standardized death rates
mapvariable(sadistricts@data$smr, sadistricts)
#check class of adjacency object
class(sadist_adj)
## [1] "character"
## [1] "character"
<- deaths ~ 1 + hypertensi + f(region1, model = "iid", param = c(1, 0.014)) +
formulaf(region2, model = "besag", graph = sadist_adj, param = c(1, 0.68))
<-inla(formula, data = sadistricts@data, family = "poisson", E = expected, control.predictor = list(compute = TRUE))
deaths.fit3
summary(sadistricts@data)
## CATEGORY DC_MDB_C ID_3 DC_MN_C
## Length:52 Length:52 Min. : 1.00 Min. :101.0
## Class :character Class :character 1st Qu.:13.75 1st Qu.:289.2
## Mode :character Mode :character Median :26.50 Median :522.5
## Mean :26.50 Mean :498.7
## 3rd Qu.:39.25 3rd Qu.:665.5
## Max. :52.00 Max. :947.0
##
## DC_MN_C_st DC_NAME DC_NAME_C MAP_TITLE
## Length:52 Length:52 Length:52 Length:52
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## PR_MDB_C PR_CODE PR_CODE_st PR_NAME
## Length:52 Min. :1.000 Length:52 Length:52
## Class :character 1st Qu.:2.000 Class :character Class :character
## Mode :character Median :5.000 Mode :character Mode :character
## Mean :4.615
## 3rd Qu.:6.250
## Max. :9.000
##
## ALBERS_ARE SHAPE_Leng SHAPE_Area FID_1
## Min. : 1645 Min. : 2.806 Min. : 0.1485 Min. : 1.00
## 1st Qu.: 7888 1st Qu.: 6.261 1st Qu.: 0.7269 1st Qu.:13.75
## Median : 15779 Median : 9.512 Median : 1.4313 Median :26.50
## Mean : 23477 Mean :10.011 Mean : 2.1766 Mean :26.50
## 3rd Qu.: 28934 3rd Qu.:12.311 3rd Qu.: 2.6284 3rd Qu.:39.25
## Max. :126836 Max. :28.727 Max. :11.9244 Max. :52.00
##
## district covidcases deaths expected
## Length:52 Min. : 345 Min. : 3.00 Min. : 6.90
## Class :character 1st Qu.: 4702 1st Qu.: 40.75 1st Qu.: 94.05
## Mode :character Median : 8906 Median : 82.00 Median : 178.12
## Mean : 19209 Mean : 248.00 Mean : 384.19
## 3rd Qu.: 18274 3rd Qu.: 259.25 3rd Qu.: 365.47
## Max. :138219 Max. :2764.00 Max. :2764.38
##
## hypertensi smr region1 region2 re_u
## Min. :19.50 Min. :0.1106 Min. : 1.00 Min. : 1.00 1 : 1
## 1st Qu.:24.38 1st Qu.:0.3683 1st Qu.:13.75 1st Qu.:13.75 2 : 1
## Median :30.05 Median :0.5642 Median :26.50 Median :26.50 3 : 1
## Mean :31.11 Mean :0.5755 Mean :26.50 Mean :26.50 4 : 1
## 3rd Qu.:37.58 3rd Qu.:0.8409 3rd Qu.:39.25 3rd Qu.:39.25 5 : 1
## Max. :56.00 Max. :1.0331 Max. :52.00 Max. :52.00 6 : 1
## (Other):46
## re_v
## 1 : 1
## 2 : 1
## 3 : 1
## 4 : 1
## 5 : 1
## 6 : 1
## (Other):46
summary(deaths.fit3)
## Time used:
## Pre = 0.565, Running = 0.398, Post = 0.0777, Total = 1.04
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.435 0.346 -1.114 -0.435 0.247 -0.435 0
## hypertensi -0.008 0.011 -0.030 -0.008 0.014 -0.008 0
##
## Random effects:
## Name Model
## region1 IID model
## region2 Besags ICAR model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for region1 85.32 98.079 6.038 54.90 344.34 16.52
## Precision for region2 1.46 0.389 0.844 1.41 2.37 1.32
##
## Marginal log-Likelihood: -318.40
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
Plotting the maps
#scan the output data structure- uncomment to see
#deaths.fit3$summary.random$region1
#pull the 5th column and exponentiate
<- exp(deaths.fit3$summary.random$region1[5])
Residual_nonspatial <- exp(deaths.fit3$summary.random$region2[5])
Residual_spatial par(mfrow = c(1, 1), mar = c(0.1, 0.1, 0.1, 0.1))
mapvariable(Residual_nonspatial[, 1], sadistricts)
Spatial effects
par(mfrow = c(1, 1), mar = c(0.1, 0.1, 0.1, 0.1))
mapvariable(Residual_spatial[, 1], sadistricts)
About 90% of COVID deaths in the Western Cape is spatially related and this can be due to socio-economic dynamics in that region and more specifically those districts.