knitr::opts_chunk$set(echo = TRUE)
The CAGE is a four-item questionnaire for screening for alcohol use disorder. A patient can answer “yes” to zero, one, two, three, or four questions. Mayfield conducted a CAGE study in 1974 with 366 patients, recorded the number of CAGE questions that were answered “Yes” (range was 0 to 4). A psychiatric social worker independently conducted in-depth interviews and each patient was classified by whether he or she had alcohol use disorder or not (AUD: “Y”, “N”). The AUD status was considered the gold standard. The study results are available here: https://raw.githubusercontent.com/taragonmd/data/master/cage.csv
Read in the data set and display a two-way continency table using the xtabs
function. The column headings should be the AUD
status.
data<- read.table("https://raw.githubusercontent.com/taragonmd/data/master/cage.csv", header = TRUE, sep = ",")
table<-xtabs(~num_yes+AUD, data=data); table
## AUD
## num_yes N Y
## 0 177 14
## 1 22 12
## 2 20 20
## 3 5 43
## 4 0 53
Recode the num_yes
variable into a new factor variable called CAGE
:
Factor | Level | num_yes values |
---|---|---|
CAGE |
pos |
>= 3 |
neg |
<= 2 |
Use the xtabs
function to display 2 by 2 table of CAGE by AUD.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- data %>%
mutate(CAGE = case_when(num_yes>=3 ~ "pos", num_yes<=2~"neg"))
table2<-xtabs(~ CAGE+AUD, data=data); table2
## AUD
## CAGE N Y
## neg 219 46
## pos 5 96
The sensitivity is P(CAGE = pos | AUD = Y), and the specificity is P(CAGE = neg | AUD = N).
table3<-addmargins(table2); table3
## AUD
## CAGE N Y Sum
## neg 219 46 265
## pos 5 96 101
## Sum 224 142 366
sens<-table3[5]/table3[6]
spec<-table3[1]/table3[3]
sens; spec
## [1] 0.6760563
## [1] 0.9776786
You have learned that the posterior (post-test) probability is a function of the prior (pre-test) probability, and sensitivity and specificity of the diagnostic test (in this case, the CAGE questionnaire). For alcohol use disorder (AUD), the positive predictive value (PPV) is the probability that a patient has AUD given that they answered “yes” to 3 or 4 items on the CAGE questionnaire.
Using the sensivity and specificity you calculated above, plot a graph displaying on how PPV changes as a function of the prior probability. Some of the R code is provided, including Bayes’ theorem for calculating PPV.
## plot of PPV vs prior probability (prior)
## use `sens` and `spec` from previous results
prior <- seq(from= 0, to = 1, by=0.05)
ppv <- (sens * prior) / (sens * prior + (1 - spec) *(1 - prior)); ppv
plot(ppv, prior)
Install the bnlearn
package as described here: http://www.bnlearn.com/. The causal model is AUD -> CAGE. Create a Bayesian network (BN) model by fitting the causal model to data (I have done this for you).
Assuming the prior probability of AUD is 5%, use the cpquery
function from bnlearn
package to calculate the PPV; that is P(AUD = “Y” | CAGE = “pos”).
library(bnlearn)
##
## Attaching package: 'bnlearn'
## The following object is masked from 'package:stats':
##
## sigma
library(Rgraphviz)
## Loading required package: graph
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:bnlearn':
##
## path, score
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter,
## Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, Position, rank, rbind, Reduce, rownames, sapply,
## setdiff, sort, table, tapply, union, unique, unsplit, which,
## which.max, which.min
##
## Attaching package: 'graph'
## The following objects are masked from 'package:bnlearn':
##
## degree, nodes, nodes<-
## Loading required package: grid
curl <- 'https://raw.githubusercontent.com/taragonmd/data/master/cage2.csv'
aud <- read.csv(curl, header = TRUE)
summary(aud)
## AUD CAGE
## N:224 neg:265
## Y:142 pos:101
dag <- empty.graph(nodes=c("AUD", "CAGE")) # Build the Bayesian network
dag <- set.arc(dag, from = "AUD", to = "CAGE")
graphviz.plot(dag, layout = "circo") # Plot the BN
bn <- bn.fit(dag, data = aud) # Fit the BN to the data
#Monte Carlo simulation
(ppv1 <- cpquery(bn, event = (AUD == "Y"), evidence = (CAGE == "pos"),
n = 10^6))
## [1] 0.9499401
#New prior
bn$AUD$prob
## N Y
## 0.6120219 0.3879781
bn_newprior <- bn
bn_newprior$AUD <- array(c(.95,.05), dim = dim(bn$AUD$prob),
dimnames = dimnames(bn$AUD$prob)) # assign new probs
bn_newprior$AUD$prob
## N Y
## 0.95 0.05
(ppv2 <- cpquery(bn_newprior, event = (AUD == "Y"), evidence = (CAGE == "pos"), n = 10^6))
## [1] 0.612345