1. Reading in CSV file

The CAGE is a four-item questionnaire for screening for alcohol use disorder. A patient can answer “yes” to zero, one, two, three, or four questions. Mayfield conducted a CAGE study in 1974 with 366 patients, recorded the number of CAGE questions that were answered “Yes” (range was 0 to 4). A psychiatric social worker independently conducted in-depth interviews and each patient was classified by whether he or she had alcohol use disorder or not (AUD: “Y”, “N”). The AUD status was considered the gold standard. The study results are available here: https://raw.githubusercontent.com/taragonmd/data/master/cage.csv Read in the data set and display a two-way continency table using the xtabs function. The column headings should be the AUD status.

### Reading the data set
Read.cage<- read.csv ("https://raw.githubusercontent.com/taragonmd/data/master/cage.csv", header = TRUE, sep = ',')

### Displaying a two-way continency table using the xtabs function
cage2by2<- xtabs(~ num_yes + AUD, data = Read.cage)
cage2by2
##        AUD
## num_yes   N   Y
##       0 177  14
##       1  22  12
##       2  20  20
##       3   5  43
##       4   0  53

2. Recode integer variable into a factor with two levels

Recode the num_yes variable into a new factor variable called CAGE:

Read.cage$CAGE[Read.cage$num_yes >= 3]<- 'Pos'
Read.cage$CAGE[Read.cage$num_yes <= 2] <- 'Neg'
Recode.cage2by2<- xtabs(~ CAGE + AUD, data = Read.cage)
Recode.cage2by2
##      AUD
## CAGE    N   Y
##   Neg 219  46
##   Pos   5  96

3. Calculate the sensitivity and specificity of the CAGE questionnaire

The sensitivity is P(CAGE = pos | AUD = Y), and the specificity is P(CAGE = neg | AUD = N)

Sensitivity

### Sensitivity
Sensitivity <- 96/142
Sensitivity
## [1] 0.6760563

Specificity

Specificity <- 219/224
Specificity
## [1] 0.9776786

4. Posterior probabilities as a function of prior probabilities

You have learned that the posterior (post-test) probability is a function of the prior (pre-test) probability, and sensitivity and specificity of the diagnostic test (in this case, the CAGE questionnaire). For alcohol use disorder (AUD), the positive predictive value (PPV) is the probability that a patient has AUD given that they answered “yes” to 3 or 4 items on the CAGE questionnaire. Using the sensivity and specificity you calculated above, plot a graph displaying on how PPV changes as a function of the prior probability. Some of the R code is provided, including Bayes’ theorem for calculating PPV.

## plot of PPV vs prior probability (prior)
## use Sensitivity and Specificity from previous results
prior <- seq (from = 0, to = 1, by= 0.05)
ppv <- (Sensitivity * prior) / (Sensitivity * prior + (1 - Specificity) *(1 - prior))
plot(x=prior, y=ppv)
lines(x=prior,y=ppv)

5. Bayesian network for probabilistic reasoning

Install the bnlearn package as described here: http://www.bnlearn.com/. The causal model is AUD -> CAGE. Create a Bayesian network (BN) model by fitting the causal model to data.

library(bnlearn)
## 
## Attaching package: 'bnlearn'
## The following object is masked from 'package:stats':
## 
##     sigma
curl <- 'https://raw.githubusercontent.com/taragonmd/data/master/cage2.csv'
aud <- read.csv(curl, header = TRUE)
summary(aud)
##  AUD      CAGE    
##  N:224   neg:265  
##  Y:142   pos:101
dag <- empty.graph(nodes=c("AUD", "CAGE")) # Build the Bayesian network
dag <- set.arc(dag, from = "AUD", to = "CAGE")
graphviz.plot(dag, layout = "circo") # Plot the BN
## Loading required namespace: Rgraphviz

bn <- bn.fit(dag, data = aud) # Fit the BN to the data
# cpquery

Assuming the prior probability of AUD is 5%, use the cpquery function from bnlearn package to calculate the PPV; that is P(AUD = “Y” | CAGE = “pos”).

bn_newprior <- bn
bn_newprior$AUD <- array(c(0.95,0.05), dim = dim(bn$AUD$prob),
dimnames = dimnames(bn$AUD$prob)) 
bn_newprior$AUD$prob 
##    N    Y 
## 0.95 0.05
PPV<-cpquery(bn_newprior, event = (AUD == "Y"), evidence = (CAGE == "pos"), n= 366)
PPV
## [1] 0.8461538

6. Deconfounding: IPW with stratified contingency table

In 1986, Charig published a retrospective observational study comparing open surgery (Treatment A) to percutaneous nephrolithotomy (Treatment B) for the treatment of kidney stones. Treatment B is noninvasive. Success was defined as no stones at three months.

Urologists select open surgery (Treatment A) for more severe cases, and larger stones are considered more severe. In other words, severity causes treatment selection, and severity lowers success. Stone size (severity) is a common cause or fork that may be confounding the causal relationship between treatment and success (outcome). Review Section 4.3 from https://escholarship.org/uc/item/8000r5m5. The study results are available here: https://github.com/taragonmd/data/blob/master/kidney.csv Read in the data set. Using the inverse probability weighting (IPW) method (see Section 4.3.1.1) and report on the average causal effect (ACE) in terms of the following:

(a) causal risk difference

(b) causal risk ratio

(c) causal odds ratio

## Read in the data set
Stones.data <- read.csv ("https://raw.githubusercontent.com/taragonmd/data/master/kidney.csv", header = TRUE, sep = ',')

## Calculating (a) causal risk difference (b) causal risk ratio (c) causal odds ratio
tab_sutrst <- xtabs(~ success+ treatment+ stone_size, data = Stones.data)
jprob_sutrst <- prop.table(tab_sutrst) # P(su, tr, st)
tab_trst <- xtabs(~ treatment + stone_size, data = Stones.data)
cprob_trst <- prop.table(tab_trst, 2) # P(tr | st)
jprob_sutrst_ipw <- sweep(jprob_sutrst, c(2,3), cprob_trst, '/') # IPW array
cprob_sutr_ipw <- apply(jprob_sutrst_ipw, c(1,2), sum) # sum over G
R0 <- cprob_sutr_ipw['Y','A'] 
R1 <- cprob_sutr_ipw['Y','B'] 
#### Average causal effects (ACEs)
c(causal_RD=R1-R0, causal_RR=R1/R0, causal_OR=(R1/(1-R1))/(R0/(1-R0)))
##   causal_RD   causal_RR   causal_OR 
## -0.05367122  0.93553365  0.70846195

7. Deconfounding: IPW with marginal structural model (MSM)

Repeat the analysis above using IPW/marginal structural model (MSM) that uses the propensity scores and the generalized linear model (glm function in R). Edit the R code from Section 4.3.1.2 and report on the average causal effect (ACE) in terms of the following:

(a) causal risk difference

(b) causal risk ratio

(c) causal odds ratio

You will need to install the tableone, survey, and sandwich package

## Installing the tableone, survey, and sandwich package

library(tableone)
library(sandwich)
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
Stones.data2 <- Stones.data # copy original data set with factors
Stones.data2$success <- as.numeric(Stones.data2$success=='Y')
Stones.data2$treatment <- as.numeric(Stones.data2$treatment=='B')
Stones.data2$stone_size <- as.numeric(Stones.data2$stone_size=='large')

#### look at a table 1
table1 <- CreateTableOne(vars = "stone_size", strata = "treatment",
data = Stones.data2, test = FALSE)
#### include standardized mean difference (SMD)
print(table1,smd=TRUE)
##                         Stratified by treatment
##                          0           1           SMD   
##   n                       350         350              
##   stone_size (mean (SD)) 0.75 (0.43) 0.23 (0.42)  1.225
#### propensity score model
psmodel <- glm(treatment ~ stone_size, family = binomial(link ="logit"), data = Stones.data2)
#### value of propensity score for each subject
## ps <- psmodel$fitted.values # alternative
ps <- predict(psmodel, type = "response")
#### create IP weights
Stones.data2$wt <- ifelse(Stones.data2$treatment==1, 1/(ps), 1/(1-ps))
#### apply weights to data
weighteddata <- svydesign(ids = ~ 1, data =Stones.data2, weights = ~ wt)
#### weighted table 1
weightedtable <- svyCreateTableOne(vars = "stone_size", strata = "treatment",
data = weighteddata, test = FALSE)
#### Show table with SMD
print(weightedtable, smd = TRUE)
##                         Stratified by treatment
##                          0             1             SMD   
##   n                      700.00        700.00              
##   stone_size (mean (SD))   0.49 (0.50)   0.49 (0.50) <0.001
#### Causal Risk Difference with weigthed GLM
glm_obj <- glm(success ~ treatment, weights = wt, family =
quasibinomial(link="identity"), data = Stones.data2)
beta_ipw <- unname(coef(glm_obj))
SE <- unname(sqrt(diag(vcovHC(glm_obj, type="HC0"))))
#### get point estimate and CI for risk difference
cRD <- beta_ipw[2]
conf_level = 0.95
Z <- qnorm((1 + conf_level)/2)
cint <- beta_ipw[2] + c(-1,1) * Z * SE[2]
list(causal_RD = cRD, conf_int = cint, conf_level = conf_level )
## $causal_RD
## [1] -0.05367122
## 
## $conf_int
## [1] -0.12154033  0.01419789
## 
## $conf_level
## [1] 0.95
#### Causal Risk Ratio with weighted GLM
glm_obj <- glm(success ~ treatment, weights = wt, family =
quasibinomial(link = log), data = Stones.data2)
beta_ipw <- unname(coef(glm_obj))
#### To account for weighting, use asymptotic (sandwich) variance
SE <- unname(sqrt(diag(vcovHC(glm_obj, type="HC0"))))
cRR <- exp(beta_ipw[2])
conf_level = 0.95
Z <- qnorm((1 + conf_level)/2)
cint <- exp(beta_ipw[2] + c(-1,1) * Z * SE[2])
list(causal_RR = cRR, conf_int = cint, conf_level = conf_level)
## $causal_RR
## [1] 0.9355336
## 
## $conf_int
## [1] 0.8590792 1.0187923
## 
## $conf_level
## [1] 0.95
#### Causal Odds Ratio with weighted GLM
glm_obj <- glm(success ~ treatment, weights = wt, family =
quasibinomial(link = logit), data = Stones.data2)
beta_ipw <- unname(coef(glm_obj))
#### To account for weighting, use asymptotic (sandwich) variance
SE <- unname(sqrt(diag(vcovHC(glm_obj, type="HC0"))))
#### get point estimate and CI for odds ratio
cOR <- exp(beta_ipw[2])
conf_level = 0.95
Z <- qnorm((1 + conf_level)/2)
cint <- exp(beta_ipw[2] + c(-1,1) * Z * SE[2])
list(causal_OR = cOR, conf_int = cint, conf_level = conf_level)
## $causal_OR
## [1] 0.7084619
## 
## $conf_int
## [1] 0.4617441 1.0870053
## 
## $conf_level
## [1] 0.95