Opioid Data

Here we utilize the opioid data on the simplified subtree. This is an extension of previous work done for STAT520A. We compare the root estimates generated by the updated Bayesian model which corrects the binomial distribution to include the relationships between the nodes, to those generated using the WMM on logged data. The data for this subtree is contained in leaves \(B,C\), making this a variation of the subtree in the intuition documents. The previous model used seperate binomial distributions for each of these nodes, with parameters \(nodeA\) and \(q\) or \(q-1\) (for nodes \(B\) and \(C\), respectively). The updated model keeps this distribution for node \(B\), but sets node \(C\) to be \(Binomial(nodeA-nodeB, 0.9999)\), simply because I have not yet determined how to effectively placing all mass on \(nodeA-nodeB\) and setting node \(C\) deterministically, conditional on the values of nodes \(A,B\) (the upper bound of the discrete uniform in blang is not inclusive, so I tried this, but there was some recurring error with the annealing parameters in this set up). In each case, we first choose priors on \(N, p,q\) analogous to that used in the previous analysis: \(N \sim NB(30000,0.5)\), \(p \sim Beta(3,9)\), and \(q \sim Beta(27,4)\). These will be changed later to incorporate best guesses.

Updated WMM results - log-data

treedata <- read.table("MM_simtree.txt", sep = ",", header = TRUE)
treedata %>%
  knitr::kable() %>%
  kable_styling()

From	To	Est4	Total4	TrueEst	Pop4	Desc	Prob	ProbLabel
N	A	2	10	0	FALSE	A	0.10	p
N	D	8	10	0	FALSE	D	0.90	1-p
A	B	26	29	2279	TRUE	B	0.95	q
A	C	3	29	153	TRUE	C	0.05	1-q

# change data to match opioid data
opioiddata <- treedata
opioiddata$Estimate <- opioiddata$Est4
opioiddata$Total <- opioiddata$Total4
opioiddata$Population <- opioiddata$Pop4

# create tree with uniform data
tree <- FromDataFrameNetwork(opioiddata)

# calculate estimates from A,B using 5000 simulated p,q values
weightedPopTree(tree,sample_length=5000,method='weightedEstimate')

## [1] "using weighted mean estimates"

tree$Do(function(node) {
  if(node$isRoot){
    node$uncertainty <-pop.conf.int(tree)
    }
  else{
    node$uncertainty <- prettyConfidenceIntervals(node$overdose_samples)
    }}
  )

# create dataframe of N estimates from each leaf
x <- tree$Get('overdose_samples', filterFun = isLeaf, traversal = 'post-order')
# Ignore NAs and then also choose leaves with population=TRUE values only (these are the data)
if(is.list(x)){    
  mat.x <- NULL
  for (i in 1:length(x)) {
    if (length(x[[i]] > 0)) {
      mat.x <- cbind(mat.x, x[[i]])
      colnames(mat.x)[dim(mat.x)[2]] <- rownames(x)[[i]]
    }
    }
  x <- mat.x
  }
x <- as.data.frame(x)
  
# rename column to be of the correct leaves (with population=TRUE values)
g <- which(tree$Get('Population', filterFun = isLeaf, traversal = 'post-order'))
pop.x <- x %>% 
  select(g)

pop.x <- log(pop.x)     #take log of esimates to use in weighted mean
opioid.pop.x <- exp(pop.x)

# generate values \hat{N} to plot
weights <- as.matrix(pop.ko.weights(tree)) #weights are internally calculated with log(data)
opioidweights <- weights
opioidNs <- as.data.frame(as.matrix(pop.x) %*% t(weights))
opioidNs <- round(exp(opioidNs)) # transform to population values using exp
opioidNs.restrict <- opioidNs[opioidNs$V1>=0 & opioidNs$V1<=60000, , drop = FALSE]
p <- ggplot(opioidNs.restrict, aes(x = V1)) + geom_density(alpha=.2, fill="#FF6666") + 
   ggtitle("log WMM estimates for opioid data", subtitle = "One MM pass and set of weights, 5000 p,q estimates")
p

summary(opioidNs)

##        V1        
##  Min.   :  3219  
##  1st Qu.:  7966  
##  Median : 11326  
##  Mean   : 14561  
##  3rd Qu.: 16630  
##  Max.   :319616

We can see from the mean and median of this distribution from the above summary. This method assumes no prior distribution on \(N\), and estimates far lower values for the population at this node than the Bayesian model.

Updated Bayes Model - corrected node relationships

The posterior distribution of \(N\) generated using the updated Bayes model with the opioid dataset produces the following posterior distribution of \(N\): Negative binomial prior on N, p,q The median and mean values were 29991 and 29994, respectively, with a standard deviation of 244 and high-density interval \((29603, 30395)\).

Compared with prior \(N \sim DU(20000,50000)\), the posterior of \(N\) is: Discrete Uniform prior on N, p,q Clearly, the prior distribution is having a large effect on the posterior of \(N\). Because of this, we move to trying the model incorporating the prior knowledge from PHAC. So far, only the WMM is producing an estimate of \(N\) which is in line with expectations (see below).

Incorporating PHAC estimates

We repeat the experiment with prior knowledge from PHAC. PHAC estimates that up to 40% of overdoses may be unattended by healthcare Since there are 34113 healthcare (HC) attended overdoses, this imposes an upper bound of 22742 HC unattended overdoses in the same period of time. Furthermore, they estimate that at least 46107 overdoses have occurred in the time period, suggesting at least 11994 overdoses were not attended by HC. We use these bounds with a discrete uniform prior on \(N\), so that \(N \sim DU(11994, 22742)\).

PHAC has also provided an estimate of the proportion of deaths in HC unattended overdoses, \(p\), to be between 5-10%. For now, we use \(p \sim Beta(3,30)\) to incorporate this (there is likely a much better way, but this puts nearly all the mass below 20%). For the estimate of \(q\), we are restricted to estimates derived from the opioid data itself. Further guidance from PHAC is needed here, but we use a \(q \sim Beta(50,1)\) distribution to reflect the fact that nearly all opioid overdoses were assumed to have been included in BC Coroners data before the discrepancy was highlighted by the cohort data.

WMM estimate

We update the WMM estimate using the above distribution of \(p,q\):

# change data to match opioid data
opioiddata <- treedata
opioiddata$Est4 <- c(2, 29, 49, 0)
opioiddata$Total4 <- c(31, 31, 49, 49)
opioiddata$Estimate <- opioiddata$Est4
opioiddata$Total <- opioiddata$Total4
opioiddata$Population <- opioiddata$Pop4

# create tree with uniform data
tree <- FromDataFrameNetwork(opioiddata)

# calculate estimates from A,B using 5000 simulated p,q values
weightedPopTree(tree,sample_length=5000,method='weightedEstimate')

## [1] "using weighted mean estimates"

tree$Do(function(node) {
  if(node$isRoot){
    node$uncertainty <-pop.conf.int(tree)
    }
  else{
    node$uncertainty <- prettyConfidenceIntervals(node$overdose_samples)
    }}
  )

# create dataframe of N estimates from each leaf
x <- tree$Get('overdose_samples', filterFun = isLeaf, traversal = 'post-order')
# Ignore NAs and then also choose leaves with population=TRUE values only (these are the data)
if(is.list(x)){    
  mat.x <- NULL
  for (i in 1:length(x)) {
    if (length(x[[i]] > 0)) {
      mat.x <- cbind(mat.x, x[[i]])
      colnames(mat.x)[dim(mat.x)[2]] <- rownames(x)[[i]]
    }
    }
  x <- mat.x
  }
x <- as.data.frame(x)
  
# rename column to be of the correct leaves (with population=TRUE values)
g <- which(tree$Get('Population', filterFun = isLeaf, traversal = 'post-order'))
pop.x <- x %>% 
  select(g)

pop.x <- log(pop.x)     #take log of esimates to use in weighted mean
opioid.pop.x <- exp(pop.x)

# generate values \hat{N} to plot
weights <- as.matrix(pop.ko.weights(tree)) #weights are internally calculated with log(data)
opioidweights <- weights
opioidNs <- as.data.frame(as.matrix(pop.x) %*% t(weights))
opioidNs <- round(exp(opioidNs)) # transform to population values using exp
opioidNs.restrict <- opioidNs[opioidNs$V1>=0 & opioidNs$V1<=60000, , drop = FALSE]
p <- ggplot(opioidNs.restrict, aes(x = V1)) + geom_density(alpha=.2, fill="#FF6666") + 
   ggtitle("log WMM estimates for opioid data", subtitle = "One MM pass and set of weights, 5000 p,q estimates")
p

summary(opioidNs)

##        V1        
##  Min.   :  7413  
##  1st Qu.: 19671  
##  Median : 28232  
##  Mean   : 37452  
##  3rd Qu.: 43271  
##  Max.   :504018

In this case, we see a higher mode near \(N=20000\), with a mean of 28478 and a median of 37098.

Bayes model

We update the Bayesian model posterior using the above priors: Opioid data, PHAC estimates There is not a clearly defined mode in this posterior, though the weight is more heavily skewed towards the upper range. The high-density interval was \((14081, 22741)\), the median 18686, and the mean 18350, with a standard deviation of 2873.