See CATIE TRS: Missing Data Analysis - v3.0 for loss to follow-up analyses, that finishes by storing survival_tabulated_data_v3.RData for use in this notebook.
testTable : exposing differences in baseline data for missing-to-follow-up and the ancillary prop.race and prop.marital for the multi-level baseline variables race and marital.allIDs is to verify the integrity or row alignment of missingTab with tab.Survtab.Surv is the table of data used for survival analyses / descriptivesmissingTab is tab.Surv formatted for missing data analysesThis notebook details how to analyse tab.Surv for survival curves, incidence rate estimation and accounting for missing data at baseline.
rm( list = ls() )
require(dplyr)
require(knitr)
require(kableExtra)
options(knitr.kable.NA = '')
require(reshape2)
load("survival_tabulated_data_v3.RData")tab.SurvThe data.frame tab.Surv data dictionary is a little complex, so there details are:
Identifiers:
ID - Participant ID numberTiming:
time.inTrial - Total time in the trial – this will be the contributed person years for this participanttime.inTrialYrs = time.inTrial / 365 (in years)
Time of “onset” of individual criteria including if a participant ever meets individual criteria
NA the participant never met Sx, SOF or Rx criteriaOtherwise, time they met criteria in days
onset.Sx : the day absolute symptoms triggered above threshold (this does not mean the change in symptoms meets TRS criteria < 20%)onset.SOF : the day SOF triggered above thresholdonset.Rx : the day adequate trials triggered threshold of \(\geq 2\)
NA the participant never met the criteria, otherwise, times are in daystime.onset.TRS : the time at which Sx and SOF triggered above threshold, denoting the baseline for which PANSS is assessedtime.onset.TRSYrs is time.onset.TRS in yearstime.TRS : the time confirmed TRS (if at all) = some day after time.onset.TRS and adequate treatment condition met.onset.Rx (i.e. when adequate treatments \(\geq 2\)), we then check the corresponding Sx and SOF for response, and if none (by the criteria in TRRIP) we label this as the time TRS event happenstime.TRSYrs = time.TRS in yearsSummary of treatments:
numAdeq : the number of adequate trials (by TRRIP criteria on dose, duration, concordance)
durAdeq : the total duration of adequate trialstotalRx : total number of drug treatments, whether adequate or notNA then this data is missing / could not be established from public CATIE datasetsSummary of “Caseness”: (independent of time)
status.rx : the participant did at some point meet TRS criteria for adequate treatmentsstatus.sx : the participant did at some point meet TRS criteria for symptom persistancestatus.sof : the participant did at some point meet TRS criteria for SOF0/1, with 1 meaning positive, 0 negativestatus.TRS : overall, flag for TRS caseness where 1 = censored (did not convert) 2 = converted to TRSTRS domains: (Only for those participants reaching full TRS criteria with status.TRS = 2)
TRS.pos : treatment resistance in the positive domainTRS.neg : treatment resistance in the negative domainMissing Data Flags: (missing enough data that follow-up cannot be performed)
1, else 0missing.Sx : less than 1 record for symptom trajectorymissing.Rx : less than 1 record for treatments trajectorymissing.SOF : less than 1 record for SOF trajectorytab.Surv retrieved from CATIE data = 1444From missingDataAnalysis_v3_0.Rmd – we know that the following variables differed between those missing data for TRS assessment and those with data present:
kable( testTable$Var[ which( testTable$P.Value < 0.05 ) ] )| x |
|---|
| exac3mo |
| yrsEduc |
| olzB0 |
| otherB0 |
| race |
| Lipid |
| STI |
| alcDep_5yrs |
| alcAbuse_5yrs |
| drugDep_5yrs |
| time.inTrialYrs |
exac3months (exacerbation in 3 months prior to start of the trial) would reasonably predict missingnessyrsEduc (years of education) similarly, would be reasonbly expected to predict missingnessolzB0 (on olanzapine at baseline) and otherB0 (taking an antipsychotic medication not used in CATIE) suggests that baseline medication might be relevantrace would be a standard demographic to include, and was different between groupsLipid (having high lipids) is likely because there were more patients on olanzapine in those not missing data than those missing, so raised lipids (also high in those not missing data) makes senseSTI (having a sexually transmitted infection) is less interesting and not clearly explicable for being more prevalent in those missing dataalcDep_5yrs, alcAbuse_5yrs and drugDep_5yrs all point toward substance use, which would be expected to influence missingnessThe original CATIE paper
Lieberman, J. A., Stroup, T. S., McEvoy, J. P., Swartz, M. S., Rosenheck, R. A., Perkins, D. O., … Hsiao, J. K. (2005). Effectiveness of Antipsychotic Drugs in Patients with Chronic Schizophrenia. New England Journal of Medicine, 353(12), 1209–1223.
identifies baseline/demographic for the population. We use these as the basis for building a model of missingness.
NEJM.pred <- c("age", "sex", "race", "yrsEduc", "marital", "employFT", "exac3mo", "P", "N", "G",
"CGIsev", "yrsFirstTx", "yrsFrstAntiPsyRx", "Depression", "alcDep_5yrs", "alcAbuse_5yrs",
"drugDep_5yrs", "drugAbuse_5yrs", "OCD_5yrs",
"olzB0", "quetB0", "rispB0", "zipB0", "halB0", "decaB0", "perB0", "otherB0",
"DM","Lipid","HTN", "missingFU.any")Resulting in 98 participants for which one of these variables is not available. We’ll refer back to the multiple univariate analysis to see if these differed between missing / non-missing groups to justify dropping them for further missing-ness analyses:
NEJM.missing <- missingTab[ , NEJM.pred ]
find.missing <- apply( NEJM.missing, 2, is.na )
missing.labels <- apply( find.missing, 2, which )
# number missing each variable
kable( t( lapply( missing.labels, length ) ) )| age | sex | race | yrsEduc | marital | employFT | exac3mo | P | N | G | CGIsev | yrsFirstTx | yrsFrstAntiPsyRx | Depression | alcDep_5yrs | alcAbuse_5yrs | drugDep_5yrs | drugAbuse_5yrs | OCD_5yrs | olzB0 | quetB0 | rispB0 | zipB0 | halB0 | decaB0 | perB0 | otherB0 | DM | Lipid | HTN | missingFU.any |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 8 | 0 | 10 | 0 | 1 | 2 | 1 | 5 | 38 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Of those demographic variables which have participants missing data on these variables (i.e. non-zero in the above table):
yrsEduc did differ between groups (different dispersions, same average) so we’ll tolerate the loss of 8 participants in our overall sampleemployFT did not differ in our earlier analysis of differences between misisng groups, so we will omit this to preserve overall numbersP,N and G did not either, so we will omit this alsoCGIsev did not differ, so we’ll omityrsFirstTx of yrsFrstAntiPsyRx (years first treated for any disorder and years since 1st antipsychotic medication respectively) were different in the univariable analysis, so we’ll omitConstruct a new table for use in modelling missingness:
rm( NEJM.pred )
rm( NEJM.missing )
NEJM.missing <- missingTab
NEJM.missing$ID <- allIDs$x
# variables to use:
NEJM.pred <- c("ID", "age", "sex", "race", "yrsEduc", "marital",
"exac3mo", "Depression", "alcDep_5yrs", "alcAbuse_5yrs",
"drugDep_5yrs", "drugAbuse_5yrs", "OCD_5yrs",
"olzB0", "quetB0", "rispB0", "zipB0", "halB0", "decaB0", "perB0", "otherB0",
"DM","Lipid","HTN", "missingFU.any")
NEJM.missing <- NEJM.missing[ , NEJM.pred ]
find.missing <- apply( NEJM.missing, 2, is.na )
missing.labels <- apply( find.missing, 2, which )Now, re-construct tab.Surv and missingTab to exclude participants with missing yrsEduc
# IDs of participants missing demographics yrsEduc
remove.IDs <- NEJM.missing$ID[ which( complete.cases( NEJM.missing ) == FALSE ) ]
# locate IDs in missingTab and tab.Surv
tab.Surv <- tab.Surv[ -which( tab.Surv$ID %in% remove.IDs ), ]
# NB: we overwrite missingTab with the new set of variables to model for IPW later on
missingTab <- NEJM.missing[ -which( NEJM.missing$ID %in% remove.IDs ), ]We will use the variables now stored in missingTab of a total 1436 to inverse probability weight those participants missing TRS data.
In the missing-data analysis, we found differences between participants who were missing data at follow-up (preventing analysis for TRS status) and those that had complete data (for which we can easily establish TRS status at follow-up).
To counter bias introduced by missing data, we plan to use stabilised inverse probability (IP) weighting for cases that are missing to follow-up (MTFU). For discussion of stabilised IP weights see:
Hernán MA, Robins JM (2018). Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming, Chapter 12 (available : https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2018/02/hernanrobins_v2.17.18.pdf)
For this problem, we have:
Then, the stabilised IP weight for participant \(i\), given their baseline predictor (demographic) variables is:
\[
w_i = \frac{M_i \Pr(M_i = 1) + (1-M_i) (1-\Pr(M_i = 1)) }
{M_i \Pr( M_i = 1 | X_i ) + (1-M_i)(1-\Pr(M_i = 1 | X_i)) }
\] A model for the conditional probabilities \(\Pr(M_i = 1|X_i)\) is estimated using logistic regression glm():
# demographic variables at baseline
missingX <- names( missingTab )
# remove *outcomes* and TRS-related variables from missingX, given these are derived by our algorithm to
# implement TRRIP, we are only really interested in determining follow up missingness from CATIE demographic data
missingX <- missingX[ -which( missingX %in%
c("ID", "missingFU.rx","missingFU.sx","missingFU.sof","missingFU.any") ) ]
# store IDs
allIDs <- missingTab$ID
# build a formula
# In effect we want : missingFU.any ~ missingX
formStr <- as.formula( paste("missingFU.any", paste(missingX, collapse=" + "), sep=" ~ ") )
# fit the model
missM <- glm( formStr , family = binomial, data = missingTab )
# build a table of results
cfs <- data.frame( coef(summary(missM)) )
names( cfs ) <- c("Beta","StdError","Z","P")
# p-values
p.vals <- cfs$P
cfs$SigP <- ifelse( cfs$P < 0.05, "*", "")
cfs$OR <- exp( cfs$Beta )missM is the fitted logistic regression for baseline/demographics and the probability of having data MTFU.
Each participant’s probability \(\Pr(M_i = 1 | X_i)\) is obtained using predict() method for the GLM asking for the actual probability (type = "response")
# indicator - missingFU.any
M <- missingTab$missingFU.any
# probability of being missing at follow given demographics, X
p.M_X <- predict( missM, type = "response" )
# probability that missingFU.any == 1
p.M <- length( which( missingTab$missingFU.any == 1 ) ) / nrow( missingTab )
# compute each participant's stabilised inverse probability weight
weight.numer <- M * p.M + (1-M) * (1 - p.M)
weight.denom <- (M * p.M_X) + ( (1 - M) * ( 1 - p.M_X ) )
# build a table if IDs and corresponding IPWs
ID.IPW <- data.frame( ID = allIDs, IPW = weight.numer / weight.denom )IPWs estimated on missingTab and the corresponding IDs are allIDs – to guarantee integrity, we’ll formally join ID.IPW on ID with tab.Surv
tab.Surv <- left_join( tab.Surv, ID.IPW, by = "ID")The IPW data becomes useful later on, but we’ll park it for now.
require(survival)
require(survminer)
require(gridExtra)# extract the variables to use
curveData <- tab.Surv[ , c("time.TRSYrs", "time.inTrialYrs", "time.onset.TRSYrs", "status.TRS", "IPW")]
# -- for people *never* triggering / being at risk for TRS (by SOF or Sx)
# -- set time.TRSYrs = Inf
curveData$time.TRSYrs[ is.na( curveData$time.TRSYrs ) ] <- Inf
# -- set time.onset.TRSYrs = Inf
curveData$time.onset.TRSYrs[ is.na( curveData$time.onset.TRSYrs ) ] <- InfallPop <- curveData
# subtract off the "at risk" time from the time.inTrialYrs for the TRS cases
allPop$time.interval <- allPop$time.inTrialYrs - allPop$time.onset.TRSYrs
# whereas for the NON TRS cases, the time interval is just the time.inTrialYrs
allPop$time.interval[ allPop$time.interval < 0 ] <- allPop$time.inTrialYrs[ allPop$time.interval < 0 ]
allPop.fit <- survfit(Surv(time = time.interval,
status.TRS == 2, type = "right") ~ 1, data = allPop)
# graph this
ggsurvplot(allPop.fit, fun = "event", main = "Cumulative proportion", palette = "blue",
legend = "none",
title = paste("Whole Population =", allPop.fit$n)
)$plotWe define a “high risk” sub-population of people who (during their participation in CATIE) trigger TRS threshold for absolute symptoms severity and moderate SOF impairment. We then examine this group to see how those that progress to TRS (i.e. symptoms continue at above threshold, and who do not obtain 20% improvement despite two adequate trials).
# -- for cases NEVER reaching high risk for TRS
TRS.risk.cases <- allPop[ allPop$time.onset.TRSYrs < Inf, ]
# -- for the sub-population of those participants who at some point triggered on Sx and SOF, look at survival
TRS.risk.cases$time.TRSYrs[ which( is.na( TRS.risk.cases$time.TRSYrs ) ) ] <- Inf
# -- time.interval is the time of onset of TRS Sx and SOF to participant leaving the trial
TRS.risk.cases$time.interval <- TRS.risk.cases$time.inTrialYrs - TRS.risk.cases$time.onset.TRSYrs
TRS.KM.fit <- survfit(Surv(time = time.interval,
status.TRS == 2, type = "right") ~ 1, data = TRS.risk.cases)
ggsurvplot(TRS.KM.fit, fun = "event", main = "Cumulative proportion", palette = "red",
legend = "none",
title = paste("Sub-Population At Risk =", TRS.KM.fit$n)
)$plotAn attempt at using IPWs with the survival / event curves: the curve is not different in any interesting way. Included here for completeness and demonstration.
require(survey)
desObj.TRS <- svydesign( ids = ~ 0, weights = ~IPW, data = TRS.risk.cases )
s1<-svykm(Surv(time = time.interval, status.TRS == 2, type = "right")~1, design=desObj.TRS)
desObj.allPop <- svydesign( ids = ~ 0, weights = ~IPW, data = allPop )
s2<-svykm(Surv(time = time.interval, status.TRS == 2, type = "right")~1, design=desObj.allPop)Non-parametric estimate of the crude incidence rate for TRS by bootstrapping
require( boot )
yrsFactor <- 100
bootIR <- function(data, idx, mult) {
d <- data[idx,]
person.years <- sum( d$perYrs )
events <- sum( d$eventFlag, na.rm = TRUE)
incid.rate <- mult * ( events / person.years )
return( incid.rate )
}
# dataframe of TRS events and person years
bs.allPop <- data.frame( perYrs = allPop$time.interval,
eventFlag = ifelse( allPop$status.TRS == 2, 1, 0 ) )
bs.subPop <- data.frame( perYrs = TRS.risk.cases$time.interval,
eventFlag = ifelse( TRS.risk.cases$status.TRS == 2, 1, 0 ) )
# bootstrap incidence rate for whole population
results.allPop <- boot( data=bs.allPop, statistic=bootIR,
R=5000, mult = yrsFactor, parallel = "multicore", ncpus=4 )
# boostrap incidence rate for the sub-population of "at risk" cases
results.subPop <- boot( data=bs.subPop, statistic=bootIR,
R=5000, mult = yrsFactor, parallel = "multicore", ncpus=4 )
# get CIs
results.allPop.CI <- boot.ci(results.allPop, type = "all", parallel = "multicore", ncpus = 4)
results.subPop.CI <- boot.ci(results.subPop, type = "all", parallel = "multicore", ncpus = 4)For the whole population of 1436, we obtain a bootstrapped incidence rate of 5.43 per 100 person-years, with a 95% confidence interval of [4.29, 6.75]
For the sub-population of 740 patients who met symptom and SOF criteria, and then either responded (non-treatment resistance) or went on to develop treatment resistance after 2 adequate trials, we obtain a bootstrapped incidence rate of 9.61 per 100 person-years, with a 95% confidence interval of [7.66, 11.92]
We now explore differences in the TRS cases, compared to the population as a whole, using multiple univariate analyses to begin with:
# we'll use the NEJM paper variables (but take off the ID variable first)
predToUse <- NEJM.pred[2:length(NEJM.pred)]
tab.Surv2 <- tab.Surv[ , which( names( tab.Surv ) %in% predToUse ) ]
trs.Flag <- tab.Surv$status.TRS
# create vector of data types (numeric, ordinal/factor)
dataType <- as.numeric( unlist( lapply(tab.Surv2, function(x) if (!is.numeric(x)) NA else max(x, na.rm = TRUE))) )
# parse dataType and decide on a relevant univariate test - e.g. Chi square for categorical proportions and t and KS for continuous
testFlag <- rep(NA, length( dataType ) )
for( i in 1:length( dataType ) ) {
if( dataType[i] == 1 | is.na( dataType[i] ) ) {
# categorical (factor) or binary : proportion test - Chi-square : code = 1
testFlag[i] <- 1
} else {
if( dataType[i] > 1 ) {
# continuous, therefore KS and t-test
testFlag[i] <- 2
}
}
}
demoTable <- data.frame( Var = rep("", length( dataType ) ),
M.TRS = rep(NA, length( dataType ) ),
Spread.TRS = rep(NA, length( dataType ) ),
M.NTR = rep(NA, length( dataType ) ),
Spread.NTR = rep(NA, length( dataType ) ),
Perc.TRS = rep(NA, length( dataType ) ),
Perc.NTR = rep(NA, length( dataType ) ),
P.Value = rep(NA, length( dataType ) ),
Test.Used = rep("", length( dataType ) ),
stringsAsFactors = FALSE
)
# for all variables in testFlag ...
for( i in 1:length( testFlag ) ) {
thisVar <- names( tab.Surv2 )[i]
demoTable$Var[i] <- thisVar
# select the TRS positive cases
grpTRS <- tab.Surv2[ which( trs.Flag == 2 ), i ]
# and the non TRS (NTR) cases
grpNTR <- tab.Surv2[ which( trs.Flag != 2 ), i ]
# if this variable's testFlag == 1, it's ordinal / categorical data
if( testFlag[i] == 1 ) {
test.tab <- rbind( table( grpTRS ), table( grpNTR ) )
rownames(test.tab) <- c("TRS","NTR")
suppressWarnings(
xs <- chisq.test( test.tab )
)
demoTable$Perc.TRS[i] <- round( length( which( grpTRS == 1 ) ) / length( grpTRS ) * 100, 2 ) # proportion of 1's to 0s in subjects with present data
demoTable$Perc.NTR[i] <- round( length( which( grpNTR == 1 ) ) / length( grpNTR ) * 100, 2 )
## proportion of 1's to 0s in subjects with Missing data
demoTable$P.Value[i] <- round( xs$p.value, 3 )
demoTable$Test.Used[i] <- "ChiSq"
} else {
# This variable is continuous
# We need either KS and t-test
# test normality with Shapiro-Wilks
this.SW.test.TRS <- shapiro.test( grpTRS )
this.SW.test.NTR <- shapiro.test( grpNTR )
if( this.SW.test.TRS$p.value < 0.05 | this.SW.test.NTR$p.value < 0.05 ) {
# if p.value of either SW test < 0.05, then use KS (i.e. at least one distribution is non-normal)
suppressWarnings(
this.test <- ks.test( grpTRS, grpNTR, alternative = "two.sided" )
)
} else {
# both missing and non-missing data are normal enough
suppressWarnings(
this.test <- t.test( grpTRS, grpNTR, alternative = "two.sided" )
)
}
if( this.test$method == "Two-sample Kolmogorov-Smirnov test" ) {
# record median and IQR
demoTable$M.TRS[i] <- median( grpTRS, na.rm = TRUE )
demoTable$Spread.TRS[i] <- IQR( grpTRS, na.rm = TRUE )
demoTable$M.NTR[i] <- median( grpNTR, na.rm = TRUE )
demoTable$Spread.NTR[i] <- IQR( grpNTR, na.rm = TRUE )
demoTable$P.Value[i] <- round( this.test$p.value, 3 )
demoTable$Test.Used[i] <- "KS"
}
if( this.test$method == "Welch Two Sample t-test" ) {
# record median and IQR
demoTable$M.TRS[i] <- mean( grpTRS, na.rm = TRUE )
demoTable$Spread.TRS[i] <- sd( grpTRS, na.rm = TRUE )
demoTable$M.NTR[i] <- mean( grpNTR, na.rm = TRUE )
demoTable$Spread.NTR[i] <- sd( grpNTR, na.rm = TRUE )
demoTable$P.Value[i] <- round( this.test$p.value, 3 )
demoTable$Test.Used[i] <- "T"
}
}
}
# add a significance flag (purely cosmetic)
demoTable$signif <- ifelse( demoTable$P.Value < 0.05, "*", " " )
kable(demoTable)| Var | M.TRS | Spread.TRS | M.NTR | Spread.NTR | Perc.TRS | Perc.NTR | P.Value | Test.Used | signif |
|---|---|---|---|---|---|---|---|---|---|
| age | 42 | 15.00 | 42 | 17.0 | 0.457 | KS | |||
| sex | 76.06 | 73.56 | 0.745 | ChiSq | |||||
| exac3mo | 23.94 | 26.75 | 0.702 | ChiSq | |||||
| yrsEduc | 12 | 1.75 | 12 | 1.5 | 1.000 | KS | |||
| olzB0 | 28.17 | 26.99 | 0.935 | ChiSq | |||||
| quetB0 | 9.86 | 9.83 | 1.000 | ChiSq | |||||
| rispB0 | 23.94 | 22.50 | 0.891 | ChiSq | |||||
| zipB0 | 5.63 | 5.04 | 1.000 | ChiSq | |||||
| halB0 | 8.45 | 5.51 | 0.435 | ChiSq | |||||
| decaB0 | 0.00 | 1.65 | 0.000 | ChiSq | * | ||||
| perB0 | 1.41 | 2.20 | 0.977 | ChiSq | |||||
| otherB0 | 9.86 | 8.81 | 0.930 | ChiSq | |||||
| race | 0.00 | 0.00 | 0.325 | ChiSq | |||||
| marital | 0.00 | 0.00 | 0.908 | ChiSq | |||||
| DM | 11.27 | 10.78 | 1.000 | ChiSq | |||||
| Lipid | 14.08 | 14.08 | 1.000 | ChiSq | |||||
| HTN | 18.31 | 19.83 | 0.873 | ChiSq | |||||
| Depression | 21.13 | 28.09 | 0.255 | ChiSq | |||||
| alcDep_5yrs | 14.08 | 13.69 | 1.000 | ChiSq | |||||
| alcAbuse_5yrs | 12.68 | 18.02 | 0.324 | ChiSq | |||||
| drugDep_5yrs | 14.08 | 16.29 | 0.745 | ChiSq | |||||
| drugAbuse_5yrs | 14.08 | 20.46 | 0.250 | ChiSq | |||||
| OCD_5yrs | 4.23 | 5.19 | 0.934 | ChiSq |
Total Participants retrieved from CATIE dataset = 1436
Sub-populations:
domainTotals <- table( tab.Surv[ which( tab.Surv$status.TRS == 2 ), which( names( tab.Surv ) %in% c("TRS.pos","TRS.neg") ) ] )For participants meeting TRS criteria, we find that :
We now model the associations of demographics and baseline clinical state with the TRS outcome. More broadly, we are interested if we can predict TRS propsectively using baseline information, and we start with a classical inferential analysis
We start from the expectation this is unlikely to provide useful information – if a participant never “triggers” on TRS Sx and SOF criteria, then it’s unlikely that a collection of baseline, demographic and clinical state information is going allow reliable inferences on future TRS likelihood.
We will use the original collated data (tab.Surv) rather than that cleaned for survival analyses.
assocX.allPop <- tab.Surv
excludeCols <- c("ID", "time.inTrial", "onset.Sx", "onset.SOF",
"onset.Rx", "time.onset.TRS", "time.TRS", "numAdeq",
"durAdeq", "totalRx", "status.rx", "status.sx", "status.sof",
"TRS.pos", "TRS.neg", "missing.Sx", "missing.Rx",
"missing.SOF", "time.inTrialYrs", "time.onset.TRSYrs",
"time.TRSYrs", "status.TRS", "IPW")
indepVars <- names( assocX.allPop )[ -which( names( assocX.allPop ) %in% excludeCols ) ]
# recode status.TRS to be 0 = non-TRS and 1 = TRS
assocX.allPop$status.TRS <- ifelse( assocX.allPop$status.TRS == 2, 1, 0 )
formStr <- as.formula( paste("status.TRS", paste(indepVars, collapse=" + "), sep=" ~ ") )
# model
assoc.fit <- glm( formStr, family = binomial, data = assocX.allPop )## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
We see that some variables have very large standard errors, particularly Osteopor, STI and decaB0 (being on decanoate depot at baseline) – all of these variables have very few cases (10, 11 and 22 respectively).
Retry with these removed:
indepVars <- indepVars[ -which( indepVars %in% c("Osteopor","STI","decaB0") ) ]
formStr <- as.formula( paste("status.TRS", paste(indepVars, collapse=" + "), sep=" ~ ") )
assoc.fit.allPop <- glm( formStr, family = binomial, data = assocX.allPop )
CIs.allPop <- confint( assoc.fit.allPop )
cfs.allPop <- data.frame( coef(summary(assoc.fit.allPop)) )
names( cfs.allPop ) <- c("Beta","StdError","Z","P")
# p-values
p.vals <- round( cfs.allPop$P, 3 )
cfs.allPop$SigP <- ifelse( cfs.allPop$P < 0.05, "*", "")
cfs.allPop$OR <- round( exp( cfs.allPop$Beta ), 2 )
cfs.allPop$CI_Lower <- round( exp( CIs.allPop[,1] ), 2 )
cfs.allPop$CI_Upper <- round( exp( CIs.allPop[,2] ), 2 )
kable( cfs.allPop )| Beta | StdError | Z | P | SigP | OR | CI_Lower | CI_Upper | |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | -3.8560527 | 1.1622044 | -3.3178782 | 0.0009070 | * | 0.02 | 0.00 | 0.20 |
| P | 0.0325661 | 0.0311378 | 1.0458692 | 0.2956214 | 1.03 | 0.97 | 1.10 | |
| N | 0.0584261 | 0.0245950 | 2.3755252 | 0.0175240 | * | 1.06 | 1.01 | 1.11 |
| G | -0.0293332 | 0.0206176 | -1.4227270 | 0.1548153 | 0.97 | 0.93 | 1.01 | |
| age | -0.0374996 | 0.0186209 | -2.0138469 | 0.0440256 | * | 0.96 | 0.93 | 1.00 |
| sex | 0.1226414 | 0.3361149 | 0.3648794 | 0.7152014 | 1.13 | 0.60 | 2.25 | |
| exac3mo | -0.4274849 | 0.3203560 | -1.3344057 | 0.1820709 | 0.65 | 0.34 | 1.19 | |
| yrsEduc | 0.0279642 | 0.0391961 | 0.7134437 | 0.4755712 | 1.03 | 0.96 | 1.12 | |
| CGIsev | 0.2608167 | 0.1730918 | 1.5068110 | 0.1318591 | 1.30 | 0.92 | 1.82 | |
| employFT | -0.8836733 | 0.7504795 | -1.1774783 | 0.2390047 | 0.41 | 0.07 | 1.44 | |
| yrsFirstTx | 0.0330714 | 0.0220781 | 1.4979272 | 0.1341522 | 1.03 | 0.99 | 1.08 | |
| yrsFrstAntiPsyRx | -0.0262341 | 0.0231123 | -1.1350691 | 0.2563464 | 0.97 | 0.93 | 1.02 | |
| olzB0 | 0.1051939 | 0.3249431 | 0.3237302 | 0.7461423 | 1.11 | 0.58 | 2.09 | |
| quetB0 | -0.3498919 | 0.5032585 | -0.6952529 | 0.4868968 | 0.70 | 0.23 | 1.74 | |
| rispB0 | 0.0895717 | 0.3367585 | 0.2659821 | 0.7902530 | 1.09 | 0.56 | 2.09 | |
| zipB0 | 0.1117390 | 0.5778191 | 0.1933807 | 0.8466609 | 1.12 | 0.31 | 3.16 | |
| halB0 | 0.0309962 | 0.5623595 | 0.0551182 | 0.9560443 | 1.03 | 0.29 | 2.80 | |
| perB0 | -0.4211818 | 1.0651089 | -0.3954355 | 0.6925215 | 0.66 | 0.04 | 3.52 | |
| otherB0 | 0.1355289 | 0.4486646 | 0.3020717 | 0.7625974 | 1.15 | 0.44 | 2.61 | |
| raceHispanic | 0.6264134 | 0.4275941 | 1.4649719 | 0.1429286 | 1.87 | 0.79 | 4.29 | |
| raceOther | -0.9650276 | 1.0560850 | -0.9137784 | 0.3608333 | 0.38 | 0.02 | 2.01 | |
| raceWhite | 0.4904827 | 0.3322473 | 1.4762580 | 0.1398747 | 1.63 | 0.86 | 3.20 | |
| maritalNeverMarried | -0.1771684 | 0.4522228 | -0.3917723 | 0.6952264 | 0.84 | 0.36 | 2.19 | |
| maritalPrevMarried | 0.0112907 | 0.4763539 | 0.0237024 | 0.9810900 | 1.01 | 0.41 | 2.74 | |
| COPD | -0.1976583 | 1.0588191 | -0.1866781 | 0.8519130 | 0.82 | 0.04 | 4.36 | |
| DM | 0.3424323 | 0.4532572 | 0.7554923 | 0.4499536 | 1.41 | 0.55 | 3.29 | |
| HepABC | -0.0100721 | 0.6371844 | -0.0158072 | 0.9873882 | 0.99 | 0.23 | 3.00 | |
| Lipid | 0.1699361 | 0.3902825 | 0.4354182 | 0.6632589 | 1.19 | 0.52 | 2.46 | |
| HTN | 0.0689262 | 0.3880988 | 0.1775996 | 0.8590374 | 1.07 | 0.48 | 2.24 | |
| IHD | 1.1168385 | 1.1305364 | 0.9878837 | 0.3232096 | 3.06 | 0.15 | 19.78 | |
| OsteoArth | -0.4175183 | 0.7720972 | -0.5407587 | 0.5886739 | 0.66 | 0.10 | 2.42 | |
| Depression | -0.4324858 | 0.3345832 | -1.2926108 | 0.1961457 | 0.65 | 0.33 | 1.22 | |
| alcDep_5yrs | 0.4178658 | 0.4390143 | 0.9518273 | 0.3411846 | 1.52 | 0.62 | 3.50 | |
| alcAbuse_5yrs | -0.3254903 | 0.4396680 | -0.7403093 | 0.4591124 | 0.72 | 0.29 | 1.64 | |
| drugDep_5yrs | 0.0631063 | 0.4173850 | 0.1511943 | 0.8798224 | 1.07 | 0.45 | 2.33 | |
| drugAbuse_5yrs | -0.3920259 | 0.4063443 | -0.9647630 | 0.3346635 | 0.68 | 0.29 | 1.44 | |
| OCD_5yrs | -0.1262051 | 0.6525960 | -0.1933893 | 0.8466541 | 0.88 | 0.20 | 2.76 | |
| anxDis_5yrs | 0.0501536 | 0.4146300 | 0.1209598 | 0.9037229 | 1.05 | 0.44 | 2.27 |
We repeat the analysis in the sub-population who triggered on symptoms and SOF impairment.
# subset assocX.TRSpop for only sub-population at risk
assocX.TRSpop <- tab.Surv[ which( !is.na( tab.Surv$time.onset.TRSYrs ) ), ]
# recode status.TRS to be 0 = non-TRS and 1 = TRS
assocX.TRSpop$status.TRS <- ifelse( assocX.TRSpop$status.TRS == 2, 1, 0 )
formStr <- as.formula( paste("status.TRS", paste(indepVars, collapse=" + "), sep=" ~ ") )
# model
assoc.fit.TRSpop <- glm( formStr, family = binomial, data = assocX.TRSpop )
CIs.TRSpop <- confint( assoc.fit.TRSpop )## Waiting for profiling to be done...
cfs.TRSpop <- data.frame( coef(summary(assoc.fit.TRSpop)) )
names( cfs.TRSpop ) <- c("Beta","StdError","Z","P")
# p-values
p.vals <- round( cfs.TRSpop$P, 3 )
cfs.TRSpop$SigP <- ifelse( cfs.TRSpop$P < 0.05, "*", "")
cfs.TRSpop$OR <- round( exp( cfs.TRSpop$Beta ), 2 )
cfs.TRSpop$CI_Lower <- round( exp( CIs.TRSpop[,1] ), 2 )
cfs.TRSpop$CI_Upper <- round( exp( CIs.TRSpop[,2] ), 2 )
kable( cfs.TRSpop )| Beta | StdError | Z | P | SigP | OR | CI_Lower | CI_Upper | |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | -1.7359420 | 1.2609199 | -1.3767266 | 0.1685968 | 0.18 | 0.01 | 2.04 | |
| P | 0.0248019 | 0.0317902 | 0.7801755 | 0.4352876 | 1.03 | 0.96 | 1.09 | |
| N | 0.0432919 | 0.0259977 | 1.6652188 | 0.0958691 | 1.04 | 0.99 | 1.10 | |
| G | -0.0322634 | 0.0211672 | -1.5242119 | 0.1274558 | 0.97 | 0.93 | 1.01 | |
| age | -0.0491597 | 0.0196928 | -2.4963238 | 0.0125488 | * | 0.95 | 0.91 | 0.99 |
| sex | -0.0849137 | 0.3521116 | -0.2411555 | 0.8094346 | 0.92 | 0.47 | 1.89 | |
| exac3mo | -0.3369556 | 0.3288105 | -1.0247714 | 0.3054710 | 0.71 | 0.36 | 1.33 | |
| yrsEduc | 0.0389660 | 0.0405255 | 0.9615180 | 0.3362918 | 1.04 | 0.96 | 1.13 | |
| CGIsev | 0.1794081 | 0.1792693 | 1.0007743 | 0.3169359 | 1.20 | 0.84 | 1.71 | |
| employFT | -0.6275612 | 0.7706953 | -0.8142792 | 0.4154850 | 0.53 | 0.08 | 1.96 | |
| yrsFirstTx | 0.0337664 | 0.0232139 | 1.4545743 | 0.1457872 | 1.03 | 0.99 | 1.08 | |
| yrsFrstAntiPsyRx | -0.0271606 | 0.0240550 | -1.1291028 | 0.2588545 | 0.97 | 0.93 | 1.02 | |
| olzB0 | 0.0114301 | 0.3398664 | 0.0336311 | 0.9731713 | 1.01 | 0.51 | 1.96 | |
| quetB0 | -0.3651252 | 0.5126715 | -0.7122010 | 0.4763403 | 0.69 | 0.23 | 1.75 | |
| rispB0 | 0.0992386 | 0.3488764 | 0.2844519 | 0.7760641 | 1.10 | 0.55 | 2.17 | |
| zipB0 | -0.1085051 | 0.6024496 | -0.1801065 | 0.8570689 | 0.90 | 0.24 | 2.68 | |
| halB0 | -0.0090881 | 0.5725875 | -0.0158720 | 0.9873365 | 0.99 | 0.28 | 2.76 | |
| perB0 | -0.4801902 | 1.0949996 | -0.4385300 | 0.6610021 | 0.62 | 0.03 | 3.62 | |
| otherB0 | -0.0983109 | 0.4555132 | -0.2158244 | 0.8291246 | 0.91 | 0.34 | 2.10 | |
| raceHispanic | 0.7226005 | 0.4457252 | 1.6211794 | 0.1049792 | 2.06 | 0.84 | 4.90 | |
| raceOther | -1.0809395 | 1.0706292 | -1.0096301 | 0.3126725 | 0.34 | 0.02 | 1.87 | |
| raceWhite | 0.4726302 | 0.3449156 | 1.3702779 | 0.1706002 | 1.60 | 0.83 | 3.22 | |
| maritalNeverMarried | -0.2101868 | 0.4697986 | -0.4473976 | 0.6545880 | 0.81 | 0.34 | 2.18 | |
| maritalPrevMarried | -0.0167125 | 0.4991615 | -0.0334812 | 0.9732908 | 0.98 | 0.38 | 2.77 | |
| COPD | -0.0074840 | 1.0981369 | -0.0068152 | 0.9945623 | 0.99 | 0.05 | 5.93 | |
| DM | 0.1928834 | 0.4597671 | 0.4195241 | 0.6748331 | 1.21 | 0.47 | 2.88 | |
| HepABC | -0.2061348 | 0.6445889 | -0.3197927 | 0.7491255 | 0.81 | 0.18 | 2.51 | |
| Lipid | 0.0425896 | 0.3968661 | 0.1073149 | 0.9145391 | 1.04 | 0.46 | 2.20 | |
| HTN | 0.1919175 | 0.4025883 | 0.4767092 | 0.6335692 | 1.21 | 0.53 | 2.62 | |
| IHD | 1.1037565 | 1.2020118 | 0.9182577 | 0.3584840 | 3.02 | 0.14 | 24.30 | |
| OsteoArth | -0.6030687 | 0.7910060 | -0.7624073 | 0.4458169 | 0.55 | 0.08 | 2.10 | |
| Depression | -0.4481845 | 0.3467715 | -1.2924489 | 0.1962017 | 0.64 | 0.31 | 1.23 | |
| alcDep_5yrs | 0.5231161 | 0.4628422 | 1.1302255 | 0.2583812 | 1.69 | 0.66 | 4.10 | |
| alcAbuse_5yrs | -0.2885550 | 0.4400729 | -0.6556983 | 0.5120183 | 0.75 | 0.30 | 1.71 | |
| drugDep_5yrs | 0.0321603 | 0.4366374 | 0.0736545 | 0.9412853 | 1.03 | 0.42 | 2.36 | |
| drugAbuse_5yrs | -0.2941685 | 0.4146488 | -0.7094402 | 0.4780513 | 0.75 | 0.32 | 1.62 | |
| OCD_5yrs | -0.3350043 | 0.6721006 | -0.4984436 | 0.6181714 | 0.72 | 0.16 | 2.34 | |
| anxDis_5yrs | -0.0696678 | 0.4256441 | -0.1636762 | 0.8699861 | 0.93 | 0.38 | 2.06 |
kable( cfs.allPop[ which( cfs.allPop$P < 0.05 ), ] )| Beta | StdError | Z | P | SigP | OR | CI_Lower | CI_Upper | |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | -3.8560527 | 1.1622044 | -3.317878 | 0.0009070 | * | 0.02 | 0.00 | 0.20 |
| N | 0.0584261 | 0.0245950 | 2.375525 | 0.0175240 | * | 1.06 | 1.01 | 1.11 |
| age | -0.0374996 | 0.0186209 | -2.013847 | 0.0440256 | * | 0.96 | 0.93 | 1.00 |
kable( cfs.TRSpop[ which( cfs.TRSpop$P < 0.05 ), ] ) | Beta | StdError | Z | P | SigP | OR | CI_Lower | CI_Upper | |
|---|---|---|---|---|---|---|---|---|
| age | -0.0491597 | 0.0196928 | -2.496324 | 0.0125488 | * | 0.95 | 0.91 | 0.99 |
We computed IPWs for missingness, so we use the survey package to investigate if this makes any difference to findings:
# for entire population
desObj.allPop <- svydesign( ids = ~ 0, weights = ~IPW, data = assocX.allPop )
IPW.fit.allPop <- svyglm( formStr, design=desObj.allPop, family = quasibinomial(link = "logit"))
CIs.IPW.allPop <- confint( IPW.fit.allPop )
cfs.IPW.allPop <- data.frame( coef(summary(IPW.fit.allPop)) )
names( cfs.IPW.allPop ) <- c("Beta","StdError","Z","P")
# p-values
p.vals <- round( cfs.IPW.allPop$P, 3 )
cfs.IPW.allPop$SigP <- ifelse( cfs.IPW.allPop$P < 0.05, "*", "")
cfs.IPW.allPop$OR <- round( exp( cfs.IPW.allPop$Beta ), 2 )
cfs.IPW.allPop$CI_Lower <- round( exp( CIs.IPW.allPop[,1] ), 2 )
cfs.IPW.allPop$CI_Upper <- round( exp( CIs.IPW.allPop[,2] ), 2 )
# for subpopulation at risk for TRS (by SOF and Sx criteria)
desObj.TRS <- svydesign( ids = ~ 0, weights = ~IPW, data = assocX.TRSpop )
IPW.fit.TRS <- svyglm( formStr, design=desObj.TRS, family = quasibinomial(link = "logit"))
CIs.IPW.TRS <- confint( IPW.fit.TRS )
cfs.IPW.TRSpop <- data.frame( coef(summary(IPW.fit.TRS)) )
names( cfs.IPW.TRSpop ) <- c("Beta","StdError","Z","P")
# p-values
p.vals <- round( cfs.IPW.TRSpop$P, 3 )
cfs.IPW.TRSpop$SigP <- ifelse( cfs.IPW.TRSpop$P < 0.05, "*", "")
cfs.IPW.TRSpop$OR <- round( exp( cfs.IPW.TRSpop$Beta ), 2 )
cfs.IPW.TRSpop$CI_Lower <- round( exp( CIs.IPW.TRS[,1] ), 2 )
cfs.IPW.TRSpop$CI_Upper <- round( exp( CIs.IPW.TRS[,2] ), 2 )And a summary of the statistically significant results to compare with regular GLMs:
First, for the whole population:
kable( cfs.IPW.allPop[ which( cfs.IPW.allPop$P < 0.05 ), ] )| Beta | StdError | Z | P | SigP | OR | CI_Lower | CI_Upper | |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | -3.9667841 | 1.055150 | -3.759450 | 0.0001782 | * | 0.02 | 0.00 | 0.15 |
| N | 0.0551971 | 0.023671 | 2.331846 | 0.0198675 | * | 1.06 | 1.01 | 1.11 |
And the at risk of TRS sub-population:
kable( cfs.IPW.TRSpop[ which( cfs.IPW.TRSpop$P < 0.05 ), ] )| Beta | StdError | Z | P | SigP | OR | CI_Lower | CI_Upper | |
|---|---|---|---|---|---|---|---|---|
| age | -0.0455579 | 0.0210663 | -2.1626 | 0.0309256 | * | 0.96 | 0.92 | 1 |
Multiple univariate analyses for differences in baseline and demographic data for the TRS versus treatment responsive cases showed that the only difference was in the proportion of participants on depot medication at baseline.
There were 22 participants on decanoate at baseline or which 0 went on to be in the TRS group.
With IPW weighting, multiple-variable GLMs for associations of baseline and demographic data with TRS cases in the whole population showed only very small effects which are unlikely to be clinically meaningful:
In the whole population analyses, higher negative symptoms were weakly associated 95% CI = [1.01 , 1.11] with the probability of becoming TRS
In the at-risk population, younger age was associated with a small reduction in the probability of transitioning to TRS with a 95% CI = [0.92 , 1.09]