1 Setup

See CATIE TRS: Missing Data Analysis - v3.0 for loss to follow-up analyses, that finishes by storing survival_tabulated_data_v3.RData for use in this notebook.

testTable : exposing differences in baseline data for missing-to-follow-up and the ancillary prop.race and prop.marital for the multi-level baseline variables race and marital.
allIDs is to verify the integrity or row alignment of missingTab with tab.Surv
tab.Surv is the table of data used for survival analyses / descriptives
missingTab is tab.Surv formatted for missing data analyses

This notebook details how to analyse tab.Surv for survival curves, incidence rate estimation and accounting for missing data at baseline.

rm( list = ls() )
require(dplyr)
require(knitr)
require(kableExtra)
options(knitr.kable.NA = '')
require(reshape2)
load("survival_tabulated_data_v3.RData")

2 Data Dictionary for `tab.Surv`

The data.frame tab.Surv data dictionary is a little complex, so there details are:

Identifiers:

ID - Participant ID number

Timing:

time.inTrial - Total time in the trial – this will be the contributed person years for this participant
time.inTrialYrs = time.inTrial / 365 (in years)
Time of “onset” of individual criteria including if a participant ever meets individual criteria
- If NA the participant never met Sx, SOF or Rx criteria
- Otherwise, time they met criteria in days
- onset.Sx : the day absolute symptoms triggered above threshold (this does not mean the change in symptoms meets TRS criteria < 20%)
- onset.SOF : the day SOF triggered above threshold
- onset.Rx : the day adequate trials triggered threshold of \(\geq 2\)
Timing of “onsets” for all three TRS criteria
- If NA the participant never met the criteria, otherwise, times are in days
- time.onset.TRS : the time at which Sx and SOF triggered above threshold, denoting the baseline for which PANSS is assessed
- time.onset.TRSYrs is time.onset.TRS in years
- time.TRS : the time confirmed TRS (if at all) = some day after time.onset.TRS and adequate treatment condition met.
- Note, at the time onset.Rx (i.e. when adequate treatments \(\geq 2\)), we then check the corresponding Sx and SOF for response, and if none (by the criteria in TRRIP) we label this as the time TRS event happens
- time.TRSYrs = time.TRS in years

Summary of treatments:

numAdeq : the number of adequate trials (by TRRIP criteria on dose, duration, concordance)
- durAdeq : the total duration of adequate trials
- totalRx : total number of drug treatments, whether adequate or not
Note, if any are NA then this data is missing / could not be established from public CATIE datasets

Summary of “Caseness”: (independent of time)

status.rx : the participant did at some point meet TRS criteria for adequate treatments
status.sx : the participant did at some point meet TRS criteria for symptom persistance
status.sof : the participant did at some point meet TRS criteria for SOF
All above are 0/1, with 1 meaning positive, 0 negative
status.TRS : overall, flag for TRS caseness where 1 = censored (did not convert) 2 = converted to TRS

TRS domains: (Only for those participants reaching full TRS criteria with status.TRS = 2)

TRS.pos : treatment resistance in the positive domain
TRS.neg : treatment resistance in the negative domain

Missing Data Flags: (missing enough data that follow-up cannot be performed)

If participants have less than 1 record or data point for symptom, SOF, or treatment record, then these = 1, else 0
missing.Sx : less than 1 record for symptom trajectory
missing.Rx : less than 1 record for treatments trajectory
missing.SOF : less than 1 record for SOF trajectory

3 Modelling Missing Baseline Data

Total participants in tab.Surv retrieved from CATIE data = 1444

From missingDataAnalysis_v3_0.Rmd – we know that the following variables differed between those missing data for TRS assessment and those with data present:

kable( testTable$Var[ which( testTable$P.Value < 0.05 ) ] )

x
exac3mo
yrsEduc
olzB0
otherB0
race
Lipid
STI
alcDep_5yrs
alcAbuse_5yrs
drugDep_5yrs
time.inTrialYrs

exac3months (exacerbation in 3 months prior to start of the trial) would reasonably predict missingness
yrsEduc (years of education) similarly, would be reasonbly expected to predict missingness
olzB0 (on olanzapine at baseline) and otherB0 (taking an antipsychotic medication not used in CATIE) suggests that baseline medication might be relevant
race would be a standard demographic to include, and was different between groups
Lipid (having high lipids) is likely because there were more patients on olanzapine in those not missing data than those missing, so raised lipids (also high in those not missing data) makes sense
STI (having a sexually transmitted infection) is less interesting and not clearly explicable for being more prevalent in those missing data
alcDep_5yrs, alcAbuse_5yrs and drugDep_5yrs all point toward substance use, which would be expected to influence missingness

The original CATIE paper

Lieberman, J. A., Stroup, T. S., McEvoy, J. P., Swartz, M. S., Rosenheck, R. A., Perkins, D. O., … Hsiao, J. K. (2005). Effectiveness of Antipsychotic Drugs in Patients with Chronic Schizophrenia. New England Journal of Medicine, 353(12), 1209–1223.

identifies baseline/demographic for the population. We use these as the basis for building a model of missingness.

NEJM.pred <- c("age", "sex", "race", "yrsEduc", "marital", "employFT", "exac3mo", "P", "N", "G",
               "CGIsev", "yrsFirstTx", "yrsFrstAntiPsyRx", "Depression", "alcDep_5yrs", "alcAbuse_5yrs",
               "drugDep_5yrs", "drugAbuse_5yrs", "OCD_5yrs", 
               "olzB0", "quetB0", "rispB0", "zipB0", "halB0", "decaB0", "perB0", "otherB0",
               "DM","Lipid","HTN", "missingFU.any")

Resulting in 98 participants for which one of these variables is not available. We’ll refer back to the multiple univariate analysis to see if these differed between missing / non-missing groups to justify dropping them for further missing-ness analyses:

NEJM.missing <- missingTab[ , NEJM.pred ]

find.missing <- apply( NEJM.missing, 2, is.na )
missing.labels <- apply( find.missing, 2, which )

# number missing each variable
kable( t( lapply( missing.labels, length ) ) )

age	sex	race	yrsEduc	marital	employFT	exac3mo	P	N	G	CGIsev	yrsFirstTx	yrsFrstAntiPsyRx	Depression	alcDep_5yrs	alcAbuse_5yrs	drugDep_5yrs	drugAbuse_5yrs	OCD_5yrs	olzB0	quetB0	rispB0	zipB0	halB0	decaB0	perB0	otherB0	DM	Lipid	HTN	missingFU.any
0	0	0	8	0	10	0	1	2	1	5	38	33	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Of those demographic variables which have participants missing data on these variables (i.e. non-zero in the above table):

yrsEduc did differ between groups (different dispersions, same average) so we’ll tolerate the loss of 8 participants in our overall sample
employFT did not differ in our earlier analysis of differences between misisng groups, so we will omit this to preserve overall numbers
symptoms P,N and G did not either, so we will omit this also
CGIsev did not differ, so we’ll omit
neither yrsFirstTx of yrsFrstAntiPsyRx (years first treated for any disorder and years since 1st antipsychotic medication respectively) were different in the univariable analysis, so we’ll omit

Construct a new table for use in modelling missingness:

rm( NEJM.pred )
rm( NEJM.missing )

NEJM.missing <- missingTab
NEJM.missing$ID <- allIDs$x

# variables to use:
NEJM.pred <- c("ID", "age", "sex", "race", "yrsEduc", "marital", 
               "exac3mo", "Depression", "alcDep_5yrs", "alcAbuse_5yrs",
               "drugDep_5yrs", "drugAbuse_5yrs", "OCD_5yrs", 
               "olzB0", "quetB0", "rispB0", "zipB0", "halB0", "decaB0", "perB0", "otherB0",
               "DM","Lipid","HTN", "missingFU.any")


NEJM.missing <- NEJM.missing[ , NEJM.pred ]
find.missing <- apply( NEJM.missing, 2, is.na )
missing.labels <- apply( find.missing, 2, which )

Now, re-construct tab.Surv and missingTab to exclude participants with missing yrsEduc

# IDs of participants missing demographics yrsEduc
remove.IDs <- NEJM.missing$ID[ which( complete.cases( NEJM.missing ) == FALSE ) ]

# locate IDs in missingTab and tab.Surv
tab.Surv   <- tab.Surv[ -which( tab.Surv$ID  %in% remove.IDs  ),  ]
# NB: we overwrite missingTab with the new set of variables to model for IPW later on 
missingTab <- NEJM.missing[ -which( NEJM.missing$ID %in% remove.IDs ), ]

We will use the variables now stored in missingTab of a total 1436 to inverse probability weight those participants missing TRS data.

4 Estimating Inverse Probability Weights for Missing-To-Follow-Up Cases

In the missing-data analysis, we found differences between participants who were missing data at follow-up (preventing analysis for TRS status) and those that had complete data (for which we can easily establish TRS status at follow-up).

To counter bias introduced by missing data, we plan to use stabilised inverse probability (IP) weighting for cases that are missing to follow-up (MTFU). For discussion of stabilised IP weights see:

Hernán MA, Robins JM (2018). Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming, Chapter 12 (available : https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2018/02/hernanrobins_v2.17.18.pdf)

For this problem, we have:

Let \(M_i = 1\) if participant \(i\) is missing data for any of the TRRIP symptoms, adequate treatment or social/occupational functioning (SOF) criteria. Conversely, \(M_i = 0\) if the data is complete at follow-up which means we can assess the participant for treatment resistance status (TRS)
If \(\Pr( M_i = 1 )\) be the probability that a participant is missing follow-up data then \(\Pr( M_i = 0 ) = 1 - \Pr( M_i = 1 )\) is the probability of not missing data.
The probability of participant \(i\) being missing given the baseline demographic variables \(X_i\) is \(\Pr(M_i = 1 | X_i )\) and conversely, the probability of not being missing given baseline demographics is \(1-\Pr(M_i = 1 | X_i )\)

Then, the stabilised IP weight for participant \(i\), given their baseline predictor (demographic) variables is:

\[ w_i = \frac{M_i \Pr(M_i = 1) + (1-M_i) (1-\Pr(M_i = 1)) } {M_i \Pr( M_i = 1 | X_i ) + (1-M_i)(1-\Pr(M_i = 1 | X_i)) } \] A model for the conditional probabilities \(\Pr(M_i = 1|X_i)\) is estimated using logistic regression glm():

# demographic variables at baseline
missingX <- names( missingTab )

# remove *outcomes* and TRS-related variables from missingX, given these are derived by our algorithm to
# implement TRRIP, we are only really interested in determining follow up missingness from CATIE demographic data
missingX <- missingX[ -which( missingX %in% 
                                 c("ID", "missingFU.rx","missingFU.sx","missingFU.sof","missingFU.any") ) ]

# store IDs
allIDs <- missingTab$ID

# build a formula 
# In effect we want : missingFU.any ~ missingX
formStr <- as.formula( paste("missingFU.any", paste(missingX, collapse=" + "), sep=" ~ ") )

# fit the model
missM <- glm( formStr , family = binomial, data = missingTab )

# build a table of results
  cfs    <- data.frame( coef(summary(missM)) )
  names( cfs ) <- c("Beta","StdError","Z","P")
  # p-values
  p.vals      <- cfs$P
  cfs$SigP    <- ifelse( cfs$P < 0.05, "*", "")
  cfs$OR      <- exp( cfs$Beta )

missM is the fitted logistic regression for baseline/demographics and the probability of having data MTFU.

Each participant’s probability \(\Pr(M_i = 1 | X_i)\) is obtained using predict() method for the GLM asking for the actual probability (type = "response")

# indicator - missingFU.any
M   <- missingTab$missingFU.any
# probability of being missing at follow given demographics, X
p.M_X <- predict( missM, type = "response" )
# probability that missingFU.any == 1
p.M   <- length( which( missingTab$missingFU.any == 1 ) ) / nrow( missingTab )
# compute each participant's stabilised inverse probability weight
weight.numer <- M * p.M + (1-M) * (1 - p.M)
weight.denom <- (M * p.M_X) + ( (1 - M) * ( 1 - p.M_X ) )

# build a table if IDs and corresponding IPWs
ID.IPW <-  data.frame( ID = allIDs, IPW = weight.numer / weight.denom )

IPWs estimated on missingTab and the corresponding IDs are allIDs – to guarantee integrity, we’ll formally join ID.IPW on ID with tab.Surv

tab.Surv <- left_join( tab.Surv, ID.IPW, by = "ID")

The IPW data becomes useful later on, but we’ll park it for now.

5 Survival Estimates

require(survival)
require(survminer)
require(gridExtra)

5.1 Cumulative Event Curves

# extract the variables to use
curveData <- tab.Surv[ , c("time.TRSYrs", "time.inTrialYrs", "time.onset.TRSYrs", "status.TRS", "IPW")]

# -- for people *never* triggering / being at risk for TRS (by SOF or Sx) 
  # -- set time.TRSYrs = Inf
  curveData$time.TRSYrs[ is.na( curveData$time.TRSYrs ) ] <- Inf
  # -- set time.onset.TRSYrs = Inf
  curveData$time.onset.TRSYrs[ is.na( curveData$time.onset.TRSYrs ) ] <- Inf

5.1.1 For Whole Population

allPop <- curveData

# subtract off the "at risk" time from the time.inTrialYrs for the TRS cases
allPop$time.interval <- allPop$time.inTrialYrs - allPop$time.onset.TRSYrs

# whereas for the NON TRS cases, the time interval is just the time.inTrialYrs
allPop$time.interval[ allPop$time.interval < 0 ] <- allPop$time.inTrialYrs[ allPop$time.interval < 0 ]

allPop.fit  <- survfit(Surv(time = time.interval,
                             status.TRS == 2, type = "right") ~ 1, data = allPop)

# graph this
ggsurvplot(allPop.fit, fun = "event", main = "Cumulative proportion", palette = "blue",
           legend = "none",
           title = paste("Whole Population =", allPop.fit$n)
           )$plot

5.1.2 At-Risk for TRS Cases

We define a “high risk” sub-population of people who (during their participation in CATIE) trigger TRS threshold for absolute symptoms severity and moderate SOF impairment. We then examine this group to see how those that progress to TRS (i.e. symptoms continue at above threshold, and who do not obtain 20% improvement despite two adequate trials).

# -- for cases NEVER reaching high risk for TRS
TRS.risk.cases <- allPop[ allPop$time.onset.TRSYrs < Inf, ]

# -- for the sub-population of those participants who at some point triggered on Sx and SOF, look at survival
TRS.risk.cases$time.TRSYrs[ which( is.na( TRS.risk.cases$time.TRSYrs ) ) ] <- Inf

# -- time.interval is the time of onset of TRS Sx and SOF to participant leaving the trial
TRS.risk.cases$time.interval <- TRS.risk.cases$time.inTrialYrs - TRS.risk.cases$time.onset.TRSYrs

TRS.KM.fit  <- survfit(Surv(time = time.interval,
                            status.TRS == 2, type = "right") ~ 1, data = TRS.risk.cases)

ggsurvplot(TRS.KM.fit, fun = "event", main = "Cumulative proportion", palette = "red",
           legend = "none",
           title = paste("Sub-Population At Risk =", TRS.KM.fit$n)
           )$plot

5.2 Addendum

An attempt at using IPWs with the survival / event curves: the curve is not different in any interesting way. Included here for completeness and demonstration.

require(survey)
desObj.TRS <- svydesign( ids = ~ 0, weights = ~IPW, data = TRS.risk.cases )
s1<-svykm(Surv(time = time.interval, status.TRS == 2, type = "right")~1, design=desObj.TRS)

desObj.allPop <- svydesign( ids = ~ 0, weights = ~IPW, data = allPop )
s2<-svykm(Surv(time = time.interval, status.TRS == 2, type = "right")~1, design=desObj.allPop)

5.3 Crude Incidence Rates

Non-parametric estimate of the crude incidence rate for TRS by bootstrapping

require( boot )
yrsFactor <- 100

bootIR <- function(data, idx, mult) {
  d            <- data[idx,] 
  person.years <- sum( d$perYrs )
  events       <- sum( d$eventFlag, na.rm = TRUE)
  incid.rate   <- mult * ( events / person.years )
  return( incid.rate )
} 

# dataframe of TRS events and person years
bs.allPop <- data.frame(  perYrs = allPop$time.interval,
                          eventFlag = ifelse( allPop$status.TRS == 2, 1, 0 ) )

bs.subPop <- data.frame(  perYrs = TRS.risk.cases$time.interval,
                          eventFlag = ifelse( TRS.risk.cases$status.TRS == 2, 1, 0 ) )

# bootstrap incidence rate for whole population
results.allPop <- boot( data=bs.allPop, statistic=bootIR,
                        R=5000, mult = yrsFactor, parallel = "multicore", ncpus=4 )

# boostrap incidence rate for the sub-population of "at risk" cases
results.subPop <- boot( data=bs.subPop, statistic=bootIR,
                        R=5000, mult = yrsFactor, parallel = "multicore", ncpus=4 )


# get CIs 
results.allPop.CI <- boot.ci(results.allPop, type = "all", parallel = "multicore", ncpus = 4)
results.subPop.CI <- boot.ci(results.subPop, type = "all", parallel = "multicore", ncpus = 4)

For the whole population of 1436, we obtain a bootstrapped incidence rate of 5.43 per 100 person-years, with a 95% confidence interval of [4.29, 6.75]
For the sub-population of 740 patients who met symptom and SOF criteria, and then either responded (non-treatment resistance) or went on to develop treatment resistance after 2 adequate trials, we obtain a bootstrapped incidence rate of 9.61 per 100 person-years, with a 95% confidence interval of [7.66, 11.92]

6 Differences in TRS-Cases Compared to Population

We now explore differences in the TRS cases, compared to the population as a whole, using multiple univariate analyses to begin with:

# we'll use the NEJM paper variables (but take off the ID variable first)
predToUse <- NEJM.pred[2:length(NEJM.pred)]

tab.Surv2 <- tab.Surv[ , which( names( tab.Surv ) %in% predToUse ) ]
trs.Flag  <- tab.Surv$status.TRS

# create vector of data types (numeric, ordinal/factor)
dataType <- as.numeric( unlist( lapply(tab.Surv2, function(x) if (!is.numeric(x)) NA else max(x, na.rm = TRUE))) )

# parse dataType and decide on a relevant univariate test - e.g. Chi square for categorical proportions and t and KS for continuous
testFlag <- rep(NA, length( dataType ) )

for( i in 1:length( dataType ) ) {
  if( dataType[i] == 1 | is.na( dataType[i] ) ) {
    # categorical (factor) or binary : proportion test - Chi-square : code = 1
    testFlag[i] <- 1
  } else {
    if( dataType[i] > 1 ) {
      # continuous, therefore KS and t-test
      testFlag[i] <- 2
    }
  }
}

demoTable <- data.frame( Var = rep("", length( dataType ) ),
                         M.TRS          = rep(NA, length( dataType ) ),
                         Spread.TRS     = rep(NA, length( dataType ) ),
                         M.NTR          = rep(NA, length( dataType ) ),
                         Spread.NTR     = rep(NA, length( dataType ) ),
                         Perc.TRS       = rep(NA, length( dataType ) ),
                         Perc.NTR       = rep(NA, length( dataType ) ),
                         P.Value            = rep(NA, length( dataType ) ),
                         Test.Used          = rep("", length( dataType ) ),
                         stringsAsFactors = FALSE
)

# for all variables in testFlag ...
for( i in 1:length( testFlag ) ) {
  thisVar <- names( tab.Surv2 )[i]
  demoTable$Var[i] <- thisVar
  
  # select the TRS positive cases
  grpTRS    <- tab.Surv2[ which( trs.Flag == 2 ), i ]
  # and the non TRS (NTR) cases
  grpNTR    <- tab.Surv2[ which( trs.Flag != 2 ), i ]
  
  # if this variable's testFlag == 1, it's ordinal / categorical data
  if( testFlag[i] == 1 ) {
    test.tab <- rbind( table( grpTRS ), table( grpNTR ) )
    rownames(test.tab) <- c("TRS","NTR")
    
    suppressWarnings(
      xs <- chisq.test( test.tab )
    )
    demoTable$Perc.TRS[i]  <- round( length( which( grpTRS == 1 ) ) / length( grpTRS ) * 100, 2 )     # proportion of 1's to 0s in subjects with present data 
    demoTable$Perc.NTR[i]  <- round( length( which( grpNTR == 1 ) ) / length( grpNTR ) * 100, 2 )   
    ## proportion of 1's to 0s in subjects with Missing data 
    demoTable$P.Value[i]       <- round( xs$p.value, 3 )
    demoTable$Test.Used[i]     <- "ChiSq"
    
  } else {
    # This variable is continuous
    # We need either KS and t-test 
    
    # test normality with Shapiro-Wilks
    this.SW.test.TRS    <- shapiro.test( grpTRS )
    this.SW.test.NTR    <- shapiro.test( grpNTR )
    
    if( this.SW.test.TRS$p.value < 0.05 | this.SW.test.NTR$p.value < 0.05 ) {
      # if p.value of either SW test < 0.05, then use KS (i.e. at least one distribution is non-normal)        
      suppressWarnings(
        this.test <- ks.test( grpTRS, grpNTR, alternative = "two.sided" )      
      )
    } else {
      # both missing and non-missing data are normal enough
      suppressWarnings(
        this.test  <- t.test( grpTRS, grpNTR, alternative = "two.sided" )
      )
    }
    
    if( this.test$method == "Two-sample Kolmogorov-Smirnov test" ) {
      # record median and IQR
      demoTable$M.TRS[i]      <- median( grpTRS, na.rm = TRUE )
      demoTable$Spread.TRS[i] <- IQR( grpTRS, na.rm = TRUE )
      demoTable$M.NTR[i]      <- median( grpNTR, na.rm = TRUE )
      demoTable$Spread.NTR[i] <- IQR( grpNTR, na.rm = TRUE )
      demoTable$P.Value[i]    <- round( this.test$p.value, 3 )
      demoTable$Test.Used[i]  <- "KS"
    }
    
    if( this.test$method == "Welch Two Sample t-test" ) {
      # record median and IQR
      demoTable$M.TRS[i]      <- mean( grpTRS, na.rm = TRUE )
      demoTable$Spread.TRS[i] <- sd( grpTRS, na.rm = TRUE )
      demoTable$M.NTR[i]      <- mean( grpNTR, na.rm = TRUE )
      demoTable$Spread.NTR[i] <- sd( grpNTR, na.rm = TRUE )
      demoTable$P.Value[i]        <- round( this.test$p.value, 3 )
      demoTable$Test.Used[i]      <- "T"
      
    }
    
  }
}

# add a significance flag (purely cosmetic)
demoTable$signif <- ifelse( demoTable$P.Value < 0.05, "*", " " )

kable(demoTable)

Var	M.TRS	Spread.TRS	M.NTR	Spread.NTR	Perc.TRS	Perc.NTR	P.Value	Test.Used	signif
age	42	15.00	42	17.0			0.457	KS
sex					76.06	73.56	0.745	ChiSq
exac3mo					23.94	26.75	0.702	ChiSq
yrsEduc	12	1.75	12	1.5			1.000	KS
olzB0					28.17	26.99	0.935	ChiSq
quetB0					9.86	9.83	1.000	ChiSq
rispB0					23.94	22.50	0.891	ChiSq
zipB0					5.63	5.04	1.000	ChiSq
halB0					8.45	5.51	0.435	ChiSq
decaB0					0.00	1.65	0.000	ChiSq	*
perB0					1.41	2.20	0.977	ChiSq
otherB0					9.86	8.81	0.930	ChiSq
race					0.00	0.00	0.325	ChiSq
marital					0.00	0.00	0.908	ChiSq
DM					11.27	10.78	1.000	ChiSq
Lipid					14.08	14.08	1.000	ChiSq
HTN					18.31	19.83	0.873	ChiSq
Depression					21.13	28.09	0.255	ChiSq
alcDep_5yrs					14.08	13.69	1.000	ChiSq
alcAbuse_5yrs					12.68	18.02	0.324	ChiSq
drugDep_5yrs					14.08	16.29	0.745	ChiSq
drugAbuse_5yrs					14.08	20.46	0.250	ChiSq
OCD_5yrs					4.23	5.19	0.934	ChiSq

7 Final Participant “Flow”

Total Participants retrieved from CATIE dataset = 1436

Sub-populations:

Trigger Sx and SOF above absolute moderate severity threshold = 740
- Missing 0 cases
- Treatment Resistance (TRS) Cases = 71
- Responsive (NTR) Cases = 669
- Subset of NTR cases who (i) have a TRS onset, but only 1 adequate trial – so those that might be TRS but we can’t be sure due to right-censoring – results in 197
Do not trigger Sx and SOF above absolute moderate severity threshold = 602
- Missing 94 cases

8 TRS Domains

domainTotals <- table( tab.Surv[ which( tab.Surv$status.TRS == 2 ), which( names( tab.Surv ) %in% c("TRS.pos","TRS.neg") ) ] )

For participants meeting TRS criteria, we find that :

TRS in positive domain only = 17
TRS in negative domain only = 8
TRS in both positive and negative domain = 46

9 Associations of TRS With Demographics and Baseline Clinical State

We now model the associations of demographics and baseline clinical state with the TRS outcome. More broadly, we are interested if we can predict TRS propsectively using baseline information, and we start with a classical inferential analysis

9.1 In Whole Population

We start from the expectation this is unlikely to provide useful information – if a participant never “triggers” on TRS Sx and SOF criteria, then it’s unlikely that a collection of baseline, demographic and clinical state information is going allow reliable inferences on future TRS likelihood.

We will use the original collated data (tab.Surv) rather than that cleaned for survival analyses.

assocX.allPop <- tab.Surv

excludeCols <- c("ID", "time.inTrial", "onset.Sx", "onset.SOF",
  "onset.Rx", "time.onset.TRS", "time.TRS", "numAdeq",
  "durAdeq", "totalRx", "status.rx", "status.sx", "status.sof",
  "TRS.pos", "TRS.neg", "missing.Sx", "missing.Rx",
  "missing.SOF", "time.inTrialYrs", "time.onset.TRSYrs",
  "time.TRSYrs", "status.TRS", "IPW")

indepVars <- names( assocX.allPop )[ -which( names( assocX.allPop ) %in% excludeCols ) ]

# recode status.TRS to be 0 = non-TRS and 1 = TRS
assocX.allPop$status.TRS <- ifelse( assocX.allPop$status.TRS == 2, 1, 0 )

formStr <- as.formula( paste("status.TRS", paste(indepVars, collapse=" + "), sep=" ~ ") )

# model
assoc.fit <- glm( formStr, family = binomial, data = assocX.allPop )

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

We see that some variables have very large standard errors, particularly Osteopor, STI and decaB0 (being on decanoate depot at baseline) – all of these variables have very few cases (10, 11 and 22 respectively).

Retry with these removed:

indepVars <- indepVars[ -which( indepVars %in% c("Osteopor","STI","decaB0") ) ]
formStr <- as.formula( paste("status.TRS", paste(indepVars, collapse=" + "), sep=" ~ ") )
assoc.fit.allPop <- glm( formStr, family = binomial, data = assocX.allPop )

CIs.allPop <- confint( assoc.fit.allPop )

cfs.allPop    <- data.frame( coef(summary(assoc.fit.allPop)) )
  names( cfs.allPop ) <- c("Beta","StdError","Z","P")
  # p-values
  p.vals              <- round( cfs.allPop$P, 3 )
  cfs.allPop$SigP     <- ifelse( cfs.allPop$P < 0.05, "*", "")
  cfs.allPop$OR       <- round( exp( cfs.allPop$Beta ), 2 )
  cfs.allPop$CI_Lower <- round( exp( CIs.allPop[,1] ), 2 )
  cfs.allPop$CI_Upper <- round( exp( CIs.allPop[,2] ), 2 )


kable( cfs.allPop )

	Beta	StdError	Z	P	SigP	OR	CI_Lower	CI_Upper
(Intercept)	-3.8560527	1.1622044	-3.3178782	0.0009070	*	0.02	0.00	0.20
P	0.0325661	0.0311378	1.0458692	0.2956214		1.03	0.97	1.10
N	0.0584261	0.0245950	2.3755252	0.0175240	*	1.06	1.01	1.11
G	-0.0293332	0.0206176	-1.4227270	0.1548153		0.97	0.93	1.01
age	-0.0374996	0.0186209	-2.0138469	0.0440256	*	0.96	0.93	1.00
sex	0.1226414	0.3361149	0.3648794	0.7152014		1.13	0.60	2.25
exac3mo	-0.4274849	0.3203560	-1.3344057	0.1820709		0.65	0.34	1.19
yrsEduc	0.0279642	0.0391961	0.7134437	0.4755712		1.03	0.96	1.12
CGIsev	0.2608167	0.1730918	1.5068110	0.1318591		1.30	0.92	1.82
employFT	-0.8836733	0.7504795	-1.1774783	0.2390047		0.41	0.07	1.44
yrsFirstTx	0.0330714	0.0220781	1.4979272	0.1341522		1.03	0.99	1.08
yrsFrstAntiPsyRx	-0.0262341	0.0231123	-1.1350691	0.2563464		0.97	0.93	1.02
olzB0	0.1051939	0.3249431	0.3237302	0.7461423		1.11	0.58	2.09
quetB0	-0.3498919	0.5032585	-0.6952529	0.4868968		0.70	0.23	1.74
rispB0	0.0895717	0.3367585	0.2659821	0.7902530		1.09	0.56	2.09
zipB0	0.1117390	0.5778191	0.1933807	0.8466609		1.12	0.31	3.16
halB0	0.0309962	0.5623595	0.0551182	0.9560443		1.03	0.29	2.80
perB0	-0.4211818	1.0651089	-0.3954355	0.6925215		0.66	0.04	3.52
otherB0	0.1355289	0.4486646	0.3020717	0.7625974		1.15	0.44	2.61
raceHispanic	0.6264134	0.4275941	1.4649719	0.1429286		1.87	0.79	4.29
raceOther	-0.9650276	1.0560850	-0.9137784	0.3608333		0.38	0.02	2.01
raceWhite	0.4904827	0.3322473	1.4762580	0.1398747		1.63	0.86	3.20
maritalNeverMarried	-0.1771684	0.4522228	-0.3917723	0.6952264		0.84	0.36	2.19
maritalPrevMarried	0.0112907	0.4763539	0.0237024	0.9810900		1.01	0.41	2.74
COPD	-0.1976583	1.0588191	-0.1866781	0.8519130		0.82	0.04	4.36
DM	0.3424323	0.4532572	0.7554923	0.4499536		1.41	0.55	3.29
HepABC	-0.0100721	0.6371844	-0.0158072	0.9873882		0.99	0.23	3.00
Lipid	0.1699361	0.3902825	0.4354182	0.6632589		1.19	0.52	2.46
HTN	0.0689262	0.3880988	0.1775996	0.8590374		1.07	0.48	2.24
IHD	1.1168385	1.1305364	0.9878837	0.3232096		3.06	0.15	19.78
OsteoArth	-0.4175183	0.7720972	-0.5407587	0.5886739		0.66	0.10	2.42
Depression	-0.4324858	0.3345832	-1.2926108	0.1961457		0.65	0.33	1.22
alcDep_5yrs	0.4178658	0.4390143	0.9518273	0.3411846		1.52	0.62	3.50
alcAbuse_5yrs	-0.3254903	0.4396680	-0.7403093	0.4591124		0.72	0.29	1.64
drugDep_5yrs	0.0631063	0.4173850	0.1511943	0.8798224		1.07	0.45	2.33
drugAbuse_5yrs	-0.3920259	0.4063443	-0.9647630	0.3346635		0.68	0.29	1.44
OCD_5yrs	-0.1262051	0.6525960	-0.1933893	0.8466541		0.88	0.20	2.76
anxDis_5yrs	0.0501536	0.4146300	0.1209598	0.9037229		1.05	0.44	2.27

9.2 At Risk Sub-Population

We repeat the analysis in the sub-population who triggered on symptoms and SOF impairment.

# subset assocX.TRSpop for only sub-population at risk
assocX.TRSpop <- tab.Surv[ which( !is.na( tab.Surv$time.onset.TRSYrs ) ), ]

# recode status.TRS to be 0 = non-TRS and 1 = TRS
assocX.TRSpop$status.TRS <- ifelse( assocX.TRSpop$status.TRS == 2, 1, 0 )

formStr <- as.formula( paste("status.TRS", paste(indepVars, collapse=" + "), sep=" ~ ") )

# model
assoc.fit.TRSpop <- glm( formStr, family = binomial, data = assocX.TRSpop )

CIs.TRSpop <- confint( assoc.fit.TRSpop )

## Waiting for profiling to be done...

cfs.TRSpop    <- data.frame( coef(summary(assoc.fit.TRSpop)) )
  names( cfs.TRSpop ) <- c("Beta","StdError","Z","P")
  # p-values
  p.vals              <- round( cfs.TRSpop$P, 3 )
  cfs.TRSpop$SigP     <- ifelse( cfs.TRSpop$P < 0.05, "*", "")
  cfs.TRSpop$OR       <- round( exp( cfs.TRSpop$Beta ), 2 )
  cfs.TRSpop$CI_Lower <- round( exp( CIs.TRSpop[,1] ), 2 )
  cfs.TRSpop$CI_Upper <- round( exp( CIs.TRSpop[,2] ), 2 )


kable( cfs.TRSpop )

	Beta	StdError	Z	P	SigP	OR	CI_Lower	CI_Upper
(Intercept)	-1.7359420	1.2609199	-1.3767266	0.1685968		0.18	0.01	2.04
P	0.0248019	0.0317902	0.7801755	0.4352876		1.03	0.96	1.09
N	0.0432919	0.0259977	1.6652188	0.0958691		1.04	0.99	1.10
G	-0.0322634	0.0211672	-1.5242119	0.1274558		0.97	0.93	1.01
age	-0.0491597	0.0196928	-2.4963238	0.0125488	*	0.95	0.91	0.99
sex	-0.0849137	0.3521116	-0.2411555	0.8094346		0.92	0.47	1.89
exac3mo	-0.3369556	0.3288105	-1.0247714	0.3054710		0.71	0.36	1.33
yrsEduc	0.0389660	0.0405255	0.9615180	0.3362918		1.04	0.96	1.13
CGIsev	0.1794081	0.1792693	1.0007743	0.3169359		1.20	0.84	1.71
employFT	-0.6275612	0.7706953	-0.8142792	0.4154850		0.53	0.08	1.96
yrsFirstTx	0.0337664	0.0232139	1.4545743	0.1457872		1.03	0.99	1.08
yrsFrstAntiPsyRx	-0.0271606	0.0240550	-1.1291028	0.2588545		0.97	0.93	1.02
olzB0	0.0114301	0.3398664	0.0336311	0.9731713		1.01	0.51	1.96
quetB0	-0.3651252	0.5126715	-0.7122010	0.4763403		0.69	0.23	1.75
rispB0	0.0992386	0.3488764	0.2844519	0.7760641		1.10	0.55	2.17
zipB0	-0.1085051	0.6024496	-0.1801065	0.8570689		0.90	0.24	2.68
halB0	-0.0090881	0.5725875	-0.0158720	0.9873365		0.99	0.28	2.76
perB0	-0.4801902	1.0949996	-0.4385300	0.6610021		0.62	0.03	3.62
otherB0	-0.0983109	0.4555132	-0.2158244	0.8291246		0.91	0.34	2.10
raceHispanic	0.7226005	0.4457252	1.6211794	0.1049792		2.06	0.84	4.90
raceOther	-1.0809395	1.0706292	-1.0096301	0.3126725		0.34	0.02	1.87
raceWhite	0.4726302	0.3449156	1.3702779	0.1706002		1.60	0.83	3.22
maritalNeverMarried	-0.2101868	0.4697986	-0.4473976	0.6545880		0.81	0.34	2.18
maritalPrevMarried	-0.0167125	0.4991615	-0.0334812	0.9732908		0.98	0.38	2.77
COPD	-0.0074840	1.0981369	-0.0068152	0.9945623		0.99	0.05	5.93
DM	0.1928834	0.4597671	0.4195241	0.6748331		1.21	0.47	2.88
HepABC	-0.2061348	0.6445889	-0.3197927	0.7491255		0.81	0.18	2.51
Lipid	0.0425896	0.3968661	0.1073149	0.9145391		1.04	0.46	2.20
HTN	0.1919175	0.4025883	0.4767092	0.6335692		1.21	0.53	2.62
IHD	1.1037565	1.2020118	0.9182577	0.3584840		3.02	0.14	24.30
OsteoArth	-0.6030687	0.7910060	-0.7624073	0.4458169		0.55	0.08	2.10
Depression	-0.4481845	0.3467715	-1.2924489	0.1962017		0.64	0.31	1.23
alcDep_5yrs	0.5231161	0.4628422	1.1302255	0.2583812		1.69	0.66	4.10
alcAbuse_5yrs	-0.2885550	0.4400729	-0.6556983	0.5120183		0.75	0.30	1.71
drugDep_5yrs	0.0321603	0.4366374	0.0736545	0.9412853		1.03	0.42	2.36
drugAbuse_5yrs	-0.2941685	0.4146488	-0.7094402	0.4780513		0.75	0.32	1.62
OCD_5yrs	-0.3350043	0.6721006	-0.4984436	0.6181714		0.72	0.16	2.34
anxDis_5yrs	-0.0696678	0.4256441	-0.1636762	0.8699861		0.93	0.38	2.06

9.3 Summary of GLMs Results

9.3.1 Entire Population

kable( cfs.allPop[ which( cfs.allPop$P < 0.05 ), ] )

	Beta	StdError	Z	P	SigP	OR	CI_Lower	CI_Upper
(Intercept)	-3.8560527	1.1622044	-3.317878	0.0009070	*	0.02	0.00	0.20
N	0.0584261	0.0245950	2.375525	0.0175240	*	1.06	1.01	1.11
age	-0.0374996	0.0186209	-2.013847	0.0440256	*	0.96	0.93	1.00

9.3.2 At-Risk Sub-population

kable( cfs.TRSpop[ which( cfs.TRSpop$P < 0.05 ), ] )

	Beta	StdError	Z	P	SigP	OR	CI_Lower	CI_Upper
age	-0.0491597	0.0196928	-2.496324	0.0125488	*	0.95	0.91	0.99

9.4 GLM Summaries with Stabilised Inverse Probability Weights for Missing Cases

We computed IPWs for missingness, so we use the survey package to investigate if this makes any difference to findings:

# for entire population
desObj.allPop  <- svydesign( ids = ~ 0, weights = ~IPW, data = assocX.allPop )
IPW.fit.allPop <- svyglm( formStr, design=desObj.allPop, family = quasibinomial(link = "logit"))
CIs.IPW.allPop <- confint( IPW.fit.allPop )

cfs.IPW.allPop <- data.frame( coef(summary(IPW.fit.allPop)) )
  names( cfs.IPW.allPop ) <- c("Beta","StdError","Z","P")
  # p-values
  p.vals              <- round( cfs.IPW.allPop$P, 3 )
  cfs.IPW.allPop$SigP     <- ifelse( cfs.IPW.allPop$P < 0.05, "*", "")
  cfs.IPW.allPop$OR       <- round( exp( cfs.IPW.allPop$Beta ), 2 )
  cfs.IPW.allPop$CI_Lower <- round( exp( CIs.IPW.allPop[,1] ), 2 )
  cfs.IPW.allPop$CI_Upper <- round( exp( CIs.IPW.allPop[,2] ), 2 )



# for subpopulation at risk for TRS (by SOF and Sx criteria)
desObj.TRS     <- svydesign( ids = ~ 0, weights = ~IPW, data = assocX.TRSpop )
IPW.fit.TRS    <- svyglm( formStr, design=desObj.TRS, family = quasibinomial(link = "logit"))
CIs.IPW.TRS    <- confint( IPW.fit.TRS )

cfs.IPW.TRSpop <- data.frame( coef(summary(IPW.fit.TRS)) )
  names( cfs.IPW.TRSpop ) <- c("Beta","StdError","Z","P")
  # p-values
  p.vals                  <- round( cfs.IPW.TRSpop$P, 3 )
  cfs.IPW.TRSpop$SigP     <- ifelse( cfs.IPW.TRSpop$P < 0.05, "*", "")
  cfs.IPW.TRSpop$OR       <- round( exp( cfs.IPW.TRSpop$Beta ), 2 )
  cfs.IPW.TRSpop$CI_Lower <- round( exp( CIs.IPW.TRS[,1] ), 2 )
  cfs.IPW.TRSpop$CI_Upper <- round( exp( CIs.IPW.TRS[,2] ), 2 )

And a summary of the statistically significant results to compare with regular GLMs:

First, for the whole population:

kable( cfs.IPW.allPop[ which( cfs.IPW.allPop$P < 0.05 ), ] )

	Beta	StdError	Z	P	SigP	OR	CI_Lower	CI_Upper
(Intercept)	-3.9667841	1.055150	-3.759450	0.0001782	*	0.02	0.00	0.15
N	0.0551971	0.023671	2.331846	0.0198675	*	1.06	1.01	1.11

And the at risk of TRS sub-population:

kable( cfs.IPW.TRSpop[ which( cfs.IPW.TRSpop$P < 0.05 ), ] )

	Beta	StdError	Z	P	SigP	OR	CI_Lower	CI_Upper
age	-0.0455579	0.0210663	-2.1626	0.0309256	*	0.96	0.92	1

10 Discussion for Inferential Analyses

Multiple univariate analyses for differences in baseline and demographic data for the TRS versus treatment responsive cases showed that the only difference was in the proportion of participants on depot medication at baseline.

There were 22 participants on decanoate at baseline or which 0 went on to be in the TRS group.

With IPW weighting, multiple-variable GLMs for associations of baseline and demographic data with TRS cases in the whole population showed only very small effects which are unlikely to be clinically meaningful:

In the whole population analyses, higher negative symptoms were weakly associated 95% CI = [1.01 , 1.11] with the probability of becoming TRS
In the at-risk population, younger age was associated with a small reduction in the probability of transitioning to TRS with a 95% CI = [0.92 , 1.09]

CATIE TRS : Survival Analysis - v3.0

Dan W Joyce

23rd March 2018

1 Setup

2 Data Dictionary for `tab.Surv`

3 Modelling Missing Baseline Data

4 Estimating Inverse Probability Weights for Missing-To-Follow-Up Cases

5 Survival Estimates

5.1 Cumulative Event Curves

5.1.1 For Whole Population

5.1.2 At-Risk for TRS Cases

5.2 Addendum

5.3 Crude Incidence Rates

6 Differences in TRS-Cases Compared to Population

7 Final Participant “Flow”

8 TRS Domains

9 Associations of TRS With Demographics and Baseline Clinical State

9.1 In Whole Population

9.2 At Risk Sub-Population

9.3 Summary of GLMs Results

9.3.1 Entire Population

9.3.2 At-Risk Sub-population

9.4 GLM Summaries with Stabilised Inverse Probability Weights for Missing Cases

10 Discussion for Inferential Analyses

CATIE TRS : Survival Analysis - v3.0

Dan W Joyce

23rd March 2018

1 Setup

2 Data Dictionary for tab.Surv

3 Modelling Missing Baseline Data

4 Estimating Inverse Probability Weights for Missing-To-Follow-Up Cases

5 Survival Estimates

5.1 Cumulative Event Curves

5.1.1 For Whole Population

5.1.2 At-Risk for TRS Cases

5.2 Addendum

5.3 Crude Incidence Rates

6 Differences in TRS-Cases Compared to Population

7 Final Participant “Flow”

8 TRS Domains

9 Associations of TRS With Demographics and Baseline Clinical State

9.1 In Whole Population

9.2 At Risk Sub-Population

9.3 Summary of GLMs Results

9.3.1 Entire Population

9.3.2 At-Risk Sub-population

9.4 GLM Summaries with Stabilised Inverse Probability Weights for Missing Cases

10 Discussion for Inferential Analyses

2 Data Dictionary for `tab.Surv`