Cox analysis for Palu dataset

Olivier Bouaziz

8 December 2015

The dataset

require(survival)
## Loading required package: survival
load("beninClean.RData")
benin[benin$id=="A003",c("id","start","end","event","palu_tot","ap","Pred","numbercl")]
##      id start   end event palu_tot ap      Pred numbercl
## 45 A003     0  28.0 FALSE        0  0 2.9095545        0
## 46 A003    28  61.0 FALSE        0  0 2.9095545        0
## 47 A003    61  92.0 FALSE        0  0 2.3186846        0
## 65 A003    92  92.5 FALSE        0  0        NA        0
## 48 A003    92 120.0 FALSE        0  0 0.7075125        0
## 49 A003   120 152.0 FALSE        0  0 0.6446808        0
## 64 A003   152 180.0 FALSE        0  0        NA        0
## 50 A003   180 181.0 FALSE        0  0 0.7595721        0
## 51 A003   181 212.0 FALSE        0  0 0.4391123        0
## 66 A003   212 229.0 FALSE        0  0        NA        0
## 52 A003   229 243.0 FALSE        0  0 0.1471946        0
## 53 A003   243 272.0 FALSE        0  0 0.1471946        0
## 54 A003   272 302.0 FALSE        0  0 0.1521330        0
## 71 A003   302 318.0 FALSE        0  0        NA        0
## 55 A003   318 335.0 FALSE        0  0 0.4005166        0
## 56 A003   335 365.0 FALSE        0  0 0.5304657        0
## 57 A003   365 392.0 FALSE        0  0 2.9095545        0
## 58 A003   392 427.0 FALSE        0  0 2.9095545        0
## 72 A003   427 453.0  TRUE        2  0        NA        0
## 59 A003   453 456.0 FALSE        0  0 2.4694612        1
## 60 A003   456 490.0 FALSE        0  0 0.6059244        1
## 61 A003   490 518.0 FALSE        0  0 0.7535198        1
## 62 A003   518 547.0 FALSE        0  0 0.7075125        1
## 68 A003   547 590.0  TRUE        2  0        NA        1

numbercl represents the number of recurrent events previously encountered by a patient and palu_tot=2 represents observed palu infections.

Cox model

Our model is:

fit<-coxph(Surv(start,end,event)~ap+Pred+cluster(id),data=benin)
summary(fit)
## Call:
## coxph(formula = Surv(start, end, event) ~ ap + Pred + cluster(id), 
##     data = benin)
## 
##   n= 5344, number of events= 33 
##    (3899 observations deleted due to missingness)
## 
##         coef exp(coef) se(coef) robust se     z Pr(>|z|)    
## ap   0.88895   2.43256  0.42892   0.40300 2.206   0.0274 *  
## Pred 0.09301   1.09747  0.02803   0.02310 4.026 5.67e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      exp(coef) exp(-coef) lower .95 upper .95
## ap       2.433     0.4111     1.104     5.359
## Pred     1.097     0.9112     1.049     1.148
## 
## Concordance= 0.719  (se = 0.052 )
## Rsquare= 0.002   (max possible= 0.066 )
## Likelihood ratio test= 11.83  on 2 df,   p=0.002702
## Wald test            = 21.96  on 2 df,   p=1.705e-05
## Score (logrank) test = 15.41  on 2 df,   p=0.0004501,   Robust = 8.56  p=0.01382
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

with the cluster option to treat the recurrente events as clustered by individuals.

Effect of ap on the first recurrent event

If we want to check the effect of ap on the occurence of the first recurrent event (and only the first) we create the Cox model:

benin$PredC<-benin$Pred-mean(benin$Pred,na.rm=TRUE)#center Pred variable
#To take individuals that have not yet experienced a recurrent event.
fitOnly0<-with(benin,coxph(Surv(end,palu_tot==2)~ap+PredC,subset=(benin$numbercl=="0")))
summary(fitOnly0)
## Call:
## coxph(formula = Surv(end, palu_tot == 2) ~ ap + PredC, subset = (benin$numbercl == 
##     "0"))
## 
##   n= 4046, number of events= 18 
##    (2623 observations deleted due to missingness)
## 
##          coef exp(coef) se(coef)     z Pr(>|z|)    
## ap    1.84809   6.34770  0.50462 3.662  0.00025 ***
## PredC 0.06584   1.06805  0.04428 1.487  0.13701    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## ap        6.348     0.1575    2.3609    17.067
## PredC     1.068     0.9363    0.9793     1.165
## 
## Concordance= 0.691  (se = 0.079 )
## Rsquare= 0.003   (max possible= 0.061 )
## Likelihood ratio test= 11.92  on 2 df,   p=0.002582
## Wald test            = 15.1  on 2 df,   p=0.0005267
## Score (logrank) test = 19.39  on 2 df,   p=6.147e-05

ap is highly significant!! Very strong effect equal to \(6.34\)!

Effect of ap on the first recurrent event (survival curves)

survfit of a Cox model computes the estimated survival curve from a Cox model, that is:

\[\exp\left(\widehat{\int_0^t h_0(s)ds} \,e^{\hat\theta_1 ap +\hat\theta_2 PredC}\right)\]

plot(survfit(fitOnly0,newdata = data.frame(ap=0,PredC=0)),mark.time = FALSE)
lines(survfit(fitOnly0,newdata = data.frame(ap=1,PredC=0)),col="red",mark.time = FALSE)

Strange result…! Je ne sais pas pourquoi les intervalles de confiance se coupent… Normalement, on peut utiliser la delta-method pour construire un intervalle de confiance de la fonction de survie dans le modèle de Cox… il me semblait que c’est ce que R fait mais c’est étrange avec un effet pour ap si significatif que ça ne le soit pas pour les courbes !!

I can use a stratified log-rank test, which works just fine here:

#First create at least two levels for Pred
benin$PredCl<-cut(benin$Pred,c(0,median(benin$Pred,na.rm=TRUE),max(benin$Pred,na.rm=TRUE)+1))
survdiff(Surv(end,palu_tot==2)~ap+strata(PredCl),data=benin[benin$numbercl=="0",])
## Call:
## survdiff(formula = Surv(end, palu_tot == 2) ~ ap + strata(PredCl), 
##     data = benin[benin$numbercl == "0", ])
## 
## n=4046, 2623 observations deleted due to missingness.
## 
##         N Observed Expected (O-E)^2/E (O-E)^2/V
## ap=0 3674       12    16.57      1.26      15.9
## ap=1  372        6     1.43     14.58      15.9
## 
##  Chisq= 15.9  on 1 degrees of freedom, p= 6.56e-05

ap is (again) highly significant!

Effect of ap on further recurrent events

If we want to check the effect of ap on the occurence of recurrence events after the first one we create the Cox model:

fitNo0<-with(benin,coxph(Surv(start,end,palu_tot==2)~ap+PredC+cluster(id),subset=(benin$numbercl!="0")))
summary(fitNo0)
## Call:
## coxph(formula = Surv(start, end, palu_tot == 2) ~ ap + PredC + 
##     cluster(id), subset = (benin$numbercl != "0"))
## 
##   n= 1298, number of events= 15 
##    (1276 observations deleted due to missingness)
## 
##           coef exp(coef) se(coef) robust se      z Pr(>|z|)    
## ap    -0.87988   0.41483  1.04393   1.05662 -0.833 0.404996    
## PredC  0.11476   1.12161  0.04014   0.03107  3.693 0.000221 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##       exp(coef) exp(-coef) lower .95 upper .95
## ap       0.4148     2.4106    0.0523     3.291
## PredC    1.1216     0.8916    1.0553     1.192
## 
## Concordance= 0.733  (se = 0.079 )
## Rsquare= 0.006   (max possible= 0.091 )
## Likelihood ratio test= 7.37  on 2 df,   p=0.02512
## Wald test            = 14.11  on 2 df,   p=0.0008612
## Score (logrank) test = 9.99  on 2 df,   p=0.006768,   Robust = 4.16  p=0.1251
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

ap is no longer significant!!

Effect of ap on further recurrent events (survival curves)

plot(survfit(fitNo0,newdata = data.frame(ap=0,PredC=0)),mark.time = FALSE)
lines(survfit(fitNo0,newdata = data.frame(ap=1,PredC=0)),col="red",mark.time = FALSE)