Olivier Bouaziz
8 December 2015
require(survival)
## Loading required package: survival
load("beninClean.RData")
benin[benin$id=="A003",c("id","start","end","event","palu_tot","ap","Pred","numbercl")]
## id start end event palu_tot ap Pred numbercl
## 45 A003 0 28.0 FALSE 0 0 2.9095545 0
## 46 A003 28 61.0 FALSE 0 0 2.9095545 0
## 47 A003 61 92.0 FALSE 0 0 2.3186846 0
## 65 A003 92 92.5 FALSE 0 0 NA 0
## 48 A003 92 120.0 FALSE 0 0 0.7075125 0
## 49 A003 120 152.0 FALSE 0 0 0.6446808 0
## 64 A003 152 180.0 FALSE 0 0 NA 0
## 50 A003 180 181.0 FALSE 0 0 0.7595721 0
## 51 A003 181 212.0 FALSE 0 0 0.4391123 0
## 66 A003 212 229.0 FALSE 0 0 NA 0
## 52 A003 229 243.0 FALSE 0 0 0.1471946 0
## 53 A003 243 272.0 FALSE 0 0 0.1471946 0
## 54 A003 272 302.0 FALSE 0 0 0.1521330 0
## 71 A003 302 318.0 FALSE 0 0 NA 0
## 55 A003 318 335.0 FALSE 0 0 0.4005166 0
## 56 A003 335 365.0 FALSE 0 0 0.5304657 0
## 57 A003 365 392.0 FALSE 0 0 2.9095545 0
## 58 A003 392 427.0 FALSE 0 0 2.9095545 0
## 72 A003 427 453.0 TRUE 2 0 NA 0
## 59 A003 453 456.0 FALSE 0 0 2.4694612 1
## 60 A003 456 490.0 FALSE 0 0 0.6059244 1
## 61 A003 490 518.0 FALSE 0 0 0.7535198 1
## 62 A003 518 547.0 FALSE 0 0 0.7075125 1
## 68 A003 547 590.0 TRUE 2 0 NA 1
numbercl represents the number of recurrent events previously encountered by a patient and palu_tot=2 represents observed palu infections.
Our model is:
fit<-coxph(Surv(start,end,event)~ap+Pred+cluster(id),data=benin)
summary(fit)
## Call:
## coxph(formula = Surv(start, end, event) ~ ap + Pred + cluster(id),
## data = benin)
##
## n= 5344, number of events= 33
## (3899 observations deleted due to missingness)
##
## coef exp(coef) se(coef) robust se z Pr(>|z|)
## ap 0.88895 2.43256 0.42892 0.40300 2.206 0.0274 *
## Pred 0.09301 1.09747 0.02803 0.02310 4.026 5.67e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## ap 2.433 0.4111 1.104 5.359
## Pred 1.097 0.9112 1.049 1.148
##
## Concordance= 0.719 (se = 0.052 )
## Rsquare= 0.002 (max possible= 0.066 )
## Likelihood ratio test= 11.83 on 2 df, p=0.002702
## Wald test = 21.96 on 2 df, p=1.705e-05
## Score (logrank) test = 15.41 on 2 df, p=0.0004501, Robust = 8.56 p=0.01382
##
## (Note: the likelihood ratio and score tests assume independence of
## observations within a cluster, the Wald and robust score tests do not).
with the cluster option to treat the recurrente events as clustered by individuals.
If we want to check the effect of ap on the occurence of the first recurrent event (and only the first) we create the Cox model:
benin$PredC<-benin$Pred-mean(benin$Pred,na.rm=TRUE)#center Pred variable
#To take individuals that have not yet experienced a recurrent event.
fitOnly0<-with(benin,coxph(Surv(end,palu_tot==2)~ap+PredC,subset=(benin$numbercl=="0")))
summary(fitOnly0)
## Call:
## coxph(formula = Surv(end, palu_tot == 2) ~ ap + PredC, subset = (benin$numbercl ==
## "0"))
##
## n= 4046, number of events= 18
## (2623 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## ap 1.84809 6.34770 0.50462 3.662 0.00025 ***
## PredC 0.06584 1.06805 0.04428 1.487 0.13701
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## ap 6.348 0.1575 2.3609 17.067
## PredC 1.068 0.9363 0.9793 1.165
##
## Concordance= 0.691 (se = 0.079 )
## Rsquare= 0.003 (max possible= 0.061 )
## Likelihood ratio test= 11.92 on 2 df, p=0.002582
## Wald test = 15.1 on 2 df, p=0.0005267
## Score (logrank) test = 19.39 on 2 df, p=6.147e-05
ap is highly significant!! Very strong effect equal to \(6.34\)!
survfit of a Cox model computes the estimated survival curve from a Cox model, that is:
\[\exp\left(\widehat{\int_0^t h_0(s)ds} \,e^{\hat\theta_1 ap +\hat\theta_2 PredC}\right)\]
plot(survfit(fitOnly0,newdata = data.frame(ap=0,PredC=0)),mark.time = FALSE)
lines(survfit(fitOnly0,newdata = data.frame(ap=1,PredC=0)),col="red",mark.time = FALSE)
Strange result…! Je ne sais pas pourquoi les intervalles de confiance se coupent… Normalement, on peut utiliser la delta-method pour construire un intervalle de confiance de la fonction de survie dans le modèle de Cox… il me semblait que c’est ce que R fait mais c’est étrange avec un effet pour ap si significatif que ça ne le soit pas pour les courbes !!
I can use a stratified log-rank test, which works just fine here:
#First create at least two levels for Pred
benin$PredCl<-cut(benin$Pred,c(0,median(benin$Pred,na.rm=TRUE),max(benin$Pred,na.rm=TRUE)+1))
survdiff(Surv(end,palu_tot==2)~ap+strata(PredCl),data=benin[benin$numbercl=="0",])
## Call:
## survdiff(formula = Surv(end, palu_tot == 2) ~ ap + strata(PredCl),
## data = benin[benin$numbercl == "0", ])
##
## n=4046, 2623 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## ap=0 3674 12 16.57 1.26 15.9
## ap=1 372 6 1.43 14.58 15.9
##
## Chisq= 15.9 on 1 degrees of freedom, p= 6.56e-05
ap is (again) highly significant!
If we want to check the effect of ap on the occurence of recurrence events after the first one we create the Cox model:
fitNo0<-with(benin,coxph(Surv(start,end,palu_tot==2)~ap+PredC+cluster(id),subset=(benin$numbercl!="0")))
summary(fitNo0)
## Call:
## coxph(formula = Surv(start, end, palu_tot == 2) ~ ap + PredC +
## cluster(id), subset = (benin$numbercl != "0"))
##
## n= 1298, number of events= 15
## (1276 observations deleted due to missingness)
##
## coef exp(coef) se(coef) robust se z Pr(>|z|)
## ap -0.87988 0.41483 1.04393 1.05662 -0.833 0.404996
## PredC 0.11476 1.12161 0.04014 0.03107 3.693 0.000221 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## ap 0.4148 2.4106 0.0523 3.291
## PredC 1.1216 0.8916 1.0553 1.192
##
## Concordance= 0.733 (se = 0.079 )
## Rsquare= 0.006 (max possible= 0.091 )
## Likelihood ratio test= 7.37 on 2 df, p=0.02512
## Wald test = 14.11 on 2 df, p=0.0008612
## Score (logrank) test = 9.99 on 2 df, p=0.006768, Robust = 4.16 p=0.1251
##
## (Note: the likelihood ratio and score tests assume independence of
## observations within a cluster, the Wald and robust score tests do not).
ap is no longer significant!!
plot(survfit(fitNo0,newdata = data.frame(ap=0,PredC=0)),mark.time = FALSE)
lines(survfit(fitNo0,newdata = data.frame(ap=1,PredC=0)),col="red",mark.time = FALSE)