For purpose of simplicity, we are going to use only some part …
accept.dt: acceptance into program
tx.date: transplant date
fu.date: end of followup
fustat: dead or alive
transplant: transplant indicator
surgery: prior bypass surgery
library(survival)
dat <- jasa[,c("accept.dt","tx.date","fu.date","fustat","transplant","surgery")]
dat$id <- 1:nrow(dat)
dat[1:10,]
## accept.dt tx.date fu.date fustat transplant surgery id
## 1 1967-11-15 <NA> 1968-01-03 1 0 0 1
## 2 1968-01-02 <NA> 1968-01-07 1 0 0 2
## 3 1968-01-06 1968-01-06 1968-01-21 1 1 0 3
## 4 1968-03-28 1968-05-02 1968-05-05 1 1 0 4
## 5 1968-05-10 <NA> 1968-05-27 1 0 0 5
## 6 1968-06-13 <NA> 1968-06-15 1 0 0 6
## 7 1968-07-12 1968-08-31 1970-05-17 1 1 0 7
## 8 1968-08-01 <NA> 1968-09-09 1 0 0 8
## 9 1968-08-09 <NA> 1968-11-01 1 0 0 9
## 10 1968-08-11 1968-08-22 1968-10-07 1 1 0 10
These rows refer to individual subject. We need an identifier (id) for each of them so i added the id column to the data (dat).
Let’s try some analyze based on this raw data: Survival between transplanted group and non-transplanted group. Along with coxph with transplant + surgery as time-fixed covariate
## Call:
## coxph(formula = Surv ~ transplant + surgery, data = temp)
##
## coef exp(coef) se(coef) z p
## transplant -1.241 0.289 0.250 -4.96 6.9e-07
## surgery -0.437 0.646 0.370 -1.18 0.24
##
## Likelihood ratio test=27.5 on 2 df, p=1.08e-06
## n= 103, number of events= 75
In order to classify covariate “transplant” by time, we actually want some thing like this:
jasa1[1:10,1:5]
## id start stop event transplant
## 1 1 0 49 1 0
## 2 2 0 5 1 0
## 102 3 0 15 1 1
## 3 4 0 35 0 0
## 103 4 35 38 1 1
## 4 5 0 17 1 0
## 5 6 0 2 1 0
## 6 7 0 50 0 0
## 104 7 50 674 1 1
## 7 8 0 39 1 0
ID1&2 never got transplanted, and died on stop time
ID3 got transplanted on the first day, and died thereafter
ID4 got transplanted on day 35, and died on day 38
That is why ID4 got 2 separates row, the first row data will be treated as non-transplanted, and the second as transplanted. This is a simple example of time-dependent covariate analysis.
Now, Let’s see the code we need to transform jasa–dat into jasa1
sadly, some subject dies within the same day with cohort entry, create a start-stop time of (0,0) which is invalid; we will have to adapt the data making the dead occurs at 0.5 day. Also some subjects died the same day with transplantation, this should account as death in transplant group, so we will have to step back transplantation date for 0.5 day
dat[1:10,]
## accept.dt tx.date fu.date fustat transplant surgery id
## 1 1967-11-15 <NA> 1968-01-03 1 0 0 1
## 2 1968-01-02 <NA> 1968-01-07 1 0 0 2
## 3 1968-01-06 1968-01-06 1968-01-21 1 1 0 3
## 4 1968-03-28 1968-05-02 1968-05-05 1 1 0 4
## 5 1968-05-10 <NA> 1968-05-27 1 0 0 5
## 6 1968-06-13 <NA> 1968-06-15 1 0 0 6
## 7 1968-07-12 1968-08-31 1970-05-17 1 1 0 7
## 8 1968-08-01 <NA> 1968-09-09 1 0 0 8
## 9 1968-08-09 <NA> 1968-11-01 1 0 0 9
## 10 1968-08-11 1968-08-22 1968-10-07 1 1 0 10
dat$futime <- pmax(0.5,dat$fu.date-dat$accept.dt)
dat$txtime <- ifelse(dat$tx.date==dat$fu.date,
dat$futime-0.5,dat$tx.date-dat$accept.dt)
Then we use tmerge()
newdat <- tmerge(dat,dat,id=id,event=event(futime,fustat))
newdat <- tmerge(newdat,dat,id=id,tx=tdc(txtime))
newdat[1:10,7:13]
## id futime txtime tstart tstop event tx
## 1 1 49 NA 0 49 1 0
## 2 2 5 NA 0 5 1 0
## 3 3 15 0 0 15 1 1
## 4 4 38 35 0 35 0 0
## 5 4 38 35 35 38 1 1
## 6 5 17 NA 0 17 1 0
## 7 6 2 NA 0 2 1 0
## 8 7 674 50 0 50 0 0
## 9 7 674 50 50 674 1 1
## 10 8 39 NA 0 39 1 0
Compare with jasa1 again
## id start stop event transplant
## 1 1 0 49 1 0
## 2 2 0 5 1 0
## 102 3 0 15 1 1
## 3 4 0 35 0 0
## 103 4 35 38 1 1
## 4 5 0 17 1 0
## 5 6 0 2 1 0
## 6 7 0 50 0 0
## 104 7 50 674 1 1
## 7 8 0 39 1 0
Hey! looking good! Now, let’s see what the plot will look like
newdat$Surv <- Surv(newdat$tstart,newdat$tstop,newdat$event)
plot(survfit(Surv~tx,data=newdat),col=c("red","blue"))
If you wanna use Cox;
coxph(Surv~tx+surgery,data=newdat)
## Call:
## coxph(formula = Surv ~ tx + surgery, data = newdat)
##
## coef exp(coef) se(coef) z p
## tx 0.156 1.169 0.297 0.53 0.598
## surgery -0.749 0.473 0.360 -2.08 0.037
##
## Likelihood ratio test=5.36 on 2 df, p=0.0687
## n= 170, number of events= 75
This is the real thing, and we should be able to design a proper CRF with this example; a data of PBC patients.We have 2 dataset; pbc & pbcseq
pbc contains subject baseline information, each per row. We are going to use only part of it (again, for simplicity)
id: case number
time: number of days between registration and the earlier of death, transplantion, or study analysis in July, 1986
status: status at endpoint, 0/1/2 for censored, transplant, dead
ascites: presence of ascites
protime: standardised blood clotting time
bili: serum bilirunbin (mg/dl)
## id time status ascites protime bili
## 1 1 400 2 1 12.2 14.5
## 2 2 4500 0 0 10.6 1.1
## 3 3 1012 2 0 12.0 1.4
## 4 4 1925 2 0 10.3 1.8
## 5 5 1504 1 0 10.9 3.4
## 6 6 2503 2 0 11.0 0.8
## 7 7 1832 0 0 9.7 1.0
## 8 8 2466 2 0 11.0 0.3
## 9 9 2400 2 0 11.0 3.2
## 10 10 51 2 1 11.5 12.6
pbcseq contains the lab and examination data of each subject on separate day of follow up
>day: number of days between enrollment and this visit date all measurements below refer to this date
## id day ascites protime bili
## 1 1 0 1 12.2 14.5
## 2 1 192 1 11.2 21.3
## 3 2 0 0 10.6 1.1
## 4 2 182 0 11.0 0.8
## 5 2 365 0 11.6 1.0
## 6 2 768 0 10.6 1.9
## 7 2 1790 1 11.3 2.6
## 8 2 2151 1 11.5 3.6
## 9 2 2515 1 11.5 4.2
## 10 2 2882 1 11.5 3.6
Let’s tmerge!
dat2 <- tmerge(dat1,dat1,id=id,death=event(time,status))
dat2 <- tmerge(dat2, pbcseq, id=id,
ascites=tdc(day,ascites),
protime=tdc(day,protime),
bili=tdc(day,bili))
dat2[1:10,]
## id time status ascites protime bili tstart tstop death
## 1 1 400 2 1 12.2 14.5 0 192 0
## 2 1 400 2 1 11.2 21.3 192 400 2
## 3 2 4500 0 0 10.6 1.1 0 182 0
## 4 2 4500 0 0 11.0 0.8 182 365 0
## 5 2 4500 0 0 11.6 1.0 365 768 0
## 6 2 4500 0 0 10.6 1.9 768 1790 0
## 7 2 4500 0 1 11.3 2.6 1790 2151 0
## 8 2 4500 0 1 11.5 3.6 2151 2515 0
## 9 2 4500 0 1 11.5 4.2 2515 2882 0
## 10 2 4500 0 1 11.5 3.6 2882 3226 0
Cox with time-dependent covariate
coxph(Surv(tstart,tstop,status!=1)~ascites+protime+bili,data=dat2)
## Call:
## coxph(formula = Surv(tstart, tstop, status != 1) ~ ascites +
## protime + bili, data = dat2)
##
## coef exp(coef) se(coef) z p
## ascites 0.06930 1.07176 0.09096 0.76 0.4461
## protime 0.08015 1.08345 0.01523 5.26 1.4e-07
## bili 0.01460 1.01471 0.00536 2.72 0.0065
##
## Likelihood ratio test=41.7 on 3 df, p=4.73e-09
## n= 1807, number of events= 1701