A form of this analysis has been published in the European Journal of Surgical Oncology (https://doi.org/10.1016/j.ejso.2020.06.038; 22/06/2020) The following libraries are required for this analysis
library(naniar) #####missing data analysis
library(mice) #####imputation of missing data
library(VGAM) ####Logistic regression model for PS generation
library(plyr) ####data cleaning
library(dplyr) ####data cleaning
library(tableone) ####comparison tables
library(survey) ####weighting of tables
library(jskm) ####weighted Kaplan-Meier charts
library(gbm) ####arrangement of charts
library(plotly) ####advanced plotting
kableone <- function(x, ...) {
capture.output(x <- print(x))
knitr::kable(x, ...)
} ####makes nice tables with tableone
Having loaded in a dataframe, as we know this data contains missing values, we first visualise these. There are different aspects that need to be considered here. First, the absolute/proportion of missing data in each variable, which is shown in the first plot. Any variable that contains more than about 30% of missing data should probably be discarded as imputation becomes unreliable.
gg_miss_var(data,show_pct = TRUE)
Figure 1: Missing Data plot, percentage of data points missing for each variable.
Here we can see an overall very small amount of missing data, with only Overall and Disease Free survival >1%. The second aspect for consideration is the pattern of missingness in the data. In chapter 2 there is a discussion about types of missing data. If data is missing completely at random (MCAR), then it is reasonable to simply exclude the cases with missing data (although this will reduce the power of the study). In reality it is rare for this to be the case with clinical datasets, as can be seen below. Formal quanitificaiton of this could be performed with Little’s MCAR test (a chi-square variant). Unsurprisingly, patients who have OS missing, also have DFS missing, so this data is obviously not MCAR. The data is therefore either missing at random (MAR) or missing not at random (MNAR). We have no reason to think that missing data points are related to the outcomes in question, so it is reasonable to treat this data as MAR and proceed with imputation. Imputation is performed by the MICE function, here with 5 imputation sets and 10 iterations. For this study we then extract the individual datasets using the complete function .
mice<-mice(data,m=5,maxit=10)
complete1<-mice::complete(mice,1)
complete2<-mice::complete(mice,2)
complete3<-mice::complete(mice,3)
complete4<-mice::complete(mice,4)
complete5<-mice::complete(mice,5)
covariates<-colnames(data)[c(2,3,4,5,6,8,9)]####Choose which covariates to include in the PS score
The propensity score is the probability of being assigned to a treatment dependent on a set of specified covariates. It is therefore equal to the probability output of a logistic regression model with the treatment choice (in this case preoperative treatment) as the dependent variable and a list of covariates as the independent variable. We first specify a list of variables to match on pragmatically. We then derive the logistic regression model and add the propensity score to the existing data frames. The below code was modified from that provided by Yoshida et al.(doi: 10.1097/EDE.0000000000000627), and was chosen as it can be easily extended to three or more groups if necessary. This is repeated for each of the 5 imputations sets.
###Pre.Operative.Treatment should be changed to the response variable required. Covariates applies the selected variables to generate the propenstiy score.
AddGPS <- function(data,
formula = as.formula(paste('Pre.Operative.Treatment','~',paste(covariates,collapse='+'))),
psPrefix = "PS_",
family = multinomial(parallel = FALSE)) {
## Fit multinomial logistic regression
resVglm <- vglm(formula = formula,
data = data,
family = family)
## Calculate PS
psData <- as.data.frame(predict(resVglm, type = "response"))
names(psData) <- paste0(psPrefix, names(psData))
## Add to data
cbind(data, psData)
}
complete1a<-AddGPS(complete1)
complete2a<-AddGPS(complete2)
complete3a<-AddGPS(complete3)
complete4a<-AddGPS(complete4)
complete5a<-AddGPS(complete5)
Having added the propensity score, we can then calculate the inverse probability of treatment weight (IPTW). This was again modified from Yoshida et al.
###Add response variable levels in place of ('Chemo','CRT')
AddIPTW <- function(data, txVar = "Tr", tx = c('Chemo','CRT'), psPrefix = "PS_") {
## Treatment indicator data frame (any number of groups allowed)
dfAssign <- as.data.frame(lapply(tx, function(tx_k) {
as.numeric(data[txVar] == tx_k)
}))
colnames(dfAssign) <- paste0(txVar, tx)
psVars <- paste0(psPrefix,tx)
data$PSassign <- rowSums(data[psVars] * dfAssign)
data$iptw <- exp(- log(data$PSassign))
data
}
complete1b<-AddIPTW(data=complete1a,txVar='Pre.Operative.Treatment')
complete2b<-AddIPTW(data=complete2a,txVar='Pre.Operative.Treatment')
complete3b<-AddIPTW(data=complete3a,txVar='Pre.Operative.Treatment')
complete4b<-AddIPTW(data=complete4a,txVar='Pre.Operative.Treatment')
complete5b<-AddIPTW(data=complete5a,txVar='Pre.Operative.Treatment')
These values can then be combined by taking the arithmetic mean (this is the across method of analysis of multiply imputed propensity scores). The stabilised IPTW is calculated by normalising the IPTW to the number of cases of each treatment.
averagePS<-as.data.frame(complete1b$iptw)
names(averagePS)<-'iptw'
for (i in 1:nrow(averagePS)){
averagePS$iptw[i]<-mean(complete1b$iptw[i],complete2b$iptw[i],complete3b$iptw[i],complete4b$iptw[i],complete5b$iptw[i])
}
dataPS<-cbind(data,averagePS)
stabilisedIPTW<-ifelse(dataPS$Pre.Operative.Treatment=='Chemo',
dataPS$iptw*(plyr::count(dataPS$Pre.Operative.Treatment)[1,2]/
nrow(dataPS)),
ifelse(dataPS$Pre.Operative.Treatment=='CRT',
dataPS$iptw*(plyr::count(dataPS$Pre.Operative.Treatment)[2,2]/
nrow(dataPS)),NA))
dataPS$sw<-stabilisedIPTW
The range of weights generated is next assessed. As can be seen the naive IPTW has wide range of weights, with a large mean weight which if applied to a dataset will result in incorrect estimates of effect and biased confidence intervals. The stabilised IPTW eliminates this bias.
summary(dataPS$iptw)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.306 1.411 2.015 2.150 15.863
summary(dataPS$sw)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4343 0.7050 0.7835 1.0114 1.2217 9.4398
Covariate balance is then assessed using the standardised mean difference (SMD). This is most easily seen with a plot of all variables, but also as a global mean. A SMD of <0.1 is said to indicate a lack of imbalance between variables. Code for this plot was also modified from Yoshida et al. Comparison using standard methods (with or without weighting) is also possible, but is less favoured.
Unadjusted <- CreateTableOne( data = data,vars = covariates, strata = 'Pre.Operative.Treatment')
iptwsvy <- svydesign(ids = ~ 1, data = dataPS, weights = ~ sw) #####weighted analysis
iptw <- svyCreateTableOne(vars = covariates, strata = 'Pre.Operative.Treatment', data = iptwsvy)
ExtractSmd(Unadjusted)
## 1 vs 2
## Gender 0.25002428
## Age 0.07325106
## ASA 0.22811825
## cT.Stage 0.11475789
## cN.Stage 0.03656357
## Tumour.Histology 0.58403008
## Tumour.Location 0.50430369
ExtractSmd(iptw)
## 1 vs 2
## Gender 0.09264395
## Age 0.06033189
## ASA 0.01885207
## cT.Stage 0.08398933
## cN.Stage 0.07071395
## Tumour.Histology 0.06585594
## Tumour.Location 0.04787361
{
dataPlot <- data.frame(variable = rownames(ExtractSmd(Unadjusted)),
Unadjusted = ExtractSmd(Unadjusted),
Weighted = ExtractSmd(iptw))
names(dataPlot)<-c("variable",'Unadjusted','Weighted')
library(reshape2)
dataPlotMelt <- melt(data = dataPlot,
id.vars = "variable",
variable.name = "method",
value.name = "SMD")
varsOrderedBySmd <- rownames(dataPlot)[order(dataPlot[,"Unadjusted"])]
dataPlotMelt$variable <- factor(dataPlotMelt$variable,
levels = varsOrderedBySmd)
dataPlotMelt$method <- factor(dataPlotMelt$method,
levels = c("Weighted","Unadjusted"))
dataPlotMelt$method
library(ggplot2)
ggplotly(ggplot(data = dataPlotMelt, mapping = aes(x = variable, y = SMD, group = method, linetype = method)) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0, size = 0.3) +
geom_hline(yintercept = 0.1, size = 0.1) +
coord_flip() +
theme_bw() + theme(legend.key = element_blank())+
ggtitle(''))
}
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Figure 2: Standardised Mean Difference of Confounding variables, before and after Inverse Probability of Treatment Weighting. SMD<0.1 indicates good balance.
We can see here that there is an improvement of balance with this approach. Before balancing, 6/7 variables have a SMD of >0.1, and after balancing none do. In comparison below, there are no significant differences between variables after balancing.
kableone(CreateTableOne( data = data,vars = covariates, strata = 'Pre.Operative.Treatment'))
| Chemo | CRT | p | test | |
|---|---|---|---|---|
| n | 169 | 115 | ||
| Gender = M (%) | 142 (84.0) | 85 (73.9) | 0.053 | |
| Age (mean (SD)) | 63.79 (8.85) | 64.44 (8.92) | 0.545 | |
| ASA (%) | 0.179 | |||
| 1 | 11 ( 6.5) | 9 ( 7.8) | ||
| 2 | 106 (62.7) | 82 (71.3) | ||
| 3 | 52 (30.8) | 24 (20.9) | ||
| cT.Stage (%) | 0.857 | |||
| 0 | 1 ( 0.6) | 0 ( 0.0) | ||
| 2 | 30 (17.8) | 22 (19.1) | ||
| 3 | 121 (71.6) | 82 (71.3) | ||
| 4 | 17 (10.1) | 11 ( 9.6) | ||
| cN.Stage (%) | 0.956 | |||
| 0 | 35 (20.7) | 22 (19.3) | ||
| 1 | 117 (69.2) | 80 (70.2) | ||
| 2 | 17 (10.1) | 12 (10.5) | ||
| Tumour.Histology = SCC (%) | 14 ( 8.3) | 35 (30.4) | <0.001 | |
| Tumour.Location = GOJ (%) | 136 (80.5) | 66 (57.9) | <0.001 |
kableone(svyCreateTableOne(vars = covariates, strata = 'Pre.Operative.Treatment', data = iptwsvy))
| Chemo | CRT | p | test | |
|---|---|---|---|---|
| n | 173.7 | 113.5 | ||
| Gender = M (%) | 134.8 (77.6) | 92.3 (81.3) | 0.554 | |
| Age (mean (SD)) | 64.80 (8.95) | 64.26 (8.86) | 0.686 | |
| ASA (%) | 0.987 | |||
| 1 | 10.0 ( 5.8) | 6.3 ( 5.5) | ||
| 2 | 118.7 (68.3) | 77.0 (67.8) | ||
| 3 | 45.0 (25.9) | 30.3 (26.7) | ||
| cT.Stage (%) | 0.927 | |||
| 0 | 0.6 ( 0.3) | 0.0 ( 0.0) | ||
| 2 | 28.5 (16.4) | 19.2 (16.9) | ||
| 3 | 129.8 (74.7) | 84.6 (74.5) | ||
| 4 | 14.9 ( 8.6) | 9.7 ( 8.5) | ||
| cN.Stage (%) | 0.879 | |||
| 0 | 33.5 (19.3) | 22.4 (19.8) | ||
| 1 | 121.8 (70.1) | 76.2 (67.5) | ||
| 2 | 18.4 (10.6) | 14.3 (12.7) | ||
| Tumour.Histology = SCC (%) | 34.4 (19.8) | 19.6 (17.2) | 0.681 | |
| Tumour.Location = GOJ (%) | 119.9 (69.0) | 80.3 (71.2) | 0.744 |
Next, the desired outcomes can be compared. This can be conducted in one of two ways. Firstly a straightforward comparison by statistical hypothesis testing (with/without weighting). This has the benefit of being easy to understand and familiar to clinicians. We first define a list of outcomes for comparison.
complicationvariables<-colnames(dataPS)[c(11:19)]
pathologicalvariables<-colnames(dataPS)[c(24:31)]
unadjustedcomplications <- CreateTableOne(vars = complicationvariables, strata = 'Pre.Operative.Treatment', data = data)
iptwcomplications <- svyCreateTableOne(vars = complicationvariables, strata = 'Pre.Operative.Treatment', data = iptwsvy)
kableone(unadjustedcomplications <- CreateTableOne(vars = complicationvariables, strata = 'Pre.Operative.Treatment', data = data))
| Chemo | CRT | p | test | |
|---|---|---|---|---|
| n | 169 | 115 | ||
| Complications = YES (%) | 108 (63.9) | 73 (63.5) | 1.000 | |
| Anastomotic.Leak = Y (%) | 10 ( 6.0) | 8 ( 7.0) | 0.927 | |
| Anypulmonary = YES (%) | 66 (39.3) | 36 (31.3) | 0.212 | |
| Chyle.Leak = Y (%) | 8 ( 4.7) | 12 (10.4) | 0.108 | |
| Length.Of.Stay (mean (SD)) | 14.67 (12.94) | 13.67 (13.34) | 0.530 | |
| returntotheatre = 2 (%) | 16 ( 9.5) | 9 ( 7.8) | 0.779 | |
| Change.in.Level.of.care = 1 (%) | 22 (13.1) | 17 (14.8) | 0.819 | |
| majcomp = 1 (%) | 41 (24.3) | 28 (24.3) | 1.000 | |
| IHM = 1 (%) | 1 ( 0.6) | 2 ( 1.7) | 0.736 |
kableone(iptwcomplications <- svyCreateTableOne(vars = complicationvariables, strata = 'Pre.Operative.Treatment', data = iptwsvy))
| Chemo | CRT | p | test | |
|---|---|---|---|---|
| n | 173.7 | 113.5 | ||
| Complications = YES (%) | 106.5 (61.3) | 73.1 (64.4) | 0.670 | |
| Anastomotic.Leak = Y (%) | 12.6 ( 7.3) | 7.3 ( 6.4) | 0.800 | |
| Anypulmonary = YES (%) | 65.9 (38.3) | 39.2 (34.5) | 0.589 | |
| Chyle.Leak = Y (%) | 7.0 ( 4.0) | 13.1 (11.5) | 0.026 | |
| Length.Of.Stay (mean (SD)) | 16.19 (15.52) | 13.70 (13.32) | 0.244 | |
| returntotheatre = 2 (%) | 16.3 ( 9.4) | 9.8 ( 8.7) | 0.850 | |
| Change.in.Level.of.care = 1 (%) | 23.7 (13.8) | 16.7 (14.7) | 0.855 | |
| majcomp = 1 (%) | 41.7 (24.0) | 30.2 (26.6) | 0.674 | |
| IHM = 1 (%) | 2.2 ( 1.3) | 1.2 ( 1.1) | 0.898 |
Note the number of patients in the adjusted sample is not quite the same (or whole numbers). In this sample, balancing made no real difference in complications between the two groups. We then repeat the process for pathological variables.
unadjustedpathology <- CreateTableOne(vars = pathologicalvariables, strata = 'Pre.Operative.Treatment', data = data)
iptwpathology <- svyCreateTableOne(vars = pathologicalvariables, strata = 'Pre.Operative.Treatment', data = iptwsvy)
kableone(unadjustedpathology <- CreateTableOne(vars = pathologicalvariables, strata = 'Pre.Operative.Treatment', data = data))
| Chemo | CRT | p | test | |
|---|---|---|---|---|
| n | 169 | 115 | ||
| Recurrence.Pattern (%) | 0.057 | |||
| Both | 13 ( 7.7) | 6 ( 5.2) | ||
| Local | 13 ( 7.7) | 4 ( 3.5) | ||
| None | 92 (54.8) | 81 (70.4) | ||
| Systemic | 50 (29.8) | 24 (20.9) | ||
| pT.Stage (%) | <0.001 | |||
| 0 | 14 ( 8.3) | 35 (30.4) | ||
| 1 | 40 (23.8) | 19 (16.5) | ||
| 2 | 24 (14.3) | 17 (14.8) | ||
| 3 | 88 (52.4) | 42 (36.5) | ||
| 4 | 2 ( 1.2) | 2 ( 1.7) | ||
| pN.Stage (%) | 0.007 | |||
| 0 | 86 (51.2) | 69 (60.0) | ||
| 1 | 32 (19.0) | 29 (25.2) | ||
| 2 | 30 (17.9) | 15 (13.0) | ||
| 3 | 20 (11.9) | 2 ( 1.7) | ||
| pM.Stage = 1 (%) | 5 ( 3.0) | 6 ( 5.2) | 0.519 | |
| Response.to.chemo (%) | <0.001 | |||
| 1 | 18 (10.8) | 39 (33.9) | ||
| 2 | 27 (16.3) | 31 (27.0) | ||
| 3 | 20 (12.0) | 25 (21.7) | ||
| 4 | 48 (28.9) | 17 (14.8) | ||
| 5 | 49 (29.5) | 3 ( 2.6) | ||
| N/A | 4 ( 2.4) | 0 ( 0.0) | ||
| Vascular.Invasion = 1 (%) | 51 (30.4) | 20 (17.4) | 0.020 | |
| Resection.Margin (%) | 0.005 | |||
| 0 | 132 (79.0) | 107 (93.0) | ||
| 1 | 34 (20.4) | 8 ( 7.0) | ||
| 2 | 1 ( 0.6) | 0 ( 0.0) | ||
| pCRM = 1 (%) | 13 ( 7.7) | 27 (23.5) | <0.001 |
kableone(iptwpathology <- svyCreateTableOne(vars = pathologicalvariables, strata = 'Pre.Operative.Treatment', data = iptwsvy))
| Chemo | CRT | p | test | |
|---|---|---|---|---|
| n | 173.7 | 113.5 | ||
| Recurrence.Pattern (%) | 0.131 | |||
| Both | 12.3 ( 7.2) | 6.2 ( 5.5) | ||
| Local | 24.2 (14.0) | 5.7 ( 5.1) | ||
| None | 87.5 (50.8) | 75.6 (66.6) | ||
| Systemic | 48.2 (28.0) | 25.9 (22.8) | ||
| pT.Stage (%) | <0.001 | |||
| 0 | 12.1 ( 7.0) | 31.5 (27.7) | ||
| 1 | 38.2 (22.2) | 18.1 (16.0) | ||
| 2 | 22.7 (13.2) | 20.2 (17.8) | ||
| 3 | 97.6 (56.7) | 42.6 (37.5) | ||
| 4 | 1.5 ( 0.9) | 1.1 ( 1.0) | ||
| pN.Stage (%) | 0.055 | |||
| 0 | 83.6 (48.6) | 65.9 (58.0) | ||
| 1 | 35.4 (20.5) | 27.7 (24.4) | ||
| 2 | 35.4 (20.6) | 18.5 (16.3) | ||
| 3 | 17.8 (10.4) | 1.5 ( 1.3) | ||
| pM.Stage = 1 (%) | 5.3 ( 3.1) | 6.5 ( 5.7) | 0.326 | |
| Response.to.chemo (%) | <0.001 | |||
| 1 | 15.6 ( 9.3) | 33.2 (29.2) | ||
| 2 | 23.9 (14.2) | 33.9 (29.9) | ||
| 3 | 19.5 (11.6) | 24.7 (21.7) | ||
| 4 | 53.8 (32.1) | 18.6 (16.4) | ||
| 5 | 51.1 (30.4) | 3.2 ( 2.8) | ||
| N/A | 4.0 ( 2.4) | 0.0 ( 0.0) | ||
| Vascular.Invasion = 1 (%) | 56.2 (32.7) | 21.7 (19.1) | 0.047 | |
| Resection.Margin (%) | 0.010 | |||
| 0 | 131.0 (76.3) | 104.9 (92.4) | ||
| 1 | 39.8 (23.2) | 8.7 ( 7.6) | ||
| 2 | 0.8 ( 0.5) | 0.0 ( 0.0) | ||
| pCRM = 1 (%) | 11.3 ( 6.6) | 23.8 (21.0) | 0.001 |
Again, similar values are seen, although there is no longer a difference in recurrence pattern or pN stage after adjustment for confounding. Next we can assess overall survival and disease free survival.
dataOS<-subset(dataPS,dataPS$OS!='NA')
unadjustedOS<-survfit(Surv(dataOS$OS, as.numeric(dataOS$Death)) ~ dataOS$Pre.Operative.Treatment, data=dataOS)
iptwsvyOS <- svydesign(ids = ~ 1, data = dataOS, weights = ~ sw)
iptwOS<-svykm(Surv(OS,Death)~Pre.Operative.Treatment,design=iptwsvyOS,data=dataOS)
a<-jskm(unadjustedOS, timeby = 12, ystratalabs=c('NACT',"NACRT"),
ystrataname = "Neoadjuvant Treatment", table = TRUE, ci=FALSE, pval=FALSE,
xlabs="Months after Surgery", main ="",
dashed=FALSE,marks=FALSE,xlims=c(0,60),legendposition=c(0.85,1))
## Warning: Removed 18 row(s) containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_text).
Figure 3: Unweighted Overall Survival
b<-svyjskm(iptwOS, timeby = 12, ystratalabs=c('NACT',"NACRT"),
ystrataname = "Neoadjuvant Treatment", table = TRUE, ci=FALSE, pval=FALSE,
xlabs="Months after Surgery", main ="",
dashed=FALSE,marks=TRUE,xlims=c(0,60),legendposition=c(0.85,1))
## Warning: Removed 18 row(s) containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_text).
Figure 4: Weighted Overall Survival
dataDFS<-subset(dataPS,dataPS$DFS!='NA')
unadjustedDFS<-survfit(Surv(dataDFS$DFS, as.numeric(dataDFS$Recurrence)) ~ dataOS$Pre.Operative.Treatment, data=dataDFS)
iptwsvyDFS <- svydesign(ids = ~ 1, data = dataDFS, weights = ~ sw)
iptwDFS<-svykm(Surv(DFS,Recurrence)~Pre.Operative.Treatment,design=iptwsvyDFS,data=dataDFS)
c<-jskm(unadjustedDFS, timeby = 12, ystratalabs=c('NACT',"NACRT"),
ystrataname = "Neoadjuvant Treatment", table = TRUE, ci=FALSE, pval=FALSE,
xlabs="Months after Surgery", main ="",
dashed=FALSE,marks=FALSE,xlims=c(0,60),legendposition=c(0.85,1))
## Warning: Removed 19 row(s) containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_text).
Figure 5: Unweighted Disease Free Survival
d<-svyjskm(iptwDFS, timeby = 12, ystratalabs=c('NACT',"NACRT"),
ystrataname = "Neoadjuvant Treatment", table = TRUE, ci=FALSE, pval=FALSE,
xlabs="Months after Surgery", main ="",
dashed=FALSE,marks=TRUE,xlims=c(0,60),legendposition=c(0.85,1))
## Warning: Removed 19 row(s) containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_text).
Figure 6: Weighted Disease Free Survival
grid.arrange(a,b,c,d,ncol=2)
Figure 7: Combined Plot
Importantly, a statistically significant increase in overall survival in the unadjusted cohort is eliminated by adjusting for confounders. In this cohort, the increase in survival with NACRT is seen exclusively with patients who have squamous cell carcinoma (SCC), in line with the published literature. This highlights the difficulty in assessing mixed groups of oesophageal cancer patients, but also that the propensity score balancing can address this problem.
Comparison between groups can also be made by deriving weighted logistic regression models (or cox proportional hazard models for survival outcomes) with the outcome (e.g. anastomotic leak) as the dependent variable and the group on which the propensity score balances (e.g. preoperative treatment) as the only independent/predictor variable. As balancing has eliminated differences in other variables already, there is no need to include them in the model formula. As an example of this below, it is calculated for Anastomotic Leak and for Overall Survival
dataleak<-subset(dataPS,dataPS$Chyle.Leak!='NA')
Svyiptw <- svydesign(ids = ~ 1, data = dataleak, weights = ~ sw)
modelLeak <- svyglm(formula = as.formula(Chyle.Leak ~ Pre.Operative.Treatment), design = Svyiptw,family=quasibinomial())
summary(modelLeak)
##
## Call:
## svyglm(formula = as.formula(Chyle.Leak ~ Pre.Operative.Treatment),
## design = Svyiptw, family = quasibinomial())
##
## Survey design:
## svydesign(ids = ~1, data = dataleak, weights = ~sw)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.1657 0.3812 -8.305 4.22e-15 ***
## Pre.Operative.TreatmentCRT 1.1291 0.5253 2.149 0.0325 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 1.003534)
##
## Number of Fisher Scoring iterations: 6
OR<-round(exp(modelLeak$coefficients)[2],2)
LCI<-round(exp((modelLeak$coefficients)[2]-(1.96*summary(modelLeak)$coefficients[2,2])),2)
UCI<-round(exp((modelLeak$coefficients)[2]+(1.96*summary(modelLeak)$coefficients[2,2])),2)
odds<-paste(OR,LCI,UCI)
odds
## [1] "3.09 1.1 8.66"
We can see here an increased odds ratio of chyle leak with NACRT (whether this is spurious or not) at 3.1 (1.1-8.7).
dataOS<-subset(dataPS,dataPS$OS!='NA')
SvyiptwOS <- svydesign(ids = ~ 1, data = dataOS, weights = ~ sw)
svykmiptw<-svykm(Surv(OS,Death)~Pre.Operative.Treatment,design=SvyiptwOS,data=dataOS)
svykmiptw
## Weighted survival curves:
## svykm(formula = Surv(OS, Death) ~ Pre.Operative.Treatment, design = SvyiptwOS,
## data = dataOS)
## Chemo : Q1 = 18 median = 40 Q3 = Inf
## CRT : Q1 = 20 median = Inf Q3 = Inf
modelcox1 <- svycoxph(formula = as.formula(Surv(OS,Death)~Pre.Operative.Treatment), design = SvyiptwOS)
summary(modelcox1)
## Independent Sampling design (with replacement)
## svydesign(ids = ~1, data = dataOS, weights = ~sw)
## Call:
## svycoxph(formula = as.formula(Surv(OS, Death) ~ Pre.Operative.Treatment),
## design = SvyiptwOS)
##
## n= 277, number of events= 105
##
## coef exp(coef) se(coef) robust se z Pr(>|z|)
## Pre.Operative.TreatmentCRT -0.3055 0.7367 0.2167 0.2537 -1.205 0.228
##
## exp(coef) exp(-coef) lower .95 upper .95
## Pre.Operative.TreatmentCRT 0.7367 1.357 0.4481 1.211
##
## Concordance= 0.525 (se = 0.03 )
## Likelihood ratio test= NA on 1 df, p=NA
## Wald test = 1.45 on 1 df, p=0.2
## Score (logrank) test = NA on 1 df, p=NA
##
## (Note: the likelihood ratio and score tests assume independence of
## observations within a cluster, the Wald and robust score tests do not).
In comparison there is no overall survival advantage to NACRT, with a hazard ratio of survival of 0.74 (0.45-1.21). This method will tend to give similar results to direct comparison, but has the advantage of quantifying the differences between groups. It is probably more appropriate to use in larger cohorts.