Making the Match

Creating the SQL Query

Study question

Do composite morbidity and mortality outcomes significantly differ between open distal pancreatectomy and minimally invasive distal pancreatectomy?

Hypotheses

\(H_0\): There is no significant difference in composite morbidity, mortality, and post-operative fistula rates.

\(H_a\): The rates of composite morbidity, mortality and fistula rates are significantly different between the two study groups.

Desired study population

The population to which our results are ideally generalizable:

Inpatients undergoing elective, non-emergent open or minimally-invasive distal pancreatectomy without reconstruction and irrespective of indication.

Required CPT codes

Distal pancreatectomy

  • 48140 - Pancreatectomy, distal subtotal, with or without splenectomy; without pancreaticojejunostomy

Splenectomy

  • 38128 - Unlisted laparoscopy procedure, spleen
  • 38100 - Splenectomy; total (separate procedure)
  • 38101 - Splenectomy; partial (separate procedure)
  • 38102 - Splenectomy; total, en bloc for extensive disease, in conjunction with other procedure (List in addition to code for primary procedure)

Inclusion criteria (and relevant NSQIP variables)

Possible explanatory variables (and relevant NSQIP variables)

Our possible explanatory variables are as follows:

Primary outcomes (and relevant NSQIP variables)

Dindo-Clavien classification

Dindo-Clavien classifications provide a convenient way to create composite outcomes. These are created from the NSQIP variables via the following categorizations:

  • 1: Superficial SSI (supinfec), deep SSI (wndinfd), wound dehiscence (dehis), renal insufficiency (renainsf)
  • 2: Organ-space SSI (orgspcssi), post-operative pneumonia (oupneumo), urine infection (urninfec), pulmonary embolism (pulembol), post-operative bleeding (othbleed), deep venous thrombosis (othdvt), systemic sepsis (othsysep), C. difficile infection (othcdiff)
  • 3: Return to the OR (returnor), any reoperation within 30 days (reoperation)
  • 4: Reintubation (reintub), failure to wean off the ventilator for 48 hours post-operatively (failwean), renal failure requiring dialysis (oprenafl), CNS CVA (cnscva), cardiac arrest (cdarrest), myocardial infarction (cdmi), septic shock (othseshock), coma (cnscoma), new neurological deficit (neurodef)
  • 5: Death (dopertod)

In addition, I will use targeted data regarding post-operative fistulas to help assign to a Dindo-Clavien classification:

  • 2: If fistula intervention (pan_fistula_intervention) is listed as NPO-TPN, Spontaneous wound drainage, or Drain continued >7 days.
  • 3: If fistula intervention (pan_fistula_intervention) is listed as Percutaneous drainage or Reoperation.

The patient is categorized into the highest category for which they qualify.

The Dindo-Clavien classifications can be further consolidated:

  • None - grade 0
  • Minor - grade 1 or 2
  • Major - grade 3 or 4
  • Death - grade 5

We will create a composite outcome called morbidity that is a Boolean variable:

\[f(x) = \begin{cases} 1, & \text{if major outcome}.\\ 0, & \text{otherwise}. \end{cases} \]

mortality will be an independently assessed Boolean variable:

\[g(x) = \begin{cases} 1, & \text{if death}.\\ 0, & \text{otherwise}. \end{cases} \]

Post-operative fistula

Post-operative fistula (pan_fistula) will be defined as any patient in the NSQIP database with one of the following:

  • pan_fistula_type listed as Grade B POPF or Grade C POPF.
  • pan_fistula_intervention listed as Percutaneous drainage, NPO-TPN, Reoperation, or Spontaneous wound drainage.
  • ddrainremoval (days until drain removal) recorded as greater than 21 days.

Secondary outcomes (and relevant NSQIP variables)

Capturing relevant data from database

An SQL query to retrieve all of the above information from the NSQIP database can be structured as follows:

This query results in a total of 6239 patients.

Cleaning the data

Removing patients with inadequate data

There are some variables for which we will not accept a missing value. These are age, height, weight, and pan_approach.

Removing these patients results in a total of 6217 patients remaining.

Applying inclusion criteria

We can now apply the inclusion criteria above. First, I will create a helper function called or_na. There are a few variables for which we make the assumption that missing data is equivalent to whichever condition meets the inclusion criteria. This is an assumption made to keep up our sample size but is a potential point of criticism, so be aware of which variables are subject to this assumption. These variables are:

I can show this works:

After applying these inclusion criteria, we are left with 4414 patients.

Removing records with discrepancies in the data

Some records have discordant data. These should be removed.

Patients with a benign histology and a T or N stage

After removing these discrepancies from the data, we are left with 3964 patients.

Re-formatting explanatory variables

Some of the explanatory variables will need to be either created or reformatted. Again, for many of these variables, I am assuming that missing data is equivalent to “False”. This is an assumption that is potentially open to criticism but is made to keep the sample size up. These variables are:

Creating primary outcome variables

Fistula

First I will build a function to check if a patient meets criteria for a “positive” fistula outcome:

We can see this appropriately assigns the correct outcome:

unmatch_df %>% 
  filter((ddrainremoval > 21 & ddrainremoval < 25) | (ddrainremoval < 21 & ddrainremoval > 17)) %>%
  mutate(pan_fistula = check_fistula(pan_fistula_type, pan_fistula_intervention, ddrainremoval)) %>%
  select(pan_fistula_type, pan_fistula_intervention, ddrainremoval, pan_fistula) %>%
  distinct() %>% 
  arrange(pan_fistula, ddrainremoval) %>%
  print(n = Inf)
#> # A tibble: 37 x 4
#>    pan_fistula_type      pan_fistula_intervention   ddrainremoval pan_fistula
#>    <chr>                 <chr>                              <int> <lgl>      
#>  1 Persistent drainage   Drain continued >7 days               18 FALSE      
#>  2 <NA>                  <NA>                                  18 FALSE      
#>  3 Clinical diagnosis    Drain continued >7 days               18 FALSE      
#>  4 Persistent drainage   Drain continued >7 days               19 FALSE      
#>  5 <NA>                  <NA>                                  19 FALSE      
#>  6 Clinical diagnosis    Drain continued >7 days               19 FALSE      
#>  7 Biochemical leak only <NA>                                  19 FALSE      
#>  8 <NA>                  <NA>                                  20 FALSE      
#>  9 Persistent drainage   Drain continued >7 days               20 FALSE      
#> 10 Biochemical leak only <NA>                                  20 FALSE      
#> 11 Clinical diagnosis    Drain continued >7 days               20 FALSE      
#> 12 Clinical diagnosis    Percutaneous drainage                 18 TRUE       
#> 13 Clinical diagnosis    Spontaneous wound drainage            18 TRUE       
#> 14 Persistent drainage   Percutaneous drainage                 18 TRUE       
#> 15 Clinical diagnosis    Percutaneous drainage                 19 TRUE       
#> 16 Persistent drainage   Percutaneous drainage                 19 TRUE       
#> 17 Grade B POPF          <NA>                                  19 TRUE       
#> 18 Clinical diagnosis    Percutaneous drainage                 20 TRUE       
#> 19 Grade B POPF          <NA>                                  20 TRUE       
#> 20 Persistent drainage   Drain continued >7 days               22 TRUE       
#> 21 <NA>                  <NA>                                  22 TRUE       
#> 22 Grade B POPF          <NA>                                  22 TRUE       
#> 23 Clinical diagnosis    Drain continued >7 days               22 TRUE       
#> 24 Clinical diagnosis    Percutaneous drainage                 22 TRUE       
#> 25 Biochemical leak only <NA>                                  22 TRUE       
#> 26 Persistent drainage   Percutaneous drainage                 22 TRUE       
#> 27 Clinical diagnosis    Drain continued >7 days               23 TRUE       
#> 28 Persistent drainage   Drain continued >7 days               23 TRUE       
#> 29 <NA>                  <NA>                                  23 TRUE       
#> 30 Grade C POPF          <NA>                                  23 TRUE       
#> 31 Clinical diagnosis    Spontaneous wound drainage            23 TRUE       
#> 32 <NA>                  <NA>                                  24 TRUE       
#> 33 Grade B POPF          <NA>                                  24 TRUE       
#> 34 Persistent drainage   Drain continued >7 days               24 TRUE       
#> 35 Clinical diagnosis    Drain continued >7 days               24 TRUE       
#> 36 Clinical diagnosis    Reoperation                           24 TRUE       
#> 37 Clinical diagnosis    Percutaneous drainage                 24 TRUE

Now we will create the variable:

Dindo-Clavien outcomes

In order to assign the Dindo-Clavien outcomes, I have to work in fistula outcomes into my pre-existing function. I will use the following criteria to assign fistula outcomes a Dindo-Clavien classification:

  • 2: If fistula intervention (pan_fistula_intervention) is listed as NPO-TPN, Spontaneous wound drainage, or Drain continued >7 days. Alternatively, if pan_fistula is TRUE because ddrainremoval > 21 and no intervention is listed.
  • 3: If fistula intervention (pan_fistula_intervention) is listed as Percutaneous drainage or Reoperation.

We can check this assigns the correct classification:

We can now create the variable. In addition, we will account the remainder of the composite outcomes and select the highest category Dindo-Clavien outcome. Be aware, for all of the composite outcomes included in the Dindo-Clavien classification, the function below assumes a missing value is equivalent to FALSE.

Finally, we can collapse the Dindo-Clavien complications into three levels - “None”, “Major” and “Death”:

Creating secondary outcome variables

Only minor adjustments need to be made for secondary outcomes. Again, the primary assumption is that a missing value is equivalent to FALSE.

Selecting for only complete cases

A match requires that all records have no missing fields. First, we must select for only those variables we care about in the match or the post-match analysis and then select for only those complete cases:

After selecting for complete cases, we are left with 3940 patients.

A summary of our pre-match data looks like:

#>      caseid          group            sex               age       
#>  Min.   :3113861   Mode :logical   Mode :logical   Min.   :18.00  
#>  1st Qu.:4062299   FALSE:1962      FALSE:2217      1st Qu.:53.00  
#>  Median :6436328   TRUE :1978      TRUE :1723      Median :64.00  
#>  Mean   :6119729                                   Mean   :61.57  
#>  3rd Qu.:7791588                                   3rd Qu.:72.00  
#>  Max.   :9284181                                   Max.   :90.00  
#>       bmi         diabetes         smoke               fnstatus2   
#>  Min.   :13.62   Mode :logical   Mode :logical   Independent:3919  
#>  1st Qu.:24.37   FALSE:2947      FALSE:3263      Dependent  :  21  
#>  Median :28.03   TRUE :993       TRUE :677                         
#>  Mean   :28.84                                                     
#>  3rd Qu.:32.28                                                     
#>  Max.   :66.75                                                     
#>    hxcopd         ascites          hxchf          hypermed      
#>  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
#>  FALSE:3787      FALSE:3936      FALSE:3926      FALSE:1893     
#>  TRUE :153       TRUE :4         TRUE :14        TRUE :2047     
#>                                                                 
#>                                                                 
#>                                                                 
#>   dialysis        steroid          wtloss         bleeddis       asaclas 
#>  Mode :logical   Mode :logical   Mode :logical   Mode :logical   1:  45  
#>  FALSE:3919      FALSE:3807      FALSE:3685      FALSE:3833      2:1230  
#>  TRUE :21        TRUE :133       TRUE :255       TRUE :107       3:2505  
#>                                                                  4: 160  
#>                                                                          
#>                                                                          
#>  pan_chemo       pan_radio       pan_drains     pan_benign_histologic
#>  Mode :logical   Mode :logical   Mode:logical   Mode :logical        
#>  FALSE:3485      FALSE:3708      TRUE:3940      FALSE:2279           
#>  TRUE :455       TRUE :232                      TRUE :1661           
#>                                                                      
#>                                                                      
#>                                                                      
#>  pan_tstage      pan_nstage       supinfec        wndinfd       
#>  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
#>  FALSE:1793      FALSE:3183      FALSE:3838      FALSE:3914     
#>  TRUE :2147      TRUE :757       TRUE :102       TRUE :26       
#>                                                                 
#>                                                                 
#>                                                                 
#>  orgspcssi         othdvt         pulembol        reintub       
#>  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
#>  FALSE:3513      FALSE:3845      FALSE:3878      FALSE:3876     
#>  TRUE :427       TRUE :95        TRUE :62        TRUE :64       
#>                                                                 
#>                                                                 
#>                                                                 
#>   failwean        renainsf        oprenafl       pan_fistula    
#>  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
#>  FALSE:3883      FALSE:3931      FALSE:3925      FALSE:3468     
#>  TRUE :57        TRUE :9         TRUE :15        TRUE :472      
#>                                                                 
#>                                                                 
#>                                                                 
#>      dindo        dindo_class  dindo_major     dindo_death        log_amy      
#>  Min.   :0.0000   None :3553   Mode :logical   Mode :logical   Min.   : 0.000  
#>  1st Qu.:0.0000   Major: 363   FALSE:3577      FALSE:3916      1st Qu.: 4.443  
#>  Median :0.0000   Death:  24   TRUE :363       TRUE :24        Median : 6.377  
#>  Mean   :0.8056                                                Mean   : 6.347  
#>  3rd Qu.:2.0000                                                3rd Qu.: 8.154  
#>  Max.   :5.0000                                                Max.   :11.513  
#>       pod        reoperation     readmission     pan_delgastric 
#>  Min.   : 0.00   Mode :logical   Mode :logical   Mode :logical  
#>  1st Qu.: 1.00   FALSE:3825      FALSE:3236      FALSE:3788     
#>  Median : 3.00   TRUE :115       TRUE :704       TRUE :152      
#>  Mean   : 3.92                                                  
#>  3rd Qu.: 4.00                                                  
#>  Max.   :30.00                                                  
#>     tothlos       wndclas  pan_spleen     
#>  Min.   : 0.000   1: 455   Mode :logical  
#>  1st Qu.: 4.000   2:3208   FALSE:3742     
#>  Median : 5.000   3: 277   TRUE :198      
#>  Mean   : 6.457                           
#>  3rd Qu.: 7.000                           
#>  Max.   :81.000

Making the Match

Pre-weighting balance assessment

Covariate balance assessment can be assessed using the standardized mean difference (SMD). An SMD greater than 0.1 is usually regarded as substantial imbalance.

Matching

A genetic match utilizing Mahalanobis distance is performed. I will match on all the covariates that are greater than 0.1 SMD. A genetic match uses a genetic search algorithm to find a set of weights for each covariate such that the a version of optimal balance is achieved after matching. It uses matching with replacement (groups will likely not be identical in their sample size, but their weighted means should be closely matched).

Post-weighting balance assessment

We can visualize the improvement in SMD with the following graph:

Modeling

Unadjusted models (var ~ group)

outcome OR p
pan_fistula 1.39 [1.06, 1.83] 0.018
dindo_major 0.88 [0.65, 1.17] 0.371
dindo_death 1.40 [0.52, 3.76] 0.501
reoperation 0.82 [0.49, 1.40] 0.471
readmission 1.03 [0.82, 1.29] 0.781
pan_delgastric 0.72 [0.47, 1.11] 0.136
outcome coefficient p
log_amy 0.60 [0.40, 0.81] <0.001
pod -0.74 [-1.15, -0.33] <0.001
tothlos -1.40 [-1.71, -1.10] <0.001

Adjusted models (var ~ group + ...)

outcome OR p
pan_fistula 1.40 [1.06, 1.85] 0.018
dindo_major 0.88 [0.66, 1.18] 0.386
dindo_death 1.58 [0.50, 4.99] 0.432
reoperation 0.84 [0.49, 1.42] 0.512
readmission 1.04 [0.82, 1.30] 0.768
pan_delgastric 0.72 [0.47, 1.11] 0.134
outcome coefficient p
log_amy 0.59 [0.40, 0.79] <0.001
pod -0.73 [-1.14, -0.33] <0.001
tothlos -1.37 [-1.68, -1.07] <0.001