Tyler M. Muffly, Nicki Nguyen, Merdith Alston, Jill Liss, Michael Courtois, Georg Kropat, Janet Corral, Christine Raffaelli, J. Eric Jelovsek Authorship order will be determined by the amount of work done. - name: Tyler M. Muffly, MD affiliation: Denver Health - name: Nicki Nguyen, MD affiliation: Denver Health - name: Meredith Alston, MD affiliation: Denver Health - name: Jill Liss, MD affiliation: University of Colorado - name: Michael Courtois, MS affiliation: Denver Health - name: Georg Kropat, PhD affiliation: R & D Services - name: Janet Corall, PhD affiliation: University of Colorado - name: Christine Raffaelli affiliation: University of Colorado - name: J. Eric Jelovsek affiliation: Duke University

Introduction

Obstetrics and gynecology clerkship directors and medical students alike desire reliable prognostic matching information tailored to the individual. One predictive tool is the nomogram, which creates a simple graphical representation of a statistical predictive model that generates a numerical probability of an event.

Objective: We sought to construct and validate a model that predict a medical student’s chances of matching into an obstetrics and gynecology residency.

Materials and Methods

In total, 3904 eligible medical students applied to the CU OBGYN residency during the study period from 2017 to 2019 and were included. The rate of those applicants matching into a US OBGYN residency for all_data is: 65.9% with a 95% confidence interval [CI] of 64.3%-67.3%.

Eligible applicants were included based on the following criteria: applied to the University of Colorado OBGYN residency training in 2017, 2018, or 2019. The exclusion criteria were students without complete ERAS application information. These were decided a priori. A logistic regression model was created and internally validated with bootstrapping, assessed for discrimination by calibration plots and concordance (C) index.

Results

The typical applicant to the University of Colorado Obstetrics and Gynecology residency program is a 28-year-old (IQR 28-66) White female who is a U.S. Senior training at a U.S. public medical school. Eighty-eight percent were trained in an allopathic medical school. Interestingly, fewer students applied to CU OBGYN from an osteopathic school than from an international medical school. This might be a holdover from the less competitive match in OBGYN when IMGs were more likely to get into OBGYN. The majority of students do not need visa sponsorship. About 15% of applicants had interrupted their medical school training either to get another degree or to take time for an illness or another reason. Typically applicants were not members of AOA or of the Gold Humanism Society.

Unsurprisingly the most common research output was a poster. In terms of research the applicants had presented a median of one poster (IQR 0-82), zero (0-19 IQR) oral podium talks, zero (IQR 0-86) published peer-reviewed articles, and zero (IQR 0-14) unpublished articles.

Table 1

all_data_ordered <- all_data %>%
  exploratory::reorder_cols(Age, Self_Identify, Gender, US_or_Canadian_Applicant, Type_of_medical_school, Medical_Degree, Military_Service_Obligation, Visa_Sponsorship_Needed, Medical_Education_or_Training_Interrupted, Alpha_Omega_Alpha, Gold_Humanism_Honor_Society, Couples_Match, Count_of_Oral_Presentation, Count_of_Peer_Reviewed_Journal_Articles_Abstracts, Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published, Count_of_Poster_Presentation,   Count_of_Peer_Reviewed_Book_Chapter, Misdemeanor_Conviction)

Table_1 <- arsenal::tableby(formula = Match_Status ~ .,
                 data=all_data_ordered, 
                 control =arsenal::tableby.control(test = TRUE,
                 total = TRUE,
                  digits = 1L,                                               digits.count = 0L, cat.simplify = FALSE, numeric.simplify = TRUE,numeric.stats = c("median",  "q1q3"), cat.stats =c("Nmiss", "countpct"), stats.labels = list(Nmiss = "N Missing", Nmiss2 ="N Missing", meansd = "Mean (SD)", medianrange = "Median (Range)",median ="Median", medianq1q3 = "Median (Q1, Q3)",q1q3 = "Q1, Q3",iqr = "IQR",range = "Range",countpct = "Count (Pct)", Nevents = "Events", medSurv ="Median Survival",medTime = "Median Follow-Up")))
  
summary(Table_1,
          text=T,
          title = 'Table: Applicant Descriptive Variables by Matched or Did Not Match from 2017 to 2019', 
          #labelTranslations = mylabels, #Seen in additional functions file
          pfootnote=TRUE)

Table: Applicant Descriptive Variables by Matched or Did Not Match from 2017 to 2019
	Not_Matched (N=1333)	Matched (N=2571)	Total (N=3904)	p value
Age				< 0.001¹
- Median	29.0	28.0	28.0
- Q1, Q3	27.0, 32.0	27.0, 29.0	27.0, 30.0
Self_Identify				< 0.001²
- White	626 (47.0%)	1639 (63.7%)	2265 (58.0%)
- Asian	275 (20.6%)	435 (16.9%)	710 (18.2%)
- Black	158 (11.9%)	104 (4.0%)	262 (6.7%)
- Hispanic	116 (8.7%)	227 (8.8%)	343 (8.8%)
- Other	158 (11.9%)	166 (6.5%)	324 (8.3%)
Gender				< 0.001²
- Female	1000 (75.0%)	2182 (84.9%)	3182 (81.5%)
- Male	333 (25.0%)	389 (15.1%)	722 (18.5%)
US_or_Canadian_Applicant				< 0.001²
- Yes	824 (61.8%)	2357 (91.7%)	3181 (81.5%)
- No	509 (38.2%)	214 (8.3%)	723 (18.5%)
Type_of_medical_school				< 0.001²
- IMG	506 (38.0%)	214 (8.3%)	720 (18.4%)
- Osteopathic	239 (17.9%)	194 (7.5%)	433 (11.1%)
- US Private	202 (15.2%)	760 (29.6%)	962 (24.6%)
- US Public	386 (29.0%)	1403 (54.6%)	1789 (45.8%)
Medical_Degree				< 0.001²
- MD	1095 (82.1%)	2377 (92.5%)	3472 (88.9%)
- DO	238 (17.9%)	194 (7.5%)	432 (11.1%)
Military_Service_Obligation				0.915²
- No	1320 (99.0%)	2545 (99.0%)	3865 (99.0%)
- Yes	13 (1.0%)	26 (1.0%)	39 (1.0%)
Visa_Sponsorship_Needed				< 0.001²
- No	1158 (86.9%)	2507 (97.5%)	3665 (93.9%)
- Yes	175 (13.1%)	64 (2.5%)	239 (6.1%)
Medical_Education_or_Training_Interrupted				< 0.001²
- No	1063 (79.7%)	2272 (88.4%)	3335 (85.4%)
- Yes	270 (20.3%)	299 (11.6%)	569 (14.6%)
Alpha_Omega_Alpha				< 0.001²
- No	1307 (98.0%)	2109 (82.0%)	3416 (87.5%)
- Yes	26 (2.0%)	462 (18.0%)	488 (12.5%)
Gold_Humanism_Honor_Society				< 0.001²
- No	1220 (91.5%)	2079 (80.9%)	3299 (84.5%)
- Yes	113 (8.5%)	492 (19.1%)	605 (15.5%)
Couples_Match				< 0.001²
- No	1288 (96.6%)	2261 (87.9%)	3549 (90.9%)
- Yes	45 (3.4%)	310 (12.1%)	355 (9.1%)
Count_of_Oral_Presentation				0.389¹
- Median	0.0	0.0	0.0
- Q1, Q3	0.0, 1.0	0.0, 1.0	0.0, 1.0
Count_of_Peer_Reviewed_Journal_Articles_Abstracts				0.561¹
- Median	0.0	0.0	0.0
- Q1, Q3	0.0, 1.0	0.0, 2.0	0.0, 1.0
Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published				< 0.001¹
- Median	0.0	0.0	0.0
- Q1, Q3	0.0, 0.0	0.0, 1.0	0.0, 0.0
Count_of_Poster_Presentation				< 0.001¹
- Median	1.0	2.0	1.0
- Q1, Q3	0.0, 2.0	0.0, 3.0	0.0, 3.0
Count_of_Peer_Reviewed_Book_Chapter				0.001¹
- Median	0.0	0.0	0.0
- Q1, Q3	0.0, 0.0	0.0, 0.0	0.0, 0.0
Misdemeanor_Conviction				0.888²
- No	1312 (98.4%)	2532 (98.5%)	3844 (98.5%)
- Yes	21 (1.6%)	39 (1.5%)	60 (1.5%)

Linear Model ANOVA
Pearson’s Chi-squared test

table1 <- Table_1 %>% as.data.frame() 

#tm_write2word(tm_arsenal_table_output, "tm_arsenal_table_output1")
#tm_write2pdf(tm_arsenal_table_output, "tm_arsenal_table_output1")

Students who matched at University of Colorado Obstetrics and Gynecology residency were significantly: younger by one year (28 vs. 29 years old, p<0.01), more likely to be White compared to all other race/ethnicity (p<0.01), females were more likely to match than men (p<0.01), US Seniors matched more than international medical graduates (p<0.01), matched applicants were more likely to be allopathic trained (p<0.01), interrupted medical education was less common (p<0.01), more likely to be members of Alpha Omega Alpha (18% matched vs 2% unmatched, p<0.01), and applicants participating in a couples match were more likely to match than those who did not (12% vs. 3%, p<0.01).

It is likely that the model would not predict well if a different population of students applied. For example, the model would not predict well for IMGs or for older medical/fifth pathway/non-traditional students who had careers prior to medical school.

All candidate predictors were used because they were all available for all applicants. A total of 18 features were included in the data.

`all.features` model

We removed three variables called US_or_Canadian_Applicant, Age and Medical_Degree as they were collinear. Each variable was removed based on a variable inflation factor greater than 10. Military service obligations and misdemeanor convictions were removed as expected events were less than 5. These was zero variance or near zero variance variables. Following this process there were 13 predictors for use in the model.

There are 12 variables that are statistically significant within the model. Here we plot the Wald z-statistic from above and the associated p-values. The Wald ANOVA indications especially strong effects for Type_of_medical_school.

The overall performance of these models was evaluated using the Brier score, rescaled to range from 0 to 1 (with higher values indicating better performance) as suggested by Steyerberg et al (2010). Brier score for all.features was 0.17. The AUC was 0.79 for the all.features model.

Table 2: Interpreting Model Coefficients

library(finalfit)
# https://finalfit.org/
# https://www.datasurg.net/2018/05/16/elegant-regression-results-tables-and-plots-the-finalfit-package/

data(all_data)

explanatory = c("Gender", "Self_Identify", "Type_of_medical_school", "Visa_Sponsorship_Needed", "Medical_Education_or_Training_Interrupted", 
    "Alpha_Omega_Alpha", "Gold_Humanism_Honor_Society", "Couples_Match", "Count_of_Oral_Presentation", 
    "Count_of_Peer_Reviewed_Journal_Articles_Abstracts", "Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published", 
    "Count_of_Poster_Presentation", "Count_of_Peer_Reviewed_Book_Chapter")
dependent = "Match_Status"

t2 <- all_data %>% finalfit(dependent, explanatory, metrics = TRUE)

knitr::kable(t2[[1]], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r")) %>% kable_styling()

Dependent: Match_Status		Not_Matched	Matched	OR (univariable)	OR (multivariable)
Gender	Female	1000 (31.4)	2182 (68.6)
	Male	333 (46.1)	389 (53.9)	0.54 (0.45-0.63, p<0.001)	0.70 (0.58-0.84, p<0.001)
Race/Ethnicity	White	626 (27.6)	1639 (72.4)
	Asian	275 (38.7)	435 (61.3)	0.60 (0.51-0.72, p<0.001)	0.76 (0.62-0.93, p=0.008)
	Black	158 (60.3)	104 (39.7)	0.25 (0.19-0.33, p<0.001)	0.26 (0.20-0.35, p<0.001)
	Hispanic	116 (33.8)	227 (66.2)	0.75 (0.59-0.95, p=0.018)	0.89 (0.68-1.18, p=0.417)
	Other	158 (48.8)	166 (51.2)	0.40 (0.32-0.51, p<0.001)	0.68 (0.52-0.89, p=0.006)
Type of Medical School	IMG	506 (70.3)	214 (29.7)
	Osteopathic	239 (55.2)	194 (44.8)	1.92 (1.50-2.46, p<0.001)	1.38 (1.05-1.83, p=0.021)
	US Private	202 (21.0)	760 (79.0)	8.90 (7.13-11.14, p<0.001)	6.03 (4.68-7.80, p<0.001)
	US Public	386 (21.6)	1403 (78.4)	8.59 (7.08-10.47, p<0.001)	5.23 (4.16-6.58, p<0.001)
Visa Sponsorship Needed	No	1158 (31.6)	2507 (68.4)
	Yes	175 (73.2)	64 (26.8)	0.17 (0.12-0.23, p<0.001)	0.67 (0.47-0.95, p=0.026)
Medical Education Interrupted	No	1063 (31.9)	2272 (68.1)
	Yes	270 (47.5)	299 (52.5)	0.52 (0.43-0.62, p<0.001)	0.58 (0.47-0.72, p<0.001)
Alpha Omega Alpha	No	1307 (38.3)	2109 (61.7)
	Yes	26 (5.3)	462 (94.7)	11.01 (7.53-16.85, p<0.001)	4.69 (3.16-7.27, p<0.001)
Gold Humanism Society	No	1220 (37.0)	2079 (63.0)
	Yes	113 (18.7)	492 (81.3)	2.56 (2.07-3.19, p<0.001)	1.49 (1.18-1.91, p=0.001)
Couples Match	No	1288 (36.3)	2261 (63.7)
	Yes	45 (12.7)	310 (87.3)	3.92 (2.88-5.47, p<0.001)	2.39 (1.71-3.41, p<0.001)
Oral Presentations	Mean (SD)	1.0 (2.1)	1.0 (1.6)	0.98 (0.95-1.02, p=0.389)	0.98 (0.93-1.02, p=0.303)
Peer-Reviewed Articles	Mean (SD)	1.4 (4.1)	1.3 (3.2)	0.99 (0.98-1.01, p=0.562)	0.99 (0.97-1.02, p=0.710)
Unpublished Articles	Mean (SD)	0.4 (1.1)	0.5 (1.2)	1.14 (1.07-1.23, p<0.001)	1.10 (1.02-1.19, p=0.011)
Poster Presentations	Mean (SD)	1.6 (2.5)	2.3 (3.0)	1.13 (1.09-1.16, p<0.001)	1.03 (1.00-1.07, p=0.082)
Book Chapters	Mean (SD)	0.1 (0.4)	0.0 (0.2)	0.71 (0.57-0.88, p=0.002)	0.74 (0.56-0.96, p=0.025)

knitr::kable(t2[[2]], row.names = FALSE, col.names = "") %>% kable_styling()


Number in dataframe = 3904, Number in model = 3904, Missing = 0, AIC = 4032.5, C-statistic = 0.792, H&L = Chi-sq(8) 12.27 (p=0.140)

# OR plot
explanatory = c("Gender", "Self_Identify", "Type_of_medical_school", "Visa_Sponsorship_Needed", "Medical_Education_or_Training_Interrupted", 
    "Alpha_Omega_Alpha", "Gold_Humanism_Honor_Society", "Couples_Match", "Count_of_Oral_Presentation", 
    "Count_of_Peer_Reviewed_Journal_Articles_Abstracts", "Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published", 
    "Count_of_Poster_Presentation", "Count_of_Peer_Reviewed_Book_Chapter")

dependent = "Match_Status"

all_data$Match_Status <- relevel(all_data$Match_Status, ref = "Not_Matched")

all_data %>% finalfit::or_plot(dependent, explanatory, table_text_size = 3, title_text_size = 10, column_space = c(-0.5, 
    0, 0.5), remove_ref = TRUE, plot_opts = list(xlab("Odds Ratio, 95% CI"), theme(axis.title = element_text(size = 9))))

It is readily seen from this plot that students who are White, female, and graduates of a US medical school are most likely to match. In addition, those with a history of better academic achievement (AOA, Gold Humanism) are those more likely to Match. Very few features are modifiable by the time a student decides to apply to OBGYN residency.

In this case the c-index is the probability that, of two randomly chosen medical students, the medical student with the higher predicted survival is more likely to match than the medical student with the lower chance of matching. The larger the value of C-index was, the more accurate the model predicted. This is a powerful model (C-statistic = 0.79) and the prediction patterns are easy to detect. It is generally believed that model discrimination is considered adequate when the c-index exceeds 0.7 and strong when the c-index exceeds 0.8

Figure 1: Nomogram: An analogue tool to deliver digital knowledge

We generated the nomogram to provide a pre-match, personalized estimate of the chance of matching into OBGYN residency at CU wereby points in the nomogram were assigned in proportion to the effect sizes in the multivariable logistic regression analysis model. The nomogram was based on presurgical variables including pre-Match education preparations, research accomplishments, and applicant demographics.

Points were allocated for each variable, summed, and then used to calculate a medical student-specific, pre-application risk chance of Matching. The nomogram illustrates the strength of association of the predictors to the outcome as well as the nonlinear associations between age and count of poster presentations and matching.

THE GOLD HUMANISM YES/NO IS BACKWARDS AND I AM NOT SURE HOW TO FIX IT.

par(mar=c(2, 2, 2, 2)) # 2 inch margins all over 
#https://www.statmethods.net/advgraphs/parameters.html

plot(nomo.nomo, xfrac=0.5, #distance of variable names to the bars
     cex.axis=0.4, #Size of the words  (e.g. "Male")
     cex.var=0.8, #Variable name size (e.g. size of "Age, years")
     total.points.label="Sum of all points",
     force.label = TRUE,
     lmgp = 0.25,
     #tcl = 0.8,
       label.every=2)

We can see than several of the variables are non-linear.

Figure 2: Bootstrap validation of the `all.features` model

In bootstrap validation random samples are drawn with replacement from the original data set are the same size as the original cohort. A bootstrap sample for the illustrative nomogram would include 299 medical students, but in this new sample, student A could appear three times, whereas student B could appear zero times.

all.features_predict <- stats::predict(all.features, x = TRUE, y = TRUE)
# all.features_predict <- rms::Predict(all.features, fun = plogis) all.features_predict <-
# update(all.features_predict, x=TRUE, y=TRUE)

I estimated an optimism-corrected c-statistic (0.79) and shrinkage factor (0.98)using bootstrapping, as described in Harrell, Lee and Mark (1996).

Predicted values

Calibration of the `all.features` model

We want to estimate the relationship between predicted matching probabilities and observed outcomes, i.e., to derive a calibration curve. The bootstrap is used to de-bias the estimates to correct for overfitting, allowing estimation of the likely future calibration performance of the fitted model.

The calibration plot demonstrated that the predicted chance of matching approximated the observed chance. On internal validation, the all.features model had excellent discrimination as demonstrated bya a bias-corrected (generated by bootstrap validation) index of 0.79 (95% CI x-y) and a Brier score of 0.17. We did not perform temporal validation though that would be possible as desired.

The validation output indicates minor overfitting. Overfitting would have been worse had the risk factors not been so strong.

# par(mar=c(8,5,3,2),cex = 1.0)
graphics::plot(cal_all.features, main = "all.features Bootstrap Overfitting-Corrected Calibration Curve", 
    xlim = c(0, 1), ylim = c(0, 1), xlab = "Nomogram-Predicted Probability of Matching", ylab = "Actual Matching (proportion)", 
    lwd = 2, lty = 1, errbar.col = c(rgb(0, 118, 192, maxColorValue = 255)), legend = FALSE, subtitles = FALSE, 
    cex.lab = 1.2, cex.axis = 1, cex.main = 1.2, cex.sub = 0.6, scat1d.opts = list(nhistSpike = 240, 
        side = 3, frac = 0.08, lwd = 1, nint = 50))


n=3904   Mean absolute error=0.011   Mean squared error=0.00021
0.9 Quantile of absolute error=0.023

lines(cal_all.features, lwd = 2, lty = 3, col = c(rgb(255, 0, 0, maxColorValue = 255)))
abline(0, 1, lty = 5, lwd = 2, col = c(rgb(0, 0, 255, maxColorValue = 255)))
legend("bottomright", cex = 0.8, legend = c("Apparent", "Bias-corrected", "Ideal"), col = c("red", "black", 
    "blue"), lwd = c(1, 1, 1), lty = c(1, 1, 2))

Model all.features appears to underpredict a bit at the mid-range predicted values. At the lower predicted values all.features tends to overpredict matching success. The closeness of the calibration curve to the 45 degree line demonstrates excellent valudation on an absolute probability scale. There is no missing data so that is not contributing doubts to the validity of the model.

Note that the cross-validated Somers’ Dxy statistic is: 0.59. Compare this to the nominal C = 0.79 shown in the summary. So, it looks like this model is slightly overfitting the data.

Discussion

The prevalence of not matching into OBGYN residency found in this study (34.1)% reinforces the finding from previous research that the match is exceptionally competitive for obstetrics and gynecology and not matching is a common problem.

We have successfully built and internally validated a model that accurately predicts a medical students’s chance of matching into an Obstetrics and Gynecology residency. The initial test of the model’s generalizability are promising as it was developed from a large cohort. It will be further generalized by testing in a subsequent cohort from multiple different institutions across the country.

Limitations

This study has several limitations. Because it was a retrospective study, the details of some predictors could not be discerned. Although the nomogram was developed in a bootstrap validated cohort, their educational use must be externally validated and evaluated. All variables used were easily exportable from the Program Director’s Workstation but incorporation of additional variables would be ideal. Some potentially important variables were not available in the data set and thus could not be included in the nomogram, such as letters of recommendation or USMLE clinical skills scores. We purposely did not include USMLE Step 1 score as a variable given the change to a pass or fail test. Although the C index for this model suggests good performance, the inclusion of additional applicant-level variables could result in more precise risk-prediction models.

Conclusion

This study found that a nomogram developed with pre-Match data to generate personalized estimates of matching into OBGYN residency may improve pre-Match counseling and interventions to medical students at high risk for not matching.

Comparison of nomogram with current predictive systems: Clerkship Directors

To compare the ROC curve of the model with the ROC curve of clerkship directors. The comparison will be done with the C-statistic.

A Model to Predict Chances of Matching into Obstetrics and Gynecology Residency

Tyler M. Muffly, MD

Department of Obstetrics and Gynecology, Denver Health, Denver, CO

Introduction

Materials and Methods

Results

Table 1

`all.features` model

Table 2: Interpreting Model Coefficients

Figure 1: Nomogram: An analogue tool to deliver digital knowledge

Figure 2: Bootstrap validation of the `all.features` model

Predicted values

Calibration of the `all.features` model

Discussion

Limitations

Conclusion

Comparison of nomogram with current predictive systems: Clerkship Directors

A Model to Predict Chances of Matching into Obstetrics and Gynecology Residency

Tyler M. Muffly, MD

Department of Obstetrics and Gynecology, Denver Health, Denver, CO

Introduction

Materials and Methods

Results

Table 1

all.features model

Table 2: Interpreting Model Coefficients

Figure 1: Nomogram: An analogue tool to deliver digital knowledge

Figure 2: Bootstrap validation of the all.features model

Predicted values

Calibration of the all.features model

Discussion

Limitations

Conclusion

Comparison of nomogram with current predictive systems: Clerkship Directors

`all.features` model

Figure 2: Bootstrap validation of the `all.features` model

Calibration of the `all.features` model