##Data Reduction

Descriptive Analysis

#> 
#> --------Summary descriptives table by 'Match_Status'---------
#> 
#> _______________________________________________________________________________________________________________________ 
#>                                                                            Matched        Not_Matched    p.overall  N   
#>                                                                             N=3919           N=3307                     
#> ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
#> Gender:                                                                                                   <0.001   7226 
#>     Female                                                               3327 (84.9%)     2585 (78.2%)                  
#>     Male                                                                 592 (15.1%)      722 (21.8%)                   
#> Age                                                                    27.0 [26.0;29.0] 28.0 [27.0;31.0]  <0.001   7226 
#> Medical_Degree:                                                                                           <0.001   7198 
#>     DO                                                                   521 (13.3%)      704 (21.4%)                   
#>     MD                                                                   3392 (86.7%)     2581 (78.6%)                  
#> Military_Service_Obligation:                                                                               0.726   7226 
#>     No                                                                   3875 (98.9%)     3266 (98.8%)                  
#>     Yes                                                                   44 (1.12%)       41 (1.24%)                   
#> Visa_Sponsorship_Needed:                                                                                  <0.001   7226 
#>     No                                                                   3831 (97.8%)     2942 (89.0%)                  
#>     Yes                                                                   88 (2.25%)      365 (11.0%)                   
#> Medical_Education_or_Training_Interrupted:                                                                <0.001   7226 
#>     No                                                                   3521 (89.8%)     2706 (81.8%)                  
#>     Yes                                                                  398 (10.2%)      601 (18.2%)                   
#> Misdemeanor_Conviction:                                                                                    0.233   7226 
#>     No                                                                   3865 (98.6%)     3249 (98.2%)                  
#>     Yes                                                                   54 (1.38%)       58 (1.75%)                   
#> Alpha_Omega_Alpha:                                                                                        <0.001   7226 
#>     No                                                                   3398 (86.7%)     3120 (94.3%)                  
#>     Yes                                                                  521 (13.3%)      187 (5.65%)                   
#> Gold_Humanism_Honor_Society:                                                                              <0.001   7226 
#>     No                                                                   3215 (82.0%)     2972 (89.9%)                  
#>     Yes                                                                  704 (18.0%)      335 (10.1%)                   
#> Couples_Match:                                                                                            <0.001   7226 
#>     No                                                                   3513 (89.6%)     3102 (93.8%)                  
#>     Yes                                                                  406 (10.4%)      205 (6.20%)                   
#> Count_of_Oral_Presentation                                             0.00 [0.00;1.00] 0.00 [0.00;1.00]  <0.001   7226 
#> Count_of_Peer_Reviewed_Book_Chapter                                    0.00 [0.00;0.00] 0.00 [0.00;0.00]   0.028   7226 
#> Count_of_Peer_Reviewed_Journal_Articles_Abstracts                      0.00 [0.00;1.00] 0.00 [0.00;1.00]  <0.001   7226 
#> Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published 0.00 [0.00;1.00] 0.00 [0.00;0.00]  <0.001   7226 
#> Count_of_Poster_Presentation                                           1.00 [0.00;3.00] 1.00 [0.00;2.00]  <0.001   7226 
#> Year:                                                                                                     <0.001   7226 
#>     2016                                                                 294 (7.50%)      148 (4.48%)                   
#>     2017                                                                 1060 (27.0%)     777 (23.5%)                   
#>     2018                                                                 711 (18.1%)      573 (17.3%)                   
#>     2019                                                                 1292 (33.0%)     1076 (32.5%)                  
#>     2020                                                                 562 (14.3%)      733 (22.2%)                   
#> USMLE_Pass_Fail_replaced:                                                                                 <0.001   6673 
#>     Failed attempt                                                        9 (0.24%)        88 (3.01%)                   
#>     Passed                                                               3741 (99.8%)     2835 (97.0%)                  
#> Location:                                                                                                 <0.001   7226 
#>     BSW                                                                  129 (3.29%)      174 (5.26%)                   
#>     CCAG                                                                 270 (6.89%)      333 (10.1%)                   
#>     CU                                                                   1449 (37.0%)     1431 (43.3%)                  
#>     Duke                                                                 802 (20.5%)      409 (12.4%)                   
#>     Truman                                                                84 (2.14%)      170 (5.14%)                   
#>     U_Washington                                                         180 (4.59%)      126 (3.81%)                   
#>     UAB                                                                  110 (2.81%)      130 (3.93%)                   
#>     Utah                                                                 895 (22.8%)      534 (16.1%)                   
#> Meeting_Name_Presented:                                                                                   <0.001   7226 
#>     Did not present at a meeting                                         2657 (67.8%)     2534 (76.6%)                  
#>     Presented at a meeting                                               1262 (32.2%)     773 (23.4%)                   
#> TopNIHfunded:                                                                                             <0.001   7226 
#>     Did not attend NIH top-funded medical school                         2247 (57.3%)     2568 (77.7%)                  
#>     Attended a NIH top-funded Medical School                             1672 (42.7%)     739 (22.3%)                   
#> Higher_Education_Institution:                                                                             <0.001   7226 
#>     No Ivy League Education                                              3698 (94.4%)     3228 (97.6%)                  
#>     Ivy League                                                           221 (5.64%)       79 (2.39%)                   
#> Higher_Education_Degree:                                                                                  <0.001   7226 
#>     Not a B.S. degree                                                    2136 (54.5%)     2132 (64.5%)                  
#>     B.S.                                                                 1783 (45.5%)     1175 (35.5%)                  
#> Interest_Group:                                                                                            1.000   7226 
#>     No Interest Group                                                    3914 (99.9%)     3303 (99.9%)                  
#>     Mentions Interest Group                                               5 (0.13%)        4 (0.12%)                    
#> Language_Fluency:                                                                                          0.054   5434 
#>     Speaks English and Another Language                                  2256 (73.2%)     1778 (75.6%)                  
#>     Speaks English                                                       825 (26.8%)      575 (24.4%)                   
#> ACLS:                                                                                                     <0.001   5434 
#>     No                                                                   1819 (59.0%)     1228 (52.2%)                  
#>     Yes                                                                  1262 (41.0%)     1125 (47.8%)                  
#> Other_Service_Obligation:                                                                                  0.238   5434 
#>     No                                                                   3045 (98.8%)     2334 (99.2%)                  
#>     Yes                                                                   36 (1.17%)       19 (0.81%)                   
#> Photo_Received:                                                                                           <0.001   5434 
#>     No                                                                    13 (0.42%)       48 (2.04%)                   
#>     Yes                                                                  3068 (99.6%)     2305 (98.0%)                  
#> Tracks_Applied_by_Applicant_1:                                                                            <0.001   7226 
#>     Applying for a Preliminary Position                                  2035 (51.9%)     2203 (66.6%)                  
#>     Categorical Applicant                                                1884 (48.1%)     1104 (33.4%)                  
#> AMA:                                                                                                      <0.001   7226 
#>     No AMA Membership                                                    2302 (58.7%)     2386 (72.1%)                  
#>     American Medical Association Member                                  1617 (41.3%)     921 (27.9%)                   
#> ACOG:                                                                                                     <0.001   7226 
#>     No ACOG Membership                                                   1886 (48.1%)     2291 (69.3%)                  
#>     ACOG Member                                                          2033 (51.9%)     1016 (30.7%)                  
#> Latin_Honors:                                                                                              0.001   7226 
#>     Latin_honors                                                          12 (0.31%)       31 (0.94%)                   
#>     No_cum_laude                                                         3907 (99.7%)     3276 (99.1%)                  
#> Scholarship:                                                                                              <0.001   7226 
#>     No_scholarship                                                       3033 (77.4%)     2769 (83.7%)                  
#>     Scholarship                                                          886 (22.6%)      538 (16.3%)                   
#> Grant:                                                                                                    <0.001   7226 
#>     Grant_funding                                                        231 (5.89%)      123 (3.72%)                   
#>     No_Grant_funding                                                     3688 (94.1%)     3184 (96.3%)                  
#> Phi_beta_kappa:                                                                                           <0.001   7226 
#>     No_Phi_Beta_Kappa                                                    3835 (97.9%)     3275 (99.0%)                  
#>     Phi_Beta_Kappa                                                        84 (2.14%)       32 (0.97%)                   
#> NCAA:                                                                                                      0.008   7226 
#>     NCAA_athlente                                                         47 (1.20%)       19 (0.57%)                   
#>     Not_a_NCAA_athlete                                                   3872 (98.8%)     3288 (99.4%)                  
#> Boy_Scouts:                                                                                                0.269   7226 
#>     Boy/Girl_Scouts                                                       18 (0.46%)       9 (0.27%)                    
#>     Not_a_Boy/Girl_Scout                                                 3901 (99.5%)     3298 (99.7%)                  
#> Valedictorian:                                                                                             0.044   7226 
#>     Not_a_Valedictorian                                                  3877 (98.9%)     3287 (99.4%)                  
#>     Valedictorian                                                         42 (1.07%)       20 (0.60%)                   
#> NIH:                                                                                                       0.656   7226 
#>     NIH_present                                                           24 (0.61%)       24 (0.73%)                   
#>     No_NIH_involvement                                                   3895 (99.4%)     3283 (99.3%)                  
#> NCI:                                                                                                       0.317   7226 
#>     NCI_present                                                          112 (2.86%)       81 (2.45%)                   
#>     No_NCI_involvement                                                   3807 (97.1%)     3226 (97.6%)                  
#> total_OBGYN_letter_writers                                             2.00 [2.00;3.00] 2.00 [1.00;2.00]  <0.001   5348 
#> number_of_applicant_first_author_publications                          1.00 [0.00;3.00] 0.00 [0.00;1.00]  <0.001   7226 
#> Advance_Degree:                                                                                            0.015   7226 
#>     M.B.A.                                                                14 (0.36%)       20 (0.60%)                   
#>     No Advanced Degree                                                   3651 (93.2%)     3027 (91.5%)                  
#>     Ph.D.                                                                 10 (0.26%)       19 (0.57%)                   
#>     Other                                                                244 (6.23%)      241 (7.29%)                   
#> Type_of_medical_school:                                                                                      .     7226 
#>     U.S. Public School                                                   1937 (49.4%)     1024 (31.0%)                  
#>     International School                                                 260 (6.63%)      1075 (32.5%)                  
#>     Osteopathic School                                                   523 (13.3%)      704 (21.3%)                   
#>     Osteopathic School,International School                               0 (0.00%)        1 (0.03%)                    
#>     U.S. Private School                                                  1199 (30.6%)     503 (15.2%)                   
#> work_exp_count                                                         2.00 [0.00;4.00] 2.00 [0.00;4.00]   0.052   7226 
#> Volunteer_exp_count                                                    6.00 [2.00;9.00] 4.00 [0.00;8.00]  <0.001   7226 
#> Research_exp_count                                                     2.00 [0.00;3.00] 1.00 [0.00;2.00]  <0.001   7226 
#> ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

#> 
#> --------Summary descriptives table ---------
#> 
#> ____________________________________________________________________________________________ 
#>                                                                             [ALL]        N   
#>                                                                             N=7226           
#> ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
#> Match_Status:                                                                           7226 
#>     Matched                                                              3919 (54.2%)        
#>     Not_Matched                                                          3307 (45.8%)        
#> Gender:                                                                                 7226 
#>     Female                                                               5912 (81.8%)        
#>     Male                                                                 1314 (18.2%)        
#> Age                                                                    28.0 [26.0;30.0] 7226 
#> Medical_Degree:                                                                         7198 
#>     DO                                                                   1225 (17.0%)        
#>     MD                                                                   5973 (83.0%)        
#> Military_Service_Obligation:                                                            7226 
#>     No                                                                   7141 (98.8%)        
#>     Yes                                                                   85 (1.18%)         
#> Visa_Sponsorship_Needed:                                                                7226 
#>     No                                                                   6773 (93.7%)        
#>     Yes                                                                  453 (6.27%)         
#> Medical_Education_or_Training_Interrupted:                                              7226 
#>     No                                                                   6227 (86.2%)        
#>     Yes                                                                  999 (13.8%)         
#> Misdemeanor_Conviction:                                                                 7226 
#>     No                                                                   7114 (98.5%)        
#>     Yes                                                                  112 (1.55%)         
#> Alpha_Omega_Alpha:                                                                      7226 
#>     No                                                                   6518 (90.2%)        
#>     Yes                                                                  708 (9.80%)         
#> Gold_Humanism_Honor_Society:                                                            7226 
#>     No                                                                   6187 (85.6%)        
#>     Yes                                                                  1039 (14.4%)        
#> Couples_Match:                                                                          7226 
#>     No                                                                   6615 (91.5%)        
#>     Yes                                                                  611 (8.46%)         
#> Count_of_Oral_Presentation                                             0.00 [0.00;1.00] 7226 
#> Count_of_Peer_Reviewed_Book_Chapter                                    0.00 [0.00;0.00] 7226 
#> Count_of_Peer_Reviewed_Journal_Articles_Abstracts                      0.00 [0.00;1.00] 7226 
#> Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published 0.00 [0.00;0.00] 7226 
#> Count_of_Poster_Presentation                                           1.00 [0.00;3.00] 7226 
#> Year:                                                                                   7226 
#>     2016                                                                 442 (6.12%)         
#>     2017                                                                 1837 (25.4%)        
#>     2018                                                                 1284 (17.8%)        
#>     2019                                                                 2368 (32.8%)        
#>     2020                                                                 1295 (17.9%)        
#> USMLE_Pass_Fail_replaced:                                                               6673 
#>     Failed attempt                                                        97 (1.45%)         
#>     Passed                                                               6576 (98.5%)        
#> Location:                                                                               7226 
#>     BSW                                                                  303 (4.19%)         
#>     CCAG                                                                 603 (8.34%)         
#>     CU                                                                   2880 (39.9%)        
#>     Duke                                                                 1211 (16.8%)        
#>     Truman                                                               254 (3.52%)         
#>     U_Washington                                                         306 (4.23%)         
#>     UAB                                                                  240 (3.32%)         
#>     Utah                                                                 1429 (19.8%)        
#> Meeting_Name_Presented:                                                                 7226 
#>     Did not present at a meeting                                         5191 (71.8%)        
#>     Presented at a meeting                                               2035 (28.2%)        
#> TopNIHfunded:                                                                           7226 
#>     Did not attend NIH top-funded medical school                         4815 (66.6%)        
#>     Attended a NIH top-funded Medical School                             2411 (33.4%)        
#> Higher_Education_Institution:                                                           7226 
#>     No Ivy League Education                                              6926 (95.8%)        
#>     Ivy League                                                           300 (4.15%)         
#> Higher_Education_Degree:                                                                7226 
#>     Not a B.S. degree                                                    4268 (59.1%)        
#>     B.S.                                                                 2958 (40.9%)        
#> Interest_Group:                                                                         7226 
#>     No Interest Group                                                    7217 (99.9%)        
#>     Mentions Interest Group                                               9 (0.12%)          
#> Language_Fluency:                                                                       5434 
#>     Speaks English and Another Language                                  4034 (74.2%)        
#>     Speaks English                                                       1400 (25.8%)        
#> ACLS:                                                                                   5434 
#>     No                                                                   3047 (56.1%)        
#>     Yes                                                                  2387 (43.9%)        
#> Other_Service_Obligation:                                                               5434 
#>     No                                                                   5379 (99.0%)        
#>     Yes                                                                   55 (1.01%)         
#> Photo_Received:                                                                         5434 
#>     No                                                                    61 (1.12%)         
#>     Yes                                                                  5373 (98.9%)        
#> Tracks_Applied_by_Applicant_1:                                                          7226 
#>     Applying for a Preliminary Position                                  4238 (58.6%)        
#>     Categorical Applicant                                                2988 (41.4%)        
#> AMA:                                                                                    7226 
#>     No AMA Membership                                                    4688 (64.9%)        
#>     American Medical Association Member                                  2538 (35.1%)        
#> ACOG:                                                                                   7226 
#>     No ACOG Membership                                                   4177 (57.8%)        
#>     ACOG Member                                                          3049 (42.2%)        
#> Latin_Honors:                                                                           7226 
#>     Latin_honors                                                          43 (0.60%)         
#>     No_cum_laude                                                         7183 (99.4%)        
#> Scholarship:                                                                            7226 
#>     No_scholarship                                                       5802 (80.3%)        
#>     Scholarship                                                          1424 (19.7%)        
#> Grant:                                                                                  7226 
#>     Grant_funding                                                        354 (4.90%)         
#>     No_Grant_funding                                                     6872 (95.1%)        
#> Phi_beta_kappa:                                                                         7226 
#>     No_Phi_Beta_Kappa                                                    7110 (98.4%)        
#>     Phi_Beta_Kappa                                                       116 (1.61%)         
#> NCAA:                                                                                   7226 
#>     NCAA_athlente                                                         66 (0.91%)         
#>     Not_a_NCAA_athlete                                                   7160 (99.1%)        
#> Boy_Scouts:                                                                             7226 
#>     Boy/Girl_Scouts                                                       27 (0.37%)         
#>     Not_a_Boy/Girl_Scout                                                 7199 (99.6%)        
#> Valedictorian:                                                                          7226 
#>     Not_a_Valedictorian                                                  7164 (99.1%)        
#>     Valedictorian                                                         62 (0.86%)         
#> NIH:                                                                                    7226 
#>     NIH_present                                                           48 (0.66%)         
#>     No_NIH_involvement                                                   7178 (99.3%)        
#> NCI:                                                                                    7226 
#>     NCI_present                                                          193 (2.67%)         
#>     No_NCI_involvement                                                   7033 (97.3%)        
#> total_OBGYN_letter_writers                                             2.00 [2.00;3.00] 5348 
#> number_of_applicant_first_author_publications                          0.00 [0.00;2.00] 7226 
#> Advance_Degree:                                                                         7226 
#>     M.B.A.                                                                34 (0.47%)         
#>     No Advanced Degree                                                   6678 (92.4%)        
#>     Ph.D.                                                                 29 (0.40%)         
#>     Other                                                                485 (6.71%)         
#> Type_of_medical_school:                                                                 7226 
#>     U.S. Public School                                                   2961 (41.0%)        
#>     International School                                                 1335 (18.5%)        
#>     Osteopathic School                                                   1227 (17.0%)        
#>     Osteopathic School,International School                               1 (0.01%)          
#>     U.S. Private School                                                  1702 (23.6%)        
#> work_exp_count                                                         2.00 [0.00;4.00] 7226 
#> Volunteer_exp_count                                                    5.00 [0.00;9.00] 7226 
#> Research_exp_count                                                     1.00 [0.00;3.00] 7226 
#> ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

#>  [1] "Match_Status"                                                          
#>  [2] "Gender"                                                                
#>  [3] "Age"                                                                   
#>  [4] "Medical_Degree"                                                        
#>  [5] "Military_Service_Obligation"                                           
#>  [6] "Visa_Sponsorship_Needed"                                               
#>  [7] "Medical_Education_or_Training_Interrupted"                             
#>  [8] "Misdemeanor_Conviction"                                                
#>  [9] "Alpha_Omega_Alpha"                                                     
#> [10] "Gold_Humanism_Honor_Society"                                           
#> [11] "Couples_Match"                                                         
#> [12] "Count_of_Oral_Presentation"                                            
#> [13] "Count_of_Peer_Reviewed_Book_Chapter"                                   
#> [14] "Count_of_Peer_Reviewed_Journal_Articles_Abstracts"                     
#> [15] "Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published"
#> [16] "Count_of_Poster_Presentation"                                          
#> [17] "Year"                                                                  
#> [18] "USMLE_Pass_Fail_replaced"                                              
#> [19] "Location"                                                              
#> [20] "Meeting_Name_Presented"                                                
#> [21] "TopNIHfunded"                                                          
#> [22] "Higher_Education_Institution"                                          
#> [23] "Higher_Education_Degree"                                               
#> [24] "Interest_Group"                                                        
#> [25] "Language_Fluency"                                                      
#> [26] "ACLS"                                                                  
#> [27] "Other_Service_Obligation"                                              
#> [28] "Photo_Received"                                                        
#> [29] "Tracks_Applied_by_Applicant_1"                                         
#> [30] "AMA"                                                                   
#> [31] "ACOG"                                                                  
#> [32] "Latin_Honors"                                                          
#> [33] "Scholarship"                                                           
#> [34] "Grant"                                                                 
#> [35] "Phi_beta_kappa"                                                        
#> [36] "NCAA"                                                                  
#> [37] "Boy_Scouts"                                                            
#> [38] "Valedictorian"                                                         
#> [39] "NIH"                                                                   
#> [40] "NCI"                                                                   
#> [41] "total_OBGYN_letter_writers"                                            
#> [42] "number_of_applicant_first_author_publications"                         
#> [43] "Advance_Degree"                                                        
#> [44] "Type_of_medical_school"                                                
#> [45] "work_exp_count"                                                        
#> [46] "Volunteer_exp_count"                                                   
#> [47] "Research_exp_count"

Lasso Regression with caret

1- Split Data

`{r} # # # set.seed(123) # d_part <- createDataPartition(y=reduced_Data$Match_Status, p=0.70, list=FALSE) # tstSamples <- reduced_Data[-d_part,] # trgSamples <- reduced_Data[d_part,] #`

2- Train the model

`{r} # #set lambda sequence # lambda <- 10^seq(-3, 3, length = 100) # #train model # lasso <- train(Match_Status ~., data = trgSamples, method = "glmnet", family= "binomial", trControl = trainControl("cv", number = 10), tuneGrid = expand.grid(alpha = 1, lambda = lambda)) #`

`{r} # # Model coefficients # # coefs = coef(lasso$finalModel, lasso$bestTune$lambda) # ix = which(abs(coefs[,1]) > 0) # length(ix) # # coefs[ix,1, drop=FALSE] #`

3- Prediction and accuracy

`{r} # # Make predictions # predictions <- lasso %>% predict(tstSamples) # cm <- confusionMatrix(predictions, tstSamples$Match_Status, positive = "Matched") # Accuracy = cm$overall[1] # Sensitivity = cm$byClass[1] # Specificity = cm$byClass[2] # Precision = cm$byClass[5] # F1 = cm$byClass[7] # # modelPerformance <- data.frame( Accuracy = cm$overall[1],Sensitivity = cm$byClass[1], Specificity = cm$byClass[2], Precision = cm$byClass[5], F1 = cm$byClass[7]) # modelPerformance #`

Group Lasso

1- Create dummy variables from categorical data

2-Drop reference dummies

3-Combine dummies with numeric variables then convert them to a matrix X

4- Outcome Variable

5- Create group vector that distinguish groups

#> [1] 45

#>                                                            (Intercept) 
#>                                                           2.7028166060 
#>                                                                    Age 
#>                                                          -0.0644008290 
#>                                             Count_of_Oral_Presentation 
#>                                                           0.0068303907 
#>                                    Count_of_Peer_Reviewed_Book_Chapter 
#>                                                          -0.0804134625 
#>                      Count_of_Peer_Reviewed_Journal_Articles_Abstracts 
#>                                                           0.0214819797 
#> Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published 
#>                                                           0.0683113347 
#>                                             total_OBGYN_letter_writers 
#>                                                           0.2972333604 
#>                          number_of_applicant_first_author_publications 
#>                                                           0.0099665971 
#>                                                         work_exp_count 
#>                                                          -0.0150879310 
#>                                                    Volunteer_exp_count 
#>                                                           0.0007676075 
#>                                                     Research_exp_count 
#>                                                           0.0644441525 
#>                                        Military_Service_Obligation_Yes 
#>                                                          -0.0318256987 
#>                                            Visa_Sponsorship_Needed_Yes 
#>                                                          -0.5105278656 
#>                          Medical_Education_or_Training_Interrupted_Yes 
#>                                                          -0.4695121962 
#>                                             Misdemeanor_Conviction_Yes 
#>                                                          -0.0438318999 
#>                                        Gold_Humanism_Honor_Society_Yes 
#>                                                           0.2048801515 
#>                                                              Year_2017 
#>                                                          -0.4448468466 
#>                                                              Year_2018 
#>                                                          -0.7920745253 
#>                                                              Year_2019 
#>                                                          -1.0012131967 
#>                                USMLE_Pass_Fail_replaced_Failed attempt 
#>                                                          -1.6106680799 
#>                                                           Location_BSW 
#>                                                          -0.0423034916 
#>                                                          Location_CCAG 
#>                                                           0.3172336056 
#>                                                            Location_CU 
#>                                                           0.2536836760 
#>                                                          Location_Duke 
#>                                                           0.4242711983 
#>                                                        Location_Truman 
#>                                                          -0.0905061764 
#>                                                  Location_U_Washington 
#>                                                           0.3895741126 
#>                                                           Location_UAB 
#>                                                          -0.0098410791 
#>                    Meeting_Name_Presented_Did not present at a meeting 
#>                                                          -0.0681419911 
#>                  TopNIHfunded_Attended a NIH top-funded Medical School 
#>                                                           0.2068550243 
#>                                Higher_Education_Institution_Ivy League 
#>                                                           0.2631380197 
#>                                       Interest_Group_No Interest Group 
#>                                                          -0.0431535955 
#>                                            Other_Service_Obligation_No 
#>                                                          -0.0769676431 
#>                                           Other_Service_Obligation_Yes 
#>                                                           0.0769676431 
#>                                                      Photo_Received_No 
#>                                                          -0.2148202210 
#>      Tracks_Applied_by_Applicant_1_Applying for a Preliminary Position 
#>                                                          -0.5586389384 
#>                                                  AMA_No AMA Membership 
#>                                                           0.0743980900 
#>                                                ACOG_No ACOG Membership 
#>                                                          -0.4428659113 
#>                                                     NCAA_NCAA_athlente 
#>                                                           0.1496755231 
#>                                                        NIH_NIH_present 
#>                                                          -0.2271965431 
#>                                                  Advance_Degree_M.B.A. 
#>                                                          -0.0151805770 
#>                                      Advance_Degree_No Advanced Degree 
#>                                                           0.1737718800 
#>                                                   Advance_Degree_Ph.D. 
#>                                                           0.1239380303 
#>                              Type_of_medical_school_U.S. Public School 
#>                                                          -0.1896174159 
#>                            Type_of_medical_school_International School 
#>                                                          -1.2966275262 
#>                              Type_of_medical_school_Osteopathic School 
#>                                                          -0.4307976573

Temporal validation

Models Training

1-prepare training theme

Add brier score to matrix

2- train models on train data

##Model ensemble

As ensemble caret does not support custome model, we will add cat model manually

#>              logit     lasso       xgb      nnet        rf     rpart  CATboost
#> logit    1.0000000 0.9999351 0.8880047 0.9968440 0.9971100 0.8427780 0.9814025
#> lasso    0.9999351 1.0000000 0.8930157 0.9968824 0.9976580 0.8462075 0.9835208
#> xgb      0.8880047 0.8930157 1.0000000 0.8800297 0.9156156 0.9360580 0.9550694
#> nnet     0.9968440 0.9968824 0.8800297 1.0000000 0.9902172 0.8079485 0.9811130
#> rf       0.9971100 0.9976580 0.9156156 0.9902172 1.0000000 0.8805577 0.9881312
#> rpart    0.8427780 0.8462075 0.9360580 0.8079485 0.8805577 1.0000000 0.8796145
#> CATboost 0.9814025 0.9835208 0.9550694 0.9811130 0.9881312 0.8796145 1.0000000

All models are highly correlated, so they are not good candidate for ensemble

Find Confidence interval

3- Model Comparison

-Summary Table

#> 
#> Call:
#> summary.resamples(object = results)
#> 
#> Models: Logit, Lasso, RF, XGB, CAT, Nnet 
#> Number of resamples: 4 
#> 
#> Acc 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.6074271 0.6845423 0.7167154 0.7034744 0.7356475 0.7730399    0
#> Lasso 0.5941645 0.6822781 0.7127151 0.6978148 0.7282518 0.7716644    0
#> RF    0.6127321 0.6805682 0.7062614 0.6964788 0.7221720 0.7606602    0
#> XGB   0.6312997 0.6699357 0.6894637 0.6861881 0.7057162 0.7345254    0
#> CAT   0.5915119 0.6755596 0.7228125 0.7015126 0.7487654 0.7689133    0
#> Nnet  0.6233422 0.6891400 0.7141936 0.7055045 0.7305581 0.7702889    0
#> 
#> AUCPR 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.6963751 0.7007829 0.7321083 0.7420537 0.7733791 0.8076228    0
#> Lasso 0.6974151 0.7018986 0.7325478 0.7432135 0.7738627 0.8103436    0
#> RF    0.6774010 0.6791621 0.7061786 0.7208920 0.7479084 0.7938099    0
#> XGB   0.6682657 0.6841082 0.7099361 0.7231657 0.7489936 0.8045248    0
#> CAT   0.6773208 0.6948163 0.7222133 0.7395354 0.7669325 0.8363942    0
#> Nnet  0.6948113 0.7007222 0.7292614 0.7397723 0.7683115 0.8057551    0
#> 
#> AUCROC 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.6816236 0.7447619 0.7705780 0.7522735 0.7780897 0.7863145    0
#> Lasso 0.6822151 0.7446997 0.7707367 0.7529009 0.7789378 0.7879152    0
#> RF    0.6735818 0.7181008 0.7451371 0.7349916 0.7620280 0.7761104    0
#> XGB   0.6749761 0.7188217 0.7396482 0.7323707 0.7531972 0.7752101    0
#> CAT   0.6636387 0.7321357 0.7577719 0.7468202 0.7724563 0.8080982    0
#> Nnet  0.6769196 0.7442541 0.7675981 0.7490198 0.7723638 0.7839636    0
#> 
#> Brier 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.1722116 0.1844084 0.1896108 0.1950277 0.2002300 0.2286775    0
#> Lasso 0.1707416 0.1833249 0.1880585 0.1947664 0.1995000 0.2322068    0
#> RF    0.1799489 0.1905923 0.1949141 0.2004738 0.2047956 0.2321181    0
#> XGB   0.1828559 0.1941797 0.2032792 0.2078545 0.2169541 0.2420038    0
#> CAT   0.1835637 0.1873832 0.1928299 0.1993416 0.2047883 0.2281426    0
#> Nnet  0.1659708 0.1873544 0.1946628 0.1957851 0.2030935 0.2278441    0
#> 
#> F 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.6062053 0.6348846 0.6460317 0.6474995 0.6586466 0.6917293    0
#> Lasso 0.6084906 0.6259813 0.6367488 0.6463212 0.6570888 0.7032967    0
#> RF    0.5837321 0.6188069 0.6327493 0.6345299 0.6484722 0.6888889    0
#> XGB   0.5563218 0.5931694 0.6258812 0.6217972 0.6545089 0.6791045    0
#> CAT   0.5820896 0.6007976 0.6110168 0.6351463 0.6453655 0.7364621    0
#> Nnet  0.5546667 0.5790736 0.6033996 0.6097213 0.6340473 0.6774194    0
#> 
#> Kappa 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.2207045 0.3732169 0.4250306 0.3808895 0.4327032 0.4527925    0
#> Lasso 0.1952424 0.3508282 0.4163395 0.3700956 0.4356068 0.4524609    0
#> RF    0.2288660 0.3559897 0.4038631 0.3647268 0.4126002 0.4223150    0
#> XGB   0.2650454 0.3256951 0.3584094 0.3443734 0.3770877 0.3956295    0
#> CAT   0.1866121 0.3332217 0.4072066 0.3716872 0.4456721 0.4857235    0
#> Nnet  0.2434640 0.3596681 0.4083276 0.3752422 0.4239016 0.4408496    0
#> 
#> Precision 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.5738397 0.6877456 0.7417770 0.7076257 0.7616571 0.7731092    0
#> Lasso 0.5614754 0.6778689 0.7268900 0.6942901 0.7433112 0.7619048    0
#> RF    0.5852535 0.6721754 0.7162210 0.6934482 0.7374938 0.7560976    0
#> XGB   0.6047619 0.6263214 0.6581779 0.6682959 0.7001525 0.7520661    0
#> CAT   0.5668203 0.6970848 0.7433735 0.7095456 0.7558343 0.7846154    0
#> Nnet  0.6273292 0.7224515 0.7740260 0.7517661 0.8033406 0.8316832    0
#> 
#> Recall 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.5204918 0.5505482 0.5932087 0.6125196 0.6551801 0.7431694    0
#> Lasso 0.5286885 0.5467984 0.6029481 0.6208047 0.6769544 0.7486339    0
#> RF    0.5000000 0.5405928 0.5933884 0.5951915 0.6479871 0.6939891    0
#> XGB   0.4959016 0.5318362 0.5814310 0.5881882 0.6377830 0.6939891    0
#> CAT   0.4795082 0.5035755 0.5918645 0.5892787 0.6775677 0.6938776    0
#> Nnet  0.4262295 0.5008873 0.5388429 0.5188360 0.5567916 0.5714286    0

-Plots

4- Variable importance

Plot Variable importance

#> svg 
#>   2

-Statistical Significance

There is no significant difference between models.

Model Calibration and lift Curves

-Prediction

1- Lift plot

2-Calibration plot

3- Logistic Calibration plot

Plot Calibration

#> svg 
#>   2

##Table and Plot of CI

Concordonce Index

##Model Ineraction

##Decison Curve

##Log Loss

2- train models on train data

-Summary Table

#> 
#> Call:
#> summary.resamples(object = results3)
#> 
#> Models: Logit, Lasso, RF, XGB, CAT, Nnet 
#> Number of resamples: 4 
#> 
#> logLoss 
#>            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
#> Logit 0.5181535 0.5482928 0.5597231 0.5722876 0.5837178 0.6515505    0
#> Lasso 0.5196284 0.5465371 0.5572951 0.5714494 0.5822074 0.6515790    0
#> RF    0.5401670 0.5630581 0.5736689 0.5864182 0.5970290 0.6581681    0
#> XGB   0.5516856 0.5747121 0.6008363 0.6113223 0.6374465 0.6919308    0
#> CAT   0.5569512 0.5686929 0.5739522 0.5888041 0.5940634 0.6503607    0
#> Nnet  0.5082789 0.5558810 0.5719629 0.5740597 0.5901416 0.6440341    0

-Plots

Find Confidence interval

-Staistical Significance

#> 
#> Call:
#> summary.diff.resamples(object = diffs)
#> 
#> p-value adjustment: bonferroni 
#> Upper diagonal: estimates of the difference
#> Lower diagonal: p-value for H0: difference = 0
#> 
#> logLoss 
#>       Logit  Lasso      RF         XGB        CAT        Nnet      
#> Logit         0.0008382 -0.0141307 -0.0390347 -0.0165165 -0.0017721
#> Lasso 1.0000            -0.0149688 -0.0398729 -0.0173547 -0.0026103
#> RF    0.3264 0.2323                -0.0249041 -0.0023859  0.0123585
#> XGB   0.1851 0.1824     0.7644                 0.0225182  0.0372626
#> CAT   1.0000 1.0000     1.0000     1.0000                 0.0147444
#> Nnet  1.0000 1.0000     1.0000     0.3922     1.0000

Logistic Model Formula

#> [1] "y = -3.08 - 0.38*Age - 0.92*Visa_Sponsorship_NeededYes - 0.59*Medical_Education_or_Training_InterruptedYes + 0.09*Misdemeanor_ConvictionYes + 0.32*Gold_Humanism_Honor_SocietyYes + 0.05*Count_of_Oral_Presentation - 0.03*Count_of_Peer_Reviewed_Book_Chapter + 0.11*Count_of_Peer_Reviewed_Journal_Articles_Abstracts + 0.08*Count_of_Peer_Reviewed_Journal_Articles_Abstracts_Other_than_Published + 1.4*USMLE_Pass_Fail_replacedPassed + 0.11*LocationCCAG - 0.24*LocationCU + 0.17*LocationDuke - 0.47*LocationTruman - 0.35*LocationU_Washington - 0.58*LocationUAB - 0.3*LocationUtah + 0.07*Meeting_Name_PresentedPresented at a meeting - 0.24*TopNIHfundedDid not attend NIH top-funded medical school + 0.03*Higher_Education_InstitutionNo Ivy League Education + 0.06*Interest_GroupNo Interest Group + 0.16*Other_Service_ObligationYes + 0.77*Photo_ReceivedYes + 0.33*Tracks_Applied_by_Applicant_1Categorical Applicant + 0.06*AMANo AMA Membership - 0.38*ACOGNo ACOG Membership + 0.33*total_OBGYN_letter_writers + 0.07*number_of_applicant_first_author_publications + 0.38*Advance_DegreeNo Advanced Degree - 0.04*Advance_DegreeOther + 0.66*Advance_DegreePh.D. + 0.43*Type_of_medical_schoolOsteopathic School + 1.3*Type_of_medical_schoolU.S. Private School + 1.03*Type_of_medical_schoolU.S. Public School - 0.04*work_exp_count + 0.02*Volunteer_exp_count + 0.08*Research_exp_count + 0.39*Military_Service_ObligationYes - 0.28*NCAANot_a_NCAA_athlete + 0.67*NIHNo_NIH_involvement"

###Odds Ratio table

Method

Design:

The study is a cross sectional study that aims to find the correlation between the rate of acceptance in US obstetrics and gynecology residency program and the applicants’ characteristics from 2017 to 2020. 7226 applicants participated in the survey. (more information to be added i.e is it online?, groups selection?….etc)

Survey Questionnaire:

The survey consists of 52 items. The items describe 7 main categories. The first category is demographics which include gender, age, nationality (US/Canadian), military service, and language fluency. The second category is educational excellence such as higher education institute, degree, and if the applicants have received scholarship, grants, Phi beta Kappa, or advanced degree. The third category is personal excellence and honors such as Misdemeanor Conviction, Alpha Omega Alpha, Boy Scouts, meeting presentation, oral presentation counts,interest group,and NCAA. The fourth category is medical school and training such as medical degree, medical school type, if the medical school is NIH funded, Previous medical training, Interrupted medical training or education, Gold Humanism Honor Society, AMA and ACOG membership, ACLS, and PALS. The fifth category is USMLE/Match status such as passing USMLE, Visa Sponsorship Needed, Couples Match, Tracks Applied by Applicant, and LORs count. The sixth category is work experience which includes count of work, volunteer and research experience. The seventh category is academic research experience which includes count of Peer Reviewed Book Chapter, Peer Reviewed Journal Articles or Abstracts, Poster Presentation, Peer Reviewed Journal Articles or Abstracts Other than Published, number of applicant first author publications and records count. The outcome variable is match status.

Data Preparation.

Data Reduction:

Data reduction was conducted in two phases. First, highly correlated variables were removed with R threshold = 0.7. 5 variables were removed from data. Second, group LASSO (Least Absolute Shrinkage and Selection Operator) was used for automatic variables selection. The LASSO penalizes the absolute size of the regression coefficients to drive the coefficients of irrelevant variables to zero [tibshirani1996regression]. The final reduced data include 29 variables.

Data imputation:

As the data were missing at random and missing observations can provide valid reference. The missing values were imputed using predictive mean matching method that replaces missing values with an observed value from with a similar predictive mean.

Data Partition:

The data were partitioned into train and test data based on ‘Year’ variable for temporal validation. Data from 2017 and 2018 were saved as train data while data from 2019 and 2020 were saved as test data.

Statistical Analysis:

Descriptive Analysis

Descriptive analysis of the variables by match staus (outcome variable) was performed in the form of frequency tables for categorical variables, and in the form of median[IQR] for continuous variables. The association between variables was assessed by Chi-square test of independence for categorical variables and t test for continuous varaiables. A P value < 0.05 was deemed significant.

###Classification Modeling

Multiple Classification modeling were conducted to find the best classifier for the data. 6 models were used, namely, logistic, LASSO logistic, Random Forest, XGboost (Extreme Gradient Boosting Algorithm), CATboost, and neural network models. To attain out-of-sample performance estimates, 10 folds with 3 repeats cross validation was performed using caret package in R. location groups were perserved in cross validation sampling. The models are expalined below:

logistic clasifier Logistic regression is a simple classifier which is used as a base model as it is easy to interpret.

lasso logistic classifier it penalizes the absolute size of the regression coefficients, based on the value of a tuning parameter λ. A sequence of λ from -3 to 3 was used to tune λ parameter.

Random Forest Random Forest is an ensemble technique that is a tree-based algorithm. Random Forest model was selected as it provides more accuracy for categorical data as well as it interprets categorical data better than logistic regression. Our data includes 36 out of 32 categorical variables.

XGBoost XGBoost is an ensemble method that works by boosting trees using a gradient descent algorithm. XGBoost is faster and provides more accuracy than random forest. The learning rate was tuned to 0.1, the maximum depth of each tree was tuned to 10, 15, 20, and 25, and the fraction of observations to be used in individual tree was estimated to be 0.8.

CATBoost CATBoost outperforms other gradient boost methods regarding to optimizing decision trees for categorical variables. The learning rate was tuned to 0.1, and the number of splits for numerical features was estimated to be 30.

Neural net Neural net outperforms decision tree methods if the training sample is sufficient.

Performance Metrices

Brier score was used along with other metrics to evaluate the performance of models as well as to evaluate the importance of covariates. Brier score is a metrics that verifies the accuracy of probability prediction. The score ranges from 0 to 1 with 0 equals complete accuracy perrfection. In addition, calibration, lift and decision curves were used to compare models.

*Lift curve describes how well a model ranks samples for one class.

Calibration curve can be used to characterisze how consistent the predicted class probabilities are with the observed event rates.

Decision curve quantifies the net benefit of using each model

Concordance Index is a measure of goodness of fit for binary outcomes in a logistic regression model. It ranges from 0 to 1 with 1 indicates the strongest model.

All statistical tests were performed using R version 4.0.3.

Results:

#> NULL

The characteristics of the applicants:

The count (frequency) of successfully matched applicants is

while the count (frequency) of unmatched applicants is

applicants are females and

are males. The median [IQR] of applicants’ age is

applicants are MD and

applicants do not need visa.

applicants have passed USMLE exams. The median [IQR] of work experience is

years, the median [IQR] of volunteer experience is

years, and the median [IQR] of research experience is

years.

The association between match status and the characteristics of the applicants:

The count(proportion) of matched females

is significantly higher than matched males

with P value

The median [IQR] age of matched applicants

is significantly higher than the median [IQR] of unmatched applicants

with P value

. The count (proportion) of matched applicants who passed USMLE exams

is significantly higher than matched applicants who failed it

with P value

. The count (proportions) of matched applicants graduated from Public and private US schools

and

are significantly higher than matched applicants from International and Osteopathic medical schools

and

with P value

. The median [IQR] of volunteer and research experience are significantly higher in matched applicants

and

than unmatched applicants

and

with P values

and

Modeling Results:

Cross Validation Performance

The cross validation on training data shows that Logistic regression is the model with the highest Brier score accuracy

[0.1432915-0.2476121], followed by LASSO model with Brier score

[0.1413843-0.2508084]. While neural net model showed the lowest Brier score accuracy

[0.1426993- 0.2913919]. Moreover, it shows that Logistic regression is the model with the lowest log loss

[0.463163-0.6859097], followed by LASSO model with log loss

[0.4579501-0.6927982]. While neural net model showed the highest log loss

[0.4543579- 0.8038737]

The results show that USMLE_Pass_Fail_replacedPassed, Visa_Sponsorship_NeededYes, Type_of_medical_schoolU.S. Private School, Type_of_medical_schoolU.S. Public School, and Photo_ReceivedYes are the most 5 important variables of logistic regression model, Type_of_medical_schoolU.S. Private School, USMLE_Pass_Fail_replacedPassed, Visa_Sponsorship_NeededYes, Type_of_medical_schoolU.S. Public School, and Medical_Education_or_Training_InterruptedYes are the most 5 important variables of lasso regression model, Age, Volunteer_exp_count, work_exp_count, total_OBGYN_letter_writers, and Research_exp_count are the most 5 important variables of random forest model, Volunteer_exp_count, Age, number_of_applicant_first_author_publications, work_exp_count, and Research_exp_count are the most 5 important variables of XGBoost model, Type_of_medical_school, Age, total_OBGYN_letter_writers, ACOG, and Medical_Education_or_Training_Interrupted are the most 5 important variables of CATBoost model, USMLE_Pass_Fail_replacedPassed, Type_of_medical_schoolU.S. Private School, Visa_Sponsorship_NeededYes, Type_of_medical_schoolU.S. Public School, and Photo_ReceivedYes are the most 5 important variables of neural network model.

Model Prediction and Diagnostics.

The lift curves show that all models are very close with around 50.6142506 percent can be sampled (when ordered by the probability predictions).

The calibration show that CATBoost and neural network fails to predict matched class at higher than 0.75. XGBoost is over confident which means its predictions for matched class < 0.5 are too low while its prediction for matched class > 0.5 are too high in contrast to CATBoost model. LASSO model appears as the most ideal model folllowed by logistic model.

Decision curves show that random forest and XGBoost models have the lowest benefits. The other models are closely related to each other.

The concordance index (CI) results show that random forest model is the strongest model with CI =

followed by XGBoost with CI =

Odds Ratio.

The odds ratios were calculated from the logistic regression model coefficients. They show that match status is significantly associated with younger ages (OR = 0.6839049, CI = [0.6047204-0.7734581], P = 1.4344036^{-9}). In addition, it shows that match status is less to occur in applicants with interrupted medical training or education (OR = 0.4005153, CI = [0.2655538-0.6040679], P = 1.2762757^{-5}), misdemeanor conviction (OR = 0.5541385, CI = [0.4340504-0.7074511], P = 2.168364^{-6}), not attending top NIH funding medical school (OR = 0.7840773, CI = [0.6372296-0.9647657], P = 0.0215062),and do not have ACOG membership (OR = 0.6813617, CI = [0.5691244-0.8157333], P = 2.9445962^{-5}). In contrast, it shows that match status is more to occur in applicants with Gold Humanism Honor Society (OR = 1.3721307, CI = [1.0525537-1.7887378], P = 0.0193572), higher count of peer reviewed journal articles and abstracts (OR = 0.9661793, CI = [0.8787904-1.0622582], P = 0.4768921), passing USMLE exam (OR = 4.0418448, CI = [1.652371-9.8867082], P = 0.0022105), applied to categorical tracks(OR = 1.3922623, CI = [1.1377863-1.7036542], P = 0.0013117), higher OBGYN letters count (OR = 1.386362, CI = [1.2665556-1.5175012], P = 1.3981516^{-12}), attending osteopathis schools (OR = 1.5303769, CI = [1.1094189-2.111063], P = 0.0095244), attending private schools (OR = 3.6718588, CI = [2.6274342-5.13145], P = 2.5972574^{-14}), and attending public US schools (OR = 2.8065957, CI = [2.0513984-3.8398096], P = 1.0981427^{-10})

A Model to Predict Chances of Matching into Obstetrics and Gynecology Residency at All Participating Sites

Tyler M. Muffly, MD

Department of Obstetrics and Gynecology, Denver Health, Denver, CO

Descriptive Analysis

Lasso Regression with caret

`{r} # # # set.seed(123) # d_part <- createDataPartition(y=reduced_Data$Match_Status, p=0.70, list=FALSE) # tstSamples <- reduced_Data[-d_part,] # trgSamples <- reduced_Data[d_part,] #`

2- Train the model

`{r} # #set lambda sequence # lambda <- 10^seq(-3, 3, length = 100) # #train model # lasso <- train(Match_Status ~., data = trgSamples, method = "glmnet", family= "binomial", trControl = trainControl("cv", number = 10), tuneGrid = expand.grid(alpha = 1, lambda = lambda)) #`

`{r} # # Model coefficients # # coefs = coef(lasso$finalModel, lasso$bestTune$lambda) # ix = which(abs(coefs[,1]) > 0) # length(ix) # # coefs[ix,1, drop=FALSE] #`

3- Prediction and accuracy

Group Lasso

Temporal validation

Models Training

Find Confidence interval

Model Calibration and lift Curves

Concordonce Index

Find Confidence interval

Logistic Model Formula

Method

Design:

Survey Questionnaire:

Data Preparation.

Data Reduction:

Data imputation:

Data Partition:

Statistical Analysis:

Descriptive Analysis

Performance Metrices

Results:

The characteristics of the applicants:

The association between match status and the characteristics of the applicants:

Modeling Results:

Cross Validation Performance

Model Prediction and Diagnostics.

Odds Ratio.