Tyler M. Muffly, Rich Amini - name: Tyler M. Muffly, MD affiliation: Denver Health - name: Rich Amini, MD affiliation: University of Arizona
Objective: We sought to construct and validate a model that predict a medical student’s chances of matching into an emergency medicine residency.
This data was cleaned in a separate R script with the help of exploratory.io. The project was created in R version 4.0.1 and run inside RStudio 1.2.5019. Session info is at the bottom of the script. Package installation, data download from Dropbox.com, and functions written for this project are all loaded in a separate “Additional_functions_nomogram.R” file.
all_data
is a dataframe of the independent and the dependent variables for review. Each variable is contained in a column, and each row represents a single unique medical student. If students applied in more than one year the most contemporary data was used.
ggplot(all_data, aes(x = Match_Status)) +
geom_bar() +
labs(y = "Number of Applicants")
There are more matched applicants than unmatched applicants.
Find the zero variance variables and remove them from the original data frame:
## Candidate Near Zero-Variance Variables
zero_variance_vars <- names(all_data)[caret::nearZeroVar(all_data)]
zero_variance_vars
## [1] "Degree" "Research_Year" "Absence_Year"
## [4] "Required_to_Remediate" "Pass_Attempt_Step_1"
## Remove Zero-Variance Features
#https://tidyselect.r-lib.org/reference/all_of.html, use any_of when removing variables not all_of
all_data <- all_data %>% select(!any_of(zero_variance_vars))
#Creates all_data1 with no PII present
all_data1 <- all_data
Checks to see if there are any NA or if there are any infinite values. FALSE means that there are no issues with NA or infinite variables.
## Check for bad values
apply(all_data1, 2, function(x) any(is.na(x) | is.infinite(x)))
## STAR_ID Survey_Year
## FALSE FALSE
## Interview_Offer_Total Match_Status
## FALSE FALSE
## Home_State Step_1_Score
## FALSE FALSE
## Cumulative_Quartile Quartile_Rank
## FALSE FALSE
## number_Honored_Clerkships Honors_A_This_Specialty
## FALSE FALSE
## AOA_Sigma GHHS
## FALSE FALSE
## Couples_Match Other_Degrees
## FALSE FALSE
## number_Research_Experiences number_Abstracts_Pres_Posters
## FALSE FALSE
## number_Peer_Rev_Publications number_Volunteer_Experiences
## FALSE FALSE
## number_Leadership_Positions number_Programs_Applied
## FALSE FALSE
## number_Interviews_Attended
## FALSE
The data are given below:
DT::datatable(all_data1, options = list(pageLength = 10))
# orginal response distribution
tmp <- table(all_data1$Match_Status)
match_rate <- (tmp[[2]]/(tmp[[2]] + tmp[[1]]))*100
match_rate
## [1] 92.89362
rm(tmp)
We can see that these data have 2350 observations of 21 features, and that about 92.9 percent of medical students applying to EM residency matched. Let’s create a few plots to get a sense of the data. Remember, the goal here will be to predict whether a given medical student will match into EM residency, based on the variables listed in the codebook.
## Check data types
sapply(all_data1, class)
## STAR_ID Survey_Year
## "numeric" "factor"
## Interview_Offer_Total Match_Status
## "numeric" "factor"
## Home_State Step_1_Score
## "factor" "numeric"
## Cumulative_Quartile Quartile_Rank
## "factor" "numeric"
## number_Honored_Clerkships Honors_A_This_Specialty
## "numeric" "factor"
## AOA_Sigma GHHS
## "factor" "factor"
## Couples_Match Other_Degrees
## "factor" "factor"
## number_Research_Experiences number_Abstracts_Pres_Posters
## "numeric" "numeric"
## number_Peer_Rev_Publications number_Volunteer_Experiences
## "numeric" "numeric"
## number_Leadership_Positions number_Programs_Applied
## "numeric" "numeric"
## number_Interviews_Attended
## "numeric"
Hmisc::describe(all_data1)
## all_data1
##
## 21 Variables 2350 Observations
## --------------------------------------------------------------------------------
## STAR_ID
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 2350 1 2.02e+09 1300887 2.018e+09 2.018e+09
## .25 .50 .75 .90 .95
## 2.019e+09 2.020e+09 2.021e+09 2.021e+09 2.021e+09
##
## lowest : 2017040001 2017040002 2017040003 2017040004 2017040005
## highest: 2021040718 2021040719 2021040720 2021040721 2021040722
##
## Value 2017050000 2018050000 2019050000 2020050000 2021050000
## Frequency 70 437 504 652 687
## Proportion 0.030 0.186 0.214 0.277 0.292
##
## For the frequency table, variable is rounded to the nearest 50000
## --------------------------------------------------------------------------------
## Survey_Year
## n missing distinct
## 2350 0 5
##
## lowest : 2017 2018 2019 2020 2021, highest: 2017 2018 2019 2020 2021
##
## Value 2017 2018 2019 2020 2021
## Frequency 70 437 504 652 687
## Proportion 0.030 0.186 0.214 0.277 0.292
## --------------------------------------------------------------------------------
## Interview_Offer_Total
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 100 0.999 24.5 17.95 4 7
## .25 .50 .75 .90 .95
## 13 21 32 46 57
##
## lowest : 0 1 2 3 4, highest: 133 134 180 192 249
## --------------------------------------------------------------------------------
## Match_Status
## n missing distinct
## 2350 0 2
##
## Value N Y
## Frequency 167 2183
## Proportion 0.071 0.929
## --------------------------------------------------------------------------------
## Home_State
## n missing distinct
## 2350 0 37
##
## lowest : AL AR AZ CA CT, highest: TX VA WA WI WV
## --------------------------------------------------------------------------------
## Step_1_Score
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 17 0.991 234.1 17.26 207 212
## .25 .50 .75 .90 .95
## 222 237 247 252 257
##
## lowest : 192 197 202 207 212, highest: 252 257 262 267 272
##
## Value 192 197 202 207 212 217 222 227 232 237 242
## Frequency 7 17 54 74 108 153 208 262 274 321 251
## Proportion 0.003 0.007 0.023 0.031 0.046 0.065 0.089 0.111 0.117 0.137 0.107
##
## Value 247 252 257 262 267 272
## Frequency 236 180 114 59 26 6
## Proportion 0.100 0.077 0.049 0.025 0.011 0.003
## --------------------------------------------------------------------------------
## Cumulative_Quartile
## n missing distinct
## 2350 0 5
##
## lowest : 1st 2nd 3rd 4th Unknown
## highest: 1st 2nd 3rd 4th Unknown
##
## Value 1st 2nd 3rd 4th Unknown
## Frequency 539 484 362 201 764
## Proportion 0.229 0.206 0.154 0.086 0.325
## --------------------------------------------------------------------------------
## Quartile_Rank
## n missing distinct Info Mean Gmd
## 2350 0 5 0.941 1.929 1.769
##
## lowest : 0 1 2 3 4, highest: 0 1 2 3 4
##
## Value 0 1 2 3 4
## Frequency 764 201 362 484 539
## Proportion 0.325 0.086 0.154 0.206 0.229
## --------------------------------------------------------------------------------
## number_Honored_Clerkships
## n missing distinct Info Mean Gmd
## 2350 0 9 0.983 3.155 2.686
##
## lowest : 0 1 2 3 4, highest: 4 5 6 7 8
##
## Value 0 1 2 3 4 5 6 7 8
## Frequency 390 308 347 315 303 230 206 138 113
## Proportion 0.166 0.131 0.148 0.134 0.129 0.098 0.088 0.059 0.048
## --------------------------------------------------------------------------------
## Honors_A_This_Specialty
## n missing distinct
## 2350 0 2
##
## Value No Yes
## Frequency 1071 1279
## Proportion 0.456 0.544
## --------------------------------------------------------------------------------
## AOA_Sigma
## n missing distinct
## 2350 0 3
##
## Value No No School Chapter Yes
## Frequency 1840 135 375
## Proportion 0.783 0.057 0.160
## --------------------------------------------------------------------------------
## GHHS
## n missing distinct
## 2350 0 3
##
## Value No No School Chapter Yes
## Frequency 1854 104 392
## Proportion 0.789 0.044 0.167
## --------------------------------------------------------------------------------
## Couples_Match
## n missing distinct
## 2350 0 2
##
## Value No Yes
## Frequency 2200 150
## Proportion 0.936 0.064
## --------------------------------------------------------------------------------
## Other_Degrees
## n missing distinct
## 2350 0 8
##
## lowest : MBA MDiv MEd MPH MSc
## highest: MPH MSc No additional degree Other PhD
##
## MBA (35, 0.015), MDiv (4, 0.002), MEd (14, 0.006), MPH (102, 0.043), MSc (153,
## 0.065), No additional degree (1887, 0.803), Other (133, 0.057), PhD (22, 0.009)
## --------------------------------------------------------------------------------
## number_Research_Experiences
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 12 0.971 2.919 2.217 0 1
## .25 .50 .75 .90 .95
## 2 3 4 5 7
##
## lowest : 0 1 2 3 4, highest: 7 8 9 10 11
##
## Value 0 1 2 3 4 5 6 7 8 9 10
## Frequency 215 368 542 477 322 200 101 49 29 3 4
## Proportion 0.091 0.157 0.231 0.203 0.137 0.085 0.043 0.021 0.012 0.001 0.002
##
## Value 11
## Frequency 40
## Proportion 0.017
## --------------------------------------------------------------------------------
## number_Abstracts_Pres_Posters
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 12 0.979 3.114 3.156 0 0
## .25 .50 .75 .90 .95
## 1 2 4 8 11
##
## lowest : 0 1 2 3 4, highest: 7 8 9 10 11
##
## Value 0 1 2 3 4 5 6 7 8 9 10
## Frequency 452 360 451 303 223 145 96 74 59 33 26
## Proportion 0.192 0.153 0.192 0.129 0.095 0.062 0.041 0.031 0.025 0.014 0.011
##
## Value 11
## Frequency 128
## Proportion 0.054
## --------------------------------------------------------------------------------
## number_Peer_Rev_Publications
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 12 0.901 1.383 1.813 0 0
## .25 .50 .75 .90 .95
## 0 1 2 4 5
##
## lowest : 0 1 2 3 4, highest: 7 8 9 10 11
##
## Value 0 1 2 3 4 5 6 7 8 9 10
## Frequency 1009 585 359 160 88 43 23 25 13 3 7
## Proportion 0.429 0.249 0.153 0.068 0.037 0.018 0.010 0.011 0.006 0.001 0.003
##
## Value 11
## Frequency 35
## Proportion 0.015
## --------------------------------------------------------------------------------
## number_Volunteer_Experiences
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 12 0.982 6.869 3.452 2 3
## .25 .50 .75 .90 .95
## 4 7 10 11 11
##
## lowest : 0 1 2 3 4, highest: 7 8 9 10 11
##
## Value 0 1 2 3 4 5 6 7 8 9 10
## Frequency 12 37 108 198 249 291 246 219 240 107 121
## Proportion 0.005 0.016 0.046 0.084 0.106 0.124 0.105 0.093 0.102 0.046 0.051
##
## Value 11
## Frequency 522
## Proportion 0.222
## --------------------------------------------------------------------------------
## number_Leadership_Positions
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 12 0.984 4.041 2.956 0 1
## .25 .50 .75 .90 .95
## 2 4 5 8 10
##
## lowest : 0 1 2 3 4, highest: 7 8 9 10 11
##
## Value 0 1 2 3 4 5 6 7 8 9 10
## Frequency 155 215 357 434 319 302 153 171 81 23 31
## Proportion 0.066 0.091 0.152 0.185 0.136 0.129 0.065 0.073 0.034 0.010 0.013
##
## Value 11
## Frequency 109
## Proportion 0.046
## --------------------------------------------------------------------------------
## number_Programs_Applied
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 138 0.999 49.22 23.92 21.45 28.00
## .25 .50 .75 .90 .95
## 35.00 45.00 60.00 77.00 92.00
##
## lowest : 1 2 3 4 5, highest: 160 169 176 191 192
## --------------------------------------------------------------------------------
## number_Interviews_Attended
## n missing distinct Info Mean Gmd .05 .10
## 2350 0 35 0.994 12.98 5.182 4 7
## .25 .50 .75 .90 .95
## 11 13 16 18 20
##
## lowest : 0 1 2 3 4, highest: 30 31 32 33 36
## --------------------------------------------------------------------------------
A nice data summary is available from the skim
package.
#skimr::skim(all_data1)
## Warning in breaks[-1L] + breaks[-nB]: NAs produced by integer overflow
Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STAR_ID [numeric] | Mean (sd) : 2019656896 (1171521) min < med < max: 2017040001 < 2020040172 < 2021040722 IQR (CV) : 2000020 (0) | 2350 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Survey_Year [factor] | 1. 2017 2. 2018 3. 2019 4. 2020 5. 2021 |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Interview_Offer_Total [numeric] | Mean (sd) : 24.5 (18.07) min < med < max: 0 < 21 < 249 IQR (CV) : 19 (0.74) | 100 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Match_Status [factor] | 1. N 2. Y |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home_State [factor] | 1. AL 2. AR 3. AZ 4. CA 5. CT 6. DC 7. FL 8. GA 9. IA 10. ID [ 27 others ] |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Step_1_Score [numeric] | Mean (sd) : 234.06 (15.24) min < med < max: 192 < 237 < 272 IQR (CV) : 25 (0.07) | 17 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Cumulative_Quartile [factor] | 1. 1st 2. 2nd 3. 3rd 4. 4th 5. Unknown |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Quartile_Rank [numeric] | Mean (sd) : 1.93 (1.58) min < med < max: 0 < 2 < 4 IQR (CV) : 3 (0.82) |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Honored_Clerkships [numeric] | Mean (sd) : 3.16 (2.37) min < med < max: 0 < 3 < 8 IQR (CV) : 4 (0.75) |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Honors_A_This_Specialty [factor] | 1. No 2. Yes |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
AOA_Sigma [factor] | 1. No 2. No School Chapter 3. Yes |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
GHHS [factor] | 1. No 2. No School Chapter 3. Yes |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Couples_Match [factor] | 1. No 2. Yes |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Other_Degrees [factor] | 1. MBA 2. MDiv 3. MEd 4. MPH 5. MSc 6. No additional degree 7. Other 8. PhD |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Research_Experiences [numeric] | Mean (sd) : 2.92 (2.1) min < med < max: 0 < 3 < 11 IQR (CV) : 2 (0.72) | 12 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Abstracts_Pres_Posters [numeric] | Mean (sd) : 3.11 (2.98) min < med < max: 0 < 2 < 11 IQR (CV) : 3 (0.96) | 12 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Peer_Rev_Publications [numeric] | Mean (sd) : 1.38 (2.01) min < med < max: 0 < 1 < 11 IQR (CV) : 2 (1.46) | 12 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Volunteer_Experiences [numeric] | Mean (sd) : 6.87 (3.03) min < med < max: 0 < 7 < 11 IQR (CV) : 6 (0.44) | 12 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Leadership_Positions [numeric] | Mean (sd) : 4.04 (2.69) min < med < max: 0 < 4 < 11 IQR (CV) : 3 (0.67) | 12 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Programs_Applied [numeric] | Mean (sd) : 49.22 (23.05) min < med < max: 1 < 45 < 192 IQR (CV) : 25 (0.47) | 138 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
number_Interviews_Attended [numeric] | Mean (sd) : 12.98 (4.79) min < med < max: 0 < 13 < 36 IQR (CV) : 5 (0.37) | 35 distinct values | 0 (0.0%) |
Generated by summarytools 0.9.9 (R version 4.1.0)
2021-07-13
The new data set in all_data1
, includes 2350 rows and 21 columns. The two thousand three hundred fifty applicants data are missing zero values.
#plot_str(all_data) #COOL BUT USELESS HERE
DataExplorer::plot_missing(all_data1,
ggtheme = theme_gray(),
theme_config = list(),
title = "DataExplorer NA Plot")
The hidden code has an additional four different ways to check for missingness in the data.
After the data check was completed, an exploratory data analysis (EDA) was conducted to look for interesting relationships among the variables. Histograms were used to visualize distributions among predictors. Since the outcome of Matching is a classification problem, relationships between predictors and the dichotomous outcome were also performed.
Categorical and numerical variable plots:
#General Data Description, nice start for overview
inspect_cat_plot <- inspectdf::inspect_cat(all_data1) %>% inspectdf::show_plot()
inspect_cat_plot
tm_ggsave(object = inspect_cat_plot, filename = "inspect_cat_plot.tiff")
## [1] "Function Sanity Check: Saving a ggplot image as a TIFF"
funModeling::freq(data=all_data1, plot = TRUE, na.rm = FALSE)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Survey_Year frequency percentage cumulative_perc
## 1 2021 687 29.23 29.23
## 2 2020 652 27.74 56.97
## 3 2019 504 21.45 78.42
## 4 2018 437 18.60 97.02
## 5 2017 70 2.98 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Match_Status frequency percentage cumulative_perc
## 1 Y 2183 92.89 92.89
## 2 N 167 7.11 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Home_State frequency percentage cumulative_perc
## 1 TX 356 15.15 15.15
## 2 NY 229 9.74 24.89
## 3 PA 175 7.45 32.34
## 4 FL 142 6.04 38.38
## 5 IL 115 4.89 43.27
## 6 OH 112 4.77 48.04
## 7 MI 100 4.26 52.30
## 8 NC 85 3.62 55.92
## 9 VA 85 3.62 59.54
## 10 SC 82 3.49 63.03
## 11 LA 75 3.19 66.22
## 12 MA 69 2.94 69.16
## 13 CA 65 2.77 71.93
## 14 DC 65 2.77 74.70
## 15 KY 59 2.51 77.21
## 16 WA 56 2.38 79.59
## 17 WI 55 2.34 81.93
## 18 MN 53 2.26 84.19
## 19 TN 52 2.21 86.40
## 20 AZ 45 1.91 88.31
## 21 NJ 42 1.79 90.10
## 22 CT 37 1.57 91.67
## 23 ID 30 1.28 92.95
## 24 GA 27 1.15 94.10
## 25 OK 21 0.89 94.99
## 26 IA 20 0.85 95.84
## 27 NE 19 0.81 96.65
## 28 MO 18 0.77 97.42
## 29 MS 15 0.64 98.06
## 30 NV 10 0.43 98.49
## 31 WV 10 0.43 98.92
## 32 AL 9 0.38 99.30
## 33 NM 6 0.26 99.56
## 34 SD 4 0.17 99.73
## 35 AR 3 0.13 99.86
## 36 MD 2 0.09 99.95
## 37 ND 2 0.09 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Cumulative_Quartile frequency percentage cumulative_perc
## 1 Unknown 764 32.51 32.51
## 2 1st 539 22.94 55.45
## 3 2nd 484 20.60 76.05
## 4 3rd 362 15.40 91.45
## 5 4th 201 8.55 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Honors_A_This_Specialty frequency percentage cumulative_perc
## 1 Yes 1279 54.43 54.43
## 2 No 1071 45.57 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## AOA_Sigma frequency percentage cumulative_perc
## 1 No 1840 78.30 78.30
## 2 Yes 375 15.96 94.26
## 3 No School Chapter 135 5.74 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## GHHS frequency percentage cumulative_perc
## 1 No 1854 78.89 78.89
## 2 Yes 392 16.68 95.57
## 3 No School Chapter 104 4.43 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Couples_Match frequency percentage cumulative_perc
## 1 No 2200 93.62 93.62
## 2 Yes 150 6.38 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Other_Degrees frequency percentage cumulative_perc
## 1 No additional degree 1887 80.30 80.30
## 2 MSc 153 6.51 86.81
## 3 Other 133 5.66 92.47
## 4 MPH 102 4.34 96.81
## 5 MBA 35 1.49 98.30
## 6 PhD 22 0.94 99.24
## 7 MEd 14 0.60 99.84
## 8 MDiv 4 0.17 100.00
## [1] "Variables processed: Survey_Year, Match_Status, Home_State, Cumulative_Quartile, Honors_A_This_Specialty, AOA_Sigma, GHHS, Couples_Match, Other_Degrees"
$page_1
[1] “Function Sanity Check: Saving a ggplot image as a TIFF”
create_profiling_num_output <- create_profiling_num(all_data1)
## [1] "Function Sanity Check: Plot Numeric Features"
all_data1 %>% mosaic::inspect() #another good option
##
## categorical variables:
## name class levels n missing
## 1 Survey_Year factor 5 2350 0
## 2 Match_Status factor 2 2350 0
## 3 Home_State factor 37 2350 0
## 4 Cumulative_Quartile factor 5 2350 0
## 5 Honors_A_This_Specialty factor 2 2350 0
## 6 AOA_Sigma factor 3 2350 0
## 7 GHHS factor 3 2350 0
## 8 Couples_Match factor 2 2350 0
## 9 Other_Degrees factor 8 2350 0
## distribution
## 1 2021 (29.2%), 2020 (27.7%) ...
## 2 Y (92.9%), N (7.1%)
## 3 TX (15.1%), NY (9.7%), PA (7.4%) ...
## 4 Unknown (32.5%), 1st (22.9%) ...
## 5 Yes (54.4%), No (45.6%)
## 6 No (78.3%), Yes (16%) ...
## 7 No (78.9%), Yes (16.7%) ...
## 8 No (93.6%), Yes (6.4%)
## 9 No additional degree (80.3%) ...
##
## quantitative variables:
## name class min Q1 median
## ...1 STAR_ID numeric 2017040001 2019040083 2020040172
## ...2 Interview_Offer_Total numeric 0 13 21
## ...3 Step_1_Score numeric 192 222 237
## ...4 Quartile_Rank numeric 0 0 2
## ...5 number_Honored_Clerkships numeric 0 1 3
## ...6 number_Research_Experiences numeric 0 2 3
## ...7 number_Abstracts_Pres_Posters numeric 0 1 2
## ...8 number_Peer_Rev_Publications numeric 0 0 1
## ...9 number_Volunteer_Experiences numeric 0 4 7
## ...10 number_Leadership_Positions numeric 0 2 4
## ...11 number_Programs_Applied numeric 1 35 45
## ...12 number_Interviews_Attended numeric 0 11 13
## Q3 max mean sd n missing
## ...1 2021040104 2021040722 2.019657e+09 1.171521e+06 2350 0
## ...2 32 249 2.449872e+01 1.806869e+01 2350 0
## ...3 247 272 2.340574e+02 1.524495e+01 2350 0
## ...4 3 4 1.928936e+00 1.582839e+00 2350 0
## ...5 5 8 3.155319e+00 2.365344e+00 2350 0
## ...6 4 11 2.918723e+00 2.097808e+00 2350 0
## ...7 4 11 3.114468e+00 2.983437e+00 2350 0
## ...8 2 11 1.382553e+00 2.012634e+00 2350 0
## ...9 10 11 6.868936e+00 3.025902e+00 2350 0
## ...10 5 11 4.040851e+00 2.694465e+00 2350 0
## ...11 60 192 4.921617e+01 2.305222e+01 2350 0
## ...12 16 36 1.298043e+01 4.786684e+00 2350 0
readr::write_csv(create_profiling_num_output, (here::here("results", "create_profiling_num_output.csv")))
DataExplorer::plot_boxplot(all_data1, by = "Match_Status",
ggtheme = theme_gray(),
theme_config = list(),
nrow = 10L,
ncol = 2L,
title = "DataExplorer of Variables")
Table: Applicant Descriptive Variables by Matched or Did Not Match from 2017 to 2021
# Draws a nice table one plot
tm_arsenal_table <- function(df, by){
print("Function Sanity Check: Create Arsenal Table using arsenal package")
table_variable_within_function <- arsenal::tableby(formula = by ~ .,
data=df, control = arsenal::tableby.control(test = TRUE,
total = F,
digits = 1L,
digits.p = 2L,
digits.count = 0L,
numeric.simplify = F,
numeric.stats =
c("median",
"q1q3"),
cat.stats =
c("Nmiss",
"countpct"),
stats.labels = list(Nmiss = "N Missing",
Nmiss2 ="N Missing",
meansd = "Mean (SD)",
medianrange = "Median (Range)",
median ="Median",
medianq1q3 = "Median (Q1, Q3)",
q1q3 = "Q1, Q3",
iqr = "IQR",
range = "Range",
countpct = "Count (Pct)",
Nevents = "Events",
medSurv ="Median Survival",
medTime = "Median Follow-Up")))
final <- summary(table_variable_within_function,
text=T,
title = 'Table: Applicant Descriptive Variables by Matched or Did Not Match from 2017 to 2021',
#labelTranslations = mylabels, #Seen in additional functions file
pfootnote=TRUE)
return(final)
}
tm_arsenal_table_output <- tm_arsenal_table(df = all_data1 %>% select(-STAR_ID), by = all_data1$Match_Status)
[1] “Function Sanity Check: Create Arsenal Table using arsenal package”
tm_arsenal_table_output
N (N=167) | Y (N=2183) | p value | |
---|---|---|---|
Survey_Year | < 0.01 (1) | ||
- 2017 | 5 (3.0%) | 65 (3.0%) | |
- 2018 | 55 (32.9%) | 382 (17.5%) | |
- 2019 | 41 (24.6%) | 463 (21.2%) | |
- 2020 | 53 (31.7%) | 599 (27.4%) | |
- 2021 | 13 (7.8%) | 674 (30.9%) | |
Interview_Offer_Total | < 0.01 (2) | ||
- Median | 7.0 | 22.0 | |
- Q1, Q3 | 3.5, 17.5 | 13.0, 32.5 | |
Match_Status | < 0.01 (1) | ||
- N | 167 (100.0%) | 0 (0.0%) | |
- Y | 0 (0.0%) | 2183 (100.0%) | |
Home_State | < 0.01 (1) | ||
- AL | 0 (0.0%) | 9 (0.4%) | |
- AR | 0 (0.0%) | 3 (0.1%) | |
- AZ | 1 (0.6%) | 44 (2.0%) | |
- CA | 3 (1.8%) | 62 (2.8%) | |
- CT | 4 (2.4%) | 33 (1.5%) | |
- DC | 8 (4.8%) | 57 (2.6%) | |
- FL | 15 (9.0%) | 127 (5.8%) | |
- GA | 1 (0.6%) | 26 (1.2%) | |
- IA | 0 (0.0%) | 20 (0.9%) | |
- ID | 0 (0.0%) | 30 (1.4%) | |
- IL | 20 (12.0%) | 95 (4.4%) | |
- KY | 4 (2.4%) | 55 (2.5%) | |
- LA | 11 (6.6%) | 64 (2.9%) | |
- MA | 2 (1.2%) | 67 (3.1%) | |
- MD | 0 (0.0%) | 2 (0.1%) | |
- MI | 7 (4.2%) | 93 (4.3%) | |
- MN | 7 (4.2%) | 46 (2.1%) | |
- MO | 1 (0.6%) | 17 (0.8%) | |
- MS | 0 (0.0%) | 15 (0.7%) | |
- NC | 4 (2.4%) | 81 (3.7%) | |
- ND | 1 (0.6%) | 1 (0.0%) | |
- NE | 1 (0.6%) | 18 (0.8%) | |
- NJ | 1 (0.6%) | 41 (1.9%) | |
- NM | 0 (0.0%) | 6 (0.3%) | |
- NV | 1 (0.6%) | 9 (0.4%) | |
- NY | 12 (7.2%) | 217 (9.9%) | |
- OH | 6 (3.6%) | 106 (4.9%) | |
- OK | 1 (0.6%) | 20 (0.9%) | |
- PA | 8 (4.8%) | 167 (7.7%) | |
- SC | 6 (3.6%) | 76 (3.5%) | |
- SD | 0 (0.0%) | 4 (0.2%) | |
- TN | 4 (2.4%) | 48 (2.2%) | |
- TX | 29 (17.4%) | 327 (15.0%) | |
- VA | 3 (1.8%) | 82 (3.8%) | |
- WA | 3 (1.8%) | 53 (2.4%) | |
- WI | 2 (1.2%) | 53 (2.4%) | |
- WV | 1 (0.6%) | 9 (0.4%) | |
Step_1_Score | < 0.01 (2) | ||
- Median | 227.0 | 237.0 | |
- Q1, Q3 | 217.0, 242.0 | 222.0, 247.0 | |
Cumulative_Quartile | < 0.01 (1) | ||
- 1st | 28 (16.8%) | 511 (23.4%) | |
- 2nd | 37 (22.2%) | 447 (20.5%) | |
- 3rd | 29 (17.4%) | 333 (15.3%) | |
- 4th | 27 (16.2%) | 174 (8.0%) | |
- Unknown | 46 (27.5%) | 718 (32.9%) | |
Quartile_Rank | 0.47 (2) | ||
- Median | 2.0 | 2.0 | |
- Q1, Q3 | 0.0, 3.0 | 0.0, 3.0 | |
number_Honored_Clerkships | < 0.01 (2) | ||
- Median | 2.0 | 3.0 | |
- Q1, Q3 | 0.0, 4.0 | 1.0, 5.0 | |
Honors_A_This_Specialty | < 0.01 (1) | ||
- No | 103 (61.7%) | 968 (44.3%) | |
- Yes | 64 (38.3%) | 1215 (55.7%) | |
AOA_Sigma | 0.06 (1) | ||
- No | 143 (85.6%) | 1697 (77.7%) | |
- No School Chapter | 7 (4.2%) | 128 (5.9%) | |
- Yes | 17 (10.2%) | 358 (16.4%) | |
GHHS | 0.26 (1) | ||
- No | 140 (83.8%) | 1714 (78.5%) | |
- No School Chapter | 5 (3.0%) | 99 (4.5%) | |
- Yes | 22 (13.2%) | 370 (16.9%) | |
Couples_Match | 0.06 (1) | ||
- No | 162 (97.0%) | 2038 (93.4%) | |
- Yes | 5 (3.0%) | 145 (6.6%) | |
Other_Degrees | 0.22 (1) | ||
- MBA | 4 (2.4%) | 31 (1.4%) | |
- MDiv | 0 (0.0%) | 4 (0.2%) | |
- MEd | 1 (0.6%) | 13 (0.6%) | |
- MPH | 5 (3.0%) | 97 (4.4%) | |
- MSc | 15 (9.0%) | 138 (6.3%) | |
- No additional degree | 125 (74.9%) | 1762 (80.7%) | |
- Other | 16 (9.6%) | 117 (5.4%) | |
- PhD | 1 (0.6%) | 21 (1.0%) | |
number_Research_Experiences | 0.20 (2) | ||
- Median | 3.0 | 3.0 | |
- Q1, Q3 | 1.0, 4.0 | 2.0, 4.0 | |
number_Abstracts_Pres_Posters | 0.09 (2) | ||
- Median | 2.0 | 2.0 | |
- Q1, Q3 | 1.0, 4.0 | 1.0, 4.0 | |
number_Peer_Rev_Publications | 0.64 (2) | ||
- Median | 0.0 | 1.0 | |
- Q1, Q3 | 0.0, 2.0 | 0.0, 2.0 | |
number_Volunteer_Experiences | < 0.01 (2) | ||
- Median | 5.0 | 7.0 | |
- Q1, Q3 | 4.0, 9.0 | 4.0, 10.0 | |
number_Leadership_Positions | 0.44 (2) | ||
- Median | 3.0 | 4.0 | |
- Q1, Q3 | 2.0, 5.5 | 2.0, 5.0 | |
number_Programs_Applied | < 0.01 (2) | ||
- Median | 50.0 | 45.0 | |
- Q1, Q3 | 34.0, 70.0 | 35.0, 60.0 | |
number_Interviews_Attended | < 0.01 (2) | ||
- Median | 10.0 | 13.0 | |
- Q1, Q3 | 5.0, 14.0 | 11.0, 16.0 |
#tm_write2word(tm_arsenal_table_output, "tm_arsenal_table_output1")
#tm_write2pdf(tm_arsenal_table_output, "tm_arsenal_table_output1")
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] exploratory_6.6.3.1 anonymizer_0.2.2
## [3] rgl_0.106.8 MLmetrics_1.1.1
## [5] plotROC_2.2.1 R.methodsS3_1.8.1
## [7] ranger_0.12.1 psych_2.1.6
## [9] correlationfunnel_0.2.0 fs_1.5.0
## [11] remotes_2.4.0 summarytools_0.9.9
## [13] doMC_1.3.7 discrim_0.1.2
## [15] RSQLite_2.2.7 odbc_1.3.2
## [17] doParallel_1.0.16 iterators_1.0.13
## [19] factoextra_1.0.7 corrgram_1.14
## [21] ezknitr_0.6 vip_0.3.2
## [23] BH_1.75.0-0 plotly_4.9.4.1
## [25] shinyWidgets_0.6.0 shinyjs_2.0.0
## [27] flexdashboard_0.5.2 lime_0.5.2
## [29] cowplot_1.1.1 tidyquant_1.0.3
## [31] quantmod_0.4.18 TTR_0.24.2
## [33] PerformanceAnalytics_2.0.4 xts_0.12.1
## [35] yardstick_0.0.8 workflowsets_0.0.2
## [37] workflows_0.2.2 tune_0.1.5
## [39] rsample_0.1.0 recipes_0.1.16
## [41] parsnip_0.1.6 modeldata_0.1.0
## [43] infer_0.5.4 dials_0.0.9
## [45] tidymodels_0.1.3 mltools_0.3.5
## [47] broom_0.7.6 mosaic_1.8.3
## [49] ggridges_0.5.3 mosaicData_0.20.2
## [51] ggformula_0.10.1 ggstance_0.3.5
## [53] english_1.2-5 naniar_0.6.1
## [55] skimr_2.1.3 data.table_1.14.0
## [57] naivebayes_0.9.7 kernlab_0.9-29
## [59] kableExtra_1.3.4 RColorBrewer_1.1-2
## [61] rpart.plot_3.0.9 visNetwork_2.0.9
## [63] drake_7.13.2 knitr_1.33
## [65] tableone_0.12.0 ggpubr_0.4.0
## [67] inspectdf_0.0.11 rsconnect_0.8.18
## [69] DataExplorer_0.8.2 labeling_0.4.2
## [71] highr_0.9 vctrs_0.3.8
## [73] progress_1.2.2 corrplot_0.89
## [75] Rmisc_1.5 plyr_1.8.6
## [77] caretEnsemble_2.0.1 tinytex_0.32
## [79] funModeling_1.9.4 rmda_1.6
## [81] rattle_5.4.0 bitops_1.0-7
## [83] rmarkdown_2.9 ResourceSelection_0.3-5
## [85] lmtest_0.9-38 zoo_1.8-9
## [87] utf8_1.2.1 fansi_0.5.0
## [89] beepr_1.3 rpart_4.1-15
## [91] mice_3.13.0 car_3.0-10
## [93] carData_3.0-4 MatchIt_4.2.0
## [95] leaps_3.1 moments_0.14
## [97] pander_0.6.4 arsenal_3.6.3
## [99] gbm_2.1.8 DescTools_0.99.41
## [101] scoring_0.6 pscl_1.5.5
## [103] InformationValue_1.2.3 tidylog_1.0.2
## [105] ggforce_0.3.3 glmnet_4.1-2
## [107] Matrix_1.3-4 Boruta_7.0.0
## [109] fastAdaboost_1.0.0 earth_5.3.0
## [111] plotmo_3.6.0 TeachingDemos_2.12
## [113] plotrix_3.8-1 shiny_1.6.0
## [115] AppliedPredictiveModeling_1.1-7 RANN_2.6.1
## [117] Metrics_0.1.4 xgboost_1.4.1.1
## [119] ipred_0.9-11 randomForest_4.6-14
## [121] mlbench_2.1-3 caTools_1.18.2
## [123] DynNom_5.0.1 magrittr_2.0.1
## [125] packrat_0.6.0 nnet_7.3-16
## [127] ROCR_1.0-11 pROC_1.17.0.1
## [129] rms_6.2-0 SparseM_1.81
## [131] Hmisc_4.5-0 Formula_1.2-4
## [133] survival_3.2-11 PASWR_1.1
## [135] MASS_7.3-54 e1071_1.7-7
## [137] foreach_1.5.1 tidyverse_1.3.1
## [139] rgdal_1.5-23 sp_1.4-5
## [141] scales_1.1.1 munsell_0.5.0
## [143] bit64_4.0.5 bit_4.0.4
## [145] tibble_3.1.2 RcppRoll_0.3.0
## [147] forcats_0.5.1 openxlsx_4.2.4
## [149] stringr_1.4.0 tidyr_1.1.3
## [151] hms_1.1.0 lubridate_1.7.10
## [153] janitor_2.1.0 magick_2.7.2
## [155] dplyr_1.0.7 readr_1.4.0.1
## [157] purrr_0.3.4 devtools_2.4.2
## [159] usethis_2.0.1 reshape2_1.4.4
## [161] XML_3.99-0.6 readxl_1.3.1
## [163] caret_6.0-88 ggplot2_3.3.5
## [165] lattice_0.20-44 here_1.0.1
##
## loaded via a namespace (and not attached):
## [1] storr_1.2.5 clisymbols_1.2.0 mitools_2.4
## [4] pbapply_1.4-3 haven_2.4.1 tcltk_4.1.0
## [7] expm_0.999-6 blob_1.2.1 prodlim_2019.11.13
## [10] later_1.2.0 DBI_1.1.1 jpeg_0.1-8.1
## [13] MatrixModels_0.5-0 htmlwidgets_1.5.3 mvtnorm_1.1-1
## [16] future_1.21.0 Rcpp_1.0.7 DT_0.18
## [19] promises_1.2.0.1 pkgload_1.2.1 leaflet_2.0.4.1
## [22] textshaping_0.3.5 mnormt_2.0.2 digest_0.6.27
## [25] png_0.1-7 polspline_1.1.19 pkgconfig_2.0.3
## [28] gower_0.2.2 GPfit_1.0-8 xfun_0.24
## [31] bslib_0.2.5.1 tidyselect_1.1.1 labelled_2.8.0
## [34] viridisLite_0.4.0 pkgbuild_1.2.0 rlang_0.4.11
## [37] manipulateWidget_0.11.0 jquerylib_0.1.4 glue_1.4.2
## [40] pryr_0.1.4 lhs_1.1.1 modelr_0.1.8
## [43] matrixStats_0.58.0 lava_1.6.9 ggsignif_0.6.2
## [46] httpuv_1.6.1 class_7.3-19 Rttf2pt1_1.3.8
## [49] TH.data_1.0-10 CORElearn_1.56.0 webshot_0.5.2
## [52] jsonlite_1.7.2 tmvnsim_1.0-2 mime_0.10
## [55] systemfonts_1.0.2 gridExtra_2.3 Exact_2.1
## [58] stringi_1.6.2 processx_3.5.2 survey_4.0
## [61] quadprog_1.5-8 cli_3.0.0 rstudioapi_0.13
## [64] nlme_3.1-152 listenv_0.8.0 miniUI_0.1.1.1
## [67] dbplyr_2.1.1 entropy_1.3.0 sessioninfo_1.1.1
## [70] lifecycle_1.0.0 networkD3_0.4 mosaicCore_0.9.0
## [73] timeDate_3043.102 Quandl_2.10.0 ggfittext_0.9.1
## [76] cellranger_1.1.0 codetools_0.2-18 triebeard_0.3.0
## [79] htmlTable_2.2.1 xtable_1.8-4 abind_1.4-5
## [82] farver_2.1.0 parallelly_1.25.0 rapportools_1.0
## [85] BBmisc_1.11 visdat_0.5.3 compare_0.2-6
## [88] base64url_1.4 ggdendro_0.1.22 cluster_2.1.2
## [91] extrafontdb_1.0 ellipsis_0.3.2 prettyunits_1.1.1
## [94] reprex_2.0.0 igraph_1.2.6 testthat_3.0.2
## [97] htmltools_0.5.1.1 yaml_2.2.1 pkgdown_1.6.1
## [100] ModelMetrics_1.2.2.2 foreign_0.8-81 withr_2.4.2
## [103] rootSolve_1.8.2.1 multcomp_1.4-17 ragg_1.1.3
## [106] prediction_0.3.14 memoise_2.0.0 evaluate_0.14
## [109] rio_0.5.26 extrafont_0.17 callr_3.7.0
## [112] lmom_2.8 ps_1.6.0 curl_4.3.1
## [115] urltools_1.7.3 furrr_0.2.3 conquer_1.0.2
## [118] checkmate_2.0.0 cachem_1.0.5 desc_1.3.0
## [121] ellipse_0.4.2 rstatix_0.7.0 stargazer_5.2.2
## [124] ggrepel_0.9.1 dtw_1.22-3 rprojroot_2.0.2
## [127] tools_4.1.0 sass_0.4.0 sandwich_3.0-1
## [130] proxy_0.4-26 xml2_1.3.2 httr_1.4.2
## [133] assertthat_0.2.1 boot_1.3-28 globals_0.14.0
## [136] R6_2.5.0 shape_1.4.6 repr_1.1.3
## [139] splines_4.1.0 snakecase_0.11.0 colorspace_2.0-2
## [142] generics_0.1.0 stats4_4.1.0 base64enc_0.1-3
## [145] pillar_1.6.1 txtq_0.2.4 tweenr_1.0.2
## [148] audio_0.1-7 gtable_0.3.0 rvest_1.0.0
## [151] zip_2.1.1 latticeExtra_0.6-29 fastmap_1.1.0
## [154] crosstalk_1.1.1 quantreg_5.86 filelock_1.0.2
## [157] backports_1.2.1 gld_2.6.2 polyclip_1.10-0
## [160] grid_4.1.0 DiceDesign_1.9 lazyeval_0.2.2
## [163] crayon_1.4.1 reshape_0.8.8 svglite_2.0.0
## [166] compiler_4.1.0
pacman::p_loaded()
## [1] "exploratory" "anonymizer"
## [3] "rgl" "MLmetrics"
## [5] "plotROC" "R.methodsS3"
## [7] "ranger" "psych"
## [9] "correlationfunnel" "fs"
## [11] "remotes" "summarytools"
## [13] "doMC" "discrim"
## [15] "RSQLite" "odbc"
## [17] "doParallel" "iterators"
## [19] "factoextra" "corrgram"
## [21] "ezknitr" "vip"
## [23] "BH" "plotly"
## [25] "shinyWidgets" "shinyjs"
## [27] "flexdashboard" "lime"
## [29] "cowplot" "tidyquant"
## [31] "quantmod" "TTR"
## [33] "PerformanceAnalytics" "xts"
## [35] "yardstick" "workflowsets"
## [37] "workflows" "tune"
## [39] "rsample" "recipes"
## [41] "parsnip" "modeldata"
## [43] "infer" "dials"
## [45] "tidymodels" "mltools"
## [47] "broom" "mosaic"
## [49] "ggridges" "mosaicData"
## [51] "ggformula" "ggstance"
## [53] "english" "naniar"
## [55] "skimr" "data.table"
## [57] "naivebayes" "kernlab"
## [59] "kableExtra" "RColorBrewer"
## [61] "rpart.plot" "visNetwork"
## [63] "drake" "knitr"
## [65] "tableone" "ggpubr"
## [67] "inspectdf" "rsconnect"
## [69] "DataExplorer" "labeling"
## [71] "highr" "vctrs"
## [73] "progress" "corrplot"
## [75] "Rmisc" "plyr"
## [77] "caretEnsemble" "tinytex"
## [79] "funModeling" "rmda"
## [81] "rattle" "bitops"
## [83] "rmarkdown" "ResourceSelection"
## [85] "lmtest" "zoo"
## [87] "utf8" "fansi"
## [89] "beepr" "rpart"
## [91] "mice" "car"
## [93] "carData" "MatchIt"
## [95] "leaps" "moments"
## [97] "pander" "arsenal"
## [99] "gbm" "DescTools"
## [101] "scoring" "pscl"
## [103] "InformationValue" "tidylog"
## [105] "ggforce" "glmnet"
## [107] "Matrix" "Boruta"
## [109] "fastAdaboost" "earth"
## [111] "plotmo" "TeachingDemos"
## [113] "plotrix" "shiny"
## [115] "AppliedPredictiveModeling" "RANN"
## [117] "Metrics" "xgboost"
## [119] "ipred" "randomForest"
## [121] "mlbench" "caTools"
## [123] "DynNom" "magrittr"
## [125] "packrat" "nnet"
## [127] "ROCR" "pROC"
## [129] "rms" "SparseM"
## [131] "Hmisc" "Formula"
## [133] "survival" "PASWR"
## [135] "MASS" "e1071"
## [137] "foreach" "tidyverse"
## [139] "rgdal" "sp"
## [141] "scales" "munsell"
## [143] "bit64" "bit"
## [145] "tibble" "RcppRoll"
## [147] "forcats" "openxlsx"
## [149] "stringr" "tidyr"
## [151] "hms" "lubridate"
## [153] "janitor" "magick"
## [155] "dplyr" "readr"
## [157] "purrr" "devtools"
## [159] "usethis" "reshape2"
## [161] "XML" "readxl"
## [163] "caret" "ggplot2"
## [165] "lattice" "here"