To summarize what we discussed, we are interested in knowing if the answers to questions 12 ("fellowship_yn"), 15 ("enfolded_postgrad_yn"), 16 ("private_academic"), 17 ("fellowship_years"), and 18 ("fellowship_field") differ based on year of training ("current_year"), debt ("debt"), gender ("gender"), race ("race"), age ("age"), marital status ("marital_status"), and family planning ("children").

Went from 265 to 257 rows because 8 cases from the original csv were removed due to having 50% or fewer of the questions answered.

read in and format data

df <- read.csv("maggie2.csv")
df <- df[!is.na(df$fellowship_yn),]


df2 <- df[c(2,7,20,22:23,31:33,35,37:39)]

df2$pgy_cat <- ifelse(df2$current_year==1|df2$current_year==2, 'junior', ifelse(df2$current_year==7|df2$current_year==6, "senior", "midlevel"))

df2$fellow_cat <- ifelse(df2$fellowship_yn==1|df2$fellowship_yn==2, 'Probably or Definitely Yes', ifelse(df2$fellowship_yn==4|df2$fellowship_yn==5, "Probably or Definitely No", "Undecided"))

df2$age_cat <- ifelse(df2$age==2, '22-25 y/o', ifelse(df2$age==3, "26-30 y/o", ifelse(df2$age==4, "31-35 y/o", "36-45 y/o")))

pgy x responses

df2 %>%
  select(c(2,3,4,13,14))   %>%
  tbl_summary(
    by = pgy_cat,
    missing_text = "(Missing)")%>%
  bold_labels() %>%
  add_p()
Characteristic junior, N = 571 midlevel, N = 1371 senior, N = 481 p-value2
enfolded_postgrad_yn 0.013
    1 8 (14%) 32 (23%) 11 (23%)
    2 29 (51%) 74 (54%) 17 (35%)
    3 19 (33%) 21 (15%) 17 (35%)
    4 1 (1.8%) 10 (7.3%) 3 (6.2%)
private_academic 0.8
    1 24 (42%) 55 (41%) 18 (38%)
    2 14 (25%) 43 (32%) 11 (23%)
    3 12 (21%) 22 (16%) 12 (25%)
    4 7 (12%) 14 (10%) 7 (15%)
    5 0 (0%) 1 (0.7%) 0 (0%)
    (Missing) 0 2 0
fellowship_years
    1 19 (33%) 38 (28%) 26 (54%)
    2 21 (37%) 79 (58%) 13 (27%)
    3 16 (28%) 10 (7.3%) 6 (12%)
    4 1 (1.8%) 10 (7.3%) 3 (6.2%)
fellow_cat 0.4
    Probably or Definitely No 1 (1.8%) 6 (4.4%) 3 (6.2%)
    Probably or Definitely Yes 52 (91%) 110 (80%) 39 (81%)
    Undecided 4 (7.0%) 21 (15%) 6 (12%)
1 n (%)
2 Fisher's exact test

debt x responses

df2 %>%
  filter(debt!='6') %>%
  select(c(2,3,4,7,14))   %>%
  tbl_summary(
    by = debt,
    missing_text = "(Missing)")%>%
  bold_labels() %>%
  add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 4L, 3L, 3L, 1L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=6028169.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 1L, 2L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=18961715.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 4L, 1L, 3L, 1L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=4838333.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellow_cat', p-value omitted:
Error in stats::fisher.test(c("Probably or Definitely No", "Probably or Definitely Yes", : FEXACT error 7(location). LDSTP=18630 is too small for this problem,
  (pastp=63.7491, ipn_0:=ipoin[itp=305]=3209, stp[ipn_0]=64.5464).
Increase workspace or consider using 'simulate.p.value=TRUE'
Characteristic 1, N = 551 2, N = 751 3, N = 431 4, N = 261 5, N = 511 p-value
enfolded_postgrad_yn
    1 14 (25%) 11 (15%) 12 (28%) 6 (23%) 14 (27%)
    2 15 (27%) 56 (75%) 24 (56%) 4 (15%) 23 (45%)
    3 18 (33%) 8 (11%) 7 (16%) 14 (54%) 9 (18%)
    4 8 (15%) 0 (0%) 0 (0%) 2 (7.7%) 5 (9.8%)
private_academic
    1 19 (35%) 47 (64%) 12 (28%) 4 (15%) 18 (35%)
    2 4 (7.3%) 18 (25%) 21 (49%) 16 (62%) 9 (18%)
    3 17 (31%) 4 (5.5%) 8 (19%) 3 (12%) 14 (27%)
    4 15 (27%) 4 (5.5%) 2 (4.7%) 3 (12%) 9 (18%)
    5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (2.0%)
    (Missing) 0 2 0 0 0
fellowship_years
    1 33 (60%) 7 (9.3%) 8 (19%) 9 (35%) 25 (49%)
    2 9 (16%) 59 (79%) 30 (70%) 8 (31%) 13 (25%)
    3 5 (9.1%) 9 (12%) 5 (12%) 7 (27%) 9 (18%)
    4 8 (15%) 0 (0%) 0 (0%) 2 (7.7%) 4 (7.8%)
fellow_cat
    Probably or Definitely No 4 (7.3%) 0 (0%) 0 (0%) 2 (7.7%) 4 (7.8%)
    Probably or Definitely Yes 43 (78%) 73 (97%) 33 (77%) 16 (62%) 43 (84%)
    Undecided 8 (15%) 2 (2.7%) 10 (23%) 8 (31%) 4 (7.8%)
1 n (%)

gender x responses

df2 %>%
  filter(gender!=6) %>%
  filter(gender!=7) %>%
  select(c(2,3,4,8,14))   %>%
  tbl_summary(
    by = gender,
    missing_text = "(Missing)")%>%
  bold_labels() %>%
  add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, : FEXACT error 6.  LDKEY=621 is too small for this problem,
  (ii := key2[itp=696] = 4204864, ldstp=18630)
Try increasing the size of the workspace and possibly 'mult'
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 4L, 1L, 2L, 3L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=14526359.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 1L, 3L, 1L, 1L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=3641234.
Try increasing the size of the workspace.
Characteristic 1, N = 591 2, N = 1511 3, N = 81 4, N = 151 5, N = 11 p-value2
enfolded_postgrad_yn
    1 11 (19%) 35 (23%) 3 (38%) 1 (6.7%) 1 (100%)
    2 36 (61%) 69 (46%) 1 (12%) 11 (73%) 0 (0%)
    3 8 (14%) 38 (25%) 4 (50%) 3 (20%) 0 (0%)
    4 4 (6.8%) 9 (6.0%) 0 (0%) 0 (0%) 0 (0%)
private_academic
    1 33 (56%) 64 (43%) 2 (25%) 0 (0%) 0 (0%)
    2 7 (12%) 42 (28%) 5 (62%) 7 (47%) 1 (100%)
    3 10 (17%) 25 (17%) 1 (12%) 8 (53%) 0 (0%)
    4 8 (14%) 18 (12%) 0 (0%) 0 (0%) 0 (0%)
    5 1 (1.7%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    (Missing) 0 2 0 0 0
fellowship_years
    1 23 (39%) 52 (34%) 0 (0%) 0 (0%) 1 (100%)
    2 26 (44%) 76 (50%) 3 (38%) 7 (47%) 0 (0%)
    3 7 (12%) 14 (9.3%) 5 (62%) 8 (53%) 0 (0%)
    4 3 (5.1%) 9 (6.0%) 0 (0%) 0 (0%) 0 (0%)
fellow_cat <0.001
    Probably or Definitely No 4 (6.8%) 5 (3.3%) 0 (0%) 0 (0%) 0 (0%)
    Probably or Definitely Yes 51 (86%) 133 (88%) 5 (62%) 6 (40%) 1 (100%)
    Undecided 4 (6.8%) 13 (8.6%) 3 (38%) 9 (60%) 0 (0%)
1 n (%)
2 Fisher's exact test

race x responses

df2 %>%
  filter(race!=6) %>%
  filter(race!=7) %>%
  select(c(2,3,4,9,14))   %>%
  tbl_summary(
    by = race,
    missing_text = "(Missing)")%>%
  bold_labels() %>%
  add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, 2L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=3215986.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 1L, 2L, 2L, 4L, 4L, 1L, 2L, 3L, 1L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=12617687.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 2L, 2L, 4L, 1L, 3L, 1L, 1L, 3L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=3220250.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellow_cat', p-value omitted:
Error in stats::fisher.test(c("Probably or Definitely No", "Probably or Definitely Yes", : FEXACT error 7(location). LDSTP=18630 is too small for this problem,
  (pastp=19.1276, ipn_0:=ipoin[itp=423]=279, stp[ipn_0]=16.7221).
Increase workspace or consider using 'simulate.p.value=TRUE'
Characteristic 1, N = 41 2, N = 321 3, N = 221 4, N = 121 5, N = 1541 p-value
enfolded_postgrad_yn
    1 1 (25%) 4 (12%) 5 (23%) 5 (42%) 33 (21%)
    2 0 (0%) 15 (47%) 11 (50%) 0 (0%) 89 (58%)
    3 0 (0%) 11 (34%) 5 (23%) 7 (58%) 26 (17%)
    4 3 (75%) 2 (6.2%) 1 (4.5%) 0 (0%) 6 (3.9%)
private_academic
    1 1 (25%) 5 (16%) 6 (27%) 2 (17%) 77 (51%)
    2 0 (0%) 11 (34%) 12 (55%) 8 (67%) 35 (23%)
    3 3 (75%) 8 (25%) 3 (14%) 2 (17%) 24 (16%)
    4 0 (0%) 8 (25%) 1 (4.5%) 0 (0%) 15 (9.9%)
    5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (0.7%)
    (Missing) 0 0 0 0 2
fellowship_years
    1 0 (0%) 11 (34%) 4 (18%) 0 (0%) 56 (36%)
    2 0 (0%) 16 (50%) 10 (45%) 5 (42%) 80 (52%)
    3 1 (25%) 4 (12%) 7 (32%) 7 (58%) 11 (7.1%)
    4 3 (75%) 1 (3.1%) 1 (4.5%) 0 (0%) 7 (4.5%)
fellow_cat
    Probably or Definitely No 0 (0%) 1 (3.1%) 0 (0%) 0 (0%) 8 (5.2%)
    Probably or Definitely Yes 1 (25%) 17 (53%) 19 (86%) 7 (58%) 141 (92%)
    Undecided 3 (75%) 14 (44%) 3 (14%) 5 (42%) 5 (3.2%)
1 n (%)

age x responses

table(df2$age)

  2   3   4   5   6 
 28 105  90  21   5 
table(df2$age_cat)

22-25 y/o 26-30 y/o 31-35 y/o 36-45 y/o 
       28       105        90        26 
df2 %>%
  select(c(2,3,4,14,15))   %>%
  tbl_summary(
    by = age_cat,
    missing_text = "(Missing)")%>%
  bold_labels() %>%
  add_p()
8 observations missing `age_cat` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `age_cat` column before passing to `tbl_summary()`.
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 4L, 3L, 3L, 1L, : FEXACT error 6.  LDKEY=621 is too small for this problem,
  (ii := key2[itp=404] = 6293512, ldstp=18630)
Try increasing the size of the workspace and possibly 'mult'
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 1L, 2L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=6540363.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 4L, 1L, 3L, 1L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=4708255.
Try increasing the size of the workspace.
Characteristic 22-25 y/o, N = 281 26-30 y/o, N = 1051 31-35 y/o, N = 901 36-45 y/o, N = 261 p-value2
enfolded_postgrad_yn
    1 6 (21%) 25 (24%) 21 (23%) 5 (19%)
    2 15 (54%) 55 (52%) 41 (46%) 9 (35%)
    3 4 (14%) 22 (21%) 25 (28%) 6 (23%)
    4 3 (11%) 3 (2.9%) 3 (3.3%) 6 (23%)
private_academic
    1 15 (54%) 42 (40%) 37 (42%) 7 (27%)
    2 8 (29%) 31 (30%) 24 (27%) 5 (19%)
    3 5 (18%) 12 (11%) 16 (18%) 11 (42%)
    4 0 (0%) 20 (19%) 10 (11%) 3 (12%)
    5 0 (0%) 0 (0%) 1 (1.1%) 0 (0%)
    (Missing) 0 0 2 0
fellowship_years
    1 6 (21%) 30 (29%) 40 (44%) 7 (27%)
    2 19 (68%) 58 (55%) 33 (37%) 9 (35%)
    3 2 (7.1%) 11 (10%) 15 (17%) 5 (19%)
    4 1 (3.6%) 6 (5.7%) 2 (2.2%) 5 (19%)
fellow_cat 0.007
    Probably or Definitely No 0 (0%) 5 (4.8%) 3 (3.3%) 2 (7.7%)
    Probably or Definitely Yes 21 (75%) 93 (89%) 77 (86%) 16 (62%)
    Undecided 7 (25%) 7 (6.7%) 10 (11%) 8 (31%)
1 n (%)
2 Fisher's exact test

kids x responses

table(df2$children)

  1   2   3   4   5 
 53 122  29  35   6 
df2 %>%
  filter(children!=5) %>%
  select(c(2,3,4,12,14))   %>%
  tbl_summary(
    by = children,
    missing_text = "(Missing)")%>%
  bold_labels() %>%
  add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 4L, 3L, 3L, 1L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=4746204.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 1L, 2L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=6102566.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 4L, 1L, 3L, 1L, : FEXACT error 6 (f5xact).  LDKEY=621 is too small for this problem: kval=4023816.
Try increasing the size of the workspace.
Characteristic 1, N = 531 2, N = 1221 3, N = 291 4, N = 351 p-value2
enfolded_postgrad_yn
    1 11 (21%) 20 (16%) 10 (34%) 10 (29%)
    2 27 (51%) 72 (59%) 11 (38%) 7 (20%)
    3 11 (21%) 22 (18%) 7 (24%) 17 (49%)
    4 4 (7.5%) 8 (6.6%) 1 (3.4%) 1 (2.9%)
private_academic
    1 32 (60%) 48 (39%) 4 (14%) 12 (36%)
    2 7 (13%) 36 (30%) 16 (55%) 9 (27%)
    3 10 (19%) 21 (17%) 8 (28%) 5 (15%)
    4 4 (7.5%) 17 (14%) 0 (0%) 7 (21%)
    5 0 (0%) 0 (0%) 1 (3.4%) 0 (0%)
    (Missing) 0 0 0 2
fellowship_years
    1 20 (38%) 44 (36%) 3 (10%) 15 (43%)
    2 26 (49%) 60 (49%) 17 (59%) 8 (23%)
    3 3 (5.7%) 10 (8.2%) 8 (28%) 11 (31%)
    4 4 (7.5%) 8 (6.6%) 1 (3.4%) 1 (2.9%)
fellow_cat <0.001
    Probably or Definitely No 4 (7.5%) 6 (4.9%) 0 (0%) 0 (0%)
    Probably or Definitely Yes 46 (87%) 103 (84%) 16 (55%) 33 (94%)
    Undecided 3 (5.7%) 13 (11%) 13 (45%) 2 (5.7%)
1 n (%)
2 Fisher's exact test

PGY X Effects

Debt x PGY ***

t1 <- table(df2[c(7,13)])

mosaicplot(t1,
  main = "Mosaic plot",
  color = TRUE)


fisher.test(t1,simulate.p.value=TRUE,B=1e6)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    1e+06 replicates)

data:  t1
p-value = 4e-06
alternative hypothesis: two.sided

visuals

#debt_labs <- c("$<100k","100-150k","150-200k","200-250k","250k+","NR")

#ggplot(as.data.frame(t1)) + geom_bar(color="black",aes(debt, Freq, fill = as.factor(pgy_cat)), position = "dodge", stat = "summary", fun = "mean")+gghisto+ggtitle("Debt amount by PGY")+ scale_fill_manual(name = "PGY",values=c("indianred3", "royalblue1","pink"))+theme(axis.text.x = element_text(face="bold", color="royalblue4", size=10))+ scale_x_discrete(labels= debt_labs)+ylab("Frequency")+xlab("Amount of Debt")

#ggplot(as.data.frame(t1)) + geom_bar(color="black",aes(pgy_cat, Freq, fill = as.factor(debt)), position = "dodge", stat = "summary", fun = "mean")+gghisto+ggtitle("Debt amount by PGY")+theme(axis.text.x = element_text(face="bold", color="royalblue4", size=10))+ylab("Frequency")+xlab("PGY")

MOSAIC PLOTS & FISHERS EXACT

PGY x Fellowship y/n NS

t2 <- table(df2[13:14])

mosaicplot(t2,
  main = "Mosaic plot",
  color = TRUE)


fisher.test(t2)

    Fisher's Exact Test for Count Data

data:  t2
p-value = 0.4002
alternative hypothesis: two.sided

PGY x Enfolded Fellowship y/n ***

t3 <- table(df2[c(2,13)])

mosaicplot(t3,
  main = "Mosaic plot",
  color = TRUE
)


fisher.test(t3)

    Fisher's Exact Test for Count Data

data:  t3
p-value = 0.01268
alternative hypothesis: two.sided

PGY x private academic NS

t4 <- table(df2[c(3,13)])

mosaicplot(t4,
  main = "Mosaic plot",
  color = TRUE)


fisher.test(t4,simulate.p.value=TRUE,B=1e6)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    1e+06 replicates)

data:  t4
p-value = 0.8098
alternative hypothesis: two.sided

PGY x fellowship years ***

t5 <- table(df2[c(4,13)])

mosaicplot(t5,
  main = "Mosaic plot",
  color = TRUE)

fisher.test(t5,simulate.p.value=TRUE,B=1e6)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    1e+06 replicates)

data:  t5
p-value = 1.7e-05
alternative hypothesis: two.sided

PGY x fellowship field … not sure

multinomial logistic regression

fellowship category x predictive factors

df2$fellow_cat <- relevel(factor(df2$fellow_cat), ref = "Undecided")
test <- multinom(fellow_cat ~ pgy_cat + children + race + age + marital_status + gender, data = df2)
# weights:  27 (16 variable)
initial  value 259.272500 
iter  10 value 99.598694
iter  20 value 97.415597
final  value 97.398446 
converged
summary(test)
Call:
multinom(formula = fellow_cat ~ pgy_cat + children + race + age + 
    marital_status + gender, data = df2)

Coefficients:
                           (Intercept) pgy_catmidlevel pgy_catsenior   children
Probably or Definitely No    -3.211769      -0.0167406     0.4833854 -0.9744640
Probably or Definitely Yes    2.241397      -1.2453126    -0.7024426 -0.2232432
                                race        age marital_status     gender
Probably or Definitely No  1.0130676 0.51664386     -0.2418292 -0.5433382
Probably or Definitely Yes 0.8579868 0.08093294     -0.3847674 -0.6313251

Std. Errors:
                           (Intercept) pgy_catmidlevel pgy_catsenior  children
Probably or Definitely No     3.614377        1.309333     1.4408315 0.5645340
Probably or Definitely Yes    1.983032        0.707117     0.8236907 0.3085286
                                race       age marital_status    gender
Probably or Definitely No  0.3705511 0.4956061      0.4419739 0.3375919
Probably or Definitely Yes 0.1766069 0.3000523      0.2001907 0.2036109

Residual Deviance: 194.7969 
AIC: 226.7969 
z <- summary(test)$coefficients/summary(test)$standard.errors
z
                           (Intercept) pgy_catmidlevel pgy_catsenior  children
Probably or Definitely No   -0.8886093     -0.01278559     0.3354906 -1.726139
Probably or Definitely Yes   1.1302882     -1.76111249    -0.8527989 -0.723574
                               race       age marital_status    gender
Probably or Definitely No  2.733949 1.0424487     -0.5471572 -1.609453
Probably or Definitely Yes 4.858172 0.2697294     -1.9220041 -3.100645
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
                           (Intercept) pgy_catmidlevel pgy_catsenior   children
Probably or Definitely No    0.3742131      0.98979885     0.7372550 0.08432247
Probably or Definitely Yes   0.2583548      0.07821937     0.3937708 0.46932734
                                   race       age marital_status      gender
Probably or Definitely No  6.257980e-03 0.2972037     0.58427073 0.107517382
Probably or Definitely Yes 1.184746e-06 0.7873684     0.05460524 0.001930997
coefs <- exp(coef(test))

The relative risk ratio for a one-unit increase in the variable age is .9437 for those who probably or definitely will not do a fellowship vs. those who are undecided.

The relative risk ratio switching from pgy_cat = junior to senior is 1.6215547 for those who probably or definitely will not do a fellowship vs. those who are undecided.

multinomial logistic regression

model <- vglm(fellow_cat ~ pgy_cat + children + race + age + marital_status + gender, multinomial(refLevel = "Undecided"), data = df2)
summary(model)

Call:
vglm(formula = fellow_cat ~ pgy_cat + children + race + age + 
    marital_status + gender, family = multinomial(refLevel = "Undecided"), 
    data = df2)

Coefficients: 
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept):1     -3.21177    3.61438      NA       NA    
(Intercept):2      2.24181    1.98302   1.131  0.25826    
pgy_catmidlevel:1 -0.01689    1.30932  -0.013  0.98971    
pgy_catmidlevel:2 -1.24539    0.70712  -1.761  0.07820 .  
pgy_catsenior:1    0.48320    1.44082   0.335  0.73735    
pgy_catsenior:2   -0.70254    0.82369  -0.853  0.39371    
children:1        -0.97443    0.56452  -1.726  0.08432 .  
children:2        -0.22328    0.30852  -0.724  0.46924    
race:1             1.01309    0.37056   2.734  0.00626 ** 
race:2             0.85795    0.17660   4.858 1.19e-06 ***
age:1              0.51661    0.49560   1.042  0.29724    
age:2              0.08089    0.30005   0.270  0.78749    
marital_status:1  -0.24179    0.44197  -0.547  0.58433    
marital_status:2  -0.38476    0.20019  -1.922  0.05461 .  
gender:1          -0.54335    0.33759  -1.609  0.10751    
gender:2          -0.63132    0.20361  -3.101  0.00193 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])

Residual deviance: 194.7969 on 456 degrees of freedom

Log-likelihood: -97.3984 on 456 degrees of freedom

Number of Fisher scoring iterations: 7 

Warning: Hauck-Donner effect detected in the following estimate(s):
'(Intercept):1', 'race:2'


Reference group is level  1  of the response

proportional odds logistic regression

fellowship category x predictive factors

model <- polr(fellow_cat ~ pgy_cat + children + race + age + marital_status + gender, data = df2)

summary(model)

Re-fitting to get Hessian
Call:
polr(formula = fellow_cat ~ pgy_cat + children + race + age + 
    marital_status + gender, data = df2)

Coefficients:
                  Value Std. Error t value
pgy_catmidlevel -1.1359     0.5928 -1.9162
pgy_catsenior   -0.8621     0.6761 -1.2751
children         0.0333     0.2469  0.1349
race             0.6657     0.1390  4.7884
age             -0.1024     0.2427 -0.4221
marital_status  -0.3264     0.1746 -1.8696
gender          -0.5463     0.1575 -3.4695

Intercepts:
                                                     Value   Std. Error t value
Undecided|Probably or Definitely No                  -2.5466  1.6153    -1.5766
Probably or Definitely No|Probably or Definitely Yes -2.1049  1.6099    -1.3074

Residual Deviance: 210.5097 
AIC: 228.5097 
(21 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel   pgy_catsenior        children            race             age 
    -1.13590828     -0.86207081      0.03330185      0.66571740     -0.10243469 
 marital_status          gender 
    -0.32638039     -0.54627050 
# fellowship category x predictive factors
# Find the p-value for a t-value of
pt(1.9162, 257-6, lower.tail=FALSE)*2 #MIDLEVEL NS
[1] 0.05647545
pt(3.4695, 257-6, lower.tail=FALSE)*2 #GENDER p=0.0006135578
[1] 0.0006135578
pt(4.7884, 257-6, lower.tail=FALSE)*2 #RACE p=2.875535e-06
[1] 2.875535e-06

What this menas: Gender and Race predict Fellowship Category

enfoldedd or postgrad x predictive factors

model <- polr(as.factor(enfolded_postgrad_yn) ~ pgy_cat + children + race + age + marital_status + gender, data = df2)

summary(model)

Re-fitting to get Hessian
Call:
polr(formula = as.factor(enfolded_postgrad_yn) ~ pgy_cat + children + 
    race + age + marital_status + gender, data = df2)

Coefficients:
                    Value Std. Error  t value
pgy_catmidlevel -0.484347     0.2954 -1.63985
pgy_catsenior   -0.065131     0.3940 -0.16532
children         0.017191     0.1338  0.12847
race            -0.185734     0.1054 -1.76260
age              0.235912     0.1599  1.47509
marital_status   0.013933     0.1188  0.11730
gender          -0.003618     0.1207 -0.02997

Intercepts:
    Value   Std. Error t value
1|2 -1.6049  0.9866    -1.6266
2|3  0.6577  0.9826     0.6694
3|4  2.6136  1.0123     2.5819

Residual Deviance: 549.051 
AIC: 569.051 
(21 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel   pgy_catsenior        children            race             age 
   -0.484346536    -0.065131234     0.017190830    -0.185734316     0.235911807 
 marital_status          gender 
    0.013933108    -0.003617707 
# enfoldedd or postgrad x predictive factors
# no factors have a p-value less than 0.05

What this menas: No predictive factors for enfolded or postgrad

private or academic x predictive factors

model <- polr(as.factor(private_academic) ~ pgy_cat + children + race + age + marital_status + gender, data = df2)

summary(model)

Re-fitting to get Hessian
Call:
polr(formula = as.factor(private_academic) ~ pgy_cat + children + 
    race + age + marital_status + gender, data = df2)

Coefficients:
                    Value Std. Error  t value
pgy_catmidlevel -0.057058    0.29736 -0.19188
pgy_catsenior   -0.123041    0.39127 -0.31447
children         0.228087    0.13665  1.66911
race            -0.184891    0.09628 -1.92032
age              0.239449    0.14672  1.63202
marital_status  -0.008478    0.11490 -0.07378
gender           0.230479    0.11062  2.08349

Intercepts:
    Value   Std. Error t value
1|2  0.5794  0.9648     0.6005
2|3  1.8490  0.9740     1.8983
3|4  2.9927  0.9814     3.0493
4|5  6.4829  1.3852     4.6801

Residual Deviance: 592.7608 
AIC: 614.7608 
(23 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel   pgy_catsenior        children            race             age 
   -0.057058258    -0.123041128     0.228087164    -0.184891266     0.239449274 
 marital_status          gender 
   -0.008477634     0.230478619 
# private or academic x predictive factors
# Find the p-value for a t-value of
pt(2.08349, 257-6, lower.tail=FALSE)*2 #GENDER p=0.03821857
[1] 0.03821857

What this means: Gender predicts private or academic institution

fellowship years x predictive factors

model <- polr(as.factor(fellowship_years) ~ pgy_cat + children + race + age + marital_status + gender, data = df2)

summary(model)

Re-fitting to get Hessian
Call:
polr(formula = as.factor(fellowship_years) ~ pgy_cat + children + 
    race + age + marital_status + gender, data = df2)

Coefficients:
                   Value Std. Error t value
pgy_catmidlevel -0.21862     0.2983 -0.7329
pgy_catsenior   -1.25900     0.4169 -3.0196
children         0.19577     0.1388  1.4101
race            -0.28850     0.1073 -2.6889
age             -0.03658     0.1617 -0.2262
marital_status   0.23509     0.1185  1.9844
gender           0.14019     0.1411  0.9934

Intercepts:
    Value   Std. Error t value
1|2 -1.1393  0.9906    -1.1502
2|3  1.2123  0.9920     1.2221
3|4  2.5758  1.0214     2.5218

Residual Deviance: 514.4281 
AIC: 534.4281 
(21 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel   pgy_catsenior        children            race             age 
    -0.21862326     -1.25900180      0.19577164     -0.28849950     -0.03658487 
 marital_status          gender 
     0.23508709      0.14019127 
# fellowship years x predictive factors
# Find the p-value for a t-value of
pt(3.0196, 257-6, lower.tail=FALSE)*2 #pgy_catsenior p=0.002792211
[1] 0.002792211
pt(2.6889, 257-6, lower.tail=FALSE)*2 #RACE 0.007649045
[1] 0.007649045
pt(1.9844, 257-6, lower.tail=FALSE)*2 #MARITAL STATUS p=0.04830027
[1] 0.04830027

What this means: Being a senior resident (6th or 7th year), race, and marital status predict number of fellowship years.

Below: examining marital status x fellowship years (4, 11)

t <- table(df2[c(4,11)]);t
                marital_status
fellowship_years  1  2  3  4  6  7
               1 25  9 46  0  2  0
               2 14 25 61 11  2  1
               3  9  5  7  5  4  5
               4  3  0 10  1  0  0
mosaicplot(t,color = TRUE)

fisher.test(t,simulate.p.value=TRUE,B=1e6)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    1e+06 replicates)

data:  t
p-value = 1e-06
alternative hypothesis: two.sided

MOSAIC PLOTS

Debt x Enfolded Fellowship y/n ***

t <- table(df2[c(7,2)])
mosaicplot(t,color = TRUE)

fisher.test(t,simulate.p.value=TRUE,B=1e6)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    1e+06 replicates)

data:  t
p-value = 1e-06
alternative hypothesis: two.sided

Debt x private academic ***

t <- table(df2[c(7,3)])
mosaicplot(t,color = TRUE)

fisher.test(t,simulate.p.value=TRUE,B=1e6)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    1e+06 replicates)

data:  t
p-value = 1e-06
alternative hypothesis: two.sided

Debt x fellowship years ***

t <- table(df2[c(7,4)])
mosaicplot(t,color = TRUE)

fisher.test(t,simulate.p.value=TRUE,B=1e6)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    1e+06 replicates)

data:  t
p-value = 1e-06
alternative hypothesis: two.sided
