To summarize what we discussed, we are interested in knowing if the
answers to questions 12 ("fellowship_yn"), 15
("enfolded_postgrad_yn"), 16
("private_academic"), 17 ("fellowship_years"),
and 18 ("fellowship_field") differ based on year of
training ("current_year"), debt ("debt"),
gender ("gender"), race ("race"), age
("age"), marital status ("marital_status"),
and family planning ("children").
Went from 265 to 257 rows because 8 cases from the original csv were removed due to having 50% or fewer of the questions answered.
df <- read.csv("maggie2.csv")
df <- df[!is.na(df$fellowship_yn),]
df2 <- df[c(2,7,20,22:23,31:33,35,37:39)]
df2$pgy_cat <- ifelse(df2$current_year==1|df2$current_year==2, 'junior', ifelse(df2$current_year==7|df2$current_year==6, "senior", "midlevel"))
df2$fellow_cat <- ifelse(df2$fellowship_yn==1|df2$fellowship_yn==2, 'Probably or Definitely Yes', ifelse(df2$fellowship_yn==4|df2$fellowship_yn==5, "Probably or Definitely No", "Undecided"))
df2$age_cat <- ifelse(df2$age==2, '22-25 y/o', ifelse(df2$age==3, "26-30 y/o", ifelse(df2$age==4, "31-35 y/o", "36-45 y/o")))
df2 %>%
select(c(2,3,4,13,14)) %>%
tbl_summary(
by = pgy_cat,
missing_text = "(Missing)")%>%
bold_labels() %>%
add_p()
| Characteristic | junior, N = 571 | midlevel, N = 1371 | senior, N = 481 | p-value2 |
|---|---|---|---|---|
| enfolded_postgrad_yn | 0.013 | |||
| Â Â Â Â 1 | 8 (14%) | 32 (23%) | 11 (23%) | |
| Â Â Â Â 2 | 29 (51%) | 74 (54%) | 17 (35%) | |
| Â Â Â Â 3 | 19 (33%) | 21 (15%) | 17 (35%) | |
| Â Â Â Â 4 | 1 (1.8%) | 10 (7.3%) | 3 (6.2%) | |
| private_academic | 0.8 | |||
| Â Â Â Â 1 | 24 (42%) | 55 (41%) | 18 (38%) | |
| Â Â Â Â 2 | 14 (25%) | 43 (32%) | 11 (23%) | |
| Â Â Â Â 3 | 12 (21%) | 22 (16%) | 12 (25%) | |
| Â Â Â Â 4 | 7 (12%) | 14 (10%) | 7 (15%) | |
| Â Â Â Â 5 | 0 (0%) | 1 (0.7%) | 0 (0%) | |
| Â Â Â Â (Missing) | 0 | 2 | 0 | |
| fellowship_years | ||||
| Â Â Â Â 1 | 19 (33%) | 38 (28%) | 26 (54%) | |
| Â Â Â Â 2 | 21 (37%) | 79 (58%) | 13 (27%) | |
| Â Â Â Â 3 | 16 (28%) | 10 (7.3%) | 6 (12%) | |
| Â Â Â Â 4 | 1 (1.8%) | 10 (7.3%) | 3 (6.2%) | |
| fellow_cat | 0.4 | |||
| Â Â Â Â Probably or Definitely No | 1 (1.8%) | 6 (4.4%) | 3 (6.2%) | |
| Â Â Â Â Probably or Definitely Yes | 52 (91%) | 110 (80%) | 39 (81%) | |
| Â Â Â Â Undecided | 4 (7.0%) | 21 (15%) | 6 (12%) | |
| 1 n (%) | ||||
| 2 Fisher's exact test | ||||
df2 %>%
filter(debt!='6') %>%
select(c(2,3,4,7,14)) %>%
tbl_summary(
by = debt,
missing_text = "(Missing)")%>%
bold_labels() %>%
add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 4L, 3L, 3L, 1L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=6028169.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 1L, 2L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=18961715.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 4L, 1L, 3L, 1L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=4838333.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellow_cat', p-value omitted:
Error in stats::fisher.test(c("Probably or Definitely No", "Probably or Definitely Yes", : FEXACT error 7(location). LDSTP=18630 is too small for this problem,
(pastp=63.7491, ipn_0:=ipoin[itp=305]=3209, stp[ipn_0]=64.5464).
Increase workspace or consider using 'simulate.p.value=TRUE'
| Characteristic | 1, N = 551 | 2, N = 751 | 3, N = 431 | 4, N = 261 | 5, N = 511 | p-value |
|---|---|---|---|---|---|---|
| enfolded_postgrad_yn | ||||||
| Â Â Â Â 1 | 14 (25%) | 11 (15%) | 12 (28%) | 6 (23%) | 14 (27%) | |
| Â Â Â Â 2 | 15 (27%) | 56 (75%) | 24 (56%) | 4 (15%) | 23 (45%) | |
| Â Â Â Â 3 | 18 (33%) | 8 (11%) | 7 (16%) | 14 (54%) | 9 (18%) | |
| Â Â Â Â 4 | 8 (15%) | 0 (0%) | 0 (0%) | 2 (7.7%) | 5 (9.8%) | |
| private_academic | ||||||
| Â Â Â Â 1 | 19 (35%) | 47 (64%) | 12 (28%) | 4 (15%) | 18 (35%) | |
| Â Â Â Â 2 | 4 (7.3%) | 18 (25%) | 21 (49%) | 16 (62%) | 9 (18%) | |
| Â Â Â Â 3 | 17 (31%) | 4 (5.5%) | 8 (19%) | 3 (12%) | 14 (27%) | |
| Â Â Â Â 4 | 15 (27%) | 4 (5.5%) | 2 (4.7%) | 3 (12%) | 9 (18%) | |
| Â Â Â Â 5 | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (2.0%) | |
| Â Â Â Â (Missing) | 0 | 2 | 0 | 0 | 0 | |
| fellowship_years | ||||||
| Â Â Â Â 1 | 33 (60%) | 7 (9.3%) | 8 (19%) | 9 (35%) | 25 (49%) | |
| Â Â Â Â 2 | 9 (16%) | 59 (79%) | 30 (70%) | 8 (31%) | 13 (25%) | |
| Â Â Â Â 3 | 5 (9.1%) | 9 (12%) | 5 (12%) | 7 (27%) | 9 (18%) | |
| Â Â Â Â 4 | 8 (15%) | 0 (0%) | 0 (0%) | 2 (7.7%) | 4 (7.8%) | |
| fellow_cat | ||||||
| Â Â Â Â Probably or Definitely No | 4 (7.3%) | 0 (0%) | 0 (0%) | 2 (7.7%) | 4 (7.8%) | |
| Â Â Â Â Probably or Definitely Yes | 43 (78%) | 73 (97%) | 33 (77%) | 16 (62%) | 43 (84%) | |
| Â Â Â Â Undecided | 8 (15%) | 2 (2.7%) | 10 (23%) | 8 (31%) | 4 (7.8%) | |
| 1 n (%) | ||||||
df2 %>%
filter(gender!=6) %>%
filter(gender!=7) %>%
select(c(2,3,4,8,14)) %>%
tbl_summary(
by = gender,
missing_text = "(Missing)")%>%
bold_labels() %>%
add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, : FEXACT error 6. LDKEY=621 is too small for this problem,
(ii := key2[itp=696] = 4204864, ldstp=18630)
Try increasing the size of the workspace and possibly 'mult'
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 4L, 1L, 2L, 3L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=14526359.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 1L, 3L, 1L, 1L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=3641234.
Try increasing the size of the workspace.
| Characteristic | 1, N = 591 | 2, N = 1511 | 3, N = 81 | 4, N = 151 | 5, N = 11 | p-value2 |
|---|---|---|---|---|---|---|
| enfolded_postgrad_yn | ||||||
| Â Â Â Â 1 | 11 (19%) | 35 (23%) | 3 (38%) | 1 (6.7%) | 1 (100%) | |
| Â Â Â Â 2 | 36 (61%) | 69 (46%) | 1 (12%) | 11 (73%) | 0 (0%) | |
| Â Â Â Â 3 | 8 (14%) | 38 (25%) | 4 (50%) | 3 (20%) | 0 (0%) | |
| Â Â Â Â 4 | 4 (6.8%) | 9 (6.0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| private_academic | ||||||
| Â Â Â Â 1 | 33 (56%) | 64 (43%) | 2 (25%) | 0 (0%) | 0 (0%) | |
| Â Â Â Â 2 | 7 (12%) | 42 (28%) | 5 (62%) | 7 (47%) | 1 (100%) | |
| Â Â Â Â 3 | 10 (17%) | 25 (17%) | 1 (12%) | 8 (53%) | 0 (0%) | |
| Â Â Â Â 4 | 8 (14%) | 18 (12%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| Â Â Â Â 5 | 1 (1.7%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| Â Â Â Â (Missing) | 0 | 2 | 0 | 0 | 0 | |
| fellowship_years | ||||||
| Â Â Â Â 1 | 23 (39%) | 52 (34%) | 0 (0%) | 0 (0%) | 1 (100%) | |
| Â Â Â Â 2 | 26 (44%) | 76 (50%) | 3 (38%) | 7 (47%) | 0 (0%) | |
| Â Â Â Â 3 | 7 (12%) | 14 (9.3%) | 5 (62%) | 8 (53%) | 0 (0%) | |
| Â Â Â Â 4 | 3 (5.1%) | 9 (6.0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| fellow_cat | <0.001 | |||||
| Â Â Â Â Probably or Definitely No | 4 (6.8%) | 5 (3.3%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| Â Â Â Â Probably or Definitely Yes | 51 (86%) | 133 (88%) | 5 (62%) | 6 (40%) | 1 (100%) | |
| Â Â Â Â Undecided | 4 (6.8%) | 13 (8.6%) | 3 (38%) | 9 (60%) | 0 (0%) | |
| 1 n (%) | ||||||
| 2 Fisher's exact test | ||||||
df2 %>%
filter(race!=6) %>%
filter(race!=7) %>%
select(c(2,3,4,9,14)) %>%
tbl_summary(
by = race,
missing_text = "(Missing)")%>%
bold_labels() %>%
add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, 2L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=3215986.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 1L, 2L, 2L, 4L, 4L, 1L, 2L, 3L, 1L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=12617687.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 2L, 2L, 4L, 1L, 3L, 1L, 1L, 3L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=3220250.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellow_cat', p-value omitted:
Error in stats::fisher.test(c("Probably or Definitely No", "Probably or Definitely Yes", : FEXACT error 7(location). LDSTP=18630 is too small for this problem,
(pastp=19.1276, ipn_0:=ipoin[itp=423]=279, stp[ipn_0]=16.7221).
Increase workspace or consider using 'simulate.p.value=TRUE'
| Characteristic | 1, N = 41 | 2, N = 321 | 3, N = 221 | 4, N = 121 | 5, N = 1541 | p-value |
|---|---|---|---|---|---|---|
| enfolded_postgrad_yn | ||||||
| Â Â Â Â 1 | 1 (25%) | 4 (12%) | 5 (23%) | 5 (42%) | 33 (21%) | |
| Â Â Â Â 2 | 0 (0%) | 15 (47%) | 11 (50%) | 0 (0%) | 89 (58%) | |
| Â Â Â Â 3 | 0 (0%) | 11 (34%) | 5 (23%) | 7 (58%) | 26 (17%) | |
| Â Â Â Â 4 | 3 (75%) | 2 (6.2%) | 1 (4.5%) | 0 (0%) | 6 (3.9%) | |
| private_academic | ||||||
| Â Â Â Â 1 | 1 (25%) | 5 (16%) | 6 (27%) | 2 (17%) | 77 (51%) | |
| Â Â Â Â 2 | 0 (0%) | 11 (34%) | 12 (55%) | 8 (67%) | 35 (23%) | |
| Â Â Â Â 3 | 3 (75%) | 8 (25%) | 3 (14%) | 2 (17%) | 24 (16%) | |
| Â Â Â Â 4 | 0 (0%) | 8 (25%) | 1 (4.5%) | 0 (0%) | 15 (9.9%) | |
| Â Â Â Â 5 | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (0.7%) | |
| Â Â Â Â (Missing) | 0 | 0 | 0 | 0 | 2 | |
| fellowship_years | ||||||
| Â Â Â Â 1 | 0 (0%) | 11 (34%) | 4 (18%) | 0 (0%) | 56 (36%) | |
| Â Â Â Â 2 | 0 (0%) | 16 (50%) | 10 (45%) | 5 (42%) | 80 (52%) | |
| Â Â Â Â 3 | 1 (25%) | 4 (12%) | 7 (32%) | 7 (58%) | 11 (7.1%) | |
| Â Â Â Â 4 | 3 (75%) | 1 (3.1%) | 1 (4.5%) | 0 (0%) | 7 (4.5%) | |
| fellow_cat | ||||||
| Â Â Â Â Probably or Definitely No | 0 (0%) | 1 (3.1%) | 0 (0%) | 0 (0%) | 8 (5.2%) | |
| Â Â Â Â Probably or Definitely Yes | 1 (25%) | 17 (53%) | 19 (86%) | 7 (58%) | 141 (92%) | |
| Â Â Â Â Undecided | 3 (75%) | 14 (44%) | 3 (14%) | 5 (42%) | 5 (3.2%) | |
| 1 n (%) | ||||||
table(df2$age)
2 3 4 5 6
28 105 90 21 5
table(df2$age_cat)
22-25 y/o 26-30 y/o 31-35 y/o 36-45 y/o
28 105 90 26
df2 %>%
select(c(2,3,4,14,15)) %>%
tbl_summary(
by = age_cat,
missing_text = "(Missing)")%>%
bold_labels() %>%
add_p()
8 observations missing `age_cat` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `age_cat` column before passing to `tbl_summary()`.
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 4L, 3L, 3L, 1L, : FEXACT error 6. LDKEY=621 is too small for this problem,
(ii := key2[itp=404] = 6293512, ldstp=18630)
Try increasing the size of the workspace and possibly 'mult'
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 1L, 2L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=6540363.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 4L, 1L, 3L, 1L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=4708255.
Try increasing the size of the workspace.
| Characteristic | 22-25 y/o, N = 281 | 26-30 y/o, N = 1051 | 31-35 y/o, N = 901 | 36-45 y/o, N = 261 | p-value2 |
|---|---|---|---|---|---|
| enfolded_postgrad_yn | |||||
| Â Â Â Â 1 | 6 (21%) | 25 (24%) | 21 (23%) | 5 (19%) | |
| Â Â Â Â 2 | 15 (54%) | 55 (52%) | 41 (46%) | 9 (35%) | |
| Â Â Â Â 3 | 4 (14%) | 22 (21%) | 25 (28%) | 6 (23%) | |
| Â Â Â Â 4 | 3 (11%) | 3 (2.9%) | 3 (3.3%) | 6 (23%) | |
| private_academic | |||||
| Â Â Â Â 1 | 15 (54%) | 42 (40%) | 37 (42%) | 7 (27%) | |
| Â Â Â Â 2 | 8 (29%) | 31 (30%) | 24 (27%) | 5 (19%) | |
| Â Â Â Â 3 | 5 (18%) | 12 (11%) | 16 (18%) | 11 (42%) | |
| Â Â Â Â 4 | 0 (0%) | 20 (19%) | 10 (11%) | 3 (12%) | |
| Â Â Â Â 5 | 0 (0%) | 0 (0%) | 1 (1.1%) | 0 (0%) | |
| Â Â Â Â (Missing) | 0 | 0 | 2 | 0 | |
| fellowship_years | |||||
| Â Â Â Â 1 | 6 (21%) | 30 (29%) | 40 (44%) | 7 (27%) | |
| Â Â Â Â 2 | 19 (68%) | 58 (55%) | 33 (37%) | 9 (35%) | |
| Â Â Â Â 3 | 2 (7.1%) | 11 (10%) | 15 (17%) | 5 (19%) | |
| Â Â Â Â 4 | 1 (3.6%) | 6 (5.7%) | 2 (2.2%) | 5 (19%) | |
| fellow_cat | 0.007 | ||||
| Â Â Â Â Probably or Definitely No | 0 (0%) | 5 (4.8%) | 3 (3.3%) | 2 (7.7%) | |
| Â Â Â Â Probably or Definitely Yes | 21 (75%) | 93 (89%) | 77 (86%) | 16 (62%) | |
| Â Â Â Â Undecided | 7 (25%) | 7 (6.7%) | 10 (11%) | 8 (31%) | |
| 1 n (%) | |||||
| 2 Fisher's exact test | |||||
table(df2$children)
1 2 3 4 5
53 122 29 35 6
df2 %>%
filter(children!=5) %>%
select(c(2,3,4,12,14)) %>%
tbl_summary(
by = children,
missing_text = "(Missing)")%>%
bold_labels() %>%
add_p()
There was an error in 'add_p()/add_difference()' for variable 'enfolded_postgrad_yn', p-value omitted:
Error in stats::fisher.test(c(4L, 3L, 2L, 3L, 3L, 1L, 4L, 3L, 3L, 1L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=4746204.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'private_academic', p-value omitted:
Error in stats::fisher.test(c(2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 1L, 2L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=6102566.
Try increasing the size of the workspace.
There was an error in 'add_p()/add_difference()' for variable 'fellowship_years', p-value omitted:
Error in stats::fisher.test(c(4L, 1L, 1L, 2L, 2L, 4L, 4L, 1L, 3L, 1L, : FEXACT error 6 (f5xact). LDKEY=621 is too small for this problem: kval=4023816.
Try increasing the size of the workspace.
| Characteristic | 1, N = 531 | 2, N = 1221 | 3, N = 291 | 4, N = 351 | p-value2 |
|---|---|---|---|---|---|
| enfolded_postgrad_yn | |||||
| Â Â Â Â 1 | 11 (21%) | 20 (16%) | 10 (34%) | 10 (29%) | |
| Â Â Â Â 2 | 27 (51%) | 72 (59%) | 11 (38%) | 7 (20%) | |
| Â Â Â Â 3 | 11 (21%) | 22 (18%) | 7 (24%) | 17 (49%) | |
| Â Â Â Â 4 | 4 (7.5%) | 8 (6.6%) | 1 (3.4%) | 1 (2.9%) | |
| private_academic | |||||
| Â Â Â Â 1 | 32 (60%) | 48 (39%) | 4 (14%) | 12 (36%) | |
| Â Â Â Â 2 | 7 (13%) | 36 (30%) | 16 (55%) | 9 (27%) | |
| Â Â Â Â 3 | 10 (19%) | 21 (17%) | 8 (28%) | 5 (15%) | |
| Â Â Â Â 4 | 4 (7.5%) | 17 (14%) | 0 (0%) | 7 (21%) | |
| Â Â Â Â 5 | 0 (0%) | 0 (0%) | 1 (3.4%) | 0 (0%) | |
| Â Â Â Â (Missing) | 0 | 0 | 0 | 2 | |
| fellowship_years | |||||
| Â Â Â Â 1 | 20 (38%) | 44 (36%) | 3 (10%) | 15 (43%) | |
| Â Â Â Â 2 | 26 (49%) | 60 (49%) | 17 (59%) | 8 (23%) | |
| Â Â Â Â 3 | 3 (5.7%) | 10 (8.2%) | 8 (28%) | 11 (31%) | |
| Â Â Â Â 4 | 4 (7.5%) | 8 (6.6%) | 1 (3.4%) | 1 (2.9%) | |
| fellow_cat | <0.001 | ||||
| Â Â Â Â Probably or Definitely No | 4 (7.5%) | 6 (4.9%) | 0 (0%) | 0 (0%) | |
| Â Â Â Â Probably or Definitely Yes | 46 (87%) | 103 (84%) | 16 (55%) | 33 (94%) | |
| Â Â Â Â Undecided | 3 (5.7%) | 13 (11%) | 13 (45%) | 2 (5.7%) | |
| 1 n (%) | |||||
| 2 Fisher's exact test | |||||
t1 <- table(df2[c(7,13)])
mosaicplot(t1,
main = "Mosaic plot",
color = TRUE)
fisher.test(t1,simulate.p.value=TRUE,B=1e6)
Fisher's Exact Test for Count Data with simulated p-value (based on
1e+06 replicates)
data: t1
p-value = 4e-06
alternative hypothesis: two.sided
#debt_labs <- c("$<100k","100-150k","150-200k","200-250k","250k+","NR")
#ggplot(as.data.frame(t1)) + geom_bar(color="black",aes(debt, Freq, fill = as.factor(pgy_cat)), position = "dodge", stat = "summary", fun = "mean")+gghisto+ggtitle("Debt amount by PGY")+ scale_fill_manual(name = "PGY",values=c("indianred3", "royalblue1","pink"))+theme(axis.text.x = element_text(face="bold", color="royalblue4", size=10))+ scale_x_discrete(labels= debt_labs)+ylab("Frequency")+xlab("Amount of Debt")
#ggplot(as.data.frame(t1)) + geom_bar(color="black",aes(pgy_cat, Freq, fill = as.factor(debt)), position = "dodge", stat = "summary", fun = "mean")+gghisto+ggtitle("Debt amount by PGY")+theme(axis.text.x = element_text(face="bold", color="royalblue4", size=10))+ylab("Frequency")+xlab("PGY")
t2 <- table(df2[13:14])
mosaicplot(t2,
main = "Mosaic plot",
color = TRUE)
fisher.test(t2)
Fisher's Exact Test for Count Data
data: t2
p-value = 0.4002
alternative hypothesis: two.sided
t3 <- table(df2[c(2,13)])
mosaicplot(t3,
main = "Mosaic plot",
color = TRUE
)
fisher.test(t3)
Fisher's Exact Test for Count Data
data: t3
p-value = 0.01268
alternative hypothesis: two.sided
t4 <- table(df2[c(3,13)])
mosaicplot(t4,
main = "Mosaic plot",
color = TRUE)
fisher.test(t4,simulate.p.value=TRUE,B=1e6)
Fisher's Exact Test for Count Data with simulated p-value (based on
1e+06 replicates)
data: t4
p-value = 0.8098
alternative hypothesis: two.sided
t5 <- table(df2[c(4,13)])
mosaicplot(t5,
main = "Mosaic plot",
color = TRUE)
fisher.test(t5,simulate.p.value=TRUE,B=1e6)
Fisher's Exact Test for Count Data with simulated p-value (based on
1e+06 replicates)
data: t5
p-value = 1.7e-05
alternative hypothesis: two.sided
PGY x fellowship field … not sure
df2$fellow_cat <- relevel(factor(df2$fellow_cat), ref = "Undecided")
test <- multinom(fellow_cat ~ pgy_cat + children + race + age + marital_status + gender, data = df2)
# weights: 27 (16 variable)
initial value 259.272500
iter 10 value 99.598694
iter 20 value 97.415597
final value 97.398446
converged
summary(test)
Call:
multinom(formula = fellow_cat ~ pgy_cat + children + race + age +
marital_status + gender, data = df2)
Coefficients:
(Intercept) pgy_catmidlevel pgy_catsenior children
Probably or Definitely No -3.211769 -0.0167406 0.4833854 -0.9744640
Probably or Definitely Yes 2.241397 -1.2453126 -0.7024426 -0.2232432
race age marital_status gender
Probably or Definitely No 1.0130676 0.51664386 -0.2418292 -0.5433382
Probably or Definitely Yes 0.8579868 0.08093294 -0.3847674 -0.6313251
Std. Errors:
(Intercept) pgy_catmidlevel pgy_catsenior children
Probably or Definitely No 3.614377 1.309333 1.4408315 0.5645340
Probably or Definitely Yes 1.983032 0.707117 0.8236907 0.3085286
race age marital_status gender
Probably or Definitely No 0.3705511 0.4956061 0.4419739 0.3375919
Probably or Definitely Yes 0.1766069 0.3000523 0.2001907 0.2036109
Residual Deviance: 194.7969
AIC: 226.7969
z <- summary(test)$coefficients/summary(test)$standard.errors
z
(Intercept) pgy_catmidlevel pgy_catsenior children
Probably or Definitely No -0.8886093 -0.01278559 0.3354906 -1.726139
Probably or Definitely Yes 1.1302882 -1.76111249 -0.8527989 -0.723574
race age marital_status gender
Probably or Definitely No 2.733949 1.0424487 -0.5471572 -1.609453
Probably or Definitely Yes 4.858172 0.2697294 -1.9220041 -3.100645
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) pgy_catmidlevel pgy_catsenior children
Probably or Definitely No 0.3742131 0.98979885 0.7372550 0.08432247
Probably or Definitely Yes 0.2583548 0.07821937 0.3937708 0.46932734
race age marital_status gender
Probably or Definitely No 6.257980e-03 0.2972037 0.58427073 0.107517382
Probably or Definitely Yes 1.184746e-06 0.7873684 0.05460524 0.001930997
coefs <- exp(coef(test))
The relative risk ratio for a one-unit increase in the variable age is .9437 for those who probably or definitely will not do a fellowship vs. those who are undecided.
The relative risk ratio switching from pgy_cat = junior to senior is 1.6215547 for those who probably or definitely will not do a fellowship vs. those who are undecided.
multinomial logistic regression
model <- vglm(fellow_cat ~ pgy_cat + children + race + age + marital_status + gender, multinomial(refLevel = "Undecided"), data = df2)
summary(model)
Call:
vglm(formula = fellow_cat ~ pgy_cat + children + race + age +
marital_status + gender, family = multinomial(refLevel = "Undecided"),
data = df2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 -3.21177 3.61438 NA NA
(Intercept):2 2.24181 1.98302 1.131 0.25826
pgy_catmidlevel:1 -0.01689 1.30932 -0.013 0.98971
pgy_catmidlevel:2 -1.24539 0.70712 -1.761 0.07820 .
pgy_catsenior:1 0.48320 1.44082 0.335 0.73735
pgy_catsenior:2 -0.70254 0.82369 -0.853 0.39371
children:1 -0.97443 0.56452 -1.726 0.08432 .
children:2 -0.22328 0.30852 -0.724 0.46924
race:1 1.01309 0.37056 2.734 0.00626 **
race:2 0.85795 0.17660 4.858 1.19e-06 ***
age:1 0.51661 0.49560 1.042 0.29724
age:2 0.08089 0.30005 0.270 0.78749
marital_status:1 -0.24179 0.44197 -0.547 0.58433
marital_status:2 -0.38476 0.20019 -1.922 0.05461 .
gender:1 -0.54335 0.33759 -1.609 0.10751
gender:2 -0.63132 0.20361 -3.101 0.00193 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1])
Residual deviance: 194.7969 on 456 degrees of freedom
Log-likelihood: -97.3984 on 456 degrees of freedom
Number of Fisher scoring iterations: 7
Warning: Hauck-Donner effect detected in the following estimate(s):
'(Intercept):1', 'race:2'
Reference group is level 1 of the response
model <- polr(fellow_cat ~ pgy_cat + children + race + age + marital_status + gender, data = df2)
summary(model)
Re-fitting to get Hessian
Call:
polr(formula = fellow_cat ~ pgy_cat + children + race + age +
marital_status + gender, data = df2)
Coefficients:
Value Std. Error t value
pgy_catmidlevel -1.1359 0.5928 -1.9162
pgy_catsenior -0.8621 0.6761 -1.2751
children 0.0333 0.2469 0.1349
race 0.6657 0.1390 4.7884
age -0.1024 0.2427 -0.4221
marital_status -0.3264 0.1746 -1.8696
gender -0.5463 0.1575 -3.4695
Intercepts:
Value Std. Error t value
Undecided|Probably or Definitely No -2.5466 1.6153 -1.5766
Probably or Definitely No|Probably or Definitely Yes -2.1049 1.6099 -1.3074
Residual Deviance: 210.5097
AIC: 228.5097
(21 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel pgy_catsenior children race age
-1.13590828 -0.86207081 0.03330185 0.66571740 -0.10243469
marital_status gender
-0.32638039 -0.54627050
# fellowship category x predictive factors
# Find the p-value for a t-value of
pt(1.9162, 257-6, lower.tail=FALSE)*2 #MIDLEVEL NS
[1] 0.05647545
pt(3.4695, 257-6, lower.tail=FALSE)*2 #GENDER p=0.0006135578
[1] 0.0006135578
pt(4.7884, 257-6, lower.tail=FALSE)*2 #RACE p=2.875535e-06
[1] 2.875535e-06
What this menas: Gender and Race predict Fellowship Category
model <- polr(as.factor(enfolded_postgrad_yn) ~ pgy_cat + children + race + age + marital_status + gender, data = df2)
summary(model)
Re-fitting to get Hessian
Call:
polr(formula = as.factor(enfolded_postgrad_yn) ~ pgy_cat + children +
race + age + marital_status + gender, data = df2)
Coefficients:
Value Std. Error t value
pgy_catmidlevel -0.484347 0.2954 -1.63985
pgy_catsenior -0.065131 0.3940 -0.16532
children 0.017191 0.1338 0.12847
race -0.185734 0.1054 -1.76260
age 0.235912 0.1599 1.47509
marital_status 0.013933 0.1188 0.11730
gender -0.003618 0.1207 -0.02997
Intercepts:
Value Std. Error t value
1|2 -1.6049 0.9866 -1.6266
2|3 0.6577 0.9826 0.6694
3|4 2.6136 1.0123 2.5819
Residual Deviance: 549.051
AIC: 569.051
(21 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel pgy_catsenior children race age
-0.484346536 -0.065131234 0.017190830 -0.185734316 0.235911807
marital_status gender
0.013933108 -0.003617707
# enfoldedd or postgrad x predictive factors
# no factors have a p-value less than 0.05
What this menas: No predictive factors for enfolded or postgrad
model <- polr(as.factor(private_academic) ~ pgy_cat + children + race + age + marital_status + gender, data = df2)
summary(model)
Re-fitting to get Hessian
Call:
polr(formula = as.factor(private_academic) ~ pgy_cat + children +
race + age + marital_status + gender, data = df2)
Coefficients:
Value Std. Error t value
pgy_catmidlevel -0.057058 0.29736 -0.19188
pgy_catsenior -0.123041 0.39127 -0.31447
children 0.228087 0.13665 1.66911
race -0.184891 0.09628 -1.92032
age 0.239449 0.14672 1.63202
marital_status -0.008478 0.11490 -0.07378
gender 0.230479 0.11062 2.08349
Intercepts:
Value Std. Error t value
1|2 0.5794 0.9648 0.6005
2|3 1.8490 0.9740 1.8983
3|4 2.9927 0.9814 3.0493
4|5 6.4829 1.3852 4.6801
Residual Deviance: 592.7608
AIC: 614.7608
(23 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel pgy_catsenior children race age
-0.057058258 -0.123041128 0.228087164 -0.184891266 0.239449274
marital_status gender
-0.008477634 0.230478619
# private or academic x predictive factors
# Find the p-value for a t-value of
pt(2.08349, 257-6, lower.tail=FALSE)*2 #GENDER p=0.03821857
[1] 0.03821857
What this means: Gender predicts private or academic institution
model <- polr(as.factor(fellowship_years) ~ pgy_cat + children + race + age + marital_status + gender, data = df2)
summary(model)
Re-fitting to get Hessian
Call:
polr(formula = as.factor(fellowship_years) ~ pgy_cat + children +
race + age + marital_status + gender, data = df2)
Coefficients:
Value Std. Error t value
pgy_catmidlevel -0.21862 0.2983 -0.7329
pgy_catsenior -1.25900 0.4169 -3.0196
children 0.19577 0.1388 1.4101
race -0.28850 0.1073 -2.6889
age -0.03658 0.1617 -0.2262
marital_status 0.23509 0.1185 1.9844
gender 0.14019 0.1411 0.9934
Intercepts:
Value Std. Error t value
1|2 -1.1393 0.9906 -1.1502
2|3 1.2123 0.9920 1.2221
3|4 2.5758 1.0214 2.5218
Residual Deviance: 514.4281
AIC: 534.4281
(21 observations deleted due to missingness)
coefs <- coef(model);coefs
pgy_catmidlevel pgy_catsenior children race age
-0.21862326 -1.25900180 0.19577164 -0.28849950 -0.03658487
marital_status gender
0.23508709 0.14019127
# fellowship years x predictive factors
# Find the p-value for a t-value of
pt(3.0196, 257-6, lower.tail=FALSE)*2 #pgy_catsenior p=0.002792211
[1] 0.002792211
pt(2.6889, 257-6, lower.tail=FALSE)*2 #RACE 0.007649045
[1] 0.007649045
pt(1.9844, 257-6, lower.tail=FALSE)*2 #MARITAL STATUS p=0.04830027
[1] 0.04830027
What this means: Being a senior resident (6th or 7th year), race, and marital status predict number of fellowship years.
Below: examining marital status x fellowship years (4, 11)
t <- table(df2[c(4,11)]);t
marital_status
fellowship_years 1 2 3 4 6 7
1 25 9 46 0 2 0
2 14 25 61 11 2 1
3 9 5 7 5 4 5
4 3 0 10 1 0 0
mosaicplot(t,color = TRUE)
fisher.test(t,simulate.p.value=TRUE,B=1e6)
Fisher's Exact Test for Count Data with simulated p-value (based on
1e+06 replicates)
data: t
p-value = 1e-06
alternative hypothesis: two.sided
t <- table(df2[c(7,2)])
mosaicplot(t,color = TRUE)
fisher.test(t,simulate.p.value=TRUE,B=1e6)
Fisher's Exact Test for Count Data with simulated p-value (based on
1e+06 replicates)
data: t
p-value = 1e-06
alternative hypothesis: two.sided
t <- table(df2[c(7,3)])
mosaicplot(t,color = TRUE)
fisher.test(t,simulate.p.value=TRUE,B=1e6)
Fisher's Exact Test for Count Data with simulated p-value (based on
1e+06 replicates)
data: t
p-value = 1e-06
alternative hypothesis: two.sided
t <- table(df2[c(7,4)])
mosaicplot(t,color = TRUE)
fisher.test(t,simulate.p.value=TRUE,B=1e6)
Fisher's Exact Test for Count Data with simulated p-value (based on
1e+06 replicates)
data: t
p-value = 1e-06
alternative hypothesis: two.sided