── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.1 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: colorspace
Loading required package: grid
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
(status 2 uses the sf package in place of rgdal)
VIM is ready to use.
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
Attaching package: 'VIM'
The following object is masked from 'package:datasets':
sleep
DSU EDA Analysis
Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
You can add options to executable code like this
[1] "C:/GitLab Repository/inquisitiveimputers/R code"
# A tibble: 5,000 × 121
patientuid gender race hispanic dob outcome tract county_fips zipcode
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
2 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
3 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
4 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
5 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
6 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
7 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
8 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
9 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
10 3a336c26-5e1c-… M white unknown 2020… 1 3110… 31109 68521
# ℹ 4,990 more rows
# ℹ 112 more variables: USPS_ZIP_PREF_STATE <chr>, weight_t <dbl>,
# weight_c <dbl>, missing_geography <chr>, stcnty_c <chr>,
# rpl_theme1_c <dbl>, rpl_theme2_c <dbl>, rpl_theme3_c <dbl>,
# rpl_theme4_c <dbl>, rpl_themes_c <dbl>, area_sqmi_c <dbl>,
# e_totpop_c <dbl>, d_pop_c <dbl>, st_abbr_t <chr>, rpl_theme1_t <dbl>,
# rpl_theme2_t <dbl>, rpl_theme3_t <dbl>, rpl_theme4_t <dbl>, …
# A tibble: 6 × 121
patientuid gender race hispanic dob outcome tract county_fips zipcode
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 e97d1934-0fb4-4… M white unknown 2018… 1 2803… 28033 38632
2 E0051A0F-CE1D-4… M unkn… unknown 2017… 1 3401… 34013 07003
3 FB2CFA12-730B-4… M unkn… unknown 2018… 1 0801… 08013 80504
4 71de8492-8fb7-4… M unkn… not his… 2021… 0 1800… 18003 46845
5 58cdf738-8122-4… F unkn… unknown 2018… 0 3004… 30047 59864
6 0196aadf-b7d6-4… M white unknown 2019… 1 1207… 12073 32317
# ℹ 112 more variables: USPS_ZIP_PREF_STATE <chr>, weight_t <dbl>,
# weight_c <dbl>, missing_geography <chr>, stcnty_c <chr>,
# rpl_theme1_c <dbl>, rpl_theme2_c <dbl>, rpl_theme3_c <dbl>,
# rpl_theme4_c <dbl>, rpl_themes_c <dbl>, area_sqmi_c <dbl>,
# e_totpop_c <dbl>, d_pop_c <dbl>, st_abbr_t <chr>, rpl_theme1_t <dbl>,
# rpl_theme2_t <dbl>, rpl_theme3_t <dbl>, rpl_theme4_t <dbl>,
# rpl_themes_t <dbl>, area_sqmi_t <dbl>, e_totpop_t <dbl>, d_pop_t <dbl>, …
Explore Missing Data
[1] 566106
[1] 210888
Missing Data Table by Screening
#df_all <- df_all %>%
# mutate_at(vars(gender),
# ~labelled(., labels = c(Male = "M", Female = "F", `Other/Unknown` = "Missing")))
# Convert scrn to chr called Screen
df_all <- df_ABFM %>%
mutate(Screen = as.character(outcome))
# Print the variable labels
#print(val_labels(df_all$gender))
df_all <- df_ABFM %>%
mutate(race = if_else(race == "Missing", NA_character_, race)) %>%
mutate(racenew = if_else(is.na(race), 1, 0)) %>%
mutate(Screen = as.factor(outcome)) %>%
mutate(Screen = recode(Screen, "0" = "No", "1" = "Yes"))
df_all <- df_all %>%
mutate(Screen = fct_relevel(Screen, "No", "Yes"))
# Convert scrn to chr called Screen
df_all <- df_all %>%
mutate(Screen = as.character(outcome))
df_all <- df_all %>%
mutate(tract_na = ifelse(is.na(tract), 1, 0) %>% as.factor())
df_all <- df_all %>%
mutate(Screen = ifelse(Screen == "1", 1, 0) %>% as.factor())
head(df_all)# A tibble: 6 × 124
patientuid gender race hispanic dob outcome tract county_fips zipcode
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3a336c26-5e1c-4… M white unknown 2020… 1 3110… 31109 68521
2 3a336c26-5e1c-4… M white unknown 2020… 1 3110… 31109 68521
3 3a336c26-5e1c-4… M white unknown 2020… 1 3110… 31109 68521
4 3a336c26-5e1c-4… M white unknown 2020… 1 3110… 31109 68521
5 3a336c26-5e1c-4… M white unknown 2020… 1 3110… 31109 68521
6 3a336c26-5e1c-4… M white unknown 2020… 1 3110… 31109 68521
# ℹ 115 more variables: USPS_ZIP_PREF_STATE <chr>, weight_t <dbl>,
# weight_c <dbl>, missing_geography <chr>, stcnty_c <chr>,
# rpl_theme1_c <dbl>, rpl_theme2_c <dbl>, rpl_theme3_c <dbl>,
# rpl_theme4_c <dbl>, rpl_themes_c <dbl>, area_sqmi_c <dbl>,
# e_totpop_c <dbl>, d_pop_c <dbl>, st_abbr_t <chr>, rpl_theme1_t <dbl>,
# rpl_theme2_t <dbl>, rpl_theme3_t <dbl>, rpl_theme4_t <dbl>,
# rpl_themes_t <dbl>, area_sqmi_t <dbl>, e_totpop_t <dbl>, d_pop_t <dbl>, …
df_all <- df_all %>%
mutate(gender = ifelse(gender == "Other/Unknown", NA, gender)) %>%
mutate(gender = recode(gender, "M" = "Male", "F" = "Female")) %>%
mutate(race = recode(race, "american indian or alaska native" = "AIAN", "asian" = "Asian",
"black or african american" = "Black", "multiple races" = "Multiple",
"native hawaiian or other pacific islander" = "NHOPI", "unknown" = "Missing",
"white" = "White")) %>%
mutate(hispanic = recode(hispanic, "hispanic or latino" = "Yes", "not hispanic or latino" = "No",
"unknown" = "Missing")) %>%
mutate(tract_na = recode(tract_na, "0" = "Yes", "1" = "Missing")) %>%
mutate(Screen = recode(Screen, "0" = "No", "1" = "Yes")) %>%
mutate(gender = fct_relevel(gender, "Female", "Male")) %>%
mutate(race = fct_relevel(race, "AIAN", "Asian", "Black", "NHOPI", "White", "Multiple", "Missing")) %>%
mutate(hispanic = fct_relevel(hispanic, "No", "Yes", "Missing")) %>%
mutate(tract_na = fct_relevel(tract_na, "Yes", "Missing")) %>%
mutate(Screen = fct_relevel(Screen, "No", "Yes"))Warning: There was 1 warning in `mutate()`.
ℹ In argument: `tract_na = fct_relevel(tract_na, "Yes", "Missing")`.
Caused by warning:
! 1 unknown level in `f`: Missing
df_all$scrn <- df_all$outcome
var_label(df_all) <- list(
gender = "Gender",
race = "Race",
hispanic = "Hispanic",
tract_na = "Census Tract",
Screen = "Screen Test"
)
table1shell <- df_all %>% select(gender, race, hispanic, tract_na, rpl_themes_t, z_SE_nat_t, scrn) %>%
tbl_summary(by = scrn) %>%
add_overall() %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Developmental Screening**") %>%
modify_header(stat_1 = "**No**, n = 153,373", stat_2 = "**Yes**, n = 57,573") %>%
bold_labels() %>%
add_p()
table1shell <- modify_caption(table1shell, caption = "**Example of Table 1 for Descriptive Statistics**")
table1shell| Characteristic | Overall, N = 771,1751 | Developmental Screening | p-value2 | |
|---|---|---|---|---|
| No, n = 153,3731 | Yes, n = 57,5731 | |||
| Gender | <0.001 | |||
| Female | 372,079 (48%) | 255,971 (48%) | 116,108 (49%) | |
| Male | 398,424 (52%) | 277,157 (52%) | 121,267 (51%) | |
| Unknown | 672 | 529 | 143 | |
| Race | <0.001 | |||
| AIAN | 18,488 (2.4%) | 17,501 (3.3%) | 987 (0.4%) | |
| Asian | 11,829 (1.5%) | 8,412 (1.6%) | 3,417 (1.4%) | |
| Black | 55,166 (7.2%) | 37,943 (7.1%) | 17,223 (7.3%) | |
| NHOPI | 1,468 (0.2%) | 1,110 (0.2%) | 358 (0.2%) | |
| White | 365,921 (47%) | 249,513 (47%) | 116,408 (49%) | |
| Multiple | 3,534 (0.5%) | 1,625 (0.3%) | 1,909 (0.8%) | |
| Missing | 314,769 (41%) | 217,553 (41%) | 97,216 (41%) | |
| Hispanic | <0.001 | |||
| No | 332,933 (43%) | 240,293 (45%) | 92,640 (39%) | |
| Yes | 144,082 (19%) | 89,307 (17%) | 54,775 (23%) | |
| Missing | 294,160 (38%) | 204,057 (38%) | 90,103 (38%) | |
| Census Tract | ||||
| Yes | 771,175 (100%) | 533,657 (100%) | 237,518 (100%) | |
| rpl_themes_t | 0.52 (0.27, 0.76) | 0.53 (0.28, 0.76) | 0.49 (0.25, 0.76) | <0.001 |
| Unknown | 4,295 | 2,950 | 1,345 | |
| z_SE_nat_t | 0.03 (-0.09, 0.16) | 0.02 (-0.10, 0.15) | 0.04 (-0.08, 0.18) | <0.001 |
| Unknown | 205,069 | 137,123 | 67,946 | |
| 1 n (%); Median (IQR) | ||||
| 2 Pearson’s Chi-squared test; Wilcoxon rank sum test | ||||
#table1shell <- modify_caption(table1shell, "<div style='text-align: left; font-weight: bold; color: grey'> Table 1. Patient Characteristics</div>")
#table1shell
#save.image(file='myEnvironment.RData')
save(table1shell, file = "C:\\GitLab Repository\\inquisitiveimputers\\Documents\\Results\\EDA\\Table1Desc.Rdata")
#names(df_all)
head(df_all, 500)# A tibble: 500 × 125
patientuid gender race hispanic dob outcome tract county_fips zipcode
<chr> <fct> <fct> <fct> <chr> <chr> <chr> <chr> <chr>
1 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
2 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
3 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
4 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
5 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
6 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
7 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
8 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
9 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
10 3a336c26-5e1c-… Male White Missing 2020… 1 3110… 31109 68521
# ℹ 490 more rows
# ℹ 116 more variables: USPS_ZIP_PREF_STATE <chr>, weight_t <dbl>,
# weight_c <dbl>, missing_geography <chr>, stcnty_c <chr>,
# rpl_theme1_c <dbl>, rpl_theme2_c <dbl>, rpl_theme3_c <dbl>,
# rpl_theme4_c <dbl>, rpl_themes_c <dbl>, area_sqmi_c <dbl>,
# e_totpop_c <dbl>, d_pop_c <dbl>, st_abbr_t <chr>, rpl_theme1_t <dbl>,
# rpl_theme2_t <dbl>, rpl_theme3_t <dbl>, rpl_theme4_t <dbl>, …
# Summary table with chi-square test
df_all2 <- df_all %>%
mutate(race = if_else(race == "Missing", NA_character_, race)) %>%
mutate(racenew = if_else(is.na(race), 1, 0)) %>%
distinct(patientuid, .keep_all = TRUE)
missingtableshell <- df_all2 %>% select(Screen, gender, racenew, hispanic, rpl_themes_t,
acs_avg_hh_size_c, acs_pct_foreign_born_t) %>%
tbl_summary(
by = racenew,
type = list(Screen ~ "categorical"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 2
) %>%
add_overall() %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Missing Race**") %>%
modify_header(stat_1 = "**No**, n = 140,909", stat_2 = "**Yes**, n = 70,037") %>%
bold_labels() %>%
add_p(test = list(
all_categorical() ~ "chisq.test",
all_continuous() ~ "t.test"
))
missingtableshell <- modify_caption(missingtableshell, caption = "**Example of Table 2 for Missing Race Descriptive Statistics**")
missingtableshell| Characteristic | Overall, N = 210,8881 | Missing Race | p-value2 | |
|---|---|---|---|---|
| No, n = 140,9091 | Yes, n = 70,0371 | |||
| Screen Test | <0.001 | |||
| No | 153,319 (73%) | 99,681 (71%) | 53,638 (77%) | |
| Yes | 57,569 (27%) | 41,187 (29%) | 16,382 (23%) | |
| Gender | 0.003 | |||
| Female | 101,830 (48%) | 68,378 (49%) | 33,452 (48%) | |
| Male | 108,873 (52%) | 72,438 (51%) | 36,435 (52%) | |
| Unknown | 185 | 52 | 133 | |
| Hispanic | <0.001 | |||
| No | 104,275 (49%) | 96,314 (68%) | 7,961 (11%) | |
| Yes | 33,568 (16%) | 18,251 (13%) | 15,317 (22%) | |
| Missing | 73,045 (35%) | 26,303 (19%) | 46,742 (67%) | |
| rpl_themes_t | 0.54 (0.27) | 0.54 (0.26) | 0.55 (0.28) | <0.001 |
| Unknown | 22 | 8 | 14 | |
| acs_avg_hh_size_c | 2.58 (0.27) | 2.56 (0.25) | 2.62 (0.30) | <0.001 |
| acs_pct_foreign_born_t | 9.59 (11.94) | 8.63 (11.19) | 11.53 (13.12) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| 1 n (%); Mean (SD) | ||||
| 2 Pearson’s Chi-squared test; Welch Two Sample t-test | ||||
table2_m_r_acs <- df_all2 %>% select(racenew, acs_avg_hh_size_t,
acs_pct_child_disab_t,
acs_pct_ctz_naturalized_t, acs_pct_ctz_nonus_born_t,
acs_pct_ctz_us_born_t, acs_pct_foreign_born_t,
acs_pct_non_citizen_t, acs_pct_api_lang_t, acs_pct_english_t,
acs_pct_spanish_t, acs_pct_hh_no_internet_t,
acs_pct_child_1fam_t, acs_pct_children_grandparent_t,
acs_pct_hh_kid_1prnt_t, acs_pct_not_labor_t,
acs_pct_unemploy_t, acs_gini_index_t, acs_median_hh_inc_t,
acs_pct_health_inc_below137_t, acs_pct_inc50_t,
acs_pct_hh_food_stmp_t, acs_pct_bachelor_dgr_t,
acs_pct_owner_hu_t, acs_pct_vacant_hu_t, acs_pct_hu_no_veh_t,
acs_pct_medicaid_any_below64_t, acs_pct_uninsured_below64_t) %>%
tbl_summary(
by = racenew,
#type = list(Screen ~ "categorical"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 2
) %>%
add_overall() %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Missing Race**") %>%
modify_header(stat_1 = "**No**, n = 140,909", stat_2 = "**Yes**, n = 70,037") %>%
bold_labels() %>%
add_p(test = list(
all_categorical() ~ "chisq.test",
all_continuous() ~ "t.test"
))
table2_m_r_acs <- modify_caption(table2_m_r_acs, caption = "**Example of Table 2 for Missing Race Descriptive Statistics**")
table2_m_r_acs| Characteristic | Overall, N = 210,8881 | Missing Race | p-value2 | |
|---|---|---|---|---|
| No, n = 140,9091 | Yes, n = 70,0371 | |||
| acs_avg_hh_size_t | 2.66 (0.50) | 2.62 (0.46) | 2.72 (0.58) | <0.001 |
| Unknown | 23 | 9 | 14 | |
| acs_pct_child_disab_t | 4.83 (4.49) | 4.97 (4.60) | 4.55 (4.23) | <0.001 |
| Unknown | 72 | 44 | 28 | |
| acs_pct_ctz_naturalized_t | 3.83 (5.50) | 3.54 (5.38) | 4.41 (5.69) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_ctz_nonus_born_t | 4.59 (5.86) | 4.30 (5.75) | 5.16 (6.02) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_ctz_us_born_t | 90.41 (11.94) | 91.37 (11.19) | 88.47 (13.12) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_foreign_born_t | 9.59 (11.94) | 8.63 (11.19) | 11.53 (13.12) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_non_citizen_t | 5.01 (7.50) | 4.33 (6.62) | 6.37 (8.88) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_api_lang_t | 1.56 (3.64) | 1.45 (3.59) | 1.79 (3.73) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_english_t | 82.63 (22.34) | 84.48 (20.85) | 78.91 (24.67) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_spanish_t | 13.20 (20.88) | 11.58 (19.21) | 16.45 (23.55) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_hh_no_internet_t | 15.88 (10.10) | 16.39 (10.22) | 14.85 (9.78) | <0.001 |
| Unknown | 21 | 8 | 13 | |
| acs_pct_child_1fam_t | 30.63 (18.41) | 30.64 (18.86) | 30.61 (17.47) | 0.7 |
| Unknown | 154 | 94 | 60 | |
| acs_pct_children_grandparent_t | 8.76 (8.06) | 8.90 (8.16) | 8.50 (7.85) | <0.001 |
| Unknown | 71 | 43 | 28 | |
| acs_pct_hh_kid_1prnt_t | 17.11 (8.87) | 16.98 (8.87) | 17.39 (8.87) | <0.001 |
| Unknown | 21 | 8 | 13 | |
| acs_pct_not_labor_t | 38.13 (9.84) | 38.79 (9.91) | 36.82 (9.55) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_unemploy_t | 5.15 (3.94) | 5.21 (4.04) | 5.02 (3.73) | <0.001 |
| Unknown | 20 | 6 | 14 | |
| acs_gini_index_t | 0.42 (0.06) | 0.42 (0.06) | 0.42 (0.06) | <0.001 |
| Unknown | 32 | 15 | 17 | |
| acs_median_hh_inc_t | 59,785.26 (23,425.52) | 59,061.72 (23,206.82) | 61,239.93 (23,792.82) | <0.001 |
| Unknown | 354 | 268 | 86 | |
| acs_pct_health_inc_below137_t | 21.99 (12.57) | 21.95 (12.31) | 22.07 (13.07) | 0.054 |
| Unknown | 19 | 7 | 12 | |
| acs_pct_inc50_t | 6.00 (5.18) | 5.98 (5.07) | 6.03 (5.40) | 0.029 |
| Unknown | 19 | 7 | 12 | |
| acs_pct_hh_food_stmp_t | 12.32 (10.10) | 12.43 (10.20) | 12.11 (9.89) | <0.001 |
| Unknown | 21 | 8 | 13 | |
| acs_pct_bachelor_dgr_t | 16.11 (8.70) | 15.97 (8.57) | 16.39 (8.97) | <0.001 |
| Unknown | 13 | 2 | 11 | |
| acs_pct_owner_hu_t | 67.40 (19.75) | 68.13 (19.38) | 65.91 (20.39) | <0.001 |
| Unknown | 21 | 8 | 13 | |
| acs_pct_vacant_hu_t | 12.59 (9.34) | 13.14 (9.51) | 11.50 (8.90) | <0.001 |
| Unknown | 21 | 8 | 13 | |
| acs_pct_hu_no_veh_t | 6.07 (7.04) | 6.21 (7.22) | 5.79 (6.64) | <0.001 |
| Unknown | 21 | 8 | 13 | |
| acs_pct_medicaid_any_below64_t | 20.30 (13.15) | 20.40 (13.00) | 20.10 (13.45) | <0.001 |
| Unknown | 17 | 5 | 12 | |
| acs_pct_uninsured_below64_t | 12.00 (8.38) | 11.92 (8.39) | 12.18 (8.37) | <0.001 |
| Unknown | 17 | 5 | 12 | |
| 1 Mean (SD) | ||||
| 2 Welch Two Sample t-test | ||||
#table1shell <- modify_caption(table1shell, "<div style='text-align: left; font-weight: bold; color: grey'> Table 1. Patient Characteristics</div>")
#table1shell
#save.image(file='myEnvironment.RData')
save(table2_m_r_acs, file = "C:\\GitLab Repository\\inquisitiveimputers\\Documents\\Analysis Plan\\Table2MissingRaceACST.Rdata")
#names(df_all)The echo: false option disables the printing of code (only output is displayed).