NSCH_2022 <- read.csv("data/NSCH_2022e_Topical_SPSS_CAHMI_DRCv3.csv")
n=54,103 with 923 variables (raw 2022 NSCH data)
geo <- read.csv("data/NSCH_2022_Special_Geographies.csv")
n=54,103 with 5 variables
NSCH_2022_geo <- left_join(NSCH_2022, geo, by ="HHID")
n=54,103 with 927 variables (left join with geographic data)
NSCH_2022_subset <- NSCH_2022_geo |> select(HHID, FIPSST, NY_SGREGION, STRATUM, FWC, CSHCN_22,
npm11MHnonCSHCN_22, npm11MHCSHCN_22, MedHome_22,
instype_22, povlev4_22, AdultEduc_22, raceASIA_22, nom19ChHSt_22, sex_22, age3_22)
n=54,103 with 16 variables (restrict to variables of interest)
#Limit dataset to only NYS (FIPSST = 36)
NSCH_2022_subset_NY <- subset(NSCH_2022_subset, FIPSST == 36)
n=4,576 with 16 variables (subset to NYS)
NSCH_2022_subset_NY_clean <- NSCH_2022_subset_NY |>
mutate(CSHCN_22 = ifelse(CSHCN_22 == 1, 1, 0)) |>
mutate(MedHome_22 = na_if(MedHome_22,99)) |>
mutate(MedHome_22 = ifelse(MedHome_22 == 1, 1, 0)) |>
mutate(NYC_Metro = ifelse(NY_SGREGION == 8, 0, 1))
table(NSCH_2022_subset_NY_clean$MedHome_22, useNA = "ifany")
##
## 0 1 <NA>
## 2368 2197 11
table(NSCH_2022_subset_NY_clean$CSHCN_22, useNA = "ifany")
##
## 0 1
## 3537 1039
table(NSCH_2022_subset_NY_clean$NYC_Metro, useNA = "ifany")
##
## 0 1
## 2404 2172
#Convert to factor variables and create NYC_Metro binary variable
NSCH_2022_subset_NY_clean <- NSCH_2022_subset_NY_clean |>
mutate(
CSHCN_22 = factor(ifelse(CSHCN_22 == 1, "CSHCN", "Non-CSHCN")),
MedHome_22 = factor(ifelse(MedHome_22 == 1, "Has a medical home", "Does not have a medical home")),
NYC_Metro = factor(ifelse(NYC_Metro == 1, "NYC metro", "Rest of state")),
raceASIA_22 = factor(raceASIA_22, levels=c(1,2,3,4,5), labels=c("Hispanic", "White, non-Hispanic", "Black, non-Hispanic", "Asian, non-Hispanic", "Multi-racial, non-Hispanic")),
AdultEduc_22 = factor(AdultEduc_22, levels=c(1,2,3,4), labels=c("Less than high school", "High school or GED", "Some college or technical school", "College degree or higher")),
povlev4_22 = factor(povlev4_22, levels=c(1,2,3,4), labels=c("0-99% FPL", "100-199% FPL", "200-399% FPL", "400% FPL or greater")),
instype_22 = factor(instype_22, levels=c(1,2,3,4), labels=c("Public health insurance only", "Private health insurance only", "Public and private insurance", "Uninsured")),
nom19ChHSt_22 = factor(nom19ChHSt_22, levels=c(1,2,3), labels=c("Excellent or very good", "Good", "Fair or poor")),
sex_22 = factor(sex_22, levels=c(1,2), labels=c("Male", "Female")),
age3_22 = factor(age3_22, levels=c(1,2,3), labels=c("0-5 years old", "6-11 years old", "12-17 years old"))
)
n=4,576 with 17 variables (addition of NYC_Metro variable)
mean(is.na(NSCH_2022_subset_NY_clean$MedHome_22)) * 100 # 0.24%
## [1] 0.2403846
mean(is.na(NSCH_2022_subset_NY_clean$NYC_Metro)) * 100 # 0.00%
## [1] 0
mean(is.na(NSCH_2022_subset_NY_clean$instype_22)) * 100 # 2.93%
## [1] 2.928322
mean(is.na(NSCH_2022_subset_NY_clean$raceASIA_22)) * 100 # 0.00%
## [1] 0
NSCH_2022_subset_NY_clean_drop <- NSCH_2022_subset_NY_clean |> drop_na()
n=4,430 with 17 variables after listwise deletion
Some variables (medical home, insurance type, and overall health status) contained missing data. Prior to listwise deletion, the dataset contained 4,576 observations. I identified that 146 (3.2%) of these observations included missing data to at least one key variable. I performed a listwise deletion to remove all 146 rows with missing data.
The analytical sample flow is as follows: 1) n=54,103 with 923 variables (raw 2022 NSCH data) 2) n=54,103 with 927 variables (left join with geographic data) 3) n=54,103 with 16 variables (restrict to variables of interest) 4) n=4,576 with 16 variables (subset to NYS) 5) n=4,576 with 17 variables (addition of NYC_Metro variable) 6) n=4,430 with 17 variables (listwise deletion)
The complete 2022 NSCH dataset consists of 54,103 observations. This dataset is then merged with the special geographies dataset by participant identification number (HHID). The dataset is then restricted to only include variables of interest and data for New York State (NYS), reducing the data to 4,576 observations. Exactly 146 observations included at least one missing answer to the restricted variable list. Since the rate of missing data points is small, I performed a listwise deletion to remove all rows with missing data. The NSCH released a report describing the negligible impact of missing data on their survey results. Finally, a “NYC_Metro” variable was created to compare rest of state (ROS) with the New York City metropolitan area (NYC). The final analytical sample size is 4,430 with 17 variables. This analytical sample has changed from my original proposal. This is due to the additions of age and sex variables, as well as the removal of unnecessary medical home component variables not used in the analysis. In addition, language and nativity were removed as this was not seen in similar studies and created worse fitting models.
NSCH_2022_subset_NY_clean_drop$instype_22 <- relevel(NSCH_2022_subset_NY_clean_drop$instype_22, ref = "Private health insurance only")
NSCH_2022_subset_NY_clean_drop$nom19ChHSt_22 <- relevel(NSCH_2022_subset_NY_clean_drop$nom19ChHSt_22, ref = "Fair or poor")
NSCH_2022_subset_NY_clean_drop$raceASIA_22 <- relevel(NSCH_2022_subset_NY_clean_drop$raceASIA_22, ref = "White, non-Hispanic")
NSCH_2022_subset_NY_clean_drop$sex_22 <- relevel(NSCH_2022_subset_NY_clean_drop$sex_22, ref = "Female")
NSCH_2022_subset_NY_clean_drop$CSHCN_22 <- relevel(NSCH_2022_subset_NY_clean_drop$CSHCN_22, ref = "Non-CSHCN")
NSCH_2022_subset_NY_clean_drop$NYC_Metro <- relevel(NSCH_2022_subset_NY_clean_drop$NYC_Metro, ref = "Rest of state")
Relevel all categorical predictors with more than two predictors. The general criteria was to use the most common option for nominal variables and the lowest level for ordinal variables.
survey <- svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC, data = NSCH_2022_subset_NY_clean_drop)
#Determine dummy coding and sample size breakdown for outcome and exposure
summary(NSCH_2022_subset_NY_clean_drop$MedHome_22)
## Does not have a medical home Has a medical home
## 2283 2147
summary(NSCH_2022_subset_NY_clean_drop$NYC_Metro)
## Rest of state NYC metro
## 2342 2088
Create a survey weighting strategy based on the NSCH documentation. The NSCH aims to be representative of the entire population of NYS children. In order to accomplish this a special weighting must be applied to the dataset. The weighting relies on the “fwc” variable, which defines the weighting for each state.
svymean(~MedHome_22, by = ~NYC_Metro, survey, na.rm = TRUE) * 100
## mean SE
## MedHome_22Does not have a medical home 54.852 0.0102
## MedHome_22Has a medical home 45.148 0.0102
Validate code by comparing with publicly available statistics on dataset year
survey |>
tbl_svysummary(
by=NYC_Metro,
include=c(MedHome_22, NYC_Metro, age3_22, sex_22, raceASIA_22,
AdultEduc_22, povlev4_22, instype_22, CSHCN_22, nom19ChHSt_22),
percent = "row",
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n_unweighted} ({p}%)"),
digits = list(all_continuous() ~ c(1,1),
all_categorical() ~ 0),
label = list(
MedHome_22 ~ "Medical home status",
age3_22 ~ "Age category",
sex_22 ~ "Sex",
raceASIA_22 ~ "Race/Ethnicity of Child",
AdultEduc_22 ~ "Highest education of adults in household",
povlev4_22 ~ "Income level of child’s household ",
instype_22 ~ "Insurance type",
CSHCN_22 ~ "Special health care needs status"
)
) |>
modify_header(update = list(label ~ "",
stat_2 ~ "**New York City Metropolitan**,
n = 2,088",
stat_1 ~ "**Rest of State**,
n = 2,342"))|>
italicize_levels() |>
bold_labels() |>
add_n(statistic = "{N_obs_unweighted}") |>
modify_caption("Table 1. Descriptive Statistics — NSCH 2022 Analytic Sample (n = 4,430)") |>
as_flex_table()
## Warning: The `update` argument of `modify_header()` is deprecated as of gtsummary 2.0.0.
## ℹ Use `modify_header(...)` input instead. Dynamic dots allow for syntax like
## `modify_header(!!!list(...))`.
## ℹ The deprecated feature was likely used in the gtsummary package.
## Please report the issue at <https://github.com/ddsjoberg/gtsummary/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
N | Rest of State, | New York City Metropolitan, | |
|---|---|---|---|
Medical home status | 4,430 | ||
Does not have a medical home | 1,127 (45%) | 1,156 (55%) | |
Has a medical home | 1,215 (53%) | 932 (47%) | |
Age category | 4,430 | ||
0-5 years old | 716 (49%) | 666 (51%) | |
6-11 years old | 650 (47%) | 619 (53%) | |
12-17 years old | 976 (51%) | 803 (49%) | |
Sex | 4,430 | ||
Female | 1,145 (48%) | 1,000 (52%) | |
Male | 1,197 (50%) | 1,088 (50%) | |
Race/Ethnicity of Child | 4,430 | ||
White, non-Hispanic | 1,662 (67%) | 774 (33%) | |
Hispanic | 291 (32%) | 581 (68%) | |
Black, non-Hispanic | 117 (29%) | 260 (71%) | |
Asian, non-Hispanic | 104 (22%) | 342 (78%) | |
Multi-racial, non-Hispanic | 168 (59%) | 131 (41%) | |
Highest education of adults in household | 4,430 | ||
Less than high school | 59 (36%) | 110 (64%) | |
High school or GED | 289 (51%) | 229 (49%) | |
Some college or technical school | 488 (54%) | 369 (46%) | |
College degree or higher | 1,506 (49%) | 1,380 (51%) | |
Income level of child’s household | 4,430 | ||
0-99% FPL | 273 (42%) | 361 (58%) | |
100-199% FPL | 392 (45%) | 378 (55%) | |
200-399% FPL | 665 (58%) | 437 (42%) | |
400% FPL or greater | 1,012 (48%) | 912 (52%) | |
Insurance type | 4,430 | ||
Private health insurance only | 1,604 (52%) | 1,324 (48%) | |
Public health insurance only | 592 (46%) | 581 (54%) | |
Public and private insurance | 76 (34%) | 105 (66%) | |
Uninsured | 70 (51%) | 78 (49%) | |
Special health care needs status | 4,430 | ||
Non-CSHCN | 1,734 (47%) | 1,676 (53%) | |
CSHCN | 608 (56%) | 412 (44%) | |
nom19ChHSt_22 | 4,430 | ||
Fair or poor | 29 (39%) | 32 (61%) | |
Excellent or very good | 2,127 (50%) | 1,880 (50%) | |
Good | 186 (42%) | 176 (58%) | |
1n (unweighted) (%) | |||
Create a table to show the breakdown of included variables by their medical home percentage, as stratified by New York City metropolitan or rest of state.
ggplot(survey, aes(x=MedHome_22)) +
geom_bar(fill = "steelblue", alpha=0.7) +
labs(x = "Medical Home Status", y = "Count", title = "Medical Home Status in NYS Children")
medhome_region <- svyby(formula = ~MedHome_22, by = ~NYC_Metro, design = survey, FUN = svymean, na.rm=TRUE)
medhome_region_rename <- medhome_region |>
rename(
MHNo = `MedHome_22Does not have a medical home`,
se.MHNo = `se.MedHome_22Does not have a medical home`,
MHYes = `MedHome_22Has a medical home`,
se.MHYes = `se.MedHome_22Has a medical home`)
ggplot(medhome_region_rename) +
geom_bar(aes(x=NYC_Metro, y=MHYes), stat="identity", fill = "steelblue", alpha=0.7) +
geom_errorbar( aes(x=NYC_Metro, ymin=MHYes - se.MHYes, ymax=MHYes + se.MHYes), width=0.4, alpha=0.9) +
theme_minimal() +
scale_y_continuous(labels=scales::percent) +
labs(x = "New York State Region", y = "Percent with Medical Home", title = "Medical Home Percentage by Region")
medhome_income<-svyby(formula = ~MedHome_22, by = ~povlev4_22, design = survey, FUN = svymean, na.rm=T)
ggplot(medhome_income) +
geom_bar(aes(x=povlev4_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
theme_minimal() +
scale_y_continuous(labels=scales::percent) +
labs(x = "Income", y = "Percent with Medical Home", title = "Medical Home Percentage by Income")
medhome_race<-svyby(formula = ~MedHome_22, by = ~raceASIA_22, design = survey, FUN = svymean, na.rm=T)
ggplot(medhome_race) +
geom_bar(aes(x=raceASIA_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
theme_minimal() +
scale_y_continuous(labels=scales::percent) +
labs(x = "Race", y = "Percent with Medical Home", title = "Medical Home Percentage by Race")
medhome_CSHCN<-svyby(formula = ~MedHome_22, by = ~CSHCN_22, design = survey, FUN = svymean, na.rm=T)
ggplot(medhome_CSHCN) +
geom_bar(aes(x=CSHCN_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
theme_minimal() +
scale_y_continuous(labels=scales::percent) +
labs(x = "CSHCN Status", y = "Percent with Medical Home", title = "Medical Home Percentage by CSHCN Status")
medhome_overallhealth<-svyby(formula = ~MedHome_22, by = ~nom19ChHSt_22, design = survey, FUN = svymean, na.rm=T)
ggplot(medhome_overallhealth) +
geom_bar(aes(x=nom19ChHSt_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
theme_minimal() +
scale_y_continuous(labels=scales::percent) +
labs(x = "Overall Health Status", y = "Percent with Medical Home", title = "Medical Home Percentage by Overall Health Status")
fit1 <- svyglm(MedHome_22 ~ NYC_Metro,
design=survey, family=quasibinomial())
tbl_regression(fit1, exp = TRUE)
| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| NYC_Metro | |||
| Rest of state | — | — | |
| NYC metro | 0.73 | 0.62, 0.86 | <0.001 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
summary(fit1)
##
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro, design = survey, family = quasibinomial())
##
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC,
## data = NSCH_2022_subset_NY_clean_drop)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03782 0.05620 -0.673 0.501032
## NYC_MetroNYC metro -0.30938 0.08233 -3.758 0.000174 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 1.000226)
##
## Number of Fisher Scoring iterations: 3
Unadjusted logistic regression model between medical home and new york state region.
fit2 <- svyglm(MedHome_22 ~ NYC_Metro+instype_22+povlev4_22
+AdultEduc_22+raceASIA_22+CSHCN_22+nom19ChHSt_22+age3_22+sex_22,
design=survey, family=quasibinomial())
tbl_regression(fit2, exp = TRUE)
| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| NYC_Metro | |||
| Rest of state | — | — | |
| NYC metro | 0.86 | 0.72, 1.03 | 0.11 |
| instype_22 | |||
| Private health insurance only | — | — | |
| Public health insurance only | 0.76 | 0.60, 0.96 | 0.024 |
| Public and private insurance | 0.68 | 0.42, 1.11 | 0.12 |
| Uninsured | 0.29 | 0.17, 0.50 | <0.001 |
| povlev4_22 | |||
| 0-99% FPL | — | — | |
| 100-199% FPL | 1.09 | 0.80, 1.50 | 0.6 |
| 200-399% FPL | 1.38 | 1.02, 1.88 | 0.038 |
| 400% FPL or greater | 1.79 | 1.30, 2.45 | <0.001 |
| AdultEduc_22 | |||
| Less than high school | — | — | |
| High school or GED | 0.94 | 0.58, 1.53 | 0.8 |
| Some college or technical school | 1.09 | 0.68, 1.76 | 0.7 |
| College degree or higher | 1.30 | 0.81, 2.08 | 0.3 |
| raceASIA_22 | |||
| White, non-Hispanic | — | — | |
| Hispanic | 0.74 | 0.58, 0.93 | 0.011 |
| Black, non-Hispanic | 0.61 | 0.45, 0.84 | 0.002 |
| Asian, non-Hispanic | 0.51 | 0.37, 0.69 | <0.001 |
| Multi-racial, non-Hispanic | 0.95 | 0.70, 1.29 | 0.7 |
| CSHCN_22 | |||
| Non-CSHCN | — | — | |
| CSHCN | 0.76 | 0.61, 0.93 | 0.009 |
| nom19ChHSt_22 | |||
| Fair or poor | — | — | |
| Excellent or very good | 5.27 | 2.35, 11.8 | <0.001 |
| Good | 3.95 | 1.67, 9.35 | 0.002 |
| age3_22 | |||
| 0-5 years old | — | — | |
| 6-11 years old | 1.06 | 0.85, 1.32 | 0.6 |
| 12-17 years old | 0.92 | 0.75, 1.12 | 0.4 |
| sex_22 | |||
| Female | — | — | |
| Male | 0.97 | 0.82, 1.15 | 0.7 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
summary(fit2)
##
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro + instype_22 + povlev4_22 +
## AdultEduc_22 + raceASIA_22 + CSHCN_22 + nom19ChHSt_22 + age3_22 +
## sex_22, design = survey, family = quasibinomial())
##
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC,
## data = NSCH_2022_subset_NY_clean_drop)
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -1.77766 0.49918 -3.561
## NYC_MetroNYC metro -0.14660 0.09144 -1.603
## instype_22Public health insurance only -0.27177 0.12044 -2.256
## instype_22Public and private insurance -0.38189 0.24855 -1.536
## instype_22Uninsured -1.24285 0.27733 -4.481
## povlev4_22100-199% FPL 0.08968 0.16163 0.555
## povlev4_22200-399% FPL 0.32563 0.15663 2.079
## povlev4_22400% FPL or greater 0.58017 0.16201 3.581
## AdultEduc_22High school or GED -0.06172 0.24903 -0.248
## AdultEduc_22Some college or technical school 0.08731 0.24412 0.358
## AdultEduc_22College degree or higher 0.26108 0.23993 1.088
## raceASIA_22Hispanic -0.30752 0.12094 -2.543
## raceASIA_22Black, non-Hispanic -0.48882 0.15752 -3.103
## raceASIA_22Asian, non-Hispanic -0.68123 0.15670 -4.347
## raceASIA_22Multi-racial, non-Hispanic -0.05465 0.15651 -0.349
## CSHCN_22CSHCN -0.27980 0.10632 -2.632
## nom19ChHSt_22Excellent or very good 1.66213 0.41293 4.025
## nom19ChHSt_22Good 1.37255 0.43992 3.120
## age3_226-11 years old 0.05796 0.11013 0.526
## age3_2212-17 years old -0.08783 0.10485 -0.838
## sex_22Male -0.02742 0.08562 -0.320
## Pr(>|t|)
## (Intercept) 0.000373 ***
## NYC_MetroNYC metro 0.108959
## instype_22Public health insurance only 0.024090 *
## instype_22Public and private insurance 0.124493
## instype_22Uninsured 7.60e-06 ***
## povlev4_22100-199% FPL 0.579021
## povlev4_22200-399% FPL 0.037675 *
## povlev4_22400% FPL or greater 0.000346 ***
## AdultEduc_22High school or GED 0.804258
## AdultEduc_22Some college or technical school 0.720614
## AdultEduc_22College degree or higher 0.276593
## raceASIA_22Hispanic 0.011032 *
## raceASIA_22Black, non-Hispanic 0.001927 **
## raceASIA_22Asian, non-Hispanic 1.41e-05 ***
## raceASIA_22Multi-racial, non-Hispanic 0.726985
## CSHCN_22CSHCN 0.008523 **
## nom19ChHSt_22Excellent or very good 5.79e-05 ***
## nom19ChHSt_22Good 0.001820 **
## age3_226-11 years old 0.598728
## age3_2212-17 years old 0.402250
## sex_22Male 0.748806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 1.001092)
##
## Number of Fisher Scoring iterations: 4
Full adjusted multiple logistic regression model with all covariates.
b_exposure_max <- coef(fit2)["NYC_MetroNYC metro"]
interval_low <- b_exposure_max - 0.10 * abs(b_exposure_max)
interval_high <- b_exposure_max + 0.10 * abs(b_exposure_max)
cat("Exposure coefficient in maximum model:", round(b_exposure_max, 4), "\n")
## Exposure coefficient in maximum model: -0.1466
## Exposure coefficient in maximum model: -3.0688
cat("10% interval: (", round(interval_low, 4), ",", round(interval_high, 4), ")\n\n")
## 10% interval: ( -0.1613 , -0.1319 )
covariates_to_test <- c("instype_22","povlev4_22"
,"AdultEduc_22","raceASIA_22","CSHCN_22","nom19ChHSt_22", "sex_22", "age3_22")
assoc_table <- map_dfr(covariates_to_test, \(cov) {
# Build formula without this covariate
remaining <- setdiff(covariates_to_test, cov)
form <- as.formula(paste("MedHome_22 ~ NYC_Metro +", paste(remaining, collapse = " + ")))
mod_reduced <- svyglm(form, design=survey, family=quasibinomial())
b_reduced <- coef(mod_reduced)["NYC_MetroNYC metro"]
pct_change <- (b_reduced - b_exposure_max) / abs(b_exposure_max) * 100
tibble(
`Removed covariate` = cov,
`Medical Home β (max)` = round(b_exposure_max, 4),
`Medical Home β (without)` = round(b_reduced, 4),
`% Change` = round(pct_change, 1),
`Within 10%?` = ifelse(abs(pct_change) <= 10, "Yes (drop)", "No (keep)"),
Confounder = ifelse(abs(pct_change) > 10, "Yes", "No")
)
})
assoc_table |>
kable(caption = "Associative Model: Systematic Confounder Assessment for Medical Home") |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
column_spec(6, bold = TRUE)
| Removed covariate | Medical Home β (max) | Medical Home β (without) | % Change | Within 10%? | Confounder |
|---|---|---|---|---|---|
| instype_22 | -0.1466 | -0.1417 | 3.4 | Yes (drop) | No |
| povlev4_22 | -0.1466 | -0.1222 | 16.7 | No (keep) | Yes |
| AdultEduc_22 | -0.1466 | -0.1319 | 10.0 | No (keep) | Yes |
| raceASIA_22 | -0.1466 | -0.2929 | -99.8 | No (keep) | Yes |
| CSHCN_22 | -0.1466 | -0.1341 | 8.5 | Yes (drop) | No |
| nom19ChHSt_22 | -0.1466 | -0.1513 | -3.2 | Yes (drop) | No |
| sex_22 | -0.1466 | -0.1461 | 0.4 | Yes (drop) | No |
| age3_22 | -0.1466 | -0.1425 | 2.8 | Yes (drop) | No |
10% criteria for confounders on the full model to determine which covariates to keep in the model.
fit2.5 <- svyglm(MedHome_22 ~ NYC_Metro+instype_22+povlev4_22
+AdultEduc_22+raceASIA_22+CSHCN_22+age3_22+sex_22,
design=survey, family=quasibinomial())
tbl_regression(fit2.5, exp = TRUE)
| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| NYC_Metro | |||
| Rest of state | — | — | |
| NYC metro | 0.86 | 0.72, 1.03 | 0.10 |
| instype_22 | |||
| Private health insurance only | — | — | |
| Public health insurance only | 0.74 | 0.59, 0.94 | 0.014 |
| Public and private insurance | 0.65 | 0.40, 1.06 | 0.084 |
| Uninsured | 0.28 | 0.17, 0.49 | <0.001 |
| povlev4_22 | |||
| 0-99% FPL | — | — | |
| 100-199% FPL | 1.13 | 0.83, 1.55 | 0.4 |
| 200-399% FPL | 1.43 | 1.06, 1.94 | 0.020 |
| 400% FPL or greater | 1.86 | 1.36, 2.55 | <0.001 |
| AdultEduc_22 | |||
| Less than high school | — | — | |
| High school or GED | 0.95 | 0.59, 1.54 | 0.8 |
| Some college or technical school | 1.09 | 0.68, 1.76 | 0.7 |
| College degree or higher | 1.30 | 0.81, 2.07 | 0.3 |
| raceASIA_22 | |||
| White, non-Hispanic | — | — | |
| Hispanic | 0.73 | 0.57, 0.92 | 0.008 |
| Black, non-Hispanic | 0.60 | 0.44, 0.82 | 0.001 |
| Asian, non-Hispanic | 0.49 | 0.36, 0.66 | <0.001 |
| Multi-racial, non-Hispanic | 0.94 | 0.69, 1.28 | 0.7 |
| CSHCN_22 | |||
| Non-CSHCN | — | — | |
| CSHCN | 0.69 | 0.56, 0.84 | <0.001 |
| age3_22 | |||
| 0-5 years old | — | — | |
| 6-11 years old | 1.05 | 0.85, 1.31 | 0.6 |
| 12-17 years old | 0.91 | 0.74, 1.11 | 0.4 |
| sex_22 | |||
| Female | — | — | |
| Male | 0.99 | 0.83, 1.17 | 0.9 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
summary(fit2.5)
##
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro + instype_22 + povlev4_22 +
## AdultEduc_22 + raceASIA_22 + CSHCN_22 + age3_22 + sex_22,
## design = survey, family = quasibinomial())
##
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC,
## data = NSCH_2022_subset_NY_clean_drop)
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -0.15372 0.28791 -0.534
## NYC_MetroNYC metro -0.15126 0.09111 -1.660
## instype_22Public health insurance only -0.29446 0.11950 -2.464
## instype_22Public and private insurance -0.42516 0.24568 -1.731
## instype_22Uninsured -1.25816 0.27667 -4.547
## povlev4_22100-199% FPL 0.12438 0.15978 0.778
## povlev4_22200-399% FPL 0.35907 0.15439 2.326
## povlev4_22400% FPL or greater 0.62291 0.15979 3.898
## AdultEduc_22High school or GED -0.05116 0.24684 -0.207
## AdultEduc_22Some college or technical school 0.09026 0.24223 0.373
## AdultEduc_22College degree or higher 0.26173 0.23812 1.099
## raceASIA_22Hispanic -0.31901 0.12051 -2.647
## raceASIA_22Black, non-Hispanic -0.50297 0.15706 -3.202
## raceASIA_22Asian, non-Hispanic -0.71810 0.15579 -4.609
## raceASIA_22Multi-racial, non-Hispanic -0.06324 0.15624 -0.405
## CSHCN_22CSHCN -0.37493 0.10462 -3.584
## age3_226-11 years old 0.05222 0.11006 0.475
## age3_2212-17 years old -0.09709 0.10452 -0.929
## sex_22Male -0.01367 0.08519 -0.160
## Pr(>|t|)
## (Intercept) 0.593431
## NYC_MetroNYC metro 0.096927 .
## instype_22Public health insurance only 0.013770 *
## instype_22Public and private insurance 0.083601 .
## instype_22Uninsured 5.57e-06 ***
## povlev4_22100-199% FPL 0.436336
## povlev4_22200-399% FPL 0.020076 *
## povlev4_22400% FPL or greater 9.83e-05 ***
## AdultEduc_22High school or GED 0.835820
## AdultEduc_22Some college or technical school 0.709460
## AdultEduc_22College degree or higher 0.271778
## raceASIA_22Hispanic 0.008146 **
## raceASIA_22Black, non-Hispanic 0.001373 **
## raceASIA_22Asian, non-Hispanic 4.15e-06 ***
## raceASIA_22Multi-racial, non-Hispanic 0.685682
## CSHCN_22CSHCN 0.000342 ***
## age3_226-11 years old 0.635157
## age3_2212-17 years old 0.352980
## sex_22Male 0.872538
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.9984244)
##
## Number of Fisher Scoring iterations: 4
Reduced multiple regression model removing some of the covariates from the full model that were found to not meet the 10% confounder criteria.
fit3 <- svyglm(MedHome_22 ~ NYC_Metro*CSHCN_22+instype_22+povlev4_22
+AdultEduc_22+raceASIA_22+CSHCN_22+age3_22+sex_22,
design=survey, family=quasibinomial())
tbl_regression(fit3, exp = TRUE)
| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| NYC_Metro | |||
| Rest of state | — | — | |
| NYC metro | 0.81 | 0.67, 0.99 | 0.041 |
| CSHCN_22 | |||
| Non-CSHCN | — | — | |
| CSHCN | 0.60 | 0.46, 0.78 | <0.001 |
| instype_22 | |||
| Private health insurance only | — | — | |
| Public health insurance only | 0.75 | 0.59, 0.94 | 0.014 |
| Public and private insurance | 0.65 | 0.40, 1.05 | 0.081 |
| Uninsured | 0.28 | 0.16, 0.49 | <0.001 |
| povlev4_22 | |||
| 0-99% FPL | — | — | |
| 100-199% FPL | 1.13 | 0.83, 1.55 | 0.4 |
| 200-399% FPL | 1.43 | 1.05, 1.93 | 0.022 |
| 400% FPL or greater | 1.85 | 1.35, 2.53 | <0.001 |
| AdultEduc_22 | |||
| Less than high school | — | — | |
| High school or GED | 0.96 | 0.59, 1.55 | 0.9 |
| Some college or technical school | 1.10 | 0.69, 1.77 | 0.7 |
| College degree or higher | 1.31 | 0.82, 2.08 | 0.3 |
| raceASIA_22 | |||
| White, non-Hispanic | — | — | |
| Hispanic | 0.72 | 0.57, 0.92 | 0.007 |
| Black, non-Hispanic | 0.60 | 0.44, 0.82 | 0.001 |
| Asian, non-Hispanic | 0.49 | 0.36, 0.67 | <0.001 |
| Multi-racial, non-Hispanic | 0.94 | 0.69, 1.27 | 0.7 |
| age3_22 | |||
| 0-5 years old | — | — | |
| 6-11 years old | 1.05 | 0.85, 1.31 | 0.6 |
| 12-17 years old | 0.91 | 0.74, 1.12 | 0.4 |
| sex_22 | |||
| Female | — | — | |
| Male | 0.98 | 0.83, 1.16 | 0.9 |
| NYC_Metro * CSHCN_22 | |||
| NYC metro * CSHCN | 1.37 | 0.91, 2.05 | 0.13 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
summary(fit3)
##
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro * CSHCN_22 + instype_22 +
## povlev4_22 + AdultEduc_22 + raceASIA_22 + CSHCN_22 + age3_22 +
## sex_22, design = survey, family = quasibinomial())
##
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC,
## data = NSCH_2022_subset_NY_clean_drop)
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -0.12605 0.28957 -0.435
## NYC_MetroNYC metro -0.20797 0.10172 -2.045
## CSHCN_22CSHCN -0.51289 0.13181 -3.891
## instype_22Public health insurance only -0.29413 0.11964 -2.458
## instype_22Public and private insurance -0.42734 0.24460 -1.747
## instype_22Uninsured -1.26142 0.27696 -4.555
## povlev4_22100-199% FPL 0.12487 0.15931 0.784
## povlev4_22200-399% FPL 0.35508 0.15450 2.298
## povlev4_22400% FPL or greater 0.61469 0.15994 3.843
## AdultEduc_22High school or GED -0.04109 0.24541 -0.167
## AdultEduc_22Some college or technical school 0.09631 0.24094 0.400
## AdultEduc_22College degree or higher 0.26978 0.23687 1.139
## raceASIA_22Hispanic -0.32315 0.12043 -2.683
## raceASIA_22Black, non-Hispanic -0.50854 0.15671 -3.245
## raceASIA_22Asian, non-Hispanic -0.70954 0.15611 -4.545
## raceASIA_22Multi-racial, non-Hispanic -0.06498 0.15620 -0.416
## age3_226-11 years old 0.05177 0.11004 0.470
## age3_2212-17 years old -0.09445 0.10444 -0.904
## sex_22Male -0.01545 0.08518 -0.181
## NYC_MetroNYC metro:CSHCN_22CSHCN 0.31159 0.20690 1.506
## Pr(>|t|)
## (Intercept) 0.663374
## NYC_MetroNYC metro 0.040962 *
## CSHCN_22CSHCN 0.000101 ***
## instype_22Public health insurance only 0.013994 *
## instype_22Public and private insurance 0.080688 .
## instype_22Uninsured 5.39e-06 ***
## povlev4_22100-199% FPL 0.433214
## povlev4_22200-399% FPL 0.021593 *
## povlev4_22400% FPL or greater 0.000123 ***
## AdultEduc_22High school or GED 0.867044
## AdultEduc_22Some college or technical school 0.689388
## AdultEduc_22College degree or higher 0.254783
## raceASIA_22Hispanic 0.007319 **
## raceASIA_22Black, non-Hispanic 0.001183 **
## raceASIA_22Asian, non-Hispanic 5.64e-06 ***
## raceASIA_22Multi-racial, non-Hispanic 0.677406
## age3_226-11 years old 0.638039
## age3_2212-17 years old 0.365871
## sex_22Male 0.856080
## NYC_MetroNYC metro:CSHCN_22CSHCN 0.132149
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.9973033)
##
## Number of Fisher Scoring iterations: 4
Reduced multiple regression model with the inclusion of a racial effect modifier.
hoslem.test(fit3$y, fitted(fit3), g = 10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: fit3$y, fitted(fit3)
## X-squared = 10.044, df = 8, p-value = 0.2619
NSCH_2022_subset_NY_clean_drop |>
mutate(pred_prob = fitted(fit3),
obs_outcome = as.numeric(MedHome_22) - 1,
decile = ntile(pred_prob, 10)) |>
group_by(decile) |>
summarise(
mean_pred = mean(pred_prob),
mean_obs = mean(obs_outcome),
.groups = "drop"
) |>
ggplot(aes(x = mean_pred, y = mean_obs)) +
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
geom_point(size = 3, color = "steelblue") +
geom_line(color = "steelblue") +
labs(title = "Calibration Plot: Observed vs. Predicted Probability of Medical Home",
subtitle = "Points should fall on the dashed line for perfect calibration",
x = "Mean Predicted Probability (within decile)",
y = "Observed Proportion (within decile)") +
theme_minimal()
This plot shows the goodness of fit and calibration of the model.
plot(residuals(fit3, type="deviance"), xlab = "Index", ylab = "Deviance Residuals")
This plot shows the deviance of residuals. There appears to be an even
spread of residuals around zero, which shows normality among residuals.
A clear pattern could represent non-normal residuals. A “fan shape”
(spread increasing with fitted values) indicates heteroscedasticity.
n <- length(hatvalues(fit3))
p <- length(coef(fit3)) - 1
h_bar <- (p + 1) / n
diag_df <- tibble::tibble(
leverage = hatvalues(fit3),
std_resid = rstandard(fit3, type = "deviance")
)
influential_points <- diag_df |>
filter((leverage > h_bar) & (std_resid < -2 | std_resid > 2))
cat("Number of influential points:", nrow(influential_points), "\n")
## Number of influential points: 197
## Number of influential points: 197
ggplot(diag_df, aes(x = std_resid, y = leverage)) +
geom_point(alpha = 0.3, color = "steelblue") +
geom_hline(yintercept = 2 * h_bar, color = "red",
linetype = "dashed", linewidth = 0.8) +
geom_vline(xintercept = c(-2, 2), color = "gray40",
linetype = "dotted") +
annotate("text", x = max(diag_df$std_resid, na.rm = TRUE) * 0.9,
y = 2 * h_bar + 0.0005, label = "2p/n leverage cutoff",
color = "red", hjust = 1) +
labs(
title = "Leverage vs. Standardized Deviance Residuals",
x = "Standardized Deviance Residual",
y = "Leverage (h_i)"
) +
theme_minimal()
The plot shows that some individual observations (n=197) potentially
overly influence the model. An observation reaching the red line and
more than two standard deviations away would be categorized as an
influential observation.
diag_df <- diag_df |>
mutate(cooks_d = cooks.distance(fit3))
# Threshold
cooks_threshold <- 4 / n
cat("Cook's distance threshold (4/n):", round(cooks_threshold, 5), "\n")
## Cook's distance threshold (4/n): 9e-04
## Cook's distance threshold (4/n): 9e-04
influential_obs <- diag_df |>
filter(cooks_d > cooks_threshold)
cat("Number of influential points (D > 4/n):", nrow(influential_obs), "\n")
## Number of influential points (D > 4/n): 153
## Number of influential points (D > 4/n): 153
ggplot(diag_df, aes(x = seq_len(n), y = cooks_d)) +
geom_point(alpha = 0.3, color = "steelblue") +
geom_hline(yintercept = cooks_threshold, color = "red",
linetype = "dashed", linewidth = 0.8) +
labs(title = "Cook's Distance by Observation Index",
subtitle = paste0("Red line: 4/n = ", round(cooks_threshold, 5)),
x = "Observation Index", y = "Cook's Distance") +
theme_minimal()
### Alternative Approach ###
cooks_d <- cooks.distance(fit3)
influence_df <- data.frame(
observation = 1:length(cooks_d),
cooks_d = cooks_d
) %>%
mutate(influential = ifelse(cooks_d > 1, "Yes", "No"))
p5 <- ggplot(influence_df, aes(x = observation, y = cooks_d, color = influential)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 4/n, linetype = "dashed", color = "red") +
labs(
title = "Cook's Distance: Identifying Influential Observations",
subtitle = "Values > 1 indicate potentially influential observations",
x = "Observation Number",
y = "Cook's Distance",
color = "Influential?"
) +
scale_color_manual(values = c("No" = "steelblue", "Yes" = "red")) +
theme_minimal(base_size = 12)
ggplotly(p5)
This Cook’s distance plot looks for influential observations at a 4/n threshold. The plot shows that some individual observations (n=153) potentially overly influence the model. An observation reaching the red line would be categorized as an influential observation.
pp <- predict_response(fit3, terms = "NYC_Metro")
plot(pp)
This plot shows the predicted probabilities of medical home percentage
based on region. The point represents the predicted odds ratio and line
signifies the confidence interval.
tidy(fit3, conf.int = TRUE) %>%
filter(term != "(Intercept)") %>%
mutate(
term = fct_reorder(term, estimate),
sig = ifelse(p.value < 0.05, "Significant (p < 0.05)", "Non-significant")
) %>%
ggplot(aes(x = estimate, y = term, color = sig)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray60") +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.25, linewidth = 0.9) +
geom_point(size = 3.5) +
scale_color_manual(values = c("Significant (p < 0.05)" = "steelblue",
"Non-significant" = "tomato")) +
labs(
title = "Partial Regression Coefficients with 95% Confidence Intervals",
subtitle = "Outcome: Medical Home (NSCH 2022, n = 4,430)",
x = "Estimated Change in Medical Home Percentage (β̂)",
y = NULL,
color = NULL
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")
## Warning: `geom_errorbarh()` was deprecated in ggplot2 4.0.0.
## ℹ Please use the `orientation` argument of `geom_errorbar()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `height` was translated to `width`.
#test this
plot_model(fit3)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the sjPlot package.
## Please report the issue at <https://github.com/strengejacke/sjPlot/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This forest plot shows which covariates are significant in the adjusted
and effect modified model. As determined from the model, our primary
exposure, New York State region, is significantly associated with
medical home in this model. The results also show that those in the
highest income bracket and second highest (>400% FPL and 200-399%,
respectively) are significantly associated with higher odds of having a
medical home. In addition, Asian, non-Hispanic Black, and non-Hispanic
children all had significantly reduced odds of having a medical home
compared to White children. CSHCN were also shown to have significantly
lower odds of having a medical home, as compared with children without
special health care needs; whereas, CSHCN from the NYC region had much
higher odds of having a medical home, although this was not a
significant association. This suggests that special health care needs
does not impact medical home rates equally throughout different regions
of New York State. Finally, insurance is significantly associated with
medical home status. Children with no insurance were shown to have the
lowest odds of having a medical home out of any other covariate, when
compared to children with private insurance.