Call libraries

Read in NSCH 2022 data

NSCH_2022 <- read.csv("data/NSCH_2022e_Topical_SPSS_CAHMI_DRCv3.csv")

n=54,103 with 923 variables (raw 2022 NSCH data)

Read in NSCH 2022 special geographies data

geo <- read.csv("data/NSCH_2022_Special_Geographies.csv")

n=54,103 with 5 variables

Left join NSCH data with geographies

NSCH_2022_geo <- left_join(NSCH_2022, geo, by ="HHID")

n=54,103 with 927 variables (left join with geographic data)

Limit variables

NSCH_2022_subset <- NSCH_2022_geo |> select(HHID, FIPSST, NY_SGREGION, STRATUM, FWC, CSHCN_22, 
                                             npm11MHnonCSHCN_22, npm11MHCSHCN_22, MedHome_22,
                                             instype_22, povlev4_22, AdultEduc_22, raceASIA_22, nom19ChHSt_22, sex_22, age3_22)

n=54,103 with 16 variables (restrict to variables of interest)

#Limit dataset to only NYS (FIPSST = 36)

NSCH_2022_subset_NY <- subset(NSCH_2022_subset, FIPSST == 36)

n=4,576 with 16 variables (subset to NYS)

Data preparation - cleaning NA and recoding variables

NSCH_2022_subset_NY_clean <- NSCH_2022_subset_NY |>
  mutate(CSHCN_22 = ifelse(CSHCN_22 == 1, 1, 0)) |>
  mutate(MedHome_22 = na_if(MedHome_22,99)) |>
  mutate(MedHome_22 = ifelse(MedHome_22 == 1, 1, 0)) |>
  mutate(NYC_Metro = ifelse(NY_SGREGION == 8, 0, 1))
  
table(NSCH_2022_subset_NY_clean$MedHome_22, useNA = "ifany")

## 
##    0    1 <NA> 
## 2368 2197   11

table(NSCH_2022_subset_NY_clean$CSHCN_22, useNA = "ifany")

## 
##    0    1 
## 3537 1039

table(NSCH_2022_subset_NY_clean$NYC_Metro, useNA = "ifany")

## 
##    0    1 
## 2404 2172

#Convert to factor variables and create NYC_Metro binary variable
NSCH_2022_subset_NY_clean <- NSCH_2022_subset_NY_clean |>
  mutate(
    CSHCN_22 = factor(ifelse(CSHCN_22 == 1, "CSHCN", "Non-CSHCN")),
    MedHome_22 = factor(ifelse(MedHome_22 == 1, "Has a medical home", "Does not have a medical home")),
    NYC_Metro = factor(ifelse(NYC_Metro == 1, "NYC metro", "Rest of state")),
    raceASIA_22 = factor(raceASIA_22, levels=c(1,2,3,4,5), labels=c("Hispanic", "White, non-Hispanic", "Black, non-Hispanic", "Asian, non-Hispanic", "Multi-racial, non-Hispanic")),
    AdultEduc_22 = factor(AdultEduc_22, levels=c(1,2,3,4), labels=c("Less than high school", "High school or GED", "Some college or technical school", "College degree or higher")),
    povlev4_22 = factor(povlev4_22, levels=c(1,2,3,4), labels=c("0-99% FPL", "100-199% FPL", "200-399% FPL", "400% FPL or greater")),
    instype_22 = factor(instype_22, levels=c(1,2,3,4), labels=c("Public health insurance only", "Private health insurance only", "Public and private insurance", "Uninsured")),
    nom19ChHSt_22 = factor(nom19ChHSt_22, levels=c(1,2,3), labels=c("Excellent or very good", "Good", "Fair or poor")),
    sex_22 = factor(sex_22, levels=c(1,2), labels=c("Male", "Female")),
    age3_22 = factor(age3_22, levels=c(1,2,3), labels=c("0-5 years old", "6-11 years old", "12-17 years old"))
    )

n=4,576 with 17 variables (addition of NYC_Metro variable)

Drop all rows with any NA or missing (or 99) values

NSCH_2022_subset_NY_clean_drop <- NSCH_2022_subset_NY_clean |> drop_na()

n=4,430 with 17 variables after listwise deletion

Some variables (medical home, insurance type, and overall health status) contained missing data. Prior to listwise deletion, the dataset contained 4,576 observations. I identified that 146 (3.2%) of these observations included missing data to at least one key variable. I performed a listwise deletion to remove all 146 rows with missing data.

The analytical sample flow is as follows: 1) n=54,103 with 923 variables (raw 2022 NSCH data) 2) n=54,103 with 927 variables (left join with geographic data) 3) n=54,103 with 16 variables (restrict to variables of interest) 4) n=4,576 with 16 variables (subset to NYS) 5) n=4,576 with 17 variables (addition of NYC_Metro variable) 6) n=4,430 with 17 variables (listwise deletion)

Analytical Dataset Conclusion:

The complete 2022 NSCH dataset consists of 54,103 observations. This dataset is then merged with the special geographies dataset by participant identification number (HHID). The dataset is then restricted to only include variables of interest and data for New York State (NYS), reducing the data to 4,576 observations. Exactly 146 observations included at least one missing answer to the restricted variable list. Since the rate of missing data points is small, I performed a listwise deletion to remove all rows with missing data. The NSCH released a report describing the negligible impact of missing data on their survey results. Finally, a “NYC_Metro” variable was created to compare rest of state (ROS) with the New York City metropolitan area (NYC). The final analytical sample size is 4,430 with 17 variables. This analytical sample has changed from my original proposal. This is due to the additions of age and sex variables, as well as the removal of unnecessary medical home component variables not used in the analysis. In addition, language and nativity were removed as this was not seen in similar studies and created worse fitting models.

Load survey weight

survey <- svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC, data = NSCH_2022_subset_NY_clean_drop) 

#Determine dummy coding and sample size breakdown for outcome and exposure
summary(NSCH_2022_subset_NY_clean_drop$MedHome_22)

## Does not have a medical home           Has a medical home 
##                         2283                         2147

summary(NSCH_2022_subset_NY_clean_drop$NYC_Metro)

##     NYC metro Rest of state 
##          2088          2342

Create a survey weighting strategy based on the NSCH documentation. The NSCH aims to be representative of the entire population of NYS children. In order to accomplish this a special weighting must be applied to the dataset. The weighting relies on the “fwc” variable, which defines the weighting for each state.

Relevel categorical predicators

NSCH_2022_subset_NY_clean_drop$instype_22    <- relevel(NSCH_2022_subset_NY_clean_drop$instype_22, ref = "Private health insurance only")
NSCH_2022_subset_NY_clean_drop$nom19ChHSt_22 <- relevel(NSCH_2022_subset_NY_clean_drop$nom19ChHSt_22, ref = "Fair or poor")
NSCH_2022_subset_NY_clean_drop$raceASIA_22 <- relevel(NSCH_2022_subset_NY_clean_drop$raceASIA_22, ref = "White, non-Hispanic")

Relevel all categorical predictors with more than two predictors. The general criteria was to use the most common option for nominal variables and the lowest level for ordinal variables.

Determine mean for medical home, comparing with publicly available data (45.4%)

svymean(~MedHome_22, survey, na.rm = TRUE) * 100

##                                          mean     SE
## MedHome_22Does not have a medical home 54.852 0.0102
## MedHome_22Has a medical home           45.148 0.0102

Validate code by comparing with publicly available statistics on dataset year

Create descriptive table of medhome and all covariates

survey |>
  tbl_svysummary(
    by=NYC_Metro,
    include=c(MedHome_22, NYC_Metro, age3_22, sex_22, raceASIA_22, 
              AdultEduc_22, povlev4_22, instype_22, CSHCN_22, nom19ChHSt_22),
    percent = "row", 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~  "{n_unweighted} ({p}%)"),
    digits = list(all_continuous() ~ c(1,1),
    all_categorical() ~ 0), 
    label = list(
      MedHome_22 ~ "Medical home status",
      age3_22 ~ "Age category",
      sex_22      ~ "Sex",
      raceASIA_22     ~ "Race/Ethnicity of Child",
      AdultEduc_22           ~ "Highest education of adults in household",
      povlev4_22      ~ "Income level of child’s household ",
      instype_22      ~ "Insurance type",
      CSHCN_22      ~ "Special health care needs status"
      )
  ) |>
  modify_header(update = list(label ~ "",
                              stat_1 ~ "**New York City Metropolitan**, 
                                            n = 2,088",
                              stat_2 ~ "**Rest of State**, 
                                            n = 2,342"))|>
  italicize_levels() |> 
  bold_labels() |>
  add_n(statistic = "{N_obs_unweighted}") |>
  modify_caption("Table 1. Descriptive Statistics — NSCH 2022 Analytic Sample (n = 4,430)") |>
  as_flex_table()

## Warning: The `update` argument of `modify_header()` is deprecated as of gtsummary 2.0.0.
## ℹ Use `modify_header(...)` input instead. Dynamic dots allow for syntax like
##   `modify_header(!!!list(...))`.
## ℹ The deprecated feature was likely used in the gtsummary package.
##   Please report the issue at <https://github.com/ddsjoberg/gtsummary/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Table 1. Descriptive Statistics — NSCH 2022 Analytic Sample (n = 4,430)
	N	New York City Metropolitan, n = 2,0881	Rest of State, n = 2,3421
Medical home status	4,430
Does not have a medical home		1,156 (55%)	1,127 (45%)
Has a medical home		932 (47%)	1,215 (53%)
Age category	4,430
0-5 years old		666 (51%)	716 (49%)
6-11 years old		619 (53%)	650 (47%)
12-17 years old		803 (49%)	976 (51%)
Sex	4,430
Male		1,088 (50%)	1,197 (50%)
Female		1,000 (52%)	1,145 (48%)
Race/Ethnicity of Child	4,430
Hispanic		581 (68%)	291 (32%)
White, non-Hispanic		774 (33%)	1,662 (67%)
Black, non-Hispanic		260 (71%)	117 (29%)
Asian, non-Hispanic		342 (78%)	104 (22%)
Multi-racial, non-Hispanic		131 (41%)	168 (59%)
Highest education of adults in household	4,430
Less than high school		110 (64%)	59 (36%)
High school or GED		229 (49%)	289 (51%)
Some college or technical school		369 (46%)	488 (54%)
College degree or higher		1,380 (51%)	1,506 (49%)
Income level of child’s household	4,430
0-99% FPL		361 (58%)	273 (42%)
100-199% FPL		378 (55%)	392 (45%)
200-399% FPL		437 (42%)	665 (58%)
400% FPL or greater		912 (52%)	1,012 (48%)
Insurance type	4,430
Public health insurance only		581 (54%)	592 (46%)
Private health insurance only		1,324 (48%)	1,604 (52%)
Public and private insurance		105 (66%)	76 (34%)
Uninsured		78 (49%)	70 (51%)
Special health care needs status	4,430
CSHCN		412 (44%)	608 (56%)
Non-CSHCN		1,676 (53%)	1,734 (47%)
nom19ChHSt_22	4,430
Excellent or very good		1,880 (50%)	2,127 (50%)
Good		176 (58%)	186 (42%)
Fair or poor		32 (61%)	29 (39%)
1n (unweighted) (%)

Create a table to show the breakdown of included variables by their medical home percentage, as stratified by New York City metropolitan or rest of state.

Create basic bar graph of medical home raw data

ggplot(survey, aes(x=MedHome_22)) +
  geom_bar(fill = "steelblue", alpha=0.7) +
  labs(x = "Medical Home Status", y = "Count", title = "Medical Home Status in NYS Children")

Create descriptive statistics on survey data and plot it on a bar graph

medhome_region<-svyby(formula = ~MedHome_22, by = ~NYC_Metro, design = survey, FUN = svymean, na.rm=T)
medhome_region_rename <- medhome_region |>
    rename(
      MHNo = `MedHome_22Does not have a medical home`, 
      se.MHNo = `se.MedHome_22Does not have a medical home`,
      MHYes = `MedHome_22Has a medical home`,
      se.MHYes = `se.MedHome_22Has a medical home`)

ggplot(medhome_region_rename) +
  geom_bar(aes(x=NYC_Metro, y=MHYes), stat="identity", fill = "steelblue", alpha=0.7) +
  geom_errorbar( aes(x=NYC_Metro, ymin=MHYes - se.MHYes, ymax=MHYes + se.MHYes), width=0.4, alpha=0.9) +
  theme_minimal() +
  scale_y_continuous(labels=scales::percent) +
  labs(x = "New York State Region", y = "Percent with Medical Home", title = "Medical Home Percentage by Region")

Exploratory visualization

medhome_income<-svyby(formula = ~MedHome_22, by = ~povlev4_22, design = survey, FUN = svymean, na.rm=T)

ggplot(medhome_income) +
  geom_bar(aes(x=povlev4_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
  theme_minimal() +
  scale_y_continuous(labels=scales::percent) +
  labs(x = "Income", y = "Percent with Medical Home", title = "Medical Home Percentage by Income")

medhome_race<-svyby(formula = ~MedHome_22, by = ~raceASIA_22, design = survey, FUN = svymean, na.rm=T)

ggplot(medhome_race) +
  geom_bar(aes(x=raceASIA_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
  theme_minimal() +
  scale_y_continuous(labels=scales::percent) +
  labs(x = "Race", y = "Percent with Medical Home", title = "Medical Home Percentage by Race")

medhome_CSHCN<-svyby(formula = ~MedHome_22, by = ~CSHCN_22, design = survey, FUN = svymean, na.rm=T)

ggplot(medhome_CSHCN) +
  geom_bar(aes(x=CSHCN_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
  theme_minimal() +
  scale_y_continuous(labels=scales::percent) +
  labs(x = "CSHCN Status", y = "Percent with Medical Home", title = "Medical Home Percentage by CSHCN Status")

medhome_overallhealth<-svyby(formula = ~MedHome_22, by = ~nom19ChHSt_22, design = survey, FUN = svymean, na.rm=T)

ggplot(medhome_overallhealth) +
  geom_bar(aes(x=nom19ChHSt_22, y=`MedHome_22Has a medical home`), stat="identity", fill = "steelblue", alpha=0.7) +
  theme_minimal() +
  scale_y_continuous(labels=scales::percent) +
  labs(x = "Overall Health Status", y = "Percent with Medical Home", title = "Medical Home Percentage by Overall Health Status")

Unadjusted simple regression model

fit1 <- svyglm(MedHome_22 ~ NYC_Metro, 
               design=survey, family=quasibinomial())
tbl_regression(fit1, exp = TRUE)

Characteristic	OR	95% CI	p-value
NYC_Metro
NYC metro	—	—
Rest of state	1.36	1.16, 1.60	<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

summary(fit1)

## 
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro, design = survey, family = quasibinomial())
## 
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC, 
##     data = NSCH_2022_subset_NY_clean_drop)
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -0.34721    0.06017  -5.771 8.44e-09 ***
## NYC_MetroRest of state  0.30938    0.08233   3.758 0.000174 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 1.000226)
## 
## Number of Fisher Scoring iterations: 3

Unadjusted logistic regression model between medical home and new york state region.

Mulitple regression model

fit2 <- svyglm(MedHome_22 ~ NYC_Metro+instype_22+povlev4_22
               +AdultEduc_22+raceASIA_22+CSHCN_22+nom19ChHSt_22+age3_22+sex_22, 
               design=survey, family=quasibinomial())
tbl_regression(fit2, exp = TRUE)

Characteristic	OR	95% CI	p-value
NYC_Metro
NYC metro	—	—
Rest of state	1.16	0.97, 1.39	0.11
instype_22
Public health insurance only	—	—
Private health insurance only	1.31	1.04, 1.66	0.024
Public and private insurance	0.90	0.55, 1.47	0.7
Uninsured	0.38	0.22, 0.66	<0.001
povlev4_22
0-99% FPL	—	—
100-199% FPL	1.09	0.80, 1.50	0.6
200-399% FPL	1.38	1.02, 1.88	0.038
400% FPL or greater	1.79	1.30, 2.45	<0.001
AdultEduc_22
Less than high school	—	—
High school or GED	0.94	0.58, 1.53	0.8
Some college or technical school	1.09	0.68, 1.76	0.7
College degree or higher	1.30	0.81, 2.08	0.3
raceASIA_22
Hispanic	—	—
White, non-Hispanic	1.36	1.07, 1.72	0.011
Black, non-Hispanic	0.83	0.60, 1.16	0.3
Asian, non-Hispanic	0.69	0.50, 0.95	0.025
Multi-racial, non-Hispanic	1.29	0.91, 1.82	0.2
CSHCN_22
CSHCN	—	—
Non-CSHCN	1.32	1.07, 1.63	0.009
nom19ChHSt_22
Excellent or very good	—	—
Good	0.75	0.53, 1.05	0.094
Fair or poor	0.19	0.08, 0.43	<0.001
age3_22
0-5 years old	—	—
6-11 years old	1.06	0.85, 1.32	0.6
12-17 years old	0.92	0.75, 1.12	0.4
sex_22
Male	—	—
Female	1.03	0.87, 1.22	0.7
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

summary(fit2)

## 
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro + instype_22 + povlev4_22 + 
##     AdultEduc_22 + raceASIA_22 + CSHCN_22 + nom19ChHSt_22 + age3_22 + 
##     sex_22, design = survey, family = quasibinomial())
## 
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC, 
##     data = NSCH_2022_subset_NY_clean_drop)
## 
## Coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                  -1.14864    0.27296  -4.208
## NYC_MetroRest of state                        0.14660    0.09144   1.603
## instype_22Private health insurance only       0.27177    0.12044   2.256
## instype_22Public and private insurance       -0.11012    0.25323  -0.435
## instype_22Uninsured                          -0.97107    0.28006  -3.467
## povlev4_22100-199% FPL                        0.08968    0.16163   0.555
## povlev4_22200-399% FPL                        0.32563    0.15663   2.079
## povlev4_22400% FPL or greater                 0.58017    0.16201   3.581
## AdultEduc_22High school or GED               -0.06172    0.24903  -0.248
## AdultEduc_22Some college or technical school  0.08731    0.24412   0.358
## AdultEduc_22College degree or higher          0.26108    0.23993   1.088
## raceASIA_22White, non-Hispanic                0.30752    0.12094   2.543
## raceASIA_22Black, non-Hispanic               -0.18130    0.16980  -1.068
## raceASIA_22Asian, non-Hispanic               -0.37371    0.16683  -2.240
## raceASIA_22Multi-racial, non-Hispanic         0.25287    0.17568   1.439
## CSHCN_22Non-CSHCN                             0.27980    0.10632   2.632
## nom19ChHSt_22Good                            -0.28958    0.17266  -1.677
## nom19ChHSt_22Fair or poor                    -1.66213    0.41293  -4.025
## age3_226-11 years old                         0.05796    0.11013   0.526
## age3_2212-17 years old                       -0.08783    0.10485  -0.838
## sex_22Female                                  0.02742    0.08562   0.320
##                                              Pr(>|t|)    
## (Intercept)                                  2.63e-05 ***
## NYC_MetroRest of state                       0.108959    
## instype_22Private health insurance only      0.024090 *  
## instype_22Public and private insurance       0.663700    
## instype_22Uninsured                          0.000531 ***
## povlev4_22100-199% FPL                       0.579021    
## povlev4_22200-399% FPL                       0.037675 *  
## povlev4_22400% FPL or greater                0.000346 ***
## AdultEduc_22High school or GED               0.804258    
## AdultEduc_22Some college or technical school 0.720614    
## AdultEduc_22College degree or higher         0.276593    
## raceASIA_22White, non-Hispanic               0.011032 *  
## raceASIA_22Black, non-Hispanic               0.285695    
## raceASIA_22Asian, non-Hispanic               0.025131 *  
## raceASIA_22Multi-racial, non-Hispanic        0.150110    
## CSHCN_22Non-CSHCN                            0.008523 ** 
## nom19ChHSt_22Good                            0.093587 .  
## nom19ChHSt_22Fair or poor                    5.79e-05 ***
## age3_226-11 years old                        0.598728    
## age3_2212-17 years old                       0.402250    
## sex_22Female                                 0.748806    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 1.001092)
## 
## Number of Fisher Scoring iterations: 4

Full adjusted multiple logistic regression model with all covariates.

10% test for full model

b_exposure_max <- coef(fit2)["NYC_MetroRest of state"]
interval_low <- b_exposure_max - 0.10 * abs(b_exposure_max)
interval_high <- b_exposure_max + 0.10 * abs(b_exposure_max)

cat("Exposure coefficient in maximum model:", round(b_exposure_max, 4), "\n")

## Exposure coefficient in maximum model: 0.1466

## Exposure coefficient in maximum model: -3.0688
cat("10% interval: (", round(interval_low, 4), ",", round(interval_high, 4), ")\n\n")

## 10% interval: ( 0.1319 , 0.1613 )

covariates_to_test <- c("instype_22","povlev4_22"
               ,"AdultEduc_22","raceASIA_22","CSHCN_22","nom19ChHSt_22", "sex_22", "age3_22")

assoc_table <- map_dfr(covariates_to_test, \(cov) {
  # Build formula without this covariate
  remaining <- setdiff(covariates_to_test, cov)
  form <- as.formula(paste("MedHome_22 ~ NYC_Metro +", paste(remaining, collapse = " + ")))
mod_reduced <- svyglm(form, design=survey, family=quasibinomial())
  b_reduced <- coef(mod_reduced)["NYC_MetroRest of state"]
  pct_change <- (b_reduced - b_exposure_max) / abs(b_exposure_max) * 100

  tibble(
    `Removed covariate` = cov,
    `Medical Home β (max)` = round(b_exposure_max, 4),
    `Medical Home β (without)` = round(b_reduced, 4),
    `% Change` = round(pct_change, 1),
    `Within 10%?` = ifelse(abs(pct_change) <= 10, "Yes (drop)", "No (keep)"),
    Confounder = ifelse(abs(pct_change) > 10, "Yes", "No")
  )
})

assoc_table |>
  kable(caption = "Associative Model: Systematic Confounder Assessment for Medical Home") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  column_spec(6, bold = TRUE)

Associative Model: Systematic Confounder Assessment for Medical Home
Removed covariate	Medical Home β (max)	Medical Home β (without)	% Change	Within 10%?	Confounder
instype_22	0.1466	0.1417	-3.4	Yes (drop)	No
povlev4_22	0.1466	0.1222	-16.7	No (keep)	Yes
AdultEduc_22	0.1466	0.1319	-10.0	No (keep)	Yes
raceASIA_22	0.1466	0.2929	99.8	No (keep)	Yes
CSHCN_22	0.1466	0.1341	-8.5	Yes (drop)	No
nom19ChHSt_22	0.1466	0.1513	3.2	Yes (drop)	No
sex_22	0.1466	0.1461	-0.4	Yes (drop)	No
age3_22	0.1466	0.1425	-2.8	Yes (drop)	No

10% criteria for confounders on the full model to determine which covariates to keep in the model.

Reduced mulitple regression model

fit2.5 <- svyglm(MedHome_22 ~ NYC_Metro+instype_22+povlev4_22
               +AdultEduc_22+raceASIA_22+CSHCN_22+age3_22+sex_22, 
               design=survey, family=quasibinomial())
tbl_regression(fit2.5, exp = TRUE)

Characteristic	OR	95% CI	p-value
NYC_Metro
NYC metro	—	—
Rest of state	1.16	0.97, 1.39	0.10
instype_22
Public health insurance only	—	—
Private health insurance only	1.34	1.06, 1.70	0.014
Public and private insurance	0.88	0.54, 1.43	0.6
Uninsured	0.38	0.22, 0.66	<0.001
povlev4_22
0-99% FPL	—	—
100-199% FPL	1.13	0.83, 1.55	0.4
200-399% FPL	1.43	1.06, 1.94	0.020
400% FPL or greater	1.86	1.36, 2.55	<0.001
AdultEduc_22
Less than high school	—	—
High school or GED	0.95	0.59, 1.54	0.8
Some college or technical school	1.09	0.68, 1.76	0.7
College degree or higher	1.30	0.81, 2.07	0.3
raceASIA_22
Hispanic	—	—
White, non-Hispanic	1.38	1.09, 1.74	0.008
Black, non-Hispanic	0.83	0.60, 1.16	0.3
Asian, non-Hispanic	0.67	0.49, 0.93	0.016
Multi-racial, non-Hispanic	1.29	0.92, 1.82	0.14
CSHCN_22
CSHCN	—	—
Non-CSHCN	1.45	1.19, 1.79	<0.001
age3_22
0-5 years old	—	—
6-11 years old	1.05	0.85, 1.31	0.6
12-17 years old	0.91	0.74, 1.11	0.4
sex_22
Male	—	—
Female	1.01	0.86, 1.20	0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

summary(fit2.5)

## 
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro + instype_22 + povlev4_22 + 
##     AdultEduc_22 + raceASIA_22 + CSHCN_22 + age3_22 + sex_22, 
##     design = survey, family = quasibinomial())
## 
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC, 
##     data = NSCH_2022_subset_NY_clean_drop)
## 
## Coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                  -1.30705    0.26680  -4.899
## NYC_MetroRest of state                        0.15126    0.09111   1.660
## instype_22Private health insurance only       0.29446    0.11950   2.464
## instype_22Public and private insurance       -0.13070    0.25010  -0.523
## instype_22Uninsured                          -0.96370    0.27968  -3.446
## povlev4_22100-199% FPL                        0.12438    0.15978   0.778
## povlev4_22200-399% FPL                        0.35907    0.15439   2.326
## povlev4_22400% FPL or greater                 0.62291    0.15979   3.898
## AdultEduc_22High school or GED               -0.05116    0.24684  -0.207
## AdultEduc_22Some college or technical school  0.09026    0.24223   0.373
## AdultEduc_22College degree or higher          0.26173    0.23812   1.099
## raceASIA_22White, non-Hispanic                0.31901    0.12051   2.647
## raceASIA_22Black, non-Hispanic               -0.18396    0.16876  -1.090
## raceASIA_22Asian, non-Hispanic               -0.39909    0.16524  -2.415
## raceASIA_22Multi-racial, non-Hispanic         0.25578    0.17479   1.463
## CSHCN_22Non-CSHCN                             0.37493    0.10462   3.584
## age3_226-11 years old                         0.05222    0.11006   0.475
## age3_2212-17 years old                       -0.09709    0.10452  -0.929
## sex_22Female                                  0.01367    0.08519   0.160
##                                              Pr(>|t|)    
## (Intercept)                                  9.98e-07 ***
## NYC_MetroRest of state                       0.096927 .  
## instype_22Private health insurance only      0.013770 *  
## instype_22Public and private insurance       0.601286    
## instype_22Uninsured                          0.000575 ***
## povlev4_22100-199% FPL                       0.436336    
## povlev4_22200-399% FPL                       0.020076 *  
## povlev4_22400% FPL or greater                9.83e-05 ***
## AdultEduc_22High school or GED               0.835820    
## AdultEduc_22Some college or technical school 0.709460    
## AdultEduc_22College degree or higher         0.271778    
## raceASIA_22White, non-Hispanic               0.008146 ** 
## raceASIA_22Black, non-Hispanic               0.275735    
## raceASIA_22Asian, non-Hispanic               0.015767 *  
## raceASIA_22Multi-racial, non-Hispanic        0.143459    
## CSHCN_22Non-CSHCN                            0.000342 ***
## age3_226-11 years old                        0.635157    
## age3_2212-17 years old                       0.352980    
## sex_22Female                                 0.872538    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.9984244)
## 
## Number of Fisher Scoring iterations: 4

Reduced multiple regression model removing some of the covariates from the full model that were found to not meet the 10% confounder criteria.

Interaction reduced model

fit3 <- svyglm(MedHome_22 ~ NYC_Metro*raceASIA_22+instype_22+povlev4_22
               +AdultEduc_22+raceASIA_22+CSHCN_22+age3_22+sex_22, 
               design=survey, family=quasibinomial())
tbl_regression(fit3, exp = TRUE)

Characteristic	OR	95% CI	p-value
NYC_Metro
NYC metro	—	—
Rest of state	1.49	1.02, 2.18	0.041
raceASIA_22
Hispanic	—	—
White, non-Hispanic	1.48	1.08, 2.02	0.014
Black, non-Hispanic	1.08	0.73, 1.62	0.7
Asian, non-Hispanic	0.76	0.52, 1.10	0.14
Multi-racial, non-Hispanic	1.24	0.77, 2.01	0.4
instype_22
Public health insurance only	—	—
Private health insurance only	1.34	1.06, 1.70	0.013
Public and private insurance	0.89	0.55, 1.46	0.6
Uninsured	0.39	0.23, 0.68	<0.001
povlev4_22
0-99% FPL	—	—
100-199% FPL	1.12	0.82, 1.54	0.5
200-399% FPL	1.42	1.05, 1.93	0.021
400% FPL or greater	1.85	1.36, 2.53	<0.001
AdultEduc_22
Less than high school	—	—
High school or GED	0.97	0.60, 1.57	0.9
Some college or technical school	1.11	0.69, 1.79	0.7
College degree or higher	1.33	0.84, 2.12	0.2
CSHCN_22
CSHCN	—	—
Non-CSHCN	1.46	1.19, 1.79	<0.001
age3_22
0-5 years old	—	—
6-11 years old	1.05	0.84, 1.30	0.7
12-17 years old	0.91	0.74, 1.11	0.4
sex_22
Male	—	—
Female	1.02	0.87, 1.21	0.8
NYC_Metro * raceASIA_22
Rest of state * White, non-Hispanic	0.79	0.50, 1.24	0.3
Rest of state * Black, non-Hispanic	0.39	0.20, 0.76	0.006
Rest of state * Asian, non-Hispanic	0.66	0.32, 1.38	0.3
Rest of state * Multi-racial, non-Hispanic	0.96	0.48, 1.90	0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

summary(fit3)

## 
## Call:
## svyglm(formula = MedHome_22 ~ NYC_Metro * raceASIA_22 + instype_22 + 
##     povlev4_22 + AdultEduc_22 + raceASIA_22 + CSHCN_22 + age3_22 + 
##     sex_22, design = survey, family = quasibinomial())
## 
## Survey design:
## svydesign(id = ~HHID, strata = ~FIPSST + STRATUM, weights = ~FWC, 
##     data = NSCH_2022_subset_NY_clean_drop)
## 
## Coefficients:
##                                                              Estimate
## (Intercept)                                                  -1.40748
## NYC_MetroRest of state                                        0.39726
## raceASIA_22White, non-Hispanic                                0.39281
## raceASIA_22Black, non-Hispanic                                0.08152
## raceASIA_22Asian, non-Hispanic                               -0.27997
## raceASIA_22Multi-racial, non-Hispanic                         0.21770
## instype_22Private health insurance only                       0.29447
## instype_22Public and private insurance                       -0.11389
## instype_22Uninsured                                          -0.93474
## povlev4_22100-199% FPL                                        0.11644
## povlev4_22200-399% FPL                                        0.35401
## povlev4_22400% FPL or greater                                 0.61712
## AdultEduc_22High school or GED                               -0.03459
## AdultEduc_22Some college or technical school                  0.10872
## AdultEduc_22College degree or higher                          0.28699
## CSHCN_22Non-CSHCN                                             0.37586
## age3_226-11 years old                                         0.04559
## age3_2212-17 years old                                       -0.09728
## sex_22Female                                                  0.02214
## NYC_MetroRest of state:raceASIA_22White, non-Hispanic        -0.23755
## NYC_MetroRest of state:raceASIA_22Black, non-Hispanic        -0.94466
## NYC_MetroRest of state:raceASIA_22Asian, non-Hispanic        -0.41063
## NYC_MetroRest of state:raceASIA_22Multi-racial, non-Hispanic -0.04570
##                                                              Std. Error t value
## (Intercept)                                                     0.27406  -5.136
## NYC_MetroRest of state                                          0.19412   2.046
## raceASIA_22White, non-Hispanic                                  0.15919   2.468
## raceASIA_22Black, non-Hispanic                                  0.20300   0.402
## raceASIA_22Asian, non-Hispanic                                  0.19119  -1.464
## raceASIA_22Multi-racial, non-Hispanic                           0.24457   0.890
## instype_22Private health insurance only                         0.11904   2.474
## instype_22Public and private insurance                          0.24962  -0.456
## instype_22Uninsured                                             0.27897  -3.351
## povlev4_22100-199% FPL                                          0.15935   0.731
## povlev4_22200-399% FPL                                          0.15390   2.300
## povlev4_22400% FPL or greater                                   0.15959   3.867
## AdultEduc_22High school or GED                                  0.24638  -0.140
## AdultEduc_22Some college or technical school                    0.24194   0.449
## AdultEduc_22College degree or higher                            0.23765   1.208
## CSHCN_22Non-CSHCN                                               0.10473   3.589
## age3_226-11 years old                                           0.11017   0.414
## age3_2212-17 years old                                          0.10419  -0.934
## sex_22Female                                                    0.08499   0.261
## NYC_MetroRest of state:raceASIA_22White, non-Hispanic           0.23164  -1.026
## NYC_MetroRest of state:raceASIA_22Black, non-Hispanic           0.34449  -2.742
## NYC_MetroRest of state:raceASIA_22Asian, non-Hispanic           0.37471  -1.096
## NYC_MetroRest of state:raceASIA_22Multi-racial, non-Hispanic    0.34967  -0.131
##                                                              Pr(>|t|)    
## (Intercept)                                                  2.93e-07 ***
## NYC_MetroRest of state                                       0.040769 *  
## raceASIA_22White, non-Hispanic                               0.013641 *  
## raceASIA_22Black, non-Hispanic                               0.688025    
## raceASIA_22Asian, non-Hispanic                               0.143158    
## raceASIA_22Multi-racial, non-Hispanic                        0.373434    
## instype_22Private health insurance only                      0.013405 *  
## instype_22Public and private insurance                       0.648241    
## instype_22Uninsured                                          0.000813 ***
## povlev4_22100-199% FPL                                       0.465000    
## povlev4_22200-399% FPL                                       0.021481 *  
## povlev4_22400% FPL or greater                                0.000112 ***
## AdultEduc_22High school or GED                               0.888366    
## AdultEduc_22Some college or technical school                 0.653176    
## AdultEduc_22College degree or higher                         0.227255    
## CSHCN_22Non-CSHCN                                            0.000336 ***
## age3_226-11 years old                                        0.679026    
## age3_2212-17 years old                                       0.350507    
## sex_22Female                                                 0.794490    
## NYC_MetroRest of state:raceASIA_22White, non-Hispanic        0.305176    
## NYC_MetroRest of state:raceASIA_22Black, non-Hispanic        0.006128 ** 
## NYC_MetroRest of state:raceASIA_22Asian, non-Hispanic        0.273204    
## NYC_MetroRest of state:raceASIA_22Multi-racial, non-Hispanic 0.896026    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.9959456)
## 
## Number of Fisher Scoring iterations: 4

Reduced multiple regression model with the inclusion of a racial effect modifier.

Model diagnostics: Deviance Residuals

plot(residuals(fit3, type="deviance"))

This plot shows the deviance of residuals. There appears to be an even spread of residuals around zero, which shows normality among residuals. A clear pattern could represent non-normal residuals. A “fan shape” (spread increasing with fitted values) indicates heteroscedasticity.

Model diagnostics: Predicted Probabilities

pp <- predict_response(fit3, terms = "NYC_Metro")
plot(pp)

This plot shows the predicted probabilities of medical home percentage based on region. The point represents the predicted odds ratio and line signifies the confidence interval.

Model diagnostics: Influential Observations/ Residuals vs. Leverage

cooks_d <- cooks.distance(fit3)

influence_df <- data.frame(
  observation = 1:length(cooks_d),
  cooks_d = cooks_d
) %>%
  mutate(influential = ifelse(cooks_d > 1, "Yes", "No"))

p5 <- ggplot(influence_df, aes(x = observation, y = cooks_d, color = influential)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
  labs(
    title = "Cook's Distance: Identifying Influential Observations",
    subtitle = "Values > 1 indicate potentially influential observations",
    x = "Observation Number",
    y = "Cook's Distance",
    color = "Influential?"
  ) +
  scale_color_manual(values = c("No" = "steelblue", "Yes" = "red")) +
  theme_minimal(base_size = 12)

ggplotly(p5)

This Cook’s distance plot looks for influential observations at a 1.0 threshold. The plot shows that individual observations don’t overly influence the model. An observation reaching the red line would be categorized as an influential observation.

Forest plot

tidy(fit3, conf.int = TRUE) %>%
  filter(term != "(Intercept)") %>%
  mutate(
    term = fct_reorder(term, estimate),
    sig  = ifelse(p.value < 0.05, "Significant (p < 0.05)", "Non-significant")
  ) %>%
  ggplot(aes(x = estimate, y = term, color = sig)) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray60") +
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.25, linewidth = 0.9) +
  geom_point(size = 3.5) +
  scale_color_manual(values = c("Significant (p < 0.05)" = "steelblue",
                                "Non-significant" = "tomato")) +
  labs(
    title    = "Partial Regression Coefficients with 95% Confidence Intervals",
    subtitle = "Outcome: Medical Home (NSCH 2022, n = 4,430)",
    x        = "Estimated Change in Medical Home Percentage (β̂)",
    y        = NULL,
    color    = NULL
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top")

## Warning: `geom_errorbarh()` was deprecated in ggplot2 4.0.0.
## ℹ Please use the `orientation` argument of `geom_errorbar()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `height` was translated to `width`.

This forest plot shows which covariates are significant in the adjusted and effect modified model. As determined from the model, our primary exposure, New York State region, is significantly associated with medical home in this model. The results also show that those in the highest income bracket and second highest (>400% FPL and 200-399%, respectively) are significantly associated with higher odds of having a medical home. In addition, both white, non-Hispanic and black, non-Hispanic groups had significant results, although white children were found to have improved medical home associations and black children were found to have an effect modification with region. Those without special health care needs were also shown to have significantly greater odds of having a medical home. Finally, insurance is significantly associated with medical home with private insurance having greater odds of medical home possession and uninsured having worse odds.

EPI553_FinalProject_CheckIn2_GoldmanMatthew

Matthew Goldman

2026-04-07