Project Overview

Motivation

Covid-19 exacerbated the health inequality and there has been more discussion on public health issues
According to Community Service Society, in 2019, more than 1 million New Yorkers remained uninsured and New York ranks seventh among states on coverage.
- 34.5% are eligible for public insurance programs
- 42.1% have access through employer and self-purchased programs, yet not enrolled because of “cost, low perceived value, or other reasons”
- 24.5% does not have access to public health insurance program due to immigration status (There are health coverage options offered by NYC Care; if one is not eligible for any of the coverage option, they may be eligible to receive services under the NYC Health + Hospitals Options program)
Access to health care insurance is crucial for a person’s physical well being and longevity. A healthy person can be more productive to the economy, provide financial and emotional support to their family, and contribute to their communities.
Health insurance in US, however, is not covered universally and could be very expensive, especially if one do not have a job or their job does not offer healthcare benefits. Unexpected medical bills are extremely burdensome to the already uncovered and the financially disadvantaged individuals/families.
Children, usually dependents of their parents on health insurance access, may be more vulnerable to health issues if their parents do not have health insurance coverages.

Data

ACS 2015-2019 dataset

Topic

Which independent variables correlates to people’s decision to enroll in health insurance? Independent variables: race, median household income, employment sectors, education attainment, place of origin/citizenship, poverty.
How can free health insurance get to the targeted population, such as those that are not able to afford or non eligible for any public programs? How can we find the neighborhood that are the most in need using the linear regression analysis?

Methodology

In the first part, the study aims to understand correlations between access to health coverage and social-economic factors in NYC Bronx, Queens, Kings, and Manhattan at the census tract level. Unable to access health insurance may be positively related to lower household income/poverty, minority races, employment sectors, lower level of educational attainment, and citizenship status for the foreign origin population. The study also look into those factor’s correlations with public insurance.
In the second part, the study investigate the correlations between children’s (under 18) access to health coverage and social-economic factors in NYC Bronx, Queens, Kings, and Manhattan at the census tract level. Unable to access health insurance may be positively related to lower household income/poverty, minority races, school enrollment, employment status (if parents are employeed), and citizenship status for the foreign origin population. The study also look into those factor’s correlations with public insurance.
I converted all the data, except for median household income, into percentage of population.

Limitations:

There are different forms of health insurance. The study is not comprehensive to examine which type people are using or how much contribution people make
One may gain access to insurance through their family members. This study does not measure who provides insurance plans for their dependents.
This study looks at data prior to the Covid-19 pandemic, and thus not addressing many economic and policy-related changes and challenges incurred after 2020.

Part I: Health Insurance Access

Setting up working environment

health_ins_dat <- read_csv("Final_project_data_edited_2.csv")
to_add_dat <- read_csv("data_country_origin.csv")
to_add_dat_2 <-read_csv("data_poverty_short.csv")

health_ins_dat <- 
  left_join(health_ins_dat, to_add_dat, by='Geo_GEOID')%>%left_join(., to_add_dat_2, by='Geo_GEOID')

#glimpse(health_ins_dat)

Set up data as percentages of population & analysis environment.
Here I am converting all the variables to study as percentages of population. The rows of data (census tracts) removed are the ones with no population and no household income.
For the race data, I have grouped Total_Pop_Some_Other_Race_Alone, Total_Pop_Native_Hawaiian_and_Other_Pacific_Islander_Alone, and Total_Pop_American_Indian_and_Alaska_Native_Alone into one category.
For the education data, I grouped bachelor degree, master degree, and professional degree into one category, because they all represent higher education. High school degree or less than high school degree usually cause more barrier to work opportunities / income.
For the occupation data, I grouped Employed_Civilian_Pop_16_and_Over_Construction_Extraction_and_Maintenance_Occupations+ Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations+ Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Production_Occupations+ Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Transportation_and_Material_Moving_Occupations into one category.

#Find percentage of people with health insurance at the NYC scale 
health_ins_dat <- health_ins_dat|>filter(Total_Pop != 0,
                                         Median_Household_Income_2019 !=0,
                                         !is.na(Geo_County))

# % with health insurance in total population; % with public health insurance; % with private health insurance
health_ins_dat<- health_ins_dat|>
  mutate(perc_with_ins = Total_with_Health_Insurance_Coverage/Total_Pop,
         perc_with_ins_public = Total_with_Health_Insurance_Coverage_Public_Health_Coverage/Total_Pop,
         perc_with_ins_private = Total_with_Health_Insurance_Coverage_Private_Health_Insurance/Total_Pop)

#Part I
#1. Race % establish percentage of population for each race 
health_ins_dat<- health_ins_dat|>
  mutate(
    #Black
    perc_Pop_Black=Total_Pop_Black_or_African_American_Alone/Total_Pop,
    #Asian
    perc_Pop_Asian=Total_Pop_Asian_Alone/Total_Pop,
    #Hispanic
    perc_Pop_Hispanic_Latino=Total_Pop_Hispanic_Latino/Total_Pop,
    #other races
    perc_Pop_Other_Races=(Total_Pop_Some_Other_Race_Alone+
                            Total_Pop_Native_Hawaiian_and_Other_Pacific_Islander_Alone+
                            Total_Pop_American_Indian_and_Alaska_Native_Alone)/Total_Pop,
    #two or more races 
    perc_Pop_More_Races=Total_Pop_Two_or_More_Races/Total_Pop,
    #white
    perc_White=Total_Pop_White_Alone/Total_Pop,
    perc_NonWhite=1-perc_White)

#2. Education Attainment % in each census tract
health_ins_dat<- health_ins_dat|>
  mutate(perc_less_HS = as.numeric(Pop_25yrs_Less_than_High_School)/Total_Pop,
         perc_HS = as.numeric(Pop_25yrs_High_School_Graduate_Includes_Equivalency)/Total_Pop,
         perc_HS_or_Less = perc_less_HS+perc_HS,
         perc_College= as.numeric(Pop_25yrs_Some_College)/Total_Pop,
         perc_Bachelor_More = (as.numeric(Pop_25yrs_Bachelor_Degree)+as.numeric(Pop_25yrs_Master_Degree)+
                                 as.numeric(Pop_25yrs_Professional_School_Degree)+
                                 as.numeric(Pop_25yrs_Doctorate_Degree))/Total_Pop)

#3. Occupation %: 1. Management Professional 2. Services 3. Sales 3. Farm & Industrial
# equation: %of labor force in sector over 16yo: sector_employment/Civilian_Pop_in_Labor_Force_16_and_Over
health_ins_dat<-health_ins_dat|>
  mutate(perc_Management_Professional =
           Employed_Civilian_Pop_16_and_Over_Management_Professional_and_Related_Occupations/Civilian_Pop_in_Labor_Force_16_and_Over,
         perc_Service = 
           Employed_Civilian_Pop_16_and_Over_Service_Occupations/Civilian_Pop_in_Labor_Force_16_and_Over,
         perc_Sales = Employed_Civilian_Pop_16_and_Over_Sales_and_Office_Occupations/Civilian_Pop_in_Labor_Force_16_and_Over,
         perc_Farm_Industrial = (Employed_Civilian_Pop_16_and_Over_Construction_Extraction_and_Maintenance_Occupations+
                                 Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations+
                                 Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Production_Occupations+
                                 Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Transportation_and_Material_Moving_Occupations)/
                                Civilian_Pop_in_Labor_Force_16_and_Over)

#4. foreign origin population
health_ins_dat <-health_ins_dat|>
  mutate(
    perc_Origin_Native_Born = Origin_Native_Born/Total_Pop,
    perc_Origin_FO=(Origin_Foreign_Born)/Total_Pop,
    perc_Origin_FO_Citizen = (Origin_Foreign_Born-Origin_Foreign_Origin_Not_a_Citizen)/Total_Pop,
    perc_Origin_FO_NCitizen = Origin_Foreign_Origin_Not_a_Citizen/Total_Pop,
    perc_Poverty = Pop_Poverty/Total_Pop,
    perc_Non_Poverty = 1-perc_Poverty)

#Part II
#Children's insurance access % 
health_ins_dat <-health_ins_dat|>
  mutate(perc_children_in_school = (Enrolled_In_School_Enrolled_In_Nursery_School_Preschool+
                                    Enrolled_In_School_Enrolled_In_Kindergarten+                                      
                                    Enrolled_In_School_Enrolled_In_Grade_1_To_Grade_4+                                                                     Enrolled_In_School_Enrolled_In_Grade_5_To_Grade_8+
                                    Enrolled_In_School_Enrolled_In_Grade_9_To_Grade_12)/Pop_Under_18,
         #perc_with_ins = Total_with_Health_Insurance_Coverage/Total_Pop
         perc_adult_with_ins = (Total_with_Health_Insurance_Coverage-Pop_Under_18_with_Health_Insurance_Coverage)/
          (Total_Pop-Pop_Under_18),
         perc_adult_without_ins = 1-perc_adult_with_ins,
         perc_children_with_ins = Pop_Under_18_with_Health_Insurance_Coverage/Pop_Under_18,
         perc_children_without_ins = 1-perc_children_with_ins,
         perc_children_with_ins_public = Pop_Under_18_with_Health_Insurance_Coverage_Public_Health_Coverage/
           Pop_Under_18,
         perc_children_with_single_parents = Children_Living_with_Single_Parents/Pop_Under_18,
         perc_employed_16_over = Civilian_Pop_in_Labor_Force_16_and_Over_Employed/
           Civilian_Pop_in_Labor_Force_16_and_Over,
         perc_unemployed_16_over = 1-perc_employed_16_over,
         perc_male_employed_16_over = Civilian_Male_in_Labor_Force_16_and_Over_Employed/
           Civilian_Male_in_Labor_Force_16_and_Over,                                                                
         perc_female_employed_16_over = Civilian_Female_in_Labor_Force_16_and_Over_Employed/Civilian_Female_in_Labor_Force_16_and_Over,
         )

#Bronx, Queens, Kings, and New York
Bronx <- health_ins_dat|>
  filter(Geo_County=="Bronx")
Queens <- health_ins_dat|>
  filter(Geo_County=="Queens")
Kings <- health_ins_dat|>
  filter(Geo_County=="Kings")
NewYork <- health_ins_dat|>
  filter(Geo_County=="New_York")
#table(health_ins_dat$Geo_County)

#write.csv(health_ins_dat, "Export.csv", row.names=TRUE)

General Summary

First, I would like to create a table to quickly understand what are some general average statistics at Bronx, Queens, Kings, and Manhattan. Looking at the overall average statistics for % population with health insurance, % non white population, % population with less than bachelor degree, median household income, % population with occupations other than management professionals (the highest paying occupation category), and % foreign origin population with no citizenship in each county

#Find the summarized data Race, Education Attainment, Median Household Income, and Occupation about each county 
health_ins_dat|>
  group_by(Geo_County)|>
  summarise(HaveHealthInsurance= percent(round(mean(perc_with_ins),3)),
            NonWhitePopulation = percent(round(1-mean(perc_White),3)),
            LessThanBachelorDegree = percent(round(1-mean(perc_Bachelor_More),3)),
            MedianIncome = round(mean(Median_Household_Income_2019),3),
            NonManagementProfessional = percent(round(1-mean(perc_Management_Professional),3)),
            ForeignOriginNotCitizens = percent(round(mean(perc_Origin_FO_NCitizen),3))
            )

## # A tibble: 4 × 7
##   Geo_County HaveHealthInsurance NonWhitePopul…¹ LessT…² Media…³ NonMa…⁴ Forei…⁵
##   <chr>      <chr>               <chr>           <chr>     <dbl> <chr>   <chr>  
## 1 Bronx      91%                 89%             86%      54089. 76%     17%    
## 2 Kings      92%                 63%             74%      80246. 61%     14%    
## 3 New_York   94%                 52%             52%     142021. 43%     14%    
## 4 Queens     90%                 74%             76%      82765. 67%     19%    
## # … with abbreviated variable names ¹NonWhitePopulation,
## #   ²LessThanBachelorDegree, ³MedianIncome, ⁴NonManagementProfessional,
## #   ⁵ForeignOriginNotCitizens

#Plot health insurance access for each county
health_ins_dat%>%
  ggplot(aes(y=perc_with_ins, fill=Geo_County))+
  geom_boxplot(outlier.size=0.8,outlier.fill="grey", outlier.alpha=.2) +
  scale_fill_brewer(palette = "RdBu")+
  theme_minimal()+
  labs(title = "Health Insurance Access for Each County",
        subtitle = "% of population with health insurance",
        caption = "Data source: ACS 2015-2019",
        y = "Percentage% with health insurance", x = "County",
        tag = "Summary",
        fill ='County')

#Plot health insurance access for each county - public insurance
health_ins_dat%>%
  ggplot(aes(y=perc_with_ins_public, fill=Geo_County))+
  geom_boxplot(outlier.size=0.8,outlier.fill="grey", outlier.alpha=.2) +
  scale_fill_brewer(palette = "RdBu")+
  theme_minimal()+
  labs(title = "Health Insurance Access for Each County - Public Insurance",
        subtitle = "% of population with public health insurance",
        caption = "Data source: ACS 2015-2019",
        y = "Percentage% with public health insurance", x = "County",
        tag = "Summary",
        fill ='County')

We can observe from the plot above that 1. New York (Manhattan) has the highest median % of population with health insurance at the census tract level, followed by 2. Kings, 3. Queens, and 4. Bronx.
Queens and Kings seem to have more outliers of census tracts that fall out of the interquartile range between Q1 and Q3.

1: Race and Health Insurance

Data Exploration

Before exploring the race and health insurance correlation, I am exploring the data on racial distribution at each county by plotting Non-white population distribution for each county.

#Plot Non-white population distribution for each county
health_ins_dat%>%
  ggplot(aes(y=perc_NonWhite, fill=Geo_County))+
  geom_boxplot(outlier.size=0.8,outlier.fill="grey", outlier.alpha=.2) +
  scale_fill_brewer(palette = "RdBu")+
  theme_minimal()+
  labs(title = "Non-White Population %",
        subtitle = "at the census tracts level",
        caption = "Data source: ACS 2015-2019",
        y = "Percentage% Non-White Population", x = "County",
        tag = "Summary",
        fill ='County')

* There is an inverse relationship between % non-white population and % of population with health insurance. Bronx has the highest median value for non-white population, followed by Queens, Kings, and New York.

Linear Regression - All Counties

Now, I am probing into the relationship between race and health insurance access through linear regression. I am regressing each race in relationship to the white population.

#Linear regression: Race and Health Insurance Access
model_race <-lm(perc_with_ins ~ perc_Pop_Black +
                                perc_Pop_Asian+
                                perc_Pop_Hispanic_Latino+
                                perc_Pop_More_Races+
                                perc_Pop_Other_Races,
                                #perc_White
                                data = health_ins_dat)
summary(model_race)

## 
## Call:
## lm(formula = perc_with_ins ~ perc_Pop_Black + perc_Pop_Asian + 
##     perc_Pop_Hispanic_Latino + perc_Pop_More_Races + perc_Pop_Other_Races, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44761 -0.02060  0.00804  0.02998  0.12641 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               0.978158   0.003337 293.104   <2e-16 ***
## perc_Pop_Black           -0.041895   0.004537  -9.235   <2e-16 ***
## perc_Pop_Asian           -0.127995   0.007468 -17.139   <2e-16 ***
## perc_Pop_Hispanic_Latino -0.120429   0.005298 -22.731   <2e-16 ***
## perc_Pop_More_Races       0.069670   0.056065   1.243    0.214    
## perc_Pop_Other_Races     -0.023254   0.037849  -0.614    0.539    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04827 on 1970 degrees of freedom
## Multiple R-squared:  0.258,  Adjusted R-squared:  0.2561 
## F-statistic:   137 on 5 and 1970 DF,  p-value: < 2.2e-16

#Linear regression: Race and Health Insurance Access
model_race_public <- lm(perc_with_ins_public~
        perc_Pop_Black +
        perc_Pop_Asian+
        perc_Pop_Hispanic_Latino+
        perc_Pop_More_Races+
        perc_Pop_Other_Races,
        data = health_ins_dat)

summary(model_race_public)

## 
## Call:
## lm(formula = perc_with_ins_public ~ perc_Pop_Black + perc_Pop_Asian + 
##     perc_Pop_Hispanic_Latino + perc_Pop_More_Races + perc_Pop_Other_Races, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34283 -0.08728 -0.01590  0.06859  0.62325 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               0.279272   0.008919  31.311  < 2e-16 ***
## perc_Pop_Black            0.156312   0.012125  12.892  < 2e-16 ***
## perc_Pop_Asian            0.166915   0.019960   8.362  < 2e-16 ***
## perc_Pop_Hispanic_Latino  0.363896   0.014160  25.700  < 2e-16 ***
## perc_Pop_More_Races      -0.821163   0.149842  -5.480 4.79e-08 ***
## perc_Pop_Other_Races      0.127229   0.101157   1.258    0.209    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.129 on 1970 degrees of freedom
## Multiple R-squared:  0.3044, Adjusted R-squared:  0.3026 
## F-statistic: 172.4 on 5 and 1970 DF,  p-value: < 2.2e-16

The regression above shows % of minority race changes in population in linear relation to % population with health care, comparing to % of white population considering census tract in all four counties. All results are significant at the 99% confidence level. R-squared is 0.3162, explaining 31.62% of the data.
Y(% population with health care) = beta1 * white_population_percentage + beta2 * black_population_percentage + beta3 * asian_population_percentage + beta4 * two_or_more_races_population_percentage + beta5 * other_races_population_percentage + beta6 * hispanic_latino_population_percentage
beta1 * white_population_percentage = 1 - (beta2 * black_population_percentage + beta3 * asian_population_percentage + beta4 * two_or_more_races_population_percentage + beta5 * other_races_population_percentage beta6 * hispanic_latino_population_percentage)
alpha(black_population_percentage) = beta1-beta2 = -0.041895
- negative correlation: 1% increase in black population from white population leads to 0.41% decrease in % population with health insurance
alpha(asian_population_percentage) = beta1-beta3 = -0.127995
- negative correlation: 1% increase in Asian population from white population leads to 0.128% decrease in % population with health insurance
alpha(hispanic_latino_population_percentage) = beta1-beta5 = -0.120429
- negative correlation: 1% increase in Hispanic or Latino population from white population leads to 0.124% decrease in % population with health insurance
alpha(two_or_more_races_population_percentage) = beta1-beta4 = 0.069670 (not significant)
alpha(other_races_population_percentage) = beta1-beta6 = -0.023254 (not significant)

Linear Regression - Each County Separately

#linear model for each county - insurance
model_ins_race_by_County <- health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins ~  perc_Pop_Black +
                              perc_Pop_Asian+
                              perc_Pop_Hispanic_Latino+
                              perc_Pop_More_Races+
                              perc_Pop_Other_Races,
                              #perc_White
                              data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"))|>
  rename('race'='term')

model_ins_race_by_County

## # A tibble: 24 × 6
## # Groups:   Geo_County [4]
##    Geo_County race                     estimate std.error   p.value significan…¹
##    <chr>      <chr>                       <dbl>     <dbl>     <dbl> <chr>       
##  1 Bronx      (Intercept)               0.935     0.0134  9.98e-196 yes         
##  2 Bronx      perc_Pop_Black           -0.00492   0.0157  7.54e-  1 no          
##  3 Bronx      perc_Pop_Asian            0.0271    0.0522  6.03e-  1 no          
##  4 Bronx      perc_Pop_Hispanic_Latino -0.0454    0.0154  3.51e-  3 yes         
##  5 Bronx      perc_Pop_More_Races      -0.175     0.215   4.15e-  1 no          
##  6 Bronx      perc_Pop_Other_Races      0.00719   0.114   9.50e-  1 no          
##  7 Kings      (Intercept)               0.967     0.00473 0         yes         
##  8 Kings      perc_Pop_Black           -0.0332    0.00591 2.78e-  8 yes         
##  9 Kings      perc_Pop_Asian           -0.0877    0.0124  3.01e- 12 yes         
## 10 Kings      perc_Pop_Hispanic_Latino -0.120     0.0102  1.67e- 29 yes         
## # … with 14 more rows, and abbreviated variable name ¹`significant?`

Significant data to notice:
- In Kings and New York, 1% increase of Hispanic and Latino from white population leads to around 0.12% and 0.10% decrease in % population with health insurance
- In Queens, 1% increase of Hispanic and Latino from white population leads to around 0.22% decrease in % population with health insurance; 1% increase of Asian from white population leads to around 0.16% decrease in % population with health insurance
Thoughts: It looks like Kings, New York, and Queens county should look at improving insurance access for Latino or Hispanic population, and Queens needs to pay attention to Asian population as well
Looking at the correlation at county level, health insurance access is NOT statistically significant to race:
- any race categories except for “Hispanic or Latino” in Bronx
- people with ” two or more races” or “other races” in Kings and New York
- “other races” in Queens

Plot % Black, Asian, Hispanic/Latino, and White population in linear relationship to % population with health care

#####Black Population#####

# % Black population and health insurance access
health_ins_dat %>%
  ggplot(aes(y=perc_with_ins, x=perc_Pop_Black, na.rm=TRUE)) +
  geom_point(col = "blue", size = .8, alpha=.2)+
  facet_wrap(~Geo_County)+
  labs(y="Population with Health Insurance",
       x= "% of population: black alone",
       title = "% black population and health insurance access")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=.5, label.y=.1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# % Black population and public health insurance access
health_ins_dat %>%
  ggplot(aes(y=perc_with_ins_public, x=perc_Pop_Black, na.rm=TRUE)) +
  geom_point(col = "blue", size = .8, alpha = .2)+
  facet_wrap(~Geo_County)+
  labs(y="Population with Health Insurance",
       x= "% of population: black alone",
       title = "% black population and health insurance access")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=.5, label.y=.1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#regression - insured
health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins ~ perc_Pop_Black, data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"
  ))|>
  rename('race'='term')

## # A tibble: 8 × 6
## # Groups:   Geo_County [4]
##   Geo_County race           estimate std.error      p.value `significant?`
##   <chr>      <chr>             <dbl>     <dbl>        <dbl> <chr>         
## 1 Bronx      (Intercept)     0.904     0.00424 0            yes           
## 2 Bronx      perc_Pop_Black  0.0160    0.0124  0.196        no            
## 3 Kings      (Intercept)     0.925     0.00244 0            yes           
## 4 Kings      perc_Pop_Black -0.00388   0.00549 0.479        no            
## 5 New_York   (Intercept)     0.956     0.00314 0            yes           
## 6 New_York   perc_Pop_Black -0.0830    0.0145  0.0000000261 yes           
## 7 Queens     (Intercept)     0.894     0.00310 0            yes           
## 8 Queens     perc_Pop_Black  0.0429    0.00888 0.00000169   yes

#regression - public insurance
health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins_public ~ perc_Pop_Black, data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"
  ))|>
  rename('race'='term')

## # A tibble: 8 × 6
## # Groups:   Geo_County [4]
##   Geo_County race           estimate std.error   p.value `significant?`
##   <chr>      <chr>             <dbl>     <dbl>     <dbl> <chr>         
## 1 Bronx      (Intercept)     0.535     0.0138  4.86e-124 yes           
## 2 Bronx      perc_Pop_Black  0.0282    0.0402  4.84e-  1 no            
## 3 Kings      (Intercept)     0.433     0.00786 1.83e-264 yes           
## 4 Kings      perc_Pop_Black  0.0143    0.0177  4.21e-  1 no            
## 5 New_York   (Intercept)     0.270     0.0112  1.43e- 68 yes           
## 6 New_York   perc_Pop_Black  0.446     0.0517  6.45e- 16 yes           
## 7 Queens     (Intercept)     0.393     0.00478 0         yes           
## 8 Queens     perc_Pop_Black  0.00303   0.0137  8.25e-  1 no

For insurance: Black population is negatively correlated with % of population with insurance in New York, and positively correlated in Queens, at the 95% significant level.
For public insurance: Black population is positively correlated with % of population public insurance in New York ONLY, at the 95% significant level.

#####Asian Population#####


# % Asian population and health insurance access
health_ins_dat %>%
  ggplot(aes(y=perc_with_ins, x=perc_Pop_Asian)) +
  geom_point(col = "red", size = 0.8, alpha = 1/5)+
  facet_wrap(~Geo_County)+
  ylab("Population with Health Insurance")+
  xlab("% of population: Asian") +
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=.5, label.y=.1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# % Asian population and public health insurance access
health_ins_dat %>%
  ggplot(aes(y=perc_with_ins_public, x=perc_Pop_Asian)) +
  geom_point(col = "red", size = 0.8, alpha = 1/5)+
  facet_wrap(~Geo_County)+
  ylab("Population with Health Insurance")+
  xlab("% of population: Asian") +
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=.5, label.y=.1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#regression - public insurance
health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins_public ~ perc_Pop_Asian, data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"
  ))|>
  rename('race'='term')

## # A tibble: 8 × 6
## # Groups:   Geo_County [4]
##   Geo_County race           estimate std.error   p.value `significant?`
##   <chr>      <chr>             <dbl>     <dbl>     <dbl> <chr>         
## 1 Bronx      (Intercept)      0.578    0.00913 8.63e-185 yes           
## 2 Bronx      perc_Pop_Asian  -0.899    0.135   1.22e- 10 yes           
## 3 Kings      (Intercept)      0.419    0.00728 8.30e-276 yes           
## 4 Kings      perc_Pop_Asian   0.157    0.0378  3.61e-  5 yes           
## 5 New_York   (Intercept)      0.350    0.0147  7.71e- 68 yes           
## 6 New_York   perc_Pop_Asian  -0.177    0.0833  3.42e-  2 yes           
## 7 Queens     (Intercept)      0.379    0.00618 8.09e-269 yes           
## 8 Queens     perc_Pop_Asian   0.0617   0.0194  1.56e-  3 yes

Asian population is positively correlated with % of population public insurance in Kings and Queens, yet negatively correlated with % of population public insurance in Bronx and New York.

#####Hispanic Latino Population#####


# % Hispanic Latino population and health insurance access
health_ins_dat %>%
  ggplot(aes(y=perc_with_ins, x=perc_Pop_Hispanic_Latino)) +
  geom_point(col = "purple", size = 0.8, alpha = 1/5)+
  facet_wrap(~Geo_County)+
  ylab("Population with Health Insurance")+
  xlab("% of population:Hispanic or Latino") +
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=.5, label.y=.1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# % Hispanic Latino population and public health insurance access
health_ins_dat %>%
  ggplot(aes(y=perc_with_ins_public, x=perc_Pop_Hispanic_Latino)) +
  geom_point(col = "purple", size = 0.8, alpha = 1/5)+
  facet_wrap(~Geo_County)+
  ylab("Population with Health Insurance")+
  xlab("% of population: Hispanic or Latino") +
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=.5, label.y=.1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#regression - insured 
health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins ~ perc_Pop_Hispanic_Latino, data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"
  ))|>
  rename('race'='term')

## # A tibble: 8 × 6
## # Groups:   Geo_County [4]
##   Geo_County race                     estimate std.error   p.value significant…¹
##   <chr>      <chr>                       <dbl>     <dbl>     <dbl> <chr>        
## 1 Bronx      (Intercept)                0.931    0.00688 6.13e-288 yes          
## 2 Bronx      perc_Pop_Hispanic_Latino  -0.0420   0.0118  4.16e-  4 yes          
## 3 Kings      (Intercept)                0.944    0.00250 0         yes          
## 4 Kings      perc_Pop_Hispanic_Latino  -0.109    0.0102  9.54e- 25 yes          
## 5 New_York   (Intercept)                0.970    0.00342 0         yes          
## 6 New_York   perc_Pop_Hispanic_Latino  -0.104    0.0108  5.99e- 19 yes          
## 7 Queens     (Intercept)                0.945    0.00389 0         yes          
## 8 Queens     perc_Pop_Hispanic_Latino  -0.163    0.0119  1.79e- 37 yes          
## # … with abbreviated variable name ¹`significant?`

#regression - insured with public insurance
health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins_public ~ perc_Pop_Hispanic_Latino, data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"
  ))|>
  rename('race'='term')

## # A tibble: 8 × 6
## # Groups:   Geo_County [4]
##   Geo_County race                     estimate std.error   p.value significant…¹
##   <chr>      <chr>                       <dbl>     <dbl>     <dbl> <chr>        
## 1 Bronx      (Intercept)                 0.287   0.0169  1.83e- 46 yes          
## 2 Bronx      perc_Pop_Hispanic_Latino    0.469   0.0290  1.78e- 43 yes          
## 3 Kings      (Intercept)                 0.401   0.00845 4.80e-227 yes          
## 4 Kings      perc_Pop_Hispanic_Latino    0.201   0.0345  9.53e-  9 yes          
## 5 New_York   (Intercept)                 0.199   0.0107  8.54e- 50 yes          
## 6 New_York   perc_Pop_Hispanic_Latino    0.550   0.0339  1.28e- 41 yes          
## 7 Queens     (Intercept)                 0.346   0.00627 2.48e-244 yes          
## 8 Queens     perc_Pop_Hispanic_Latino    0.184   0.0192  1.43e- 20 yes          
## # … with abbreviated variable name ¹`significant?`

Hispanic and Latino population is positively correlated with % of population public insurance. The highest correlation is at Bronx, then New York, Queens, and Kings.

#####White Population#####

# % White population and health insurance access
health_ins_dat %>%
  ggplot(aes(x=perc_White, y=perc_with_ins)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  ylab("Population with Health Insurance")+
  xlab("% of population: white alone") +
  geom_smooth(method = lm)+
  theme_minimal()+
  stat_regline_equation(label.x=.1, label.y=.1)

## `geom_smooth()` using formula = 'y ~ x'

# % White population and public health insurance access
health_ins_dat %>%
  ggplot(aes(x=perc_White, y=perc_with_ins_public, color=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  ylab("Population with Health Insurance")+
  xlab("% of population: white alone") +
  geom_smooth(method = lm)+
  theme_minimal()+
  stat_regline_equation(label.x=.1, label.y=.1)+
  scale_color_brewer(palette = "RdBu")

## `geom_smooth()` using formula = 'y ~ x'

#regression
health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins_public ~ perc_White, data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"
  ))|>
  rename('race'='term')

## # A tibble: 8 × 6
## # Groups:   Geo_County [4]
##   Geo_County race        estimate std.error   p.value `significant?`
##   <chr>      <chr>          <dbl>     <dbl>     <dbl> <chr>         
## 1 Bronx      (Intercept)    0.602   0.00703 5.46e-225 yes           
## 2 Bronx      perc_White    -0.518   0.0327  2.44e- 42 yes           
## 3 Kings      (Intercept)    0.477   0.00909 6.46e-252 yes           
## 4 Kings      perc_White    -0.106   0.0190  3.61e-  8 yes           
## 5 New_York   (Intercept)    0.545   0.0132  5.62e-118 yes           
## 6 New_York   perc_White    -0.453   0.0237  6.22e- 52 yes           
## 7 Queens     (Intercept)    0.440   0.00526 0         yes           
## 8 Queens     perc_White    -0.180   0.0149  2.62e- 30 yes

white population is negatively correlated with % of population public insurance.

2: Median Household Income and Health Insurance Access

Data Exploration

#median income distribution per count per census tract
health_ins_dat_MedianHouseholdIncome <- health_ins_dat%>%
  filter(Households>0)

health_ins_dat_MedianHouseholdIncome%>%
  ggplot(aes(x=Median_Household_Income_2019, fill=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_histogram(bins=60)+
  scale_fill_brewer(palette = "RdBu")+
  theme_minimal()

#identify the outliers / super rich census tracts
health_ins_dat_MedianHouseholdIncome|>
  filter(Median_Household_Income_2019>250000)|>
  select(Geo_County,Geo_QName,perc_White, Per_Capita_Income)

## # A tibble: 57 × 4
##    Geo_County Geo_QName       perc_White Per_Capita_Income
##    <chr>      <chr>                <dbl>             <dbl>
##  1 Kings      Census_Tract_21      0.741            121521
##  2 Kings      Census_Tract_41      0.689             94831
##  3 New_York   Census_Tract_21      0.721            161757
##  4 New_York   Census_Tract_33      0.765            191549
##  5 New_York   Census_Tract_37      0.823            162230
##  6 New_York   Census_Tract_39      0.776            158186
##  7 New_York   Census_Tract_42      0.595             73526
##  8 New_York   Census_Tract_52      0.719            116547
##  9 New_York   Census_Tract_54      0.760            150168
## 10 New_York   Census_Tract_56      0.744            139701
## # … with 47 more rows

Linear Regression

#Linear regression: Median Household Income and Health Insurance Access

model_householdIncome <-
  lm(perc_with_ins~log(Median_Household_Income_2019),data = health_ins_dat_MedianHouseholdIncome)

summary(model_householdIncome)

## 
## Call:
## lm(formula = perc_with_ins ~ log(Median_Household_Income_2019), 
##     data = health_ins_dat_MedianHouseholdIncome)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41699 -0.01973  0.00795  0.03195  0.12215 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       0.490469   0.024514   20.01   <2e-16 ***
## log(Median_Household_Income_2019) 0.038104   0.002186   17.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05211 on 1974 degrees of freedom
## Multiple R-squared:  0.1334, Adjusted R-squared:  0.133 
## F-statistic:   304 on 1 and 1974 DF,  p-value: < 2.2e-16

#function to explore each county
lm_income <-function(data){
  fit <- lm(perc_with_ins~Median_Household_Income_2019, data=data)
  sum.fit <- summary(fit)
  
  data.frame(slope=sum.fit$coefficients[2,"Estimate"],
             se = sum.fit$coefficients[2, "Std. Error"],
             p_value = sum.fit$coefficients[2, "Pr(>|t|)"])
}

health_ins_dat_MedianHouseholdIncome|>
  group_by(Geo_County)|>
  do(lm_income(.))

## # A tibble: 4 × 4
## # Groups:   Geo_County [4]
##   Geo_County       slope           se  p_value
##   <chr>            <dbl>        <dbl>    <dbl>
## 1 Bronx      0.000000177 0.0000000863 4.16e- 2
## 2 Kings      0.000000353 0.0000000418 1.55e-16
## 3 New_York   0.000000293 0.0000000262 4.42e-24
## 4 Queens     0.00000139  0.0000000847 7.42e-51

Plot

#Scatterplot to demonstrate how median income correlates with health insurance access in each county
health_ins_dat_MedianHouseholdIncome%>%
  ggplot(aes(x=log(Median_Household_Income_2019), y=perc_with_ins, col=Geo_County, size = Total_Pop ))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation( label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#Scatterplot to demonstrate how median income correlates with public health insurance access in each county
health_ins_dat_MedianHouseholdIncome%>%
  ggplot(aes(x=log(Median_Household_Income_2019), y=perc_with_ins_public, col=Geo_County, size = Total_Pop ))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation( label.y=.05)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

3: Education Attainment of population over 25 years old

Data Exploration

health_ins_dat_Education<-health_ins_dat%>%
  pivot_longer(cols = perc_less_HS:perc_Bachelor_More & !perc_HS_or_Less, 
               #grep("Pop_25yrs_", names(health_ins_dat), value=TRUE)
               names_to = "Education_Attainment",
               values_to = "Percent_Pop")%>%
  select(c("Geo_County","Education_Attainment", "Percent_Pop", "Total_Pop"))

health_ins_dat_Education|>
  group_by(Geo_County)|>
  ggplot(aes(x=Education_Attainment,y=Percent_Pop, fill=Education_Attainment)) +
  facet_wrap(~Geo_County,nrow=4)+
  scale_fill_brewer(palette = "Set3")+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  theme_minimal()+
  coord_flip()

  #scale_x_discrete(guide = guide_axis(angle = 90))
  #theme(axis.text.x=element_text(angle=90, hjust=1))

Linear Regression

#edu and health insurance
model_edu_attain <- lm(perc_with_ins~
                       perc_less_HS+
                       perc_HS+
                       perc_College,
                       #perc_Bachelor_More,
                       data = health_ins_dat)
summary(model_edu_attain)

## 
## Call:
## lm(formula = perc_with_ins ~ perc_less_HS + perc_HS + perc_College, 
##     data = health_ins_dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4073 -0.0200  0.0061  0.0273  0.1357 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.981758   0.003892 252.266  < 2e-16 ***
## perc_less_HS -0.350121   0.015562 -22.498  < 2e-16 ***
## perc_HS      -0.137578   0.018479  -7.445 1.44e-13 ***
## perc_College  0.022096   0.023462   0.942    0.346    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04733 on 1972 degrees of freedom
## Multiple R-squared:  0.2859, Adjusted R-squared:  0.2848 
## F-statistic: 263.2 on 3 and 1972 DF,  p-value: < 2.2e-16

#linear model for each county
health_ins_dat|>
  group_by(Geo_County)|>
  do(tidy(lm(perc_with_ins ~
                       perc_less_HS+
                       perc_HS+
                       perc_College,
                       #perc_Bachelor_More,
                      data = .)))|>
  select(-statistic)|>
  mutate(`significant?` = case_when(
    `p.value` < 0.05 ~ "yes",
    TRUE ~ "no"
  ))|>
  rename('education attainment'='term')

## # A tibble: 16 × 6
## # Groups:   Geo_County [4]
##    Geo_County `education attainment` estimate std.error   p.value `significant?`
##    <chr>      <chr>                     <dbl>     <dbl>     <dbl> <chr>         
##  1 Bronx      (Intercept)              0.953    0.0193  9.38e-153 yes           
##  2 Bronx      perc_less_HS            -0.207    0.0454  7.43e-  6 yes           
##  3 Bronx      perc_HS                 -0.0763   0.0615  2.16e-  1 no            
##  4 Bronx      perc_College             0.0299   0.0653  6.47e-  1 no            
##  5 Kings      (Intercept)              0.980    0.00647 0         yes           
##  6 Kings      perc_less_HS            -0.308    0.0245  4.93e- 33 yes           
##  7 Kings      perc_HS                 -0.0421   0.0268  1.17e-  1 no            
##  8 Kings      perc_College            -0.0897   0.0385  1.99e-  2 yes           
##  9 New_York   (Intercept)              0.981    0.00562 8.30e-275 yes           
## 10 New_York   perc_less_HS            -0.185    0.0353  3.27e-  7 yes           
## 11 New_York   perc_HS                 -0.135    0.0642  3.65e-  2 yes           
## 12 New_York   perc_College            -0.0652   0.0722  3.67e-  1 no            
## 13 Queens     (Intercept)              0.987    0.0109  0         yes           
## 14 Queens     perc_less_HS            -0.668    0.0350  1.56e- 64 yes           
## 15 Queens     perc_HS                 -0.0981   0.0374  8.97e-  3 yes           
## 16 Queens     perc_College             0.115    0.0437  8.82e-  3 yes

#edu and household income
model_edu_and_householdIncome <- lm(log(Median_Household_Income_2019)~
                                      perc_less_HS+
                                      perc_HS+
                                      perc_College,
                                      #perc_Bachelor_More,
                                      data = health_ins_dat)
summary(model_edu_and_householdIncome)

## 
## Call:
## lm(formula = log(Median_Household_Income_2019) ~ perc_less_HS + 
##     perc_HS + perc_College, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.20851 -0.20553  0.03738  0.24130  1.17460 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.26327    0.02797 438.453  < 2e-16 ***
## perc_less_HS -4.67991    0.11184 -41.843  < 2e-16 ***
## perc_HS      -2.02176    0.13280 -15.224  < 2e-16 ***
## perc_College -0.86633    0.16862  -5.138 3.06e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3402 on 1972 degrees of freedom
## Multiple R-squared:  0.5987, Adjusted R-squared:  0.5981 
## F-statistic: 980.7 on 3 and 1972 DF,  p-value: < 2.2e-16

Plot

#scatterplot to demonstrate how education attainment of high school diploma or less (people over 25 years old) correlates with health insurance access in each county
health_ins_dat%>%
  ggplot(aes(x=perc_HS_or_Less, y=perc_with_ins, col=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#scatterplot to demonstrate how education attainment of bachelor degree or more (people over 25 years old) correlates with health insurance access in each county
health_ins_dat%>%
  ggplot(aes(x=perc_Bachelor_More, y=perc_with_ins, col=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#scatterplot to demonstrate how education attainment of bachelor degree or more (people over 25 years old) correlates with public health insurance access in each county
health_ins_dat%>%
  ggplot(aes(x=perc_Bachelor_More, y=perc_with_ins_public, col=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.05)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

4: Employment Sectors

$ … $ perc_Service … $ perc_Sales … $

Data Exploration

health_ins_dat_Employment<-health_ins_dat%>%
  pivot_longer(cols = perc_Management_Professional:perc_Farm_Industrial, 
               names_to = "Employment_Sector",
               values_to = "Employment_Pop")%>%
  select(c("Geo_County","Employment_Sector", "Employment_Pop", "Total_Pop"))

health_ins_dat_Employment|>
  group_by(Geo_County)|>
  ggplot(aes(x=Employment_Sector,y=Employment_Pop, fill=Employment_Sector)) +
  facet_wrap(~Geo_County,nrow=4)+
  scale_fill_brewer(palette = "Set2")+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  theme_minimal()+
  coord_flip()

  #scale_x_discrete(guide = guide_axis(angle = 90))
  #theme(axis.text.x=element_text(angle=90, hjust=1))

Linear Regression

model_employment <- lm(perc_with_ins~perc_Farm_Industrial+
                                   perc_Service+
                                   perc_Sales,
                                   data = health_ins_dat)
summary(model_employment)

## 
## Call:
## lm(formula = perc_with_ins ~ perc_Farm_Industrial + perc_Service + 
##     perc_Sales, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40968 -0.01968  0.00692  0.02913  0.12848 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.967450   0.004997 193.612  < 2e-16 ***
## perc_Farm_Industrial -0.128333   0.010423 -12.313  < 2e-16 ***
## perc_Service         -0.149520   0.012912 -11.580  < 2e-16 ***
## perc_Sales            0.072044   0.020771   3.468 0.000535 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04868 on 1972 degrees of freedom
## Multiple R-squared:  0.2445, Adjusted R-squared:  0.2433 
## F-statistic: 212.7 on 3 and 1972 DF,  p-value: < 2.2e-16

#Impact of employment status and health insurance overage
lm_employment <-function(data){
  fit <- lm(perc_with_ins~perc_employed_16_over, data=data)
  sum.fit <- summary(fit)
  
  data.frame(slope=sum.fit$coefficients[2,"Estimate"],
             se = sum.fit$coefficients[2, "Std. Error"],
             p_value = sum.fit$coefficients[2, "Pr(>|t|)"])
}

health_ins_dat|>
  group_by(Geo_County)|>
  do(lm_employment(.))

## # A tibble: 4 × 4
## # Groups:   Geo_County [4]
##   Geo_County   slope     se       p_value
##   <chr>        <dbl>  <dbl>         <dbl>
## 1 Bronx       0.108  0.0509 0.0340       
## 2 Kings       0.0851 0.0423 0.0446       
## 3 New_York    0.420  0.0674 0.00000000183
## 4 Queens     -0.0368 0.0790 0.641

Plot

#scatterplot to demonstrate how employment section (people over 16 years old) correlates with health insurance access in each county

###Farm & Industrial###
health_ins_dat%>%
  ggplot(aes(x=perc_Farm_Industrial, y=perc_with_ins, col=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

###Service###
health_ins_dat%>%
  ggplot(aes(x=perc_Service, y=perc_with_ins, col=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

###Sales###
health_ins_dat%>%
  ggplot(aes(x=perc_Sales, y=perc_with_ins, col=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

###Management_Professional###
health_ins_dat%>%
  ggplot(aes(x=perc_Management_Professional, y=perc_with_ins, col=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

5: Foreign Origin Population

Data Exploration

#distribution of Foreign Origin Population with Citizenship
health_ins_dat|>
  ggplot(aes(x = Geo_County, y = perc_Origin_FO_Citizen, fill = Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "RdBu")+
  theme_minimal()

#distribution of Foreign Origin Population with NO Citizenship
health_ins_dat|>
  ggplot(aes(x = Geo_County, y = perc_Origin_FO_NCitizen, fill = Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "RdBu")+
  theme_minimal()

Linear Regression

Foreign Origin Population with Citizenship

# Foreign Origin - US Citizen
model_foreign_citizen <- lm(perc_with_ins~perc_Origin_FO_Citizen, data=health_ins_dat)
summary(model_foreign_citizen)

## 
## Call:
## lm(formula = perc_with_ins ~ perc_Origin_FO_Citizen, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.47038 -0.02149  0.01150  0.03751  0.09212 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.933020   0.003019 309.035  < 2e-16 ***
## perc_Origin_FO_Citizen -0.070222   0.012330  -5.695 1.42e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05553 on 1974 degrees of freedom
## Multiple R-squared:  0.01617,    Adjusted R-squared:  0.01567 
## F-statistic: 32.44 on 1 and 1974 DF,  p-value: 1.417e-08

# with citizenship
health_ins_dat%>%
  ggplot(aes(x=perc_Origin_FO_Citizen, y=perc_with_ins, color=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

##publicly insured?
summary(lm(perc_with_ins_public~perc_Origin_FO_Citizen, data=health_ins_dat))

## 
## Call:
## lm(formula = perc_with_ins_public ~ perc_Origin_FO_Citizen, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37849 -0.10785 -0.01704  0.09902  0.49644 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.405032   0.008386  48.299  < 2e-16 ***
## perc_Origin_FO_Citizen 0.094407   0.034248   2.757  0.00589 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1542 on 1974 degrees of freedom
## Multiple R-squared:  0.003835,   Adjusted R-squared:  0.00333 
## F-statistic: 7.599 on 1 and 1974 DF,  p-value: 0.005894

# with citizenship - public insurance
health_ins_dat%>%
  ggplot(aes(x=perc_Origin_FO_Citizen, y=perc_with_ins_public, color=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.9)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

Foreign Origin - NOT US Citizen

# Foreign Origin - NOT US Citizen
summary(lm(perc_with_ins~perc_Origin_FO_NCitizen, data=health_ins_dat))

## 
## Call:
## lm(formula = perc_with_ins ~ perc_Origin_FO_NCitizen, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.48674 -0.02022  0.00590  0.02761  0.11241 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.975957   0.002059   474.0   <2e-16 ***
## perc_Origin_FO_NCitizen -0.367429   0.011238   -32.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04509 on 1974 degrees of freedom
## Multiple R-squared:  0.3513, Adjusted R-squared:  0.351 
## F-statistic:  1069 on 1 and 1974 DF,  p-value: < 2.2e-16

# without citizenship
health_ins_dat%>%
  ggplot(aes(x=perc_Origin_FO_NCitizen, y=perc_with_ins, color=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

##publicly insured?
summary(lm(perc_with_ins_public~perc_Origin_FO_NCitizen, data=health_ins_dat))

## 
## Call:
## lm(formula = perc_with_ins_public ~ perc_Origin_FO_NCitizen, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39618 -0.09686 -0.01675  0.08509  0.54753 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.364351   0.006874   53.00   <2e-16 ***
## perc_Origin_FO_NCitizen 0.387096   0.037515   10.32   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1505 on 1974 degrees of freedom
## Multiple R-squared:  0.05118,    Adjusted R-squared:  0.05069 
## F-statistic: 106.5 on 1 and 1974 DF,  p-value: < 2.2e-16

# without citizenship - public insurance
health_ins_dat%>%
  ggplot(aes(x=perc_Origin_FO_NCitizen, y=perc_with_ins_public, color=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.99)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

* More foreign origin and not citizen lead to more higher decrease in % population with health insurance coverage, and a much higher increase in % population with public health insurance coverage, comparing to foreign origin with citizenship. + 1 % population increase in foreign origin and not citizen is correlated with -0.367429% decrease in % population with health insurance coverage, and, 0.387096% increase in population publicly insured.

6. Poverty

Ho1: % population in poverty is not correlated with % of children covered in health insurance
Ha1: % population in poverty is correlated with % of children covered in health insurance
Ho2: % population in poverty is not correlated with % of children covered in public health insurance
Ha2: % population in poverty is correlated with % of children covered in public health insurance

Data Exploration

#distribution of Foreign Origin Population with Citizenship
health_ins_dat|>
  ggplot(aes(x = Geo_County, y = perc_Poverty, fill = Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "RdBu")+
  theme_minimal()

## Warning: Removed 115 rows containing non-finite values (`stat_boxplot()`).

Linear Regression

# health insurance enrollment rate children under 18 vs poverty 
model_ins_poverty <- lm(perc_with_ins~
        perc_Poverty,
        data = health_ins_dat)

summary(model_ins_poverty)

## 
## Call:
## lm(formula = perc_with_ins ~ perc_Poverty, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34102 -0.02287  0.00918  0.03594  0.09952 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.931238   0.002097 444.086  < 2e-16 ***
## perc_Poverty -0.077137   0.010521  -7.332 3.37e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0527 on 1859 degrees of freedom
##   (115 observations deleted due to missingness)
## Multiple R-squared:  0.0281, Adjusted R-squared:  0.02758 
## F-statistic: 53.76 on 1 and 1859 DF,  p-value: 3.37e-13

# plot
health_ins_dat%>%
  ggplot(aes(x=perc_Poverty, y=perc_with_ins, color=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.5)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#public insurance
model_ins_poverty_public <- lm(perc_with_ins_public~
        perc_Poverty,
        data = health_ins_dat)

summary(model_ins_poverty_public)

## 
## Call:
## lm(formula = perc_with_ins_public ~ perc_Poverty, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37811 -0.06572  0.00469  0.06866  0.53162 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.267372   0.004079   65.54   <2e-16 ***
## perc_Poverty 0.983018   0.020467   48.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1025 on 1859 degrees of freedom
##   (115 observations deleted due to missingness)
## Multiple R-squared:  0.5538, Adjusted R-squared:  0.5535 
## F-statistic:  2307 on 1 and 1859 DF,  p-value: < 2.2e-16

# plot
health_ins_dat%>%
  ggplot(aes(x=perc_Poverty, y=perc_with_ins_public, color=Geo_County))+
  facet_wrap(~Geo_County)+
  geom_point(size = 0.8, alpha=.2)+
  scale_color_brewer(palette = "RdBu")+
  geom_smooth(method = lm)+
  stat_regline_equation(label.x=0, label.y=.0)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

* We reject Ho1 at 95% significance level, % population in poverty is correlated to % of population covered with health insurance + 1% increase in % population in poverty leads to 0.39 % increase of % of population covered with health insurance. R-squared: 0.02104

We reject Ho2 at 95% significance level, % population in poverty is correlated to % of population covered with public health insurance
- 1% increase in % population in poverty leads to 0.17 % increase of % of population covered with health insurance. R-squared: 0.007233

Multivariate Linear Regression

Goal: In this part of the study, I am building a model to analyze if the race factor is controlled, factors that positively correlated with health insurance access, including education attainment, employment sector, median household income, and native born citizens, will make the population more likely to obtain health insurance.

Linear Model for Each Factor

###regression model for each categories
#first variable: race

tidy(summary(model_race))

## # A tibble: 6 × 5
##   term                     estimate std.error statistic   p.value
##   <chr>                       <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)                0.978    0.00334   293.    0        
## 2 perc_Pop_Black            -0.0419   0.00454    -9.23  6.49e- 20
## 3 perc_Pop_Asian            -0.128    0.00747   -17.1   1.75e- 61
## 4 perc_Pop_Hispanic_Latino  -0.120    0.00530   -22.7   9.03e-102
## 5 perc_Pop_More_Races        0.0697   0.0561      1.24  2.14e-  1
## 6 perc_Pop_Other_Races      -0.0233   0.0378     -0.614 5.39e-  1

#second variable: education (bachelor or higher degree)
model_minority_edu <-lm(perc_with_ins ~
                        perc_Bachelor_More,
                        data = health_ins_dat)
tidy(summary(model_minority_edu))

## # A tibble: 2 × 5
##   term               estimate std.error statistic  p.value
##   <chr>                 <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)           0.886   0.00212     417.  0       
## 2 perc_Bachelor_More    0.119   0.00674      17.7 4.57e-65

#third variable: employment sector
model_minority_employment <-lm(perc_with_ins ~
                            perc_Management_Professional+
                            perc_Sales,
                        data = health_ins_dat)
tidy(summary(model_minority_employment))

## # A tibble: 3 × 5
##   term                         estimate std.error statistic   p.value
##   <chr>                           <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)                     0.819   0.00535    153.   0        
## 2 perc_Management_Professional    0.153   0.00639     23.9  3.32e-111
## 3 perc_Sales                      0.214   0.0214       9.99 5.68e- 23

#fourth variable: median household income
tidy(summary(model_householdIncome))

## # A tibble: 2 × 5
##   term                              estimate std.error statistic  p.value
##   <chr>                                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)                         0.490    0.0245       20.0 3.09e-81
## 2 log(Median_Household_Income_2019)   0.0381   0.00219      17.4 1.99e-63

#fifth variable: foreign origin with citizenship
model_minority_citizen <-lm(perc_with_ins ~
                        perc_Origin_Native_Born,
                        data = health_ins_dat)
tidy(summary(model_minority_citizen))

## # A tibble: 2 × 5
##   term                    estimate std.error statistic  p.value
##   <chr>                      <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)                0.814   0.00481     169.  0       
## 2 perc_Origin_Native_Born    0.167   0.00757      22.1 6.17e-97

All variables combining together

# race 

model_m_p <-lm(perc_with_ins_public ~
                        perc_Pop_Black +
                        perc_Pop_Asian+
                        perc_Pop_Hispanic_Latino+
                        perc_Pop_More_Races+
                        perc_Pop_Other_Races,
                        data = health_ins_dat)

# race and median household income 
model_m2 <-lm(perc_with_ins ~
                        perc_Pop_Black +
                        perc_Pop_Asian+
                        perc_Pop_Hispanic_Latino+
                        perc_Pop_More_Races+
                        perc_Pop_Other_Races+
                        log(Median_Household_Income_2019),
                        data = health_ins_dat)
tidy(summary(model_m2))

## # A tibble: 7 × 5
##   term                              estimate std.error statistic   p.value
##   <chr>                                <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)                         0.827    0.0329     25.2   2.51e-121
## 2 perc_Pop_Black                     -0.0318   0.00501    -6.35  2.64e- 10
## 3 perc_Pop_Asian                     -0.117    0.00781   -15.0   3.84e- 48
## 4 perc_Pop_Hispanic_Latino           -0.100    0.00687   -14.6   9.03e- 46
## 5 perc_Pop_More_Races                 0.0506   0.0559      0.905 3.66e-  1
## 6 perc_Pop_Other_Races               -0.0296   0.0377     -0.787 4.32e-  1
## 7 log(Median_Household_Income_2019)   0.0126   0.00274     4.61  4.35e-  6

model_m2_p <-lm(perc_with_ins_public ~
                        perc_Pop_Black +
                        perc_Pop_Asian+
                        perc_Pop_Hispanic_Latino+
                        perc_Pop_More_Races+
                        perc_Pop_Other_Races+
                        log(Median_Household_Income_2019),
                        data = health_ins_dat)

# race and median household income and education attainment(bachelor degree or more) 
model_m3 <-lm(perc_with_ins ~
                      perc_Pop_Black +
                      perc_Pop_Asian+
                      perc_Pop_Hispanic_Latino+
                      perc_Pop_More_Races+
                      perc_Pop_Other_Races+
                      log(Median_Household_Income_2019)+
                      perc_Bachelor_More,
                      data = health_ins_dat)
tidy(summary(model_m3))

## # A tibble: 8 × 5
##   term                              estimate std.error statistic   p.value
##   <chr>                                <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)                        0.903     0.0396     22.8   3.61e-102
## 2 perc_Pop_Black                    -0.0246    0.00544    -4.53  6.35e-  6
## 3 perc_Pop_Asian                    -0.111     0.00800   -13.9   1.00e- 41
## 4 perc_Pop_Hispanic_Latino          -0.0939    0.00709   -13.2   2.00e- 38
## 5 perc_Pop_More_Races                0.0190    0.0566      0.336 7.37e-  1
## 6 perc_Pop_Other_Races               0.00443   0.0389      0.114 9.09e-  1
## 7 log(Median_Household_Income_2019)  0.00460   0.00362     1.27  2.04e-  1
## 8 perc_Bachelor_More                 0.0402    0.0119      3.39  7.11e-  4

model_m3_p <-lm(perc_with_ins_public ~
                      perc_Pop_Black +
                      perc_Pop_Asian+
                      perc_Pop_Hispanic_Latino+
                      perc_Pop_More_Races+
                      perc_Pop_Other_Races+
                      log(Median_Household_Income_2019)+
                      perc_Bachelor_More,
                      data = health_ins_dat)

# race and median household income and education attainment(bachelor degree or more)  and employment sectors
model_m4 <-lm(perc_with_ins ~
                     perc_Pop_Black +
                     perc_Pop_Asian+
                     perc_Pop_Hispanic_Latino+
                     perc_Pop_More_Races+
                     perc_Pop_Other_Races+
                     log(Median_Household_Income_2019)+
                     perc_Bachelor_More+
                     perc_Management_Professional+
                     perc_Sales,
                     data = health_ins_dat)
tidy(summary(model_m4))

## # A tibble: 10 × 5
##    term                               estimate std.error statistic  p.value
##    <chr>                                 <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)                        0.868      0.0389    22.3    2.40e-98
##  2 perc_Pop_Black                    -0.00166    0.00578   -0.287  7.74e- 1
##  3 perc_Pop_Asian                    -0.0828     0.00840   -9.85   2.15e-22
##  4 perc_Pop_Hispanic_Latino          -0.0536     0.00803   -6.68   3.08e-11
##  5 perc_Pop_More_Races                0.0273     0.0552     0.495  6.21e- 1
##  6 perc_Pop_Other_Races               0.0437     0.0383     1.14   2.55e- 1
##  7 log(Median_Household_Income_2019) -0.000155   0.00357   -0.0435 9.65e- 1
##  8 perc_Bachelor_More                -0.0197     0.0170    -1.16   2.47e- 1
##  9 perc_Management_Professional       0.130      0.0176     7.39   2.08e-13
## 10 perc_Sales                         0.181      0.0221     8.16   5.71e-16

Observation: In model_minority4, after adding the profession, we can observe that. There is multicollinearity between management professional and education level, and, management professional and race.

#% employment sector

#linear regression between bachelor degree or more with management profession and sales 
tidy(summary(lm(perc_Bachelor_More~perc_Management_Professional+perc_Sales, data=health_ins_dat)))

## # A tibble: 3 × 5
##   term                         estimate std.error statistic  p.value
##   <chr>                           <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)                   -0.0485   0.00738     -6.57 6.39e-11
## 2 perc_Management_Professional   0.891    0.00882    101.   0       
## 3 perc_Sales                    -0.0969   0.0296      -3.28 1.07e- 3

#linear regression between median household income (log transformed) with management profession and sales
tidy(summary(lm(log(Median_Household_Income_2019)~perc_Management_Professional+perc_Sales, data=health_ins_dat)))

## # A tibble: 3 × 5
##   term                         estimate std.error statistic    p.value
##   <chr>                           <dbl>     <dbl>     <dbl>      <dbl>
## 1 (Intercept)                    10.2      0.0362    281.   0         
## 2 perc_Management_Professional    2.41     0.0432     55.7  0         
## 3 perc_Sales                      0.690    0.145       4.75 0.00000214

## race and median household income and education attainment(bachelor degree or more) and % population in poverty
model_m5 <-lm(perc_with_ins ~
                     perc_Pop_Black +
                     perc_Pop_Asian+
                     perc_Pop_Hispanic_Latino+
                     perc_Pop_More_Races+
                     perc_Pop_Other_Races+
                     log(Median_Household_Income_2019)+
                     perc_Bachelor_More+
                     perc_Poverty,
                     #perc_Management_Professional+
                     #perc_Sales,
                     data = health_ins_dat)
tidy(summary(model_m5))

## # A tibble: 9 × 5
##   term                              estimate std.error statistic  p.value
##   <chr>                                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)                        0.762     0.0530     14.4   1.77e-44
## 2 perc_Pop_Black                    -0.0221    0.00536    -4.13  3.82e- 5
## 3 perc_Pop_Asian                    -0.0984    0.00818   -12.0   3.88e-32
## 4 perc_Pop_Hispanic_Latino          -0.0876    0.00700   -12.5   1.56e-34
## 5 perc_Pop_More_Races                0.0133    0.0552      0.242 8.09e- 1
## 6 perc_Pop_Other_Races              -0.00804   0.0386     -0.208 8.35e- 1
## 7 log(Median_Household_Income_2019)  0.0163    0.00466     3.49  4.89e- 4
## 8 perc_Bachelor_More                 0.0311    0.0119      2.62  8.91e- 3
## 9 perc_Poverty                       0.0626    0.0149      4.21  2.69e- 5

model_m5_p <-lm(perc_with_ins_public ~
                     perc_Pop_Black +
                     perc_Pop_Asian+
                     perc_Pop_Hispanic_Latino+
                     perc_Pop_More_Races+
                     perc_Pop_Other_Races+
                     log(Median_Household_Income_2019)+
                     perc_Bachelor_More+
                     perc_Poverty,
                     #perc_Management_Professional+
                     #perc_Sales,
                     data = health_ins_dat)

## race and median household income and education attainment(bachelor degree or more) and % population in poverty and % population with US citizenship
model_m6 <-lm(perc_with_ins ~
                     perc_Pop_Black +
                     perc_Pop_Asian+
                     perc_Pop_Hispanic_Latino+
                     perc_Pop_More_Races+
                     perc_Pop_Other_Races+
                     log(Median_Household_Income_2019)+
                     perc_Bachelor_More+
                     #perc_Poverty+
                     #perc_Management_Professional+
                     #perc_Sales,
                     perc_Origin_Native_Born,
                     #perc_Origin_FO_Citizen,
                     data = health_ins_dat)
tidy(summary(model_m6))

## # A tibble: 9 × 5
##   term                              estimate std.error statistic  p.value
##   <chr>                                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)                        0.774     0.0395     19.6   3.35e-78
## 2 perc_Pop_Black                    -0.00832   0.00539    -1.54  1.23e- 1
## 3 perc_Pop_Asian                    -0.0352    0.00977    -3.60  3.28e- 4
## 4 perc_Pop_Hispanic_Latino          -0.0730    0.00702   -10.4   1.08e-24
## 5 perc_Pop_More_Races                0.0520    0.0545      0.955 3.40e- 1
## 6 perc_Pop_Other_Races               0.0551    0.0377      1.46  1.43e- 1
## 7 log(Median_Household_Income_2019)  0.00799   0.00349     2.29  2.24e- 2
## 8 perc_Bachelor_More                 0.0163    0.0116      1.41  1.58e- 1
## 9 perc_Origin_Native_Born            0.122     0.00972    12.6   6.33e-35

model_m6_p <-lm(perc_with_ins_public ~
                     perc_Pop_Black +
                     perc_Pop_Asian+
                     perc_Pop_Hispanic_Latino+
                     perc_Pop_More_Races+
                     perc_Pop_Other_Races+
                     log(Median_Household_Income_2019)+
                     perc_Bachelor_More+
                     #perc_Poverty+
                     #perc_Management_Professional+
                     #perc_Sales,
                     perc_Origin_Native_Born,
                     #perc_Origin_FO_NCitizen,
                     data = health_ins_dat)

#create a combined regression table_ insurance
stargazer(model_race, model_m2, model_m3, model_m6,
          single.row=TRUE,type="text")

## 
## =========================================================================================================================================
##                                                                             Dependent variable:                                          
##                                   -------------------------------------------------------------------------------------------------------
##                                                                                perc_with_ins                                             
##                                              (1)                       (2)                       (3)                       (4)           
## -----------------------------------------------------------------------------------------------------------------------------------------
## perc_Pop_Black                        -0.042*** (0.005)         -0.032*** (0.005)         -0.025*** (0.005)          -0.008 (0.005)      
## perc_Pop_Asian                        -0.128*** (0.007)         -0.117*** (0.008)         -0.111*** (0.008)         -0.035*** (0.010)    
## perc_Pop_Hispanic_Latino              -0.120*** (0.005)         -0.100*** (0.007)         -0.094*** (0.007)         -0.073*** (0.007)    
## perc_Pop_More_Races                     0.070 (0.056)             0.051 (0.056)             0.019 (0.057)             0.052 (0.054)      
## perc_Pop_Other_Races                   -0.023 (0.038)            -0.030 (0.038)             0.004 (0.039)             0.055 (0.038)      
## log(Median_Household_Income_2019)                               0.013*** (0.003)            0.005 (0.004)            0.008** (0.003)     
## perc_Bachelor_More                                                                        0.040*** (0.012)            0.016 (0.012)      
## perc_Origin_Native_Born                                                                                             0.122*** (0.010)     
## Constant                              0.978*** (0.003)          0.827*** (0.033)          0.903*** (0.040)          0.774*** (0.039)     
## -----------------------------------------------------------------------------------------------------------------------------------------
## Observations                                1,976                     1,976                     1,976                     1,976          
## R2                                          0.258                     0.266                     0.270                     0.324          
## Adjusted R2                                 0.256                     0.264                     0.268                     0.322          
## Residual Std. Error                   0.048 (df = 1970)         0.048 (df = 1969)         0.048 (df = 1968)         0.046 (df = 1967)    
## F Statistic                       137.008*** (df = 5; 1970) 118.882*** (df = 6; 1969) 104.085*** (df = 7; 1968) 118.102*** (df = 8; 1967)
## =========================================================================================================================================
## Note:                                                                                                         *p<0.1; **p<0.05; ***p<0.01

#create a combined regression table_ public insurance
stargazer(model_m_p, model_m2_p, model_m3_p,model_m6_p,
          single.row=TRUE,type="text")

## 
## =========================================================================================================================================
##                                                                             Dependent variable:                                          
##                                   -------------------------------------------------------------------------------------------------------
##                                                                            perc_with_ins_public                                          
##                                              (1)                       (2)                       (3)                       (4)           
## -----------------------------------------------------------------------------------------------------------------------------------------
## perc_Pop_Black                        0.156*** (0.012)          -0.039*** (0.009)         -0.075*** (0.009)         -0.072*** (0.010)    
## perc_Pop_Asian                        0.167*** (0.020)          -0.048*** (0.014)         -0.078*** (0.014)         -0.068*** (0.018)    
## perc_Pop_Hispanic_Latino              0.364*** (0.014)          -0.031** (0.012)          -0.061*** (0.012)         -0.059*** (0.013)    
## perc_Pop_More_Races                   -0.821*** (0.150)         -0.450*** (0.099)         -0.296*** (0.098)         -0.292*** (0.098)    
## perc_Pop_Other_Races                    0.127 (0.101)           0.252*** (0.067)            0.086 (0.068)             0.092 (0.068)      
## log(Median_Household_Income_2019)                               -0.246*** (0.005)         -0.207*** (0.006)         -0.206*** (0.006)    
## perc_Bachelor_More                                                                        -0.196*** (0.021)         -0.199*** (0.021)    
## perc_Origin_Native_Born                                                                                               0.015 (0.018)      
## Constant                              0.279*** (0.009)          3.213*** (0.058)          2.846*** (0.069)          2.829*** (0.071)     
## -----------------------------------------------------------------------------------------------------------------------------------------
## Observations                                1,976                     1,976                     1,976                     1,976          
## R2                                          0.304                     0.698                     0.711                     0.711          
## Adjusted R2                                 0.303                     0.697                     0.710                     0.710          
## Residual Std. Error                   0.129 (df = 1970)         0.085 (df = 1969)         0.083 (df = 1968)         0.083 (df = 1967)    
## F Statistic                       172.424*** (df = 5; 1970) 757.295*** (df = 6; 1969) 691.492*** (df = 7; 1968) 605.083*** (df = 8; 1967)
## =========================================================================================================================================
## Note:                                                                                                         *p<0.1; **p<0.05; ***p<0.01

PART I CONCLUSION

As we can see in the table above that investigate % popualtion with insurance, when we have race as the base independent variable, and add more explanatory independent variables, the impact of race on a person’s likelihood to enroll in healthcare plan decreases.
Employment sectors and median household income is highly correlated at 0.8, thus I excluded the employment sectors in the linear model.
Finding in the final multivariate regression:
- % Asian population’s correlation coefficient % population without health insurance is 4.4 times higher than % black population;
- % Hispanic or Latino’s correlation coefficient with % population without health insurance is twice as much as the coefficient of % Asian population
- more native born population lead to more people with health insurance
- bachelor or high education degree does not correlated at the 95% significance level
Citizenship has a strong influence on a person’s likelihood to enroll in healthcare plan
- Foreign origin population is less likely to have healthcare plan, and foreign origin population without citizenship is much less likely.
- More foreign origin and not citizen lead to more higher decrease in % population with health insurance coverage, and a much higher increase in % population with public health insurance coverage, comparing to foreign origin with citizenship.
- % population in poverty is positively correlated with % population with health insurance
Although we can look at the four counties together, the variables play very different role when analyzing each county separately.

Part II: Children’s health insurance access

This part of the study investigates the correlations between children’s (under 18) access to health coverage and social-economic factors in NYC Bronx, Queens, Kings, and Manhattan at the census tract level.

*Unable to access health insurance may be positively related to lower household income/poverty, minority races, school enrollment, employment status (if parents are employed), and citizenship status for the foreign origin population. The study also look into those factor’s correlations with public insurance.

*Research into high school enrollment in relationship to:

Healthcare_coverage for children under 18 (y) = constant + correlation_coefficient * % school enrollment + correlation_coefficient * %single parents + correlation_coefficient * %employed + correlation_coefficient * %foreign origin non citizens
Ho: healthcare coverage is not related to poverty status / single parenting / employment status / foreign origin non citizens
Ha: healthcare coverage is related to poverty status / single parenting / employment status / foreign origin non citizens

General Summary

# children with health insurance in each county
health_ins_dat|>
  ggplot(aes(x = Geo_County, y = perc_children_with_ins, fill = Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "Dark2")+
  theme_minimal()+
  labs(title = "Children's Health Insurance Access for Each County",
      subtitle = "% of children under 18 with health insurance",
      caption = "Data source: ACS 2015-2019",
      y = "Percentage% children with health insurance", x = "County",
      tag = "Summary",
      fill ='County')

Comparing to the total population, children have much higher % covered with health insurance. New York county is still the top out of all.

#public insurance 
health_ins_dat|>
  ggplot(aes(x = Geo_County, y = perc_children_with_ins_public, fill = Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "Dark2")+
  theme_minimal()+
  labs(title = "Children's Health Insurance Access for Each County - Public Insurance",
    subtitle = "% of children under 18 with public health insurance",
    caption = "Data source: ACS 2015-2019",
    y = "Percentage% children with public health insurance", x = "County",
    tag = "Summary",
    fill ='County')

Bronx census tracts on average have the most % of children covered with public insurance while New York census tracts have the largest range and lowest median value for % of children covered with public insurance

#Is children having insurance correlated with adult with insurance?
#children with insurance vs adult population with insurance 
summary(lm(perc_children_with_ins~perc_adult_with_ins, data=health_ins_dat))

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_adult_with_ins, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31433 -0.01096  0.01364  0.02112  0.08230 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          0.87123    0.01058  82.324   <2e-16 ***
## perc_adult_with_ins  0.11562    0.01172   9.863   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03595 on 1973 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.04698,    Adjusted R-squared:  0.0465 
## F-statistic: 97.27 on 1 and 1973 DF,  p-value: < 2.2e-16

#public insurance - adult without insurance
summary(lm(perc_children_with_ins_public~perc_adult_without_ins, data=health_ins_dat))

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_adult_without_ins, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.05933 -0.17987 -0.00451  0.17527  0.61918 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.314300   0.009175   34.25   <2e-16 ***
## perc_adult_without_ins 1.607871   0.075560   21.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2317 on 1972 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1867, Adjusted R-squared:  0.1863 
## F-statistic: 452.8 on 1 and 1972 DF,  p-value: < 2.2e-16

There is high correlation between % adult without insurance and % of children covered in public insurance
- 1% increase in % adult with insurance lead to 0.11% increase of % of children covered in public insurance. R-squared: 0.04698
- 1% increase in % adult without insurance leads to 1.6% increase of % of children covered in public insurance. R-squared: 0.1867.

1. School Enrollment

Ho1: % children enrolled in school is not correlated with % of children covered in health insurance
Ha1: % children enrolled in school is correlated with % of children covered in health insurance
Ho2: % children enrolled in school is not correlated with % of children covered in public health insurance
Ha2: % children enrolled in school is correlated with % of children covered in public health insurance

Data Exploration

health_ins_dat|>
  ggplot(aes(x=Geo_County, y=perc_children_in_school, fill=Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "Dark2")+
  theme_minimal()

## Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).

Linear Regression

# health insurance enrollment rate children under 18 vs school enrollment
model_child_with_ins_school_enrollment <- lm(perc_children_with_ins ~
        perc_children_in_school,
        data = health_ins_dat)
summary(model_child_with_ins_school_enrollment)

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_children_in_school, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35254 -0.01214  0.01407  0.02464  0.02660 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.977747   0.005533 176.705   <2e-16 ***
## perc_children_in_school -0.003281   0.007337  -0.447    0.655    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03682 on 1973 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0001013,  Adjusted R-squared:  -0.0004055 
## F-statistic: 0.1999 on 1 and 1973 DF,  p-value: 0.6548

#public insurance insurance 
model_child_with_ins_school_enrollment_public <- lm(perc_children_with_ins_public ~
        perc_children_in_school,
        data = health_ins_dat)
summary(model_child_with_ins_school_enrollment_public)

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_children_in_school, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57550 -0.20285  0.01096  0.20856  0.55584 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.18399    0.03803   4.837 1.42e-06 ***
## perc_children_in_school  0.39025    0.05044   7.737 1.61e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2531 on 1972 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.02946,    Adjusted R-squared:  0.02897 
## F-statistic: 59.87 on 1 and 1972 DF,  p-value: 1.608e-14

We fail to reject Ho1 at 95% significance level, children enrolled in school is not correlated to % of children covered in health insurance
We reject Ho2 at 95% significance level, children enrolled in school is correlated to % of children covered in public health insurance
- 1% increase in children enrolled in school leads to 0.39 % increase of % of children covered in public insurance. R-squared: 0.02946

Plot

#insurance
health_ins_dat|>
  ggplot(aes(y=perc_children_with_ins,x=perc_children_in_school, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

#public insurance
health_ins_dat|>
  ggplot(aes(y=perc_children_with_ins_public,x=perc_children_in_school, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=-.4)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

2. Employment / Unemployment (age over 16)

Ho1: % adult employed is not correlated with % of children covered in health insurance
Ha1: % adult employed is correlated with % of children covered in health insurance
Ho2: % adult unemployed is not correlated with % of children covered in public health insurance
Ha2: % adult unemployed in school is correlated with % of children covered in public health insurance

Data Exploration

health_ins_dat|>
  ggplot(aes(x=Geo_County, y=perc_unemployed_16_over, fill=Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "Dark2")+
  theme_minimal()

Linear Regression

# health insurance enrollment rate children under 18 vs % of population over 16 employed
model_child_with_ins_employ <- lm(perc_children_with_ins~
        perc_employed_16_over,
        data = health_ins_dat)

summary(model_child_with_ins_employ)

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_employed_16_over, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35164 -0.01224  0.01373  0.02478  0.02555 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.98746    0.01809  54.579   <2e-16 ***
## perc_employed_16_over -0.01301    0.01934  -0.673    0.501    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03682 on 1973 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0002295,  Adjusted R-squared:  -0.0002772 
## F-statistic: 0.453 on 1 and 1973 DF,  p-value: 0.501

#public insurance
model_child_with_ins_employ_public <- lm(perc_children_with_ins_public~
        perc_unemployed_16_over,
        data = health_ins_dat)

summary(model_child_with_ins_employ_public)

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_unemployed_16_over, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62923 -0.18439  0.00284  0.18037  0.59965 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.309568   0.009548   32.42   <2e-16 ***
## perc_unemployed_16_over 2.533655   0.122278   20.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2328 on 1972 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1788, Adjusted R-squared:  0.1784 
## F-statistic: 429.3 on 1 and 1972 DF,  p-value: < 2.2e-16

We fail to reject Ho1 at 95% significance level, % adult employed is not correlated to % of children covered in health insurance
We reject Ho2 at 95% significance level, % adult unemployed is correlated to % of children covered in public health insurance
- 1% increase adult unemployed leads to 2.5 % increase of % of children covered in public insurance. R-squared: 0.1788

Plot

# plot
health_ins_dat|>
  ggplot(aes(y=perc_children_with_ins,x=perc_unemployed_16_over, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation()+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# plot
health_ins_dat|>
  ggplot(aes(y=perc_children_with_ins_public,x=perc_unemployed_16_over, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=-.4)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

3. Single Parents

Ho1: % single parents is not correlated with % of children covered in health insurance
Ha1: % single parents is correlated with % of children covered in health insurance
Ho2: % single parents is not correlated with % of children covered in public health insurance
Ha2: % single parents is correlated with % of children covered in public health insurance

Linear Regression

# health insurance enrollment rate children under 18 vs percentage of children with single parents 
model_child_with_ins_single_parent <- lm(perc_children_with_ins~
        perc_children_with_single_parents,
        data = health_ins_dat)

summary(model_child_with_ins_single_parent)

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_children_with_single_parents, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35229 -0.01202  0.01396  0.02463  0.02527 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.9755028  0.0014886 655.306   <2e-16 ***
## perc_children_with_single_parents -0.0007703  0.0047165  -0.163     0.87    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03682 on 1973 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  1.352e-05,  Adjusted R-squared:  -0.0004933 
## F-statistic: 0.02667 on 1 and 1973 DF,  p-value: 0.8703

# public - not significant
model_child_with_ins_single_parent_public <- lm(perc_children_with_ins~
        perc_children_with_single_parents,
        data = health_ins_dat)

summary(model_child_with_ins_single_parent_public)

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_children_with_single_parents, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35229 -0.01202  0.01396  0.02463  0.02527 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.9755028  0.0014886 655.306   <2e-16 ***
## perc_children_with_single_parents -0.0007703  0.0047165  -0.163     0.87    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03682 on 1973 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  1.352e-05,  Adjusted R-squared:  -0.0004933 
## F-statistic: 0.02667 on 1 and 1973 DF,  p-value: 0.8703

We fail to reject Ho1 and Ho2 at 95% significance level, % single parents is not correlated to % of children covered in health insurance or public health insurance.

4. Median Household Income

Ho1: % single parents is not correlated with % of children covered in health insurance
Ha1: % single parents is correlated with % of children covered in health insurance
Ho2: % single parents is not correlated with % of children covered in public health insurance
Ha2: % single parents is correlated with % of children covered in public health insurance

Linear Regression

# health insurance enrollment rate children under 18 vs median household income
model_child_with_ins_income <- lm(perc_children_with_ins~
        log(Median_Household_Income_2019),
        data = health_ins_dat)

summary(model_child_with_ins_income)

## 
## Call:
## lm(formula = perc_children_with_ins ~ log(Median_Household_Income_2019), 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34951 -0.01168  0.01429  0.02394  0.03007 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       0.931673   0.017299  53.859   <2e-16 ***
## log(Median_Household_Income_2019) 0.003894   0.001542   2.525   0.0116 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03676 on 1973 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.003221,   Adjusted R-squared:  0.002716 
## F-statistic: 6.375 on 1 and 1973 DF,  p-value: 0.01165

#public insurance
model_child_with_ins_income_public <- lm(perc_children_with_ins_public~
        log(Median_Household_Income_2019),
        data = health_ins_dat)

summary(model_child_with_ins_income_public)

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ log(Median_Household_Income_2019), 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74547 -0.09625  0.00917  0.09724  0.64170 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        4.899916   0.068275   71.77   <2e-16 ***
## log(Median_Household_Income_2019) -0.394963   0.006087  -64.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1451 on 1972 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.681,  Adjusted R-squared:  0.6808 
## F-statistic:  4210 on 1 and 1972 DF,  p-value: < 2.2e-16

We reject Ho1 at 95% significance level, median household income is positively correlated to % of children covered in health insurance
- 10% increase in median household income leads to 0.03894 % increase of % of children covered in public insurance. R-squared: 0.003221
We reject Ho2 at 95% significance level, median household income is negatively correlated to % of children covered in public health insurance
- 10% increase in median household income leads to -3.95 % decrease of % of children covered in public insurance. R-squared: 0.681

Plot

# plot
health_ins_dat|>
  ggplot(aes(y=perc_children_with_ins,x=log(Median_Household_Income_2019), col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# plot
health_ins_dat|>
  ggplot(aes(y=perc_children_with_ins_public,x=log(Median_Household_Income_2019), col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=-.4)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

5. Race

Ho1: Race is not correlated with % of children covered in health insurance
Ha1: Race is correlated with % of children covered in health insurance
Ho2: Race is not correlated with % of children covered in public health insurance
Ha2: Race is correlated with % of children covered in public health insurance

Linear Regression

# health insurance enrollment rate children under 18 vs percentage population non-white 
model_child_with_ins_race <- lm(perc_children_with_ins~
        perc_Pop_Black +
        perc_Pop_Hispanic_Latino +
        perc_Pop_Asian+
        perc_Pop_More_Races+
        perc_Pop_Other_Races,
        data = health_ins_dat)

summary(model_child_with_ins_race)

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino + 
##     perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32015 -0.01270  0.01359  0.02257  0.05356 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               0.986585   0.002508 393.379  < 2e-16 ***
## perc_Pop_Black           -0.012016   0.003405  -3.529 0.000427 ***
## perc_Pop_Hispanic_Latino -0.008109   0.003977  -2.039 0.041613 *  
## perc_Pop_Asian           -0.046934   0.005605  -8.374  < 2e-16 ***
## perc_Pop_More_Races       0.031586   0.042066   0.751 0.452820    
## perc_Pop_Other_Races      0.005837   0.028383   0.206 0.837075    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0362 on 1969 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.03555,    Adjusted R-squared:  0.0331 
## F-statistic: 14.52 on 5 and 1969 DF,  p-value: 5.421e-14

We reject Ho1 at 95% significance level, % Black, Hispanic or Latino, Asian population are negatively correlated to % of children covered in health insurance
- 1% increase in Black population comparing to white population leads to -0.012016 % decrease of % of children covered in public insurance.
- 1% increase in Hispanic or Latino population comparing to white population leads to -0.008109 % decrease of % of children covered in public insurance.
- 1% increase in Asian population comparing to white population leads to -0.046934 % decrease of % of children covered in public insurance.
- population with two or more races or other races* are not correlated to % of children covered in health insurance at 95% significance level
- R-squared: 0.03555

#public insurance
model_child_with_ins_race_public <- lm(perc_children_with_ins_public~
        perc_Pop_Black +
        perc_Pop_Hispanic_Latino +
        perc_Pop_Asian+
        perc_Pop_More_Races+
        perc_Pop_Other_Races,
        data = health_ins_dat)

summary(model_child_with_ins_race_public)

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black + 
##     perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races + 
##     perc_Pop_Other_Races, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59144 -0.14445 -0.01207  0.11774  0.77964 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               0.14084    0.01353  10.409   <2e-16 ***
## perc_Pop_Black            0.32929    0.01838  17.916   <2e-16 ***
## perc_Pop_Hispanic_Latino  0.75512    0.02146  35.190   <2e-16 ***
## perc_Pop_Asian            0.36933    0.03024  12.214   <2e-16 ***
## perc_Pop_More_Races      -0.44778    0.22695  -1.973   0.0486 *  
## perc_Pop_Other_Races      0.21951    0.15314   1.433   0.1519    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1953 on 1968 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.4232, Adjusted R-squared:  0.4217 
## F-statistic: 288.8 on 5 and 1968 DF,  p-value: < 2.2e-16

We reject Ho2 at 95% significance level, % Black, Hispanic or Latino, Asian population are positively correlated to % of children covered in public health insurance; population with two or more races is negatively correlated.
- 1% increase in Black population comparing to white population leads to 0.32929 % increase of % of children covered in public insurance.
- 1% increase in Hispanic or Latino population comparing to white population leads to 0.75512 % increase of % of children covered in public insurance.
- 1% increase in Asian population comparing to white population leads to 0.36933 % increase of % of children covered in public insurance.
- 1% increase in population with two or more races comparing to white population leads to -0.44778 % decrease of % of children covered in public insurance.
- population with other races* are not correlated to % of children covered in health insurance at 95% significance level
- R-squared: 0.4232

Plot

##### children with insurance 
# black
health_ins_dat|>
  ggplot(aes(x=perc_Pop_Black,y=perc_children_with_ins, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=0.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# asian
health_ins_dat|>
  ggplot(aes(x=perc_Pop_Asian,y=perc_children_with_ins, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=0.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# H L
health_ins_dat|>
  ggplot(aes(x=perc_Pop_Hispanic_Latino,y=perc_children_with_ins, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=0.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# white
health_ins_dat|>
  ggplot(aes(x=perc_White,y=perc_children_with_ins, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=0.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

##### children with insurance 
# black
health_ins_dat|>
  ggplot(aes(x=perc_Pop_Black,y=perc_children_with_ins_public, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

# asian
health_ins_dat|>
  ggplot(aes(x=perc_Pop_Asian,y=perc_children_with_ins_public, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

# H L
health_ins_dat|>
  ggplot(aes(x=perc_Pop_Hispanic_Latino,y=perc_children_with_ins_public, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

# white
health_ins_dat|>
  ggplot(aes(x=perc_White,y=perc_children_with_ins_public, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

6. Poverty

Ho1: % population in poverty is not correlated with % of children covered in health insurance
Ha1: % population in poverty is correlated with % of children covered in health insurance
Ho2: % population in poverty is not correlated with % of children covered in public health insurance
Ha2: % population in poverty is correlated with % of children covered in public health insurance

Data Exploration

health_ins_dat|>
  ggplot(aes(x=Geo_County, y=perc_Poverty, fill=Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "Dark2")+
  theme_minimal()

## Warning: Removed 115 rows containing non-finite values (`stat_boxplot()`).

#### Linear Regression

# health insurance enrollment rate children under 18 vs poverty 
model_child_with_ins_poverty <- lm(perc_children_with_ins~
        perc_Poverty,
        data = health_ins_dat)

summary(model_child_with_ins_poverty)

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Poverty, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35215 -0.01241  0.01395  0.02469  0.02499 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.975009   0.001457 668.995   <2e-16 ***
## perc_Poverty 0.002184   0.007311   0.299    0.765    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03662 on 1858 degrees of freedom
##   (116 observations deleted due to missingness)
## Multiple R-squared:  4.805e-05,  Adjusted R-squared:  -0.0004901 
## F-statistic: 0.08928 on 1 and 1858 DF,  p-value: 0.7651

#public insurance
model_child_with_ins_poverty_public <- lm(perc_children_with_ins_public~
        perc_Poverty,
        data = health_ins_dat)

summary(model_child_with_ins_poverty_public)

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Poverty, data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63382 -0.13657  0.00423  0.12767  0.60505 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.226600   0.007292   31.07   <2e-16 ***
## perc_Poverty 1.522890   0.036570   41.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1831 on 1857 degrees of freedom
##   (117 observations deleted due to missingness)
## Multiple R-squared:  0.4829, Adjusted R-squared:  0.4826 
## F-statistic:  1734 on 1 and 1857 DF,  p-value: < 2.2e-16

We fail to reject Ho1 at 95% significance level, poverty is not correlated to % of children covered in health insurance
We reject Ho2 at 95% significance level, poverty is positively correlated to % of children covered in public health insurance
- 1% increase in poverty leads to 0.16970 % increase of % of children covered in public insurance.
- R-squared: 0.002569

Plot

# plot
health_ins_dat|>
  ggplot(aes(x=perc_Poverty,y=perc_children_with_ins, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# plot
health_ins_dat|>
  ggplot(aes(x=perc_Poverty,y=perc_children_with_ins_public, col=Geo_County)) +
  geom_point(size = 0.8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=-.2)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

7. Foreign Origin Non Citizen

Ho1: % population foreign origin non citizen is not correlated with % of children covered in health insurance
Ha1: % population foreign origin non citizen is correlated with % of children covered in health insurance
Ho2: % population foreign origin non citizen is not correlated with % of children covered in public health insurance
Ha2: % population foreign origin non citizen is correlated with % of children covered in public health insurance

Data Exploration

health_ins_dat|>
  ggplot(aes(x=Geo_County, y=perc_Origin_FO_NCitizen, fill=Geo_County))+
  geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
  scale_fill_brewer(palette = "Dark2")+
  theme_minimal()

Linear Regression

# health insurance enrollment rate children under 18 vs percentage population foreign origin non citizen
model_child_with_ins_foreign <- lm(perc_children_with_ins~
        perc_Origin_FO_NCitizen,
        data = health_ins_dat)
summary(model_child_with_ins_foreign)

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Origin_FO_NCitizen, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31919 -0.01120  0.01427  0.02217  0.04821 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.985933   0.001661 593.675  < 2e-16 ***
## perc_Origin_FO_NCitizen -0.066641   0.009061  -7.355 2.79e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03633 on 1973 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.02668,    Adjusted R-squared:  0.02619 
## F-statistic: 54.09 on 1 and 1973 DF,  p-value: 2.792e-13

#public insurance
model_child_with_ins_foreign_public <- lm(perc_children_with_ins_public~
        perc_Origin_FO_NCitizen,
        data = health_ins_dat)
summary(model_child_with_ins_foreign_public)

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Origin_FO_NCitizen, 
##     data = health_ins_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68068 -0.17455  0.00012  0.16725  0.64056 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.29330    0.01077   27.24   <2e-16 ***
## perc_Origin_FO_NCitizen  1.13834    0.05874   19.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2354 on 1972 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:   0.16,  Adjusted R-squared:  0.1596 
## F-statistic: 375.6 on 1 and 1972 DF,  p-value: < 2.2e-16

We reject Ho1 at 95% significance level, % population foreign origin non citizen is negatively correlated to % of children covered in public health insurance
- 1% increase in population foreign origin non citizen leads to 0.066641 % decrease of % of children covered in public insurance.
- R-squared: 0.02668
We reject Ho2 at 95% significance level, % population foreign origin non citizen is positively correlated to % of children covered in public health insurance
- 1% increase in population foreign origin non citizen leads to 1.13834 % increase of % of children covered in public insurance.
- R-squared: 0.16

Plot

# plot
health_ins_dat|>
  ggplot(aes(x=perc_Origin_FO_NCitizen,y=perc_children_with_ins, col=Geo_County)) +
  geom_point(size = .8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=0.7)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# plot
health_ins_dat|>
  ggplot(aes(x=perc_Origin_FO_NCitizen,y=perc_children_with_ins_public, col=Geo_County)) +
  geom_point(size = .8, alpha=.2)+
  facet_wrap(~Geo_County)+
  geom_smooth(method = lm)+
  scale_color_brewer(palette = "Dark2")+
  stat_regline_equation(label.y=1)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

Multivariate Linear Regression

Children with public insurance

In this part of the study, I would like to have a model to understand what factors are more correlated with children with public health insurance.
I will first be looking at all four county together, then analyze each county individually.

#children with insurance 
func_children_ins <- function(data){
  model_children <- lm(perc_children_with_ins~
        perc_Pop_Black +
        perc_Pop_Hispanic_Latino +
        perc_Pop_Asian+
        perc_Pop_More_Races+
        perc_Pop_Other_Races+
        perc_Poverty+
        perc_unemployed_16_over+
        log(Median_Household_Income_2019)+
        perc_Origin_FO_NCitizen+
        perc_Origin_FO_Citizen+
        perc_children_in_school,
        data = data)
 summary(model_children)
}

#children with public insurance 
func_children_public_ins <- function(data){
  model_children <- lm(perc_children_with_ins_public~
        perc_Pop_Black +
        perc_Pop_Hispanic_Latino +
        perc_Pop_Asian+
        perc_Pop_More_Races+
        perc_Pop_Other_Races+
        perc_Poverty+
        perc_unemployed_16_over+
        log(Median_Household_Income_2019)+
        perc_Origin_FO_NCitizen+
        perc_Origin_FO_Citizen+
        perc_children_in_school,
        data = data)
 summary(model_children)
}

Four Counties

# Four counties - insurance 
health_ins_dat|>func_children_ins()

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino + 
##     perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races + 
##     perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) + 
##     perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29720 -0.01129  0.01291  0.02184  0.05539 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.9228962  0.0387034  23.845  < 2e-16 ***
## perc_Pop_Black                    -0.0069802  0.0041856  -1.668 0.095555 .  
## perc_Pop_Hispanic_Latino           0.0081523  0.0060823   1.340 0.180304    
## perc_Pop_Asian                    -0.0301485  0.0079114  -3.811 0.000143 ***
## perc_Pop_More_Races                0.0286227  0.0425697   0.672 0.501431    
## perc_Pop_Other_Races               0.0040546  0.0299192   0.136 0.892217    
## perc_Poverty                       0.0189494  0.0123799   1.531 0.126026    
## perc_unemployed_16_over           -0.0003649  0.0241608  -0.015 0.987951    
## log(Median_Household_Income_2019)  0.0057899  0.0030786   1.881 0.060172 .  
## perc_Origin_FO_NCitizen           -0.0546899  0.0133485  -4.097 4.36e-05 ***
## perc_Origin_FO_Citizen             0.0119858  0.0107762   1.112 0.266179    
## perc_children_in_school           -0.0084999  0.0078701  -1.080 0.280274    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03582 on 1848 degrees of freedom
##   (116 observations deleted due to missingness)
## Multiple R-squared:  0.04815,    Adjusted R-squared:  0.04249 
## F-statistic: 8.499 on 11 and 1848 DF,  p-value: 8.778e-15

# Four counties - public insurance 
health_ins_dat|>func_children_public_ins()

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black + 
##     perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races + 
##     perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over + 
##     log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen + 
##     perc_Origin_FO_Citizen + perc_children_in_school, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.48120 -0.08679 -0.00095  0.08772  0.45877 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        3.26111    0.14320  22.774  < 2e-16 ***
## perc_Pop_Black                     0.04862    0.01547   3.143   0.0017 ** 
## perc_Pop_Hispanic_Latino           0.09360    0.02246   4.167 3.23e-05 ***
## perc_Pop_Asian                    -0.05991    0.02923  -2.050   0.0405 *  
## perc_Pop_More_Races                0.12196    0.15721   0.776   0.4380    
## perc_Pop_Other_Races               0.47476    0.11049   4.297 1.82e-05 ***
## perc_Poverty                       0.46096    0.04582  10.061  < 2e-16 ***
## perc_unemployed_16_over           -0.07603    0.08930  -0.851   0.3946    
## log(Median_Household_Income_2019) -0.26765    0.01139 -23.499  < 2e-16 ***
## perc_Origin_FO_NCitizen            0.51417    0.04931  10.428  < 2e-16 ***
## perc_Origin_FO_Citizen             0.05078    0.03990   1.273   0.2033    
## perc_children_in_school            0.01613    0.02907   0.555   0.5791    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1323 on 1847 degrees of freedom
##   (117 observations deleted due to missingness)
## Multiple R-squared:  0.7316, Adjusted R-squared:   0.73 
## F-statistic: 457.7 on 11 and 1847 DF,  p-value: < 2.2e-16

Bronx

# Bronx - insurance 
Bronx|>func_children_ins()

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino + 
##     perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races + 
##     perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) + 
##     perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school, 
##     data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.157556 -0.009653  0.008102  0.018660  0.038357 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.9474711  0.1051397   9.012  < 2e-16 ***
## perc_Pop_Black                     0.0095971  0.0154850   0.620  0.53588    
## perc_Pop_Hispanic_Latino           0.0514580  0.0185091   2.780  0.00578 ** 
## perc_Pop_Asian                     0.0157602  0.0388106   0.406  0.68497    
## perc_Pop_More_Races                0.0807487  0.1549610   0.521  0.60269    
## perc_Pop_Other_Races              -0.0296361  0.1216399  -0.244  0.80768    
## perc_Poverty                      -0.0234814  0.0257986  -0.910  0.36346    
## perc_unemployed_16_over           -0.0359757  0.0453086  -0.794  0.42782    
## log(Median_Household_Income_2019) -0.0006071  0.0086905  -0.070  0.94435    
## perc_Origin_FO_NCitizen           -0.0500972  0.0266910  -1.877  0.06150 .  
## perc_Origin_FO_Citizen             0.0011430  0.0350271   0.033  0.97399    
## perc_children_in_school            0.0276166  0.0195829   1.410  0.15951    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03047 on 299 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.05984,    Adjusted R-squared:  0.02525 
## F-statistic:  1.73 on 11 and 299 DF,  p-value: 0.06621

# Bronx - public insurance 
Bronx |>func_children_public_ins()

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black + 
##     perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races + 
##     perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over + 
##     log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen + 
##     perc_Origin_FO_Citizen + perc_children_in_school, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34179 -0.05413  0.00941  0.07094  0.24437 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        1.67042    0.37415   4.465 1.14e-05 ***
## perc_Pop_Black                     0.33387    0.05559   6.006 5.54e-09 ***
## perc_Pop_Hispanic_Latino           0.51538    0.06562   7.854 7.34e-14 ***
## perc_Pop_Asian                     0.26808    0.13756   1.949 0.052250 .  
## perc_Pop_More_Races                1.24195    0.54914   2.262 0.024442 *  
## perc_Pop_Other_Races              -0.41341    0.43106  -0.959 0.338311    
## perc_Poverty                       0.30829    0.09274   3.324 0.000997 ***
## perc_unemployed_16_over            0.34596    0.16184   2.138 0.033358 *  
## log(Median_Household_Income_2019) -0.15136    0.03092  -4.895 1.62e-06 ***
## perc_Origin_FO_NCitizen            0.39145    0.09461   4.138 4.57e-05 ***
## perc_Origin_FO_Citizen            -0.09353    0.12734  -0.734 0.463232    
## perc_children_in_school            0.04455    0.06954   0.641 0.522285    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.108 on 298 degrees of freedom
##   (17 observations deleted due to missingness)
## Multiple R-squared:  0.7975, Adjusted R-squared:   0.79 
## F-statistic: 106.7 on 11 and 298 DF,  p-value: < 2.2e-16

In Bronx County:

% of children with health insurance is positively correlated with race (Hispanic and Latino) and children in school. Negatively correlated with population in poverty.
% of children with public health insurance is positively correlated with race (Black, Hispanic and Latino, Asian, Two or More Races), unemployment, and foreign origin with no citizenship population. Negatively correlated with foreign origin with citizenship.

Queens

# Queens - insurance 
Queens|>func_children_ins()

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino + 
##     perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races + 
##     perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) + 
##     perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27238 -0.01417  0.01284  0.02449  0.06428 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.729186   0.109129   6.682 5.55e-11 ***
## perc_Pop_Black                    -0.005516   0.009521  -0.579   0.5626    
## perc_Pop_Hispanic_Latino           0.019566   0.016434   1.191   0.2343    
## perc_Pop_Asian                    -0.037846   0.017994  -2.103   0.0359 *  
## perc_Pop_More_Races                0.095971   0.082162   1.168   0.2433    
## perc_Pop_Other_Races              -0.020209   0.044564  -0.453   0.6504    
## perc_Poverty                       0.016535   0.034523   0.479   0.6322    
## perc_unemployed_16_over            0.090220   0.061701   1.462   0.1442    
## log(Median_Household_Income_2019)  0.021896   0.009067   2.415   0.0161 *  
## perc_Origin_FO_NCitizen           -0.051683   0.030192  -1.712   0.0875 .  
## perc_Origin_FO_Citizen             0.026543   0.030560   0.869   0.3855    
## perc_children_in_school           -0.009745   0.017984  -0.542   0.5881    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04202 on 579 degrees of freedom
##   (46 observations deleted due to missingness)
## Multiple R-squared:  0.09921,    Adjusted R-squared:  0.0821 
## F-statistic: 5.797 on 11 and 579 DF,  p-value: 6.137e-09

# Queens - public insurance 
Queens |>func_children_public_ins()

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black + 
##     perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races + 
##     perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over + 
##     log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen + 
##     perc_Origin_FO_Citizen + perc_children_in_school, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38269 -0.09251 -0.00449  0.09296  0.42711 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        2.43436    0.35380   6.881 1.55e-11 ***
## perc_Pop_Black                     0.16139    0.03087   5.229 2.39e-07 ***
## perc_Pop_Hispanic_Latino           0.39143    0.05328   7.347 6.94e-13 ***
## perc_Pop_Asian                     0.13088    0.05834   2.244 0.025239 *  
## perc_Pop_More_Races                0.10862    0.26637   0.408 0.683577    
## perc_Pop_Other_Races               0.72201    0.14448   4.997 7.71e-07 ***
## perc_Poverty                       0.25288    0.11193   2.259 0.024235 *  
## perc_unemployed_16_over            0.69938    0.20004   3.496 0.000508 ***
## log(Median_Household_Income_2019) -0.19757    0.02940  -6.721 4.33e-11 ***
## perc_Origin_FO_NCitizen            0.41863    0.09788   4.277 2.22e-05 ***
## perc_Origin_FO_Citizen            -0.21507    0.09908  -2.171 0.030357 *  
## perc_children_in_school           -0.05767    0.05831  -0.989 0.323013    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1362 on 579 degrees of freedom
##   (46 observations deleted due to missingness)
## Multiple R-squared:  0.5743, Adjusted R-squared:  0.5662 
## F-statistic: 71.02 on 11 and 579 DF,  p-value: < 2.2e-16

In Queens County:

% of children with health insurance is negatively correlated with race (Asian) and foreign origin with no citizenship population
% of children with public health insurance is positively correlated with race (Black, Hispanic and Latino, Asian, Other Races*), unemployment, and foreign origin with no citizenship population. Negatively correlated with foreign origin with citizenship.

Kings

# Kings - insurance 
Kings|>func_children_ins()

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino + 
##     perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races + 
##     perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) + 
##     perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19678 -0.01168  0.01181  0.02183  0.03333 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.961082   0.061604  15.601   <2e-16 ***
## perc_Pop_Black                    -0.006970   0.005415  -1.287    0.198    
## perc_Pop_Hispanic_Latino          -0.006706   0.010107  -0.664    0.507    
## perc_Pop_Asian                    -0.007399   0.012985  -0.570    0.569    
## perc_Pop_More_Races               -0.055430   0.061215  -0.906    0.366    
## perc_Pop_Other_Races               0.010134   0.104762   0.097    0.923    
## perc_Poverty                       0.020634   0.018636   1.107    0.269    
## perc_unemployed_16_over           -0.027731   0.034349  -0.807    0.420    
## log(Median_Household_Income_2019)  0.002604   0.004933   0.528    0.598    
## perc_Origin_FO_NCitizen           -0.041715   0.024183  -1.725    0.085 .  
## perc_Origin_FO_Citizen             0.001635   0.014479   0.113    0.910    
## perc_children_in_school           -0.006152   0.013388  -0.460    0.646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03324 on 694 degrees of freedom
##   (38 observations deleted due to missingness)
## Multiple R-squared:  0.02274,    Adjusted R-squared:  0.007254 
## F-statistic: 1.468 on 11 and 694 DF,  p-value: 0.1386

# Kings - public insurance 
Kings |>func_children_public_ins()

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black + 
##     perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races + 
##     perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over + 
##     log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen + 
##     perc_Origin_FO_Citizen + perc_children_in_school, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44404 -0.07860 -0.00335  0.08334  0.34380 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        3.59747    0.22976  15.658  < 2e-16 ***
## perc_Pop_Black                     0.01501    0.02020   0.743  0.45770    
## perc_Pop_Hispanic_Latino           0.09701    0.03769   2.574  0.01027 *  
## perc_Pop_Asian                    -0.01379    0.04843  -0.285  0.77591    
## perc_Pop_More_Races               -0.30710    0.22831  -1.345  0.17903    
## perc_Pop_Other_Races               0.08702    0.39072   0.223  0.82382    
## perc_Poverty                       0.52209    0.06951   7.512 1.80e-13 ***
## perc_unemployed_16_over           -0.37619    0.12811  -2.937  0.00343 ** 
## log(Median_Household_Income_2019) -0.28863    0.01840 -15.689  < 2e-16 ***
## perc_Origin_FO_NCitizen            0.64350    0.09019   7.135 2.45e-12 ***
## perc_Origin_FO_Citizen            -0.01382    0.05400  -0.256  0.79808    
## perc_children_in_school           -0.05085    0.04993  -1.018  0.30885    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.124 on 694 degrees of freedom
##   (38 observations deleted due to missingness)
## Multiple R-squared:  0.7357, Adjusted R-squared:  0.7315 
## F-statistic: 175.6 on 11 and 694 DF,  p-value: < 2.2e-16

In Kings County:

% of children with health insurance is not significantly correlated with any independent variables
% of children with public health insurance is positively correlated with race (Hispanic and Latino), poverty, unemployment, and foreign origin population with no citizenship.

NewYork

# New York - insurance 
NewYork|>func_children_ins()

## 
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino + 
##     perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races + 
##     perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) + 
##     perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school, 
##     data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.155441 -0.008007  0.012520  0.018116  0.047810 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        1.087364   0.095897  11.339   <2e-16 ***
## perc_Pop_Black                     0.002682   0.018869   0.142   0.8871    
## perc_Pop_Hispanic_Latino          -0.003370   0.026973  -0.125   0.9007    
## perc_Pop_Asian                    -0.038511   0.033624  -1.145   0.2532    
## perc_Pop_More_Races                0.117972   0.131918   0.894   0.3721    
## perc_Pop_Other_Races               0.125349   0.235599   0.532   0.5952    
## perc_Poverty                      -0.028161   0.036967  -0.762   0.4469    
## perc_unemployed_16_over           -0.072222   0.068924  -1.048   0.2958    
## log(Median_Household_Income_2019) -0.006632   0.007466  -0.888   0.3753    
## perc_Origin_FO_NCitizen           -0.006105   0.040653  -0.150   0.8808    
## perc_Origin_FO_Citizen            -0.005474   0.048265  -0.113   0.9098    
## perc_children_in_school           -0.023417   0.013027  -1.798   0.0735 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03138 on 240 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.04463,    Adjusted R-squared:  0.0008428 
## F-statistic: 1.019 on 11 and 240 DF,  p-value: 0.4298

# New York - public insurance 
NewYork|>func_children_public_ins()

## 
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black + 
##     perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races + 
##     perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over + 
##     log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen + 
##     perc_Origin_FO_Citizen + perc_children_in_school, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38528 -0.06018 -0.00953  0.06292  0.51727 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        2.05137    0.34790   5.896 1.25e-08 ***
## perc_Pop_Black                     0.30594    0.06845   4.469 1.21e-05 ***
## perc_Pop_Hispanic_Latino           0.47821    0.09786   4.887 1.87e-06 ***
## perc_Pop_Asian                     0.25045    0.12198   2.053   0.0411 *  
## perc_Pop_More_Races                0.20020    0.47858   0.418   0.6761    
## perc_Pop_Other_Races              -0.03926    0.85472  -0.046   0.9634    
## perc_Poverty                       0.25936    0.13411   1.934   0.0543 .  
## perc_unemployed_16_over           -0.15790    0.25005  -0.631   0.5283    
## log(Median_Household_Income_2019) -0.17451    0.02709  -6.443 6.35e-10 ***
## perc_Origin_FO_NCitizen           -0.11829    0.14748  -0.802   0.4233    
## perc_Origin_FO_Citizen             0.15602    0.17510   0.891   0.3738    
## perc_children_in_school            0.08653    0.04726   1.831   0.0684 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1138 on 240 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.8595, Adjusted R-squared:  0.853 
## F-statistic: 133.4 on 11 and 240 DF,  p-value: < 2.2e-16

In New York County:

% of children with health insurance is not significantly correlated with any independent variables
% of children with public health insurance is correlated with race (Black, Asian, Latino) and poverty

Part II Conclusion

Almost all children have healthcare coverage thanks to the Medicaid, which covers about 50% of all births in the New York state, and Child Health Plus (CHIP) programs, which covers for about 35% of New York’s children under 19, that supports families regardless income and immigration status; their parents is much less certain.

Interpreting the linear regressions

When analyzing linear relationship with independent variables separately:

Children’s with health insurance coverage for all four counties:
- Only race, median household income, and foreign origin with no citizenship are statistically significant at 95% significance level.
- % school enrollment for children under 18, % of employed population over 16, % of single parents, and % population in poverty are NOT statistically significant at 95% significance level.
Children’s with public health insurance coverage for all four counties:
- % of employed population over 16, % school enrollment for children under 18, % population in poverty, and foreign origin with no citizenship are statistically significant at 95% significance level.
- % of single parents is NOT statistically significant at 95% significance level.

When analyzing using multivariate linear regression:

Children’s with health insurance coverage for all four counties:
- % population Asian, and % foreign origin with no citizenship have negative correlations with % children with healthcare coverage, statistically significant at 95% significance level.
- % school enrollment for children under 18, % foreign origin citizen, % single parents, % unemployment, % of children in school, % black, % hispanic/latino, % other races, % two or more races, % in poverty are NOT statistically significant at 95% significance level.
Children’s with public health insurance coverage for all four counties:
- Only median household income (negative), % of employed population over 16,% population in poverty, and foreign origin with no citizenship are statistically significant at 95% significance level.
- other factors such as % school enrollment for children under 18, % foreign origin citizen and % of children in school NOT statistically significant at 95% significance level.
When analyzing children’s with public health insurance coverage for each county individually, the significance and correlation coefficient for each factor changes. For example, all non-white races are significant independent variables for New York, Bronx, and Queens, but not in Kings, where only Hispanic or Latino is significant.
- Thus, when suggesting policies or actions for increasing % children with health care coverage, it is important to analyze and study at county or smaller scales to understand the needs and constraints for the specific population and community.
- For example:
  - Kings is the only county that there is significant negative correlation between unemployment rate and children’s enrollment rate of public healthcare.
  - In Queens county, % of children with health insurance is negatively correlated with race (Asian) and foreign origin with no citizenship population. Perhaps the barriers for Asian population are variables not existed in the study, such as languages and culture etc. Thus, health department may want to investigate further into the Asian population and the population without citizenship.

Next steps:

We may be able to infer in which region families or adult may need help with getting affordable or subsidized or free healthcare through analyzing how many children are enrolled in the public health insurance.

Thoughts on Social Implications or Policy Suggestions

New York State address health insurance coverage through broad eligibility for Medicaid Affordable Care Act (ACA)
There are various reasons that prevents racial minorities and foreign born population from getting health insurance. For example, “legislative barriers, such as the threat of being labeled a public charge, or a fear of unknown and high out-of-pocket costs may prevent many from seeking medical attention at all” (citation from Health of Asians and Pacifc Islanders in New York City by NYC Health)
According to Community Service Society, there are four main reasons one may decide to not have health insurance: “(1) they are unaware of or do not understand their coverage options and the enrollment processes; (2) they choose not to enroll for political or religious reasons; (3) they have a high risk tolerance and self-perceived good health status; or (4) they consider the coverage available to them to be unaffordable.”
To address the barriers:
- 1. Health related social work groups may consider targeting areas with higher concentration racial minorities and foreign born population
- 1. Host information session, hand out flyers, or conduct other forms of communication in the native languages to the uninsured immigrant populations to address concerns and issues that they have about insurance.
- 1. Increase enrollment timeline flexibility by introducing additional eligible enrollment time frame/life events outside of the open enrollment period to address coverage gaps and other time related limitations
- 1. Combine enrollment eligibility check with other services, such as tax return.
- 1. Provide more subsidy options for low-income population
- 1. Help more children enroll in free or subsided health insurance through schools, or use school as one of the messenger to communicate insurance options to the parents.
- 1. Actions and evaluations should be county specific.
Policies should be county specific. For example, race is more highly correlated to health insurance access in New York and Queens comparing to Bronx.
One idea is to help more children enroll in free or heavily subsided health coverages with school system, or use school as one of the messenger to communicate insurance options to the parents.

Sources: 1.https://www1.nyc.gov/assets/doh/downloads/pdf/episrv/asian-pacific-islander-health-2021.pdf 2.https://cbcny.org/research/narrowing-new-yorks-health-insurance-coverage-gap 3.https://www.researchgate.net/publication/324066442_Factors_Associated_with_Health_Insurance_Status_in_an_Asian_American_Population_in_New_York_City_Analysis_of_a_Community-Based_Survey 4.https://www.nyccare.nyc/about/ 5.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158120/

https://www1.nyc.gov/assets/doh/downloads/pdf/epi/databrief100.pdf

Health Insurance Access Inequality in New York Counties: Bronx, Queens, Kings, New York (Manhattan)

2022-12-06

Project Overview

Motivation

Data

Topic

Methodology

Limitations:

Part I: Health Insurance Access

Setting up working environment

General Summary

1: Race and Health Insurance

Data Exploration

Linear Regression - All Counties

Linear Regression - Each County Separately

Plot % Black, Asian, Hispanic/Latino, and White population in linear relationship to % population with health care

2: Median Household Income and Health Insurance Access

Data Exploration

Linear Regression

Plot

3: Education Attainment of population over 25 years old

Data Exploration

Linear Regression

Plot

4: Employment Sectors

Data Exploration

Linear Regression

Plot

5: Foreign Origin Population

Data Exploration

Linear Regression

Foreign Origin Population with Citizenship

Foreign Origin - NOT US Citizen

6. Poverty

Data Exploration

Linear Regression

Multivariate Linear Regression

Linear Model for Each Factor

All variables combining together

PART I CONCLUSION

Part II: Children’s health insurance access

General Summary

1. School Enrollment

Data Exploration

Linear Regression

Plot

2. Employment / Unemployment (age over 16)

Data Exploration

Linear Regression

Plot

3. Single Parents

Linear Regression

4. Median Household Income

Linear Regression

Plot

5. Race

Linear Regression

Plot

6. Poverty

Data Exploration

Plot

7. Foreign Origin Non Citizen

Data Exploration

Linear Regression

Plot

Multivariate Linear Regression

Children with public insurance

Four Counties

Bronx

Queens

Kings

NewYork

Part II Conclusion

Interpreting the linear regressions

Next steps:

Thoughts on Social Implications or Policy Suggestions