Covid-19 exacerbated the health inequality and there has been more discussion on public health issues
According to Community Service Society, in 2019, more than 1 million New Yorkers remained uninsured and New York ranks seventh among states on coverage.
Access to health care insurance is crucial for a person’s physical well being and longevity. A healthy person can be more productive to the economy, provide financial and emotional support to their family, and contribute to their communities.
Health insurance in US, however, is not covered universally and could be very expensive, especially if one do not have a job or their job does not offer healthcare benefits. Unexpected medical bills are extremely burdensome to the already uncovered and the financially disadvantaged individuals/families.
Children, usually dependents of their parents on health insurance access, may be more vulnerable to health issues if their parents do not have health insurance coverages.
Which independent variables correlates to people’s decision to enroll in health insurance? Independent variables: race, median household income, employment sectors, education attainment, place of origin/citizenship, poverty.
How can free health insurance get to the targeted population, such as those that are not able to afford or non eligible for any public programs? How can we find the neighborhood that are the most in need using the linear regression analysis?
In the first part, the study aims to understand correlations between access to health coverage and social-economic factors in NYC Bronx, Queens, Kings, and Manhattan at the census tract level. Unable to access health insurance may be positively related to lower household income/poverty, minority races, employment sectors, lower level of educational attainment, and citizenship status for the foreign origin population. The study also look into those factor’s correlations with public insurance.
In the second part, the study investigate the correlations between children’s (under 18) access to health coverage and social-economic factors in NYC Bronx, Queens, Kings, and Manhattan at the census tract level. Unable to access health insurance may be positively related to lower household income/poverty, minority races, school enrollment, employment status (if parents are employeed), and citizenship status for the foreign origin population. The study also look into those factor’s correlations with public insurance.
I converted all the data, except for median household income, into percentage of population.
health_ins_dat <- read_csv("Final_project_data_edited_2.csv")
to_add_dat <- read_csv("data_country_origin.csv")
to_add_dat_2 <-read_csv("data_poverty_short.csv")
health_ins_dat <-
left_join(health_ins_dat, to_add_dat, by='Geo_GEOID')%>%left_join(., to_add_dat_2, by='Geo_GEOID')
#glimpse(health_ins_dat)
Set up data as percentages of population & analysis environment.
Here I am converting all the variables to study as percentages of population. The rows of data (census tracts) removed are the ones with no population and no household income.
For the race data, I have grouped Total_Pop_Some_Other_Race_Alone, Total_Pop_Native_Hawaiian_and_Other_Pacific_Islander_Alone, and Total_Pop_American_Indian_and_Alaska_Native_Alone into one category.
For the education data, I grouped bachelor degree, master degree, and professional degree into one category, because they all represent higher education. High school degree or less than high school degree usually cause more barrier to work opportunities / income.
For the occupation data, I grouped Employed_Civilian_Pop_16_and_Over_Construction_Extraction_and_Maintenance_Occupations+ Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations+ Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Production_Occupations+ Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Transportation_and_Material_Moving_Occupations into one category.
#Find percentage of people with health insurance at the NYC scale
health_ins_dat <- health_ins_dat|>filter(Total_Pop != 0,
Median_Household_Income_2019 !=0,
!is.na(Geo_County))
# % with health insurance in total population; % with public health insurance; % with private health insurance
health_ins_dat<- health_ins_dat|>
mutate(perc_with_ins = Total_with_Health_Insurance_Coverage/Total_Pop,
perc_with_ins_public = Total_with_Health_Insurance_Coverage_Public_Health_Coverage/Total_Pop,
perc_with_ins_private = Total_with_Health_Insurance_Coverage_Private_Health_Insurance/Total_Pop)
#Part I
#1. Race % establish percentage of population for each race
health_ins_dat<- health_ins_dat|>
mutate(
#Black
perc_Pop_Black=Total_Pop_Black_or_African_American_Alone/Total_Pop,
#Asian
perc_Pop_Asian=Total_Pop_Asian_Alone/Total_Pop,
#Hispanic
perc_Pop_Hispanic_Latino=Total_Pop_Hispanic_Latino/Total_Pop,
#other races
perc_Pop_Other_Races=(Total_Pop_Some_Other_Race_Alone+
Total_Pop_Native_Hawaiian_and_Other_Pacific_Islander_Alone+
Total_Pop_American_Indian_and_Alaska_Native_Alone)/Total_Pop,
#two or more races
perc_Pop_More_Races=Total_Pop_Two_or_More_Races/Total_Pop,
#white
perc_White=Total_Pop_White_Alone/Total_Pop,
perc_NonWhite=1-perc_White)
#2. Education Attainment % in each census tract
health_ins_dat<- health_ins_dat|>
mutate(perc_less_HS = as.numeric(Pop_25yrs_Less_than_High_School)/Total_Pop,
perc_HS = as.numeric(Pop_25yrs_High_School_Graduate_Includes_Equivalency)/Total_Pop,
perc_HS_or_Less = perc_less_HS+perc_HS,
perc_College= as.numeric(Pop_25yrs_Some_College)/Total_Pop,
perc_Bachelor_More = (as.numeric(Pop_25yrs_Bachelor_Degree)+as.numeric(Pop_25yrs_Master_Degree)+
as.numeric(Pop_25yrs_Professional_School_Degree)+
as.numeric(Pop_25yrs_Doctorate_Degree))/Total_Pop)
#3. Occupation %: 1. Management Professional 2. Services 3. Sales 3. Farm & Industrial
# equation: %of labor force in sector over 16yo: sector_employment/Civilian_Pop_in_Labor_Force_16_and_Over
health_ins_dat<-health_ins_dat|>
mutate(perc_Management_Professional =
Employed_Civilian_Pop_16_and_Over_Management_Professional_and_Related_Occupations/Civilian_Pop_in_Labor_Force_16_and_Over,
perc_Service =
Employed_Civilian_Pop_16_and_Over_Service_Occupations/Civilian_Pop_in_Labor_Force_16_and_Over,
perc_Sales = Employed_Civilian_Pop_16_and_Over_Sales_and_Office_Occupations/Civilian_Pop_in_Labor_Force_16_and_Over,
perc_Farm_Industrial = (Employed_Civilian_Pop_16_and_Over_Construction_Extraction_and_Maintenance_Occupations+
Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations+
Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Production_Occupations+
Employed_Civilian_Pop_16_and_Over_Production_Transportation_and_Material_Moving_Occupations_Transportation_and_Material_Moving_Occupations)/
Civilian_Pop_in_Labor_Force_16_and_Over)
#4. foreign origin population
health_ins_dat <-health_ins_dat|>
mutate(
perc_Origin_Native_Born = Origin_Native_Born/Total_Pop,
perc_Origin_FO=(Origin_Foreign_Born)/Total_Pop,
perc_Origin_FO_Citizen = (Origin_Foreign_Born-Origin_Foreign_Origin_Not_a_Citizen)/Total_Pop,
perc_Origin_FO_NCitizen = Origin_Foreign_Origin_Not_a_Citizen/Total_Pop,
perc_Poverty = Pop_Poverty/Total_Pop,
perc_Non_Poverty = 1-perc_Poverty)
#Part II
#Children's insurance access %
health_ins_dat <-health_ins_dat|>
mutate(perc_children_in_school = (Enrolled_In_School_Enrolled_In_Nursery_School_Preschool+
Enrolled_In_School_Enrolled_In_Kindergarten+
Enrolled_In_School_Enrolled_In_Grade_1_To_Grade_4+ Enrolled_In_School_Enrolled_In_Grade_5_To_Grade_8+
Enrolled_In_School_Enrolled_In_Grade_9_To_Grade_12)/Pop_Under_18,
#perc_with_ins = Total_with_Health_Insurance_Coverage/Total_Pop
perc_adult_with_ins = (Total_with_Health_Insurance_Coverage-Pop_Under_18_with_Health_Insurance_Coverage)/
(Total_Pop-Pop_Under_18),
perc_adult_without_ins = 1-perc_adult_with_ins,
perc_children_with_ins = Pop_Under_18_with_Health_Insurance_Coverage/Pop_Under_18,
perc_children_without_ins = 1-perc_children_with_ins,
perc_children_with_ins_public = Pop_Under_18_with_Health_Insurance_Coverage_Public_Health_Coverage/
Pop_Under_18,
perc_children_with_single_parents = Children_Living_with_Single_Parents/Pop_Under_18,
perc_employed_16_over = Civilian_Pop_in_Labor_Force_16_and_Over_Employed/
Civilian_Pop_in_Labor_Force_16_and_Over,
perc_unemployed_16_over = 1-perc_employed_16_over,
perc_male_employed_16_over = Civilian_Male_in_Labor_Force_16_and_Over_Employed/
Civilian_Male_in_Labor_Force_16_and_Over,
perc_female_employed_16_over = Civilian_Female_in_Labor_Force_16_and_Over_Employed/Civilian_Female_in_Labor_Force_16_and_Over,
)
#Bronx, Queens, Kings, and New York
Bronx <- health_ins_dat|>
filter(Geo_County=="Bronx")
Queens <- health_ins_dat|>
filter(Geo_County=="Queens")
Kings <- health_ins_dat|>
filter(Geo_County=="Kings")
NewYork <- health_ins_dat|>
filter(Geo_County=="New_York")
#table(health_ins_dat$Geo_County)
#write.csv(health_ins_dat, "Export.csv", row.names=TRUE)
First, I would like to create a table to quickly understand what are some general average statistics at Bronx, Queens, Kings, and Manhattan. Looking at the overall average statistics for % population with health insurance, % non white population, % population with less than bachelor degree, median household income, % population with occupations other than management professionals (the highest paying occupation category), and % foreign origin population with no citizenship in each county
#Find the summarized data Race, Education Attainment, Median Household Income, and Occupation about each county
health_ins_dat|>
group_by(Geo_County)|>
summarise(HaveHealthInsurance= percent(round(mean(perc_with_ins),3)),
NonWhitePopulation = percent(round(1-mean(perc_White),3)),
LessThanBachelorDegree = percent(round(1-mean(perc_Bachelor_More),3)),
MedianIncome = round(mean(Median_Household_Income_2019),3),
NonManagementProfessional = percent(round(1-mean(perc_Management_Professional),3)),
ForeignOriginNotCitizens = percent(round(mean(perc_Origin_FO_NCitizen),3))
)
## # A tibble: 4 × 7
## Geo_County HaveHealthInsurance NonWhitePopul…¹ LessT…² Media…³ NonMa…⁴ Forei…⁵
## <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Bronx 91% 89% 86% 54089. 76% 17%
## 2 Kings 92% 63% 74% 80246. 61% 14%
## 3 New_York 94% 52% 52% 142021. 43% 14%
## 4 Queens 90% 74% 76% 82765. 67% 19%
## # … with abbreviated variable names ¹NonWhitePopulation,
## # ²LessThanBachelorDegree, ³MedianIncome, ⁴NonManagementProfessional,
## # ⁵ForeignOriginNotCitizens
#Plot health insurance access for each county
health_ins_dat%>%
ggplot(aes(y=perc_with_ins, fill=Geo_County))+
geom_boxplot(outlier.size=0.8,outlier.fill="grey", outlier.alpha=.2) +
scale_fill_brewer(palette = "RdBu")+
theme_minimal()+
labs(title = "Health Insurance Access for Each County",
subtitle = "% of population with health insurance",
caption = "Data source: ACS 2015-2019",
y = "Percentage% with health insurance", x = "County",
tag = "Summary",
fill ='County')
#Plot health insurance access for each county - public insurance
health_ins_dat%>%
ggplot(aes(y=perc_with_ins_public, fill=Geo_County))+
geom_boxplot(outlier.size=0.8,outlier.fill="grey", outlier.alpha=.2) +
scale_fill_brewer(palette = "RdBu")+
theme_minimal()+
labs(title = "Health Insurance Access for Each County - Public Insurance",
subtitle = "% of population with public health insurance",
caption = "Data source: ACS 2015-2019",
y = "Percentage% with public health insurance", x = "County",
tag = "Summary",
fill ='County')
Before exploring the race and health insurance correlation, I am exploring the data on racial distribution at each county by plotting Non-white population distribution for each county.
#Plot Non-white population distribution for each county
health_ins_dat%>%
ggplot(aes(y=perc_NonWhite, fill=Geo_County))+
geom_boxplot(outlier.size=0.8,outlier.fill="grey", outlier.alpha=.2) +
scale_fill_brewer(palette = "RdBu")+
theme_minimal()+
labs(title = "Non-White Population %",
subtitle = "at the census tracts level",
caption = "Data source: ACS 2015-2019",
y = "Percentage% Non-White Population", x = "County",
tag = "Summary",
fill ='County')
* There is an inverse relationship between % non-white population and %
of population with health insurance. Bronx has the highest median value
for non-white population, followed by Queens, Kings, and New York.
Now, I am probing into the relationship between race and health insurance access through linear regression. I am regressing each race in relationship to the white population.
#Linear regression: Race and Health Insurance Access
model_race <-lm(perc_with_ins ~ perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races,
#perc_White
data = health_ins_dat)
summary(model_race)
##
## Call:
## lm(formula = perc_with_ins ~ perc_Pop_Black + perc_Pop_Asian +
## perc_Pop_Hispanic_Latino + perc_Pop_More_Races + perc_Pop_Other_Races,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44761 -0.02060 0.00804 0.02998 0.12641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.978158 0.003337 293.104 <2e-16 ***
## perc_Pop_Black -0.041895 0.004537 -9.235 <2e-16 ***
## perc_Pop_Asian -0.127995 0.007468 -17.139 <2e-16 ***
## perc_Pop_Hispanic_Latino -0.120429 0.005298 -22.731 <2e-16 ***
## perc_Pop_More_Races 0.069670 0.056065 1.243 0.214
## perc_Pop_Other_Races -0.023254 0.037849 -0.614 0.539
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04827 on 1970 degrees of freedom
## Multiple R-squared: 0.258, Adjusted R-squared: 0.2561
## F-statistic: 137 on 5 and 1970 DF, p-value: < 2.2e-16
#Linear regression: Race and Health Insurance Access
model_race_public <- lm(perc_with_ins_public~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races,
data = health_ins_dat)
summary(model_race_public)
##
## Call:
## lm(formula = perc_with_ins_public ~ perc_Pop_Black + perc_Pop_Asian +
## perc_Pop_Hispanic_Latino + perc_Pop_More_Races + perc_Pop_Other_Races,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34283 -0.08728 -0.01590 0.06859 0.62325
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.279272 0.008919 31.311 < 2e-16 ***
## perc_Pop_Black 0.156312 0.012125 12.892 < 2e-16 ***
## perc_Pop_Asian 0.166915 0.019960 8.362 < 2e-16 ***
## perc_Pop_Hispanic_Latino 0.363896 0.014160 25.700 < 2e-16 ***
## perc_Pop_More_Races -0.821163 0.149842 -5.480 4.79e-08 ***
## perc_Pop_Other_Races 0.127229 0.101157 1.258 0.209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.129 on 1970 degrees of freedom
## Multiple R-squared: 0.3044, Adjusted R-squared: 0.3026
## F-statistic: 172.4 on 5 and 1970 DF, p-value: < 2.2e-16
The regression above shows % of minority race changes in population in linear relation to % population with health care, comparing to % of white population considering census tract in all four counties. All results are significant at the 99% confidence level. R-squared is 0.3162, explaining 31.62% of the data.
Y(% population with health care) = beta1 * white_population_percentage + beta2 * black_population_percentage + beta3 * asian_population_percentage + beta4 * two_or_more_races_population_percentage + beta5 * other_races_population_percentage + beta6 * hispanic_latino_population_percentage
beta1 * white_population_percentage = 1 - (beta2 * black_population_percentage + beta3 * asian_population_percentage + beta4 * two_or_more_races_population_percentage + beta5 * other_races_population_percentage beta6 * hispanic_latino_population_percentage)
alpha(black_population_percentage) = beta1-beta2 = -0.041895
alpha(asian_population_percentage) = beta1-beta3 = -0.127995
alpha(hispanic_latino_population_percentage) = beta1-beta5 = -0.120429
alpha(two_or_more_races_population_percentage) = beta1-beta4 = 0.069670 (not significant)
alpha(other_races_population_percentage) = beta1-beta6 = -0.023254 (not significant)
#linear model for each county - insurance
model_ins_race_by_County <- health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins ~ perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races,
#perc_White
data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"))|>
rename('race'='term')
model_ins_race_by_County
## # A tibble: 24 × 6
## # Groups: Geo_County [4]
## Geo_County race estimate std.error p.value significan…¹
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.935 0.0134 9.98e-196 yes
## 2 Bronx perc_Pop_Black -0.00492 0.0157 7.54e- 1 no
## 3 Bronx perc_Pop_Asian 0.0271 0.0522 6.03e- 1 no
## 4 Bronx perc_Pop_Hispanic_Latino -0.0454 0.0154 3.51e- 3 yes
## 5 Bronx perc_Pop_More_Races -0.175 0.215 4.15e- 1 no
## 6 Bronx perc_Pop_Other_Races 0.00719 0.114 9.50e- 1 no
## 7 Kings (Intercept) 0.967 0.00473 0 yes
## 8 Kings perc_Pop_Black -0.0332 0.00591 2.78e- 8 yes
## 9 Kings perc_Pop_Asian -0.0877 0.0124 3.01e- 12 yes
## 10 Kings perc_Pop_Hispanic_Latino -0.120 0.0102 1.67e- 29 yes
## # … with 14 more rows, and abbreviated variable name ¹`significant?`
Significant data to notice:
Thoughts: It looks like Kings, New York, and Queens county should look at improving insurance access for Latino or Hispanic population, and Queens needs to pay attention to Asian population as well
Looking at the correlation at county level, health insurance access is NOT statistically significant to race:
#####Black Population#####
# % Black population and health insurance access
health_ins_dat %>%
ggplot(aes(y=perc_with_ins, x=perc_Pop_Black, na.rm=TRUE)) +
geom_point(col = "blue", size = .8, alpha=.2)+
facet_wrap(~Geo_County)+
labs(y="Population with Health Insurance",
x= "% of population: black alone",
title = "% black population and health insurance access")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=.5, label.y=.1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# % Black population and public health insurance access
health_ins_dat %>%
ggplot(aes(y=perc_with_ins_public, x=perc_Pop_Black, na.rm=TRUE)) +
geom_point(col = "blue", size = .8, alpha = .2)+
facet_wrap(~Geo_County)+
labs(y="Population with Health Insurance",
x= "% of population: black alone",
title = "% black population and health insurance access")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=.5, label.y=.1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#regression - insured
health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins ~ perc_Pop_Black, data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"
))|>
rename('race'='term')
## # A tibble: 8 × 6
## # Groups: Geo_County [4]
## Geo_County race estimate std.error p.value `significant?`
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.904 0.00424 0 yes
## 2 Bronx perc_Pop_Black 0.0160 0.0124 0.196 no
## 3 Kings (Intercept) 0.925 0.00244 0 yes
## 4 Kings perc_Pop_Black -0.00388 0.00549 0.479 no
## 5 New_York (Intercept) 0.956 0.00314 0 yes
## 6 New_York perc_Pop_Black -0.0830 0.0145 0.0000000261 yes
## 7 Queens (Intercept) 0.894 0.00310 0 yes
## 8 Queens perc_Pop_Black 0.0429 0.00888 0.00000169 yes
#regression - public insurance
health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins_public ~ perc_Pop_Black, data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"
))|>
rename('race'='term')
## # A tibble: 8 × 6
## # Groups: Geo_County [4]
## Geo_County race estimate std.error p.value `significant?`
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.535 0.0138 4.86e-124 yes
## 2 Bronx perc_Pop_Black 0.0282 0.0402 4.84e- 1 no
## 3 Kings (Intercept) 0.433 0.00786 1.83e-264 yes
## 4 Kings perc_Pop_Black 0.0143 0.0177 4.21e- 1 no
## 5 New_York (Intercept) 0.270 0.0112 1.43e- 68 yes
## 6 New_York perc_Pop_Black 0.446 0.0517 6.45e- 16 yes
## 7 Queens (Intercept) 0.393 0.00478 0 yes
## 8 Queens perc_Pop_Black 0.00303 0.0137 8.25e- 1 no
#####Asian Population#####
# % Asian population and health insurance access
health_ins_dat %>%
ggplot(aes(y=perc_with_ins, x=perc_Pop_Asian)) +
geom_point(col = "red", size = 0.8, alpha = 1/5)+
facet_wrap(~Geo_County)+
ylab("Population with Health Insurance")+
xlab("% of population: Asian") +
geom_smooth(method = lm)+
stat_regline_equation(label.x=.5, label.y=.1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# % Asian population and public health insurance access
health_ins_dat %>%
ggplot(aes(y=perc_with_ins_public, x=perc_Pop_Asian)) +
geom_point(col = "red", size = 0.8, alpha = 1/5)+
facet_wrap(~Geo_County)+
ylab("Population with Health Insurance")+
xlab("% of population: Asian") +
geom_smooth(method = lm)+
stat_regline_equation(label.x=.5, label.y=.1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#regression - public insurance
health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins_public ~ perc_Pop_Asian, data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"
))|>
rename('race'='term')
## # A tibble: 8 × 6
## # Groups: Geo_County [4]
## Geo_County race estimate std.error p.value `significant?`
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.578 0.00913 8.63e-185 yes
## 2 Bronx perc_Pop_Asian -0.899 0.135 1.22e- 10 yes
## 3 Kings (Intercept) 0.419 0.00728 8.30e-276 yes
## 4 Kings perc_Pop_Asian 0.157 0.0378 3.61e- 5 yes
## 5 New_York (Intercept) 0.350 0.0147 7.71e- 68 yes
## 6 New_York perc_Pop_Asian -0.177 0.0833 3.42e- 2 yes
## 7 Queens (Intercept) 0.379 0.00618 8.09e-269 yes
## 8 Queens perc_Pop_Asian 0.0617 0.0194 1.56e- 3 yes
#####Hispanic Latino Population#####
# % Hispanic Latino population and health insurance access
health_ins_dat %>%
ggplot(aes(y=perc_with_ins, x=perc_Pop_Hispanic_Latino)) +
geom_point(col = "purple", size = 0.8, alpha = 1/5)+
facet_wrap(~Geo_County)+
ylab("Population with Health Insurance")+
xlab("% of population:Hispanic or Latino") +
geom_smooth(method = lm)+
stat_regline_equation(label.x=.5, label.y=.1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# % Hispanic Latino population and public health insurance access
health_ins_dat %>%
ggplot(aes(y=perc_with_ins_public, x=perc_Pop_Hispanic_Latino)) +
geom_point(col = "purple", size = 0.8, alpha = 1/5)+
facet_wrap(~Geo_County)+
ylab("Population with Health Insurance")+
xlab("% of population: Hispanic or Latino") +
geom_smooth(method = lm)+
stat_regline_equation(label.x=.5, label.y=.1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#regression - insured
health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins ~ perc_Pop_Hispanic_Latino, data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"
))|>
rename('race'='term')
## # A tibble: 8 × 6
## # Groups: Geo_County [4]
## Geo_County race estimate std.error p.value significant…¹
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.931 0.00688 6.13e-288 yes
## 2 Bronx perc_Pop_Hispanic_Latino -0.0420 0.0118 4.16e- 4 yes
## 3 Kings (Intercept) 0.944 0.00250 0 yes
## 4 Kings perc_Pop_Hispanic_Latino -0.109 0.0102 9.54e- 25 yes
## 5 New_York (Intercept) 0.970 0.00342 0 yes
## 6 New_York perc_Pop_Hispanic_Latino -0.104 0.0108 5.99e- 19 yes
## 7 Queens (Intercept) 0.945 0.00389 0 yes
## 8 Queens perc_Pop_Hispanic_Latino -0.163 0.0119 1.79e- 37 yes
## # … with abbreviated variable name ¹`significant?`
#regression - insured with public insurance
health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins_public ~ perc_Pop_Hispanic_Latino, data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"
))|>
rename('race'='term')
## # A tibble: 8 × 6
## # Groups: Geo_County [4]
## Geo_County race estimate std.error p.value significant…¹
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.287 0.0169 1.83e- 46 yes
## 2 Bronx perc_Pop_Hispanic_Latino 0.469 0.0290 1.78e- 43 yes
## 3 Kings (Intercept) 0.401 0.00845 4.80e-227 yes
## 4 Kings perc_Pop_Hispanic_Latino 0.201 0.0345 9.53e- 9 yes
## 5 New_York (Intercept) 0.199 0.0107 8.54e- 50 yes
## 6 New_York perc_Pop_Hispanic_Latino 0.550 0.0339 1.28e- 41 yes
## 7 Queens (Intercept) 0.346 0.00627 2.48e-244 yes
## 8 Queens perc_Pop_Hispanic_Latino 0.184 0.0192 1.43e- 20 yes
## # … with abbreviated variable name ¹`significant?`
#####White Population#####
# % White population and health insurance access
health_ins_dat %>%
ggplot(aes(x=perc_White, y=perc_with_ins)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
ylab("Population with Health Insurance")+
xlab("% of population: white alone") +
geom_smooth(method = lm)+
theme_minimal()+
stat_regline_equation(label.x=.1, label.y=.1)
## `geom_smooth()` using formula = 'y ~ x'
# % White population and public health insurance access
health_ins_dat %>%
ggplot(aes(x=perc_White, y=perc_with_ins_public, color=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
ylab("Population with Health Insurance")+
xlab("% of population: white alone") +
geom_smooth(method = lm)+
theme_minimal()+
stat_regline_equation(label.x=.1, label.y=.1)+
scale_color_brewer(palette = "RdBu")
## `geom_smooth()` using formula = 'y ~ x'
#regression
health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins_public ~ perc_White, data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"
))|>
rename('race'='term')
## # A tibble: 8 × 6
## # Groups: Geo_County [4]
## Geo_County race estimate std.error p.value `significant?`
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.602 0.00703 5.46e-225 yes
## 2 Bronx perc_White -0.518 0.0327 2.44e- 42 yes
## 3 Kings (Intercept) 0.477 0.00909 6.46e-252 yes
## 4 Kings perc_White -0.106 0.0190 3.61e- 8 yes
## 5 New_York (Intercept) 0.545 0.0132 5.62e-118 yes
## 6 New_York perc_White -0.453 0.0237 6.22e- 52 yes
## 7 Queens (Intercept) 0.440 0.00526 0 yes
## 8 Queens perc_White -0.180 0.0149 2.62e- 30 yes
#median income distribution per count per census tract
health_ins_dat_MedianHouseholdIncome <- health_ins_dat%>%
filter(Households>0)
health_ins_dat_MedianHouseholdIncome%>%
ggplot(aes(x=Median_Household_Income_2019, fill=Geo_County))+
facet_wrap(~Geo_County)+
geom_histogram(bins=60)+
scale_fill_brewer(palette = "RdBu")+
theme_minimal()
#identify the outliers / super rich census tracts
health_ins_dat_MedianHouseholdIncome|>
filter(Median_Household_Income_2019>250000)|>
select(Geo_County,Geo_QName,perc_White, Per_Capita_Income)
## # A tibble: 57 × 4
## Geo_County Geo_QName perc_White Per_Capita_Income
## <chr> <chr> <dbl> <dbl>
## 1 Kings Census_Tract_21 0.741 121521
## 2 Kings Census_Tract_41 0.689 94831
## 3 New_York Census_Tract_21 0.721 161757
## 4 New_York Census_Tract_33 0.765 191549
## 5 New_York Census_Tract_37 0.823 162230
## 6 New_York Census_Tract_39 0.776 158186
## 7 New_York Census_Tract_42 0.595 73526
## 8 New_York Census_Tract_52 0.719 116547
## 9 New_York Census_Tract_54 0.760 150168
## 10 New_York Census_Tract_56 0.744 139701
## # … with 47 more rows
#Linear regression: Median Household Income and Health Insurance Access
model_householdIncome <-
lm(perc_with_ins~log(Median_Household_Income_2019),data = health_ins_dat_MedianHouseholdIncome)
summary(model_householdIncome)
##
## Call:
## lm(formula = perc_with_ins ~ log(Median_Household_Income_2019),
## data = health_ins_dat_MedianHouseholdIncome)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.41699 -0.01973 0.00795 0.03195 0.12215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.490469 0.024514 20.01 <2e-16 ***
## log(Median_Household_Income_2019) 0.038104 0.002186 17.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05211 on 1974 degrees of freedom
## Multiple R-squared: 0.1334, Adjusted R-squared: 0.133
## F-statistic: 304 on 1 and 1974 DF, p-value: < 2.2e-16
#function to explore each county
lm_income <-function(data){
fit <- lm(perc_with_ins~Median_Household_Income_2019, data=data)
sum.fit <- summary(fit)
data.frame(slope=sum.fit$coefficients[2,"Estimate"],
se = sum.fit$coefficients[2, "Std. Error"],
p_value = sum.fit$coefficients[2, "Pr(>|t|)"])
}
health_ins_dat_MedianHouseholdIncome|>
group_by(Geo_County)|>
do(lm_income(.))
## # A tibble: 4 × 4
## # Groups: Geo_County [4]
## Geo_County slope se p_value
## <chr> <dbl> <dbl> <dbl>
## 1 Bronx 0.000000177 0.0000000863 4.16e- 2
## 2 Kings 0.000000353 0.0000000418 1.55e-16
## 3 New_York 0.000000293 0.0000000262 4.42e-24
## 4 Queens 0.00000139 0.0000000847 7.42e-51
#Scatterplot to demonstrate how median income correlates with health insurance access in each county
health_ins_dat_MedianHouseholdIncome%>%
ggplot(aes(x=log(Median_Household_Income_2019), y=perc_with_ins, col=Geo_County, size = Total_Pop ))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation( label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#Scatterplot to demonstrate how median income correlates with public health insurance access in each county
health_ins_dat_MedianHouseholdIncome%>%
ggplot(aes(x=log(Median_Household_Income_2019), y=perc_with_ins_public, col=Geo_County, size = Total_Pop ))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation( label.y=.05)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
health_ins_dat_Education<-health_ins_dat%>%
pivot_longer(cols = perc_less_HS:perc_Bachelor_More & !perc_HS_or_Less,
#grep("Pop_25yrs_", names(health_ins_dat), value=TRUE)
names_to = "Education_Attainment",
values_to = "Percent_Pop")%>%
select(c("Geo_County","Education_Attainment", "Percent_Pop", "Total_Pop"))
health_ins_dat_Education|>
group_by(Geo_County)|>
ggplot(aes(x=Education_Attainment,y=Percent_Pop, fill=Education_Attainment)) +
facet_wrap(~Geo_County,nrow=4)+
scale_fill_brewer(palette = "Set3")+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
theme_minimal()+
coord_flip()
#scale_x_discrete(guide = guide_axis(angle = 90))
#theme(axis.text.x=element_text(angle=90, hjust=1))
#edu and health insurance
model_edu_attain <- lm(perc_with_ins~
perc_less_HS+
perc_HS+
perc_College,
#perc_Bachelor_More,
data = health_ins_dat)
summary(model_edu_attain)
##
## Call:
## lm(formula = perc_with_ins ~ perc_less_HS + perc_HS + perc_College,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4073 -0.0200 0.0061 0.0273 0.1357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.981758 0.003892 252.266 < 2e-16 ***
## perc_less_HS -0.350121 0.015562 -22.498 < 2e-16 ***
## perc_HS -0.137578 0.018479 -7.445 1.44e-13 ***
## perc_College 0.022096 0.023462 0.942 0.346
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04733 on 1972 degrees of freedom
## Multiple R-squared: 0.2859, Adjusted R-squared: 0.2848
## F-statistic: 263.2 on 3 and 1972 DF, p-value: < 2.2e-16
#linear model for each county
health_ins_dat|>
group_by(Geo_County)|>
do(tidy(lm(perc_with_ins ~
perc_less_HS+
perc_HS+
perc_College,
#perc_Bachelor_More,
data = .)))|>
select(-statistic)|>
mutate(`significant?` = case_when(
`p.value` < 0.05 ~ "yes",
TRUE ~ "no"
))|>
rename('education attainment'='term')
## # A tibble: 16 × 6
## # Groups: Geo_County [4]
## Geo_County `education attainment` estimate std.error p.value `significant?`
## <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 Bronx (Intercept) 0.953 0.0193 9.38e-153 yes
## 2 Bronx perc_less_HS -0.207 0.0454 7.43e- 6 yes
## 3 Bronx perc_HS -0.0763 0.0615 2.16e- 1 no
## 4 Bronx perc_College 0.0299 0.0653 6.47e- 1 no
## 5 Kings (Intercept) 0.980 0.00647 0 yes
## 6 Kings perc_less_HS -0.308 0.0245 4.93e- 33 yes
## 7 Kings perc_HS -0.0421 0.0268 1.17e- 1 no
## 8 Kings perc_College -0.0897 0.0385 1.99e- 2 yes
## 9 New_York (Intercept) 0.981 0.00562 8.30e-275 yes
## 10 New_York perc_less_HS -0.185 0.0353 3.27e- 7 yes
## 11 New_York perc_HS -0.135 0.0642 3.65e- 2 yes
## 12 New_York perc_College -0.0652 0.0722 3.67e- 1 no
## 13 Queens (Intercept) 0.987 0.0109 0 yes
## 14 Queens perc_less_HS -0.668 0.0350 1.56e- 64 yes
## 15 Queens perc_HS -0.0981 0.0374 8.97e- 3 yes
## 16 Queens perc_College 0.115 0.0437 8.82e- 3 yes
#edu and household income
model_edu_and_householdIncome <- lm(log(Median_Household_Income_2019)~
perc_less_HS+
perc_HS+
perc_College,
#perc_Bachelor_More,
data = health_ins_dat)
summary(model_edu_and_householdIncome)
##
## Call:
## lm(formula = log(Median_Household_Income_2019) ~ perc_less_HS +
## perc_HS + perc_College, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20851 -0.20553 0.03738 0.24130 1.17460
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.26327 0.02797 438.453 < 2e-16 ***
## perc_less_HS -4.67991 0.11184 -41.843 < 2e-16 ***
## perc_HS -2.02176 0.13280 -15.224 < 2e-16 ***
## perc_College -0.86633 0.16862 -5.138 3.06e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3402 on 1972 degrees of freedom
## Multiple R-squared: 0.5987, Adjusted R-squared: 0.5981
## F-statistic: 980.7 on 3 and 1972 DF, p-value: < 2.2e-16
#scatterplot to demonstrate how education attainment of high school diploma or less (people over 25 years old) correlates with health insurance access in each county
health_ins_dat%>%
ggplot(aes(x=perc_HS_or_Less, y=perc_with_ins, col=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#scatterplot to demonstrate how education attainment of bachelor degree or more (people over 25 years old) correlates with health insurance access in each county
health_ins_dat%>%
ggplot(aes(x=perc_Bachelor_More, y=perc_with_ins, col=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#scatterplot to demonstrate how education attainment of bachelor degree or more (people over 25 years old) correlates with public health insurance access in each county
health_ins_dat%>%
ggplot(aes(x=perc_Bachelor_More, y=perc_with_ins_public, col=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.05)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
$
health_ins_dat_Employment<-health_ins_dat%>%
pivot_longer(cols = perc_Management_Professional:perc_Farm_Industrial,
names_to = "Employment_Sector",
values_to = "Employment_Pop")%>%
select(c("Geo_County","Employment_Sector", "Employment_Pop", "Total_Pop"))
health_ins_dat_Employment|>
group_by(Geo_County)|>
ggplot(aes(x=Employment_Sector,y=Employment_Pop, fill=Employment_Sector)) +
facet_wrap(~Geo_County,nrow=4)+
scale_fill_brewer(palette = "Set2")+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
theme_minimal()+
coord_flip()
#scale_x_discrete(guide = guide_axis(angle = 90))
#theme(axis.text.x=element_text(angle=90, hjust=1))
model_employment <- lm(perc_with_ins~perc_Farm_Industrial+
perc_Service+
perc_Sales,
data = health_ins_dat)
summary(model_employment)
##
## Call:
## lm(formula = perc_with_ins ~ perc_Farm_Industrial + perc_Service +
## perc_Sales, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40968 -0.01968 0.00692 0.02913 0.12848
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.967450 0.004997 193.612 < 2e-16 ***
## perc_Farm_Industrial -0.128333 0.010423 -12.313 < 2e-16 ***
## perc_Service -0.149520 0.012912 -11.580 < 2e-16 ***
## perc_Sales 0.072044 0.020771 3.468 0.000535 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04868 on 1972 degrees of freedom
## Multiple R-squared: 0.2445, Adjusted R-squared: 0.2433
## F-statistic: 212.7 on 3 and 1972 DF, p-value: < 2.2e-16
#Impact of employment status and health insurance overage
lm_employment <-function(data){
fit <- lm(perc_with_ins~perc_employed_16_over, data=data)
sum.fit <- summary(fit)
data.frame(slope=sum.fit$coefficients[2,"Estimate"],
se = sum.fit$coefficients[2, "Std. Error"],
p_value = sum.fit$coefficients[2, "Pr(>|t|)"])
}
health_ins_dat|>
group_by(Geo_County)|>
do(lm_employment(.))
## # A tibble: 4 × 4
## # Groups: Geo_County [4]
## Geo_County slope se p_value
## <chr> <dbl> <dbl> <dbl>
## 1 Bronx 0.108 0.0509 0.0340
## 2 Kings 0.0851 0.0423 0.0446
## 3 New_York 0.420 0.0674 0.00000000183
## 4 Queens -0.0368 0.0790 0.641
#scatterplot to demonstrate how employment section (people over 16 years old) correlates with health insurance access in each county
###Farm & Industrial###
health_ins_dat%>%
ggplot(aes(x=perc_Farm_Industrial, y=perc_with_ins, col=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
###Service###
health_ins_dat%>%
ggplot(aes(x=perc_Service, y=perc_with_ins, col=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
###Sales###
health_ins_dat%>%
ggplot(aes(x=perc_Sales, y=perc_with_ins, col=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
###Management_Professional###
health_ins_dat%>%
ggplot(aes(x=perc_Management_Professional, y=perc_with_ins, col=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#distribution of Foreign Origin Population with Citizenship
health_ins_dat|>
ggplot(aes(x = Geo_County, y = perc_Origin_FO_Citizen, fill = Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "RdBu")+
theme_minimal()
#distribution of Foreign Origin Population with NO Citizenship
health_ins_dat|>
ggplot(aes(x = Geo_County, y = perc_Origin_FO_NCitizen, fill = Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "RdBu")+
theme_minimal()
# Foreign Origin - US Citizen
model_foreign_citizen <- lm(perc_with_ins~perc_Origin_FO_Citizen, data=health_ins_dat)
summary(model_foreign_citizen)
##
## Call:
## lm(formula = perc_with_ins ~ perc_Origin_FO_Citizen, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.47038 -0.02149 0.01150 0.03751 0.09212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.933020 0.003019 309.035 < 2e-16 ***
## perc_Origin_FO_Citizen -0.070222 0.012330 -5.695 1.42e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05553 on 1974 degrees of freedom
## Multiple R-squared: 0.01617, Adjusted R-squared: 0.01567
## F-statistic: 32.44 on 1 and 1974 DF, p-value: 1.417e-08
# with citizenship
health_ins_dat%>%
ggplot(aes(x=perc_Origin_FO_Citizen, y=perc_with_ins, color=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
##publicly insured?
summary(lm(perc_with_ins_public~perc_Origin_FO_Citizen, data=health_ins_dat))
##
## Call:
## lm(formula = perc_with_ins_public ~ perc_Origin_FO_Citizen, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.37849 -0.10785 -0.01704 0.09902 0.49644
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.405032 0.008386 48.299 < 2e-16 ***
## perc_Origin_FO_Citizen 0.094407 0.034248 2.757 0.00589 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1542 on 1974 degrees of freedom
## Multiple R-squared: 0.003835, Adjusted R-squared: 0.00333
## F-statistic: 7.599 on 1 and 1974 DF, p-value: 0.005894
# with citizenship - public insurance
health_ins_dat%>%
ggplot(aes(x=perc_Origin_FO_Citizen, y=perc_with_ins_public, color=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.9)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Foreign Origin - NOT US Citizen
summary(lm(perc_with_ins~perc_Origin_FO_NCitizen, data=health_ins_dat))
##
## Call:
## lm(formula = perc_with_ins ~ perc_Origin_FO_NCitizen, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.48674 -0.02022 0.00590 0.02761 0.11241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.975957 0.002059 474.0 <2e-16 ***
## perc_Origin_FO_NCitizen -0.367429 0.011238 -32.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04509 on 1974 degrees of freedom
## Multiple R-squared: 0.3513, Adjusted R-squared: 0.351
## F-statistic: 1069 on 1 and 1974 DF, p-value: < 2.2e-16
# without citizenship
health_ins_dat%>%
ggplot(aes(x=perc_Origin_FO_NCitizen, y=perc_with_ins, color=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
##publicly insured?
summary(lm(perc_with_ins_public~perc_Origin_FO_NCitizen, data=health_ins_dat))
##
## Call:
## lm(formula = perc_with_ins_public ~ perc_Origin_FO_NCitizen,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39618 -0.09686 -0.01675 0.08509 0.54753
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.364351 0.006874 53.00 <2e-16 ***
## perc_Origin_FO_NCitizen 0.387096 0.037515 10.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1505 on 1974 degrees of freedom
## Multiple R-squared: 0.05118, Adjusted R-squared: 0.05069
## F-statistic: 106.5 on 1 and 1974 DF, p-value: < 2.2e-16
# without citizenship - public insurance
health_ins_dat%>%
ggplot(aes(x=perc_Origin_FO_NCitizen, y=perc_with_ins_public, color=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.99)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
* More foreign origin and not citizen lead to more higher decrease in %
population with health insurance coverage, and a much higher increase in
% population with public health insurance coverage, comparing to foreign
origin with citizenship. + 1 % population increase in foreign origin and
not citizen is correlated with -0.367429% decrease in % population with
health insurance coverage, and, 0.387096% increase in population
publicly insured.
Ho1: % population in poverty is not correlated with % of children covered in health insurance
Ha1: % population in poverty is correlated with % of children covered in health insurance
Ho2: % population in poverty is not correlated with % of children covered in public health insurance
Ha2: % population in poverty is correlated with % of children covered in public health insurance
#distribution of Foreign Origin Population with Citizenship
health_ins_dat|>
ggplot(aes(x = Geo_County, y = perc_Poverty, fill = Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "RdBu")+
theme_minimal()
## Warning: Removed 115 rows containing non-finite values (`stat_boxplot()`).
# health insurance enrollment rate children under 18 vs poverty
model_ins_poverty <- lm(perc_with_ins~
perc_Poverty,
data = health_ins_dat)
summary(model_ins_poverty)
##
## Call:
## lm(formula = perc_with_ins ~ perc_Poverty, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34102 -0.02287 0.00918 0.03594 0.09952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.931238 0.002097 444.086 < 2e-16 ***
## perc_Poverty -0.077137 0.010521 -7.332 3.37e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0527 on 1859 degrees of freedom
## (115 observations deleted due to missingness)
## Multiple R-squared: 0.0281, Adjusted R-squared: 0.02758
## F-statistic: 53.76 on 1 and 1859 DF, p-value: 3.37e-13
# plot
health_ins_dat%>%
ggplot(aes(x=perc_Poverty, y=perc_with_ins, color=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.5)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#public insurance
model_ins_poverty_public <- lm(perc_with_ins_public~
perc_Poverty,
data = health_ins_dat)
summary(model_ins_poverty_public)
##
## Call:
## lm(formula = perc_with_ins_public ~ perc_Poverty, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.37811 -0.06572 0.00469 0.06866 0.53162
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.267372 0.004079 65.54 <2e-16 ***
## perc_Poverty 0.983018 0.020467 48.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1025 on 1859 degrees of freedom
## (115 observations deleted due to missingness)
## Multiple R-squared: 0.5538, Adjusted R-squared: 0.5535
## F-statistic: 2307 on 1 and 1859 DF, p-value: < 2.2e-16
# plot
health_ins_dat%>%
ggplot(aes(x=perc_Poverty, y=perc_with_ins_public, color=Geo_County))+
facet_wrap(~Geo_County)+
geom_point(size = 0.8, alpha=.2)+
scale_color_brewer(palette = "RdBu")+
geom_smooth(method = lm)+
stat_regline_equation(label.x=0, label.y=.0)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
* We reject Ho1 at 95% significance level, % population in poverty is
correlated to % of population covered with health insurance + 1%
increase in % population in poverty leads to 0.39 % increase of % of
population covered with health insurance. R-squared: 0.02104
Goal: In this part of the study, I am building a model to analyze if the race factor is controlled, factors that positively correlated with health insurance access, including education attainment, employment sector, median household income, and native born citizens, will make the population more likely to obtain health insurance.
###regression model for each categories
#first variable: race
tidy(summary(model_race))
## # A tibble: 6 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.978 0.00334 293. 0
## 2 perc_Pop_Black -0.0419 0.00454 -9.23 6.49e- 20
## 3 perc_Pop_Asian -0.128 0.00747 -17.1 1.75e- 61
## 4 perc_Pop_Hispanic_Latino -0.120 0.00530 -22.7 9.03e-102
## 5 perc_Pop_More_Races 0.0697 0.0561 1.24 2.14e- 1
## 6 perc_Pop_Other_Races -0.0233 0.0378 -0.614 5.39e- 1
#second variable: education (bachelor or higher degree)
model_minority_edu <-lm(perc_with_ins ~
perc_Bachelor_More,
data = health_ins_dat)
tidy(summary(model_minority_edu))
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.886 0.00212 417. 0
## 2 perc_Bachelor_More 0.119 0.00674 17.7 4.57e-65
#third variable: employment sector
model_minority_employment <-lm(perc_with_ins ~
perc_Management_Professional+
perc_Sales,
data = health_ins_dat)
tidy(summary(model_minority_employment))
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.819 0.00535 153. 0
## 2 perc_Management_Professional 0.153 0.00639 23.9 3.32e-111
## 3 perc_Sales 0.214 0.0214 9.99 5.68e- 23
#fourth variable: median household income
tidy(summary(model_householdIncome))
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.490 0.0245 20.0 3.09e-81
## 2 log(Median_Household_Income_2019) 0.0381 0.00219 17.4 1.99e-63
#fifth variable: foreign origin with citizenship
model_minority_citizen <-lm(perc_with_ins ~
perc_Origin_Native_Born,
data = health_ins_dat)
tidy(summary(model_minority_citizen))
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.814 0.00481 169. 0
## 2 perc_Origin_Native_Born 0.167 0.00757 22.1 6.17e-97
# race
model_m_p <-lm(perc_with_ins_public ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races,
data = health_ins_dat)
# race and median household income
model_m2 <-lm(perc_with_ins ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019),
data = health_ins_dat)
tidy(summary(model_m2))
## # A tibble: 7 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.827 0.0329 25.2 2.51e-121
## 2 perc_Pop_Black -0.0318 0.00501 -6.35 2.64e- 10
## 3 perc_Pop_Asian -0.117 0.00781 -15.0 3.84e- 48
## 4 perc_Pop_Hispanic_Latino -0.100 0.00687 -14.6 9.03e- 46
## 5 perc_Pop_More_Races 0.0506 0.0559 0.905 3.66e- 1
## 6 perc_Pop_Other_Races -0.0296 0.0377 -0.787 4.32e- 1
## 7 log(Median_Household_Income_2019) 0.0126 0.00274 4.61 4.35e- 6
model_m2_p <-lm(perc_with_ins_public ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019),
data = health_ins_dat)
# race and median household income and education attainment(bachelor degree or more)
model_m3 <-lm(perc_with_ins ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019)+
perc_Bachelor_More,
data = health_ins_dat)
tidy(summary(model_m3))
## # A tibble: 8 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.903 0.0396 22.8 3.61e-102
## 2 perc_Pop_Black -0.0246 0.00544 -4.53 6.35e- 6
## 3 perc_Pop_Asian -0.111 0.00800 -13.9 1.00e- 41
## 4 perc_Pop_Hispanic_Latino -0.0939 0.00709 -13.2 2.00e- 38
## 5 perc_Pop_More_Races 0.0190 0.0566 0.336 7.37e- 1
## 6 perc_Pop_Other_Races 0.00443 0.0389 0.114 9.09e- 1
## 7 log(Median_Household_Income_2019) 0.00460 0.00362 1.27 2.04e- 1
## 8 perc_Bachelor_More 0.0402 0.0119 3.39 7.11e- 4
model_m3_p <-lm(perc_with_ins_public ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019)+
perc_Bachelor_More,
data = health_ins_dat)
# race and median household income and education attainment(bachelor degree or more) and employment sectors
model_m4 <-lm(perc_with_ins ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019)+
perc_Bachelor_More+
perc_Management_Professional+
perc_Sales,
data = health_ins_dat)
tidy(summary(model_m4))
## # A tibble: 10 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.868 0.0389 22.3 2.40e-98
## 2 perc_Pop_Black -0.00166 0.00578 -0.287 7.74e- 1
## 3 perc_Pop_Asian -0.0828 0.00840 -9.85 2.15e-22
## 4 perc_Pop_Hispanic_Latino -0.0536 0.00803 -6.68 3.08e-11
## 5 perc_Pop_More_Races 0.0273 0.0552 0.495 6.21e- 1
## 6 perc_Pop_Other_Races 0.0437 0.0383 1.14 2.55e- 1
## 7 log(Median_Household_Income_2019) -0.000155 0.00357 -0.0435 9.65e- 1
## 8 perc_Bachelor_More -0.0197 0.0170 -1.16 2.47e- 1
## 9 perc_Management_Professional 0.130 0.0176 7.39 2.08e-13
## 10 perc_Sales 0.181 0.0221 8.16 5.71e-16
Observation: In model_minority4, after adding the profession, we can observe that. There is multicollinearity between management professional and education level, and, management professional and race.
#% employment sector
#linear regression between bachelor degree or more with management profession and sales
tidy(summary(lm(perc_Bachelor_More~perc_Management_Professional+perc_Sales, data=health_ins_dat)))
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0485 0.00738 -6.57 6.39e-11
## 2 perc_Management_Professional 0.891 0.00882 101. 0
## 3 perc_Sales -0.0969 0.0296 -3.28 1.07e- 3
#linear regression between median household income (log transformed) with management profession and sales
tidy(summary(lm(log(Median_Household_Income_2019)~perc_Management_Professional+perc_Sales, data=health_ins_dat)))
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 10.2 0.0362 281. 0
## 2 perc_Management_Professional 2.41 0.0432 55.7 0
## 3 perc_Sales 0.690 0.145 4.75 0.00000214
## race and median household income and education attainment(bachelor degree or more) and % population in poverty
model_m5 <-lm(perc_with_ins ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019)+
perc_Bachelor_More+
perc_Poverty,
#perc_Management_Professional+
#perc_Sales,
data = health_ins_dat)
tidy(summary(model_m5))
## # A tibble: 9 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.762 0.0530 14.4 1.77e-44
## 2 perc_Pop_Black -0.0221 0.00536 -4.13 3.82e- 5
## 3 perc_Pop_Asian -0.0984 0.00818 -12.0 3.88e-32
## 4 perc_Pop_Hispanic_Latino -0.0876 0.00700 -12.5 1.56e-34
## 5 perc_Pop_More_Races 0.0133 0.0552 0.242 8.09e- 1
## 6 perc_Pop_Other_Races -0.00804 0.0386 -0.208 8.35e- 1
## 7 log(Median_Household_Income_2019) 0.0163 0.00466 3.49 4.89e- 4
## 8 perc_Bachelor_More 0.0311 0.0119 2.62 8.91e- 3
## 9 perc_Poverty 0.0626 0.0149 4.21 2.69e- 5
model_m5_p <-lm(perc_with_ins_public ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019)+
perc_Bachelor_More+
perc_Poverty,
#perc_Management_Professional+
#perc_Sales,
data = health_ins_dat)
## race and median household income and education attainment(bachelor degree or more) and % population in poverty and % population with US citizenship
model_m6 <-lm(perc_with_ins ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019)+
perc_Bachelor_More+
#perc_Poverty+
#perc_Management_Professional+
#perc_Sales,
perc_Origin_Native_Born,
#perc_Origin_FO_Citizen,
data = health_ins_dat)
tidy(summary(model_m6))
## # A tibble: 9 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.774 0.0395 19.6 3.35e-78
## 2 perc_Pop_Black -0.00832 0.00539 -1.54 1.23e- 1
## 3 perc_Pop_Asian -0.0352 0.00977 -3.60 3.28e- 4
## 4 perc_Pop_Hispanic_Latino -0.0730 0.00702 -10.4 1.08e-24
## 5 perc_Pop_More_Races 0.0520 0.0545 0.955 3.40e- 1
## 6 perc_Pop_Other_Races 0.0551 0.0377 1.46 1.43e- 1
## 7 log(Median_Household_Income_2019) 0.00799 0.00349 2.29 2.24e- 2
## 8 perc_Bachelor_More 0.0163 0.0116 1.41 1.58e- 1
## 9 perc_Origin_Native_Born 0.122 0.00972 12.6 6.33e-35
model_m6_p <-lm(perc_with_ins_public ~
perc_Pop_Black +
perc_Pop_Asian+
perc_Pop_Hispanic_Latino+
perc_Pop_More_Races+
perc_Pop_Other_Races+
log(Median_Household_Income_2019)+
perc_Bachelor_More+
#perc_Poverty+
#perc_Management_Professional+
#perc_Sales,
perc_Origin_Native_Born,
#perc_Origin_FO_NCitizen,
data = health_ins_dat)
#create a combined regression table_ insurance
stargazer(model_race, model_m2, model_m3, model_m6,
single.row=TRUE,type="text")
##
## =========================================================================================================================================
## Dependent variable:
## -------------------------------------------------------------------------------------------------------
## perc_with_ins
## (1) (2) (3) (4)
## -----------------------------------------------------------------------------------------------------------------------------------------
## perc_Pop_Black -0.042*** (0.005) -0.032*** (0.005) -0.025*** (0.005) -0.008 (0.005)
## perc_Pop_Asian -0.128*** (0.007) -0.117*** (0.008) -0.111*** (0.008) -0.035*** (0.010)
## perc_Pop_Hispanic_Latino -0.120*** (0.005) -0.100*** (0.007) -0.094*** (0.007) -0.073*** (0.007)
## perc_Pop_More_Races 0.070 (0.056) 0.051 (0.056) 0.019 (0.057) 0.052 (0.054)
## perc_Pop_Other_Races -0.023 (0.038) -0.030 (0.038) 0.004 (0.039) 0.055 (0.038)
## log(Median_Household_Income_2019) 0.013*** (0.003) 0.005 (0.004) 0.008** (0.003)
## perc_Bachelor_More 0.040*** (0.012) 0.016 (0.012)
## perc_Origin_Native_Born 0.122*** (0.010)
## Constant 0.978*** (0.003) 0.827*** (0.033) 0.903*** (0.040) 0.774*** (0.039)
## -----------------------------------------------------------------------------------------------------------------------------------------
## Observations 1,976 1,976 1,976 1,976
## R2 0.258 0.266 0.270 0.324
## Adjusted R2 0.256 0.264 0.268 0.322
## Residual Std. Error 0.048 (df = 1970) 0.048 (df = 1969) 0.048 (df = 1968) 0.046 (df = 1967)
## F Statistic 137.008*** (df = 5; 1970) 118.882*** (df = 6; 1969) 104.085*** (df = 7; 1968) 118.102*** (df = 8; 1967)
## =========================================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
#create a combined regression table_ public insurance
stargazer(model_m_p, model_m2_p, model_m3_p,model_m6_p,
single.row=TRUE,type="text")
##
## =========================================================================================================================================
## Dependent variable:
## -------------------------------------------------------------------------------------------------------
## perc_with_ins_public
## (1) (2) (3) (4)
## -----------------------------------------------------------------------------------------------------------------------------------------
## perc_Pop_Black 0.156*** (0.012) -0.039*** (0.009) -0.075*** (0.009) -0.072*** (0.010)
## perc_Pop_Asian 0.167*** (0.020) -0.048*** (0.014) -0.078*** (0.014) -0.068*** (0.018)
## perc_Pop_Hispanic_Latino 0.364*** (0.014) -0.031** (0.012) -0.061*** (0.012) -0.059*** (0.013)
## perc_Pop_More_Races -0.821*** (0.150) -0.450*** (0.099) -0.296*** (0.098) -0.292*** (0.098)
## perc_Pop_Other_Races 0.127 (0.101) 0.252*** (0.067) 0.086 (0.068) 0.092 (0.068)
## log(Median_Household_Income_2019) -0.246*** (0.005) -0.207*** (0.006) -0.206*** (0.006)
## perc_Bachelor_More -0.196*** (0.021) -0.199*** (0.021)
## perc_Origin_Native_Born 0.015 (0.018)
## Constant 0.279*** (0.009) 3.213*** (0.058) 2.846*** (0.069) 2.829*** (0.071)
## -----------------------------------------------------------------------------------------------------------------------------------------
## Observations 1,976 1,976 1,976 1,976
## R2 0.304 0.698 0.711 0.711
## Adjusted R2 0.303 0.697 0.710 0.710
## Residual Std. Error 0.129 (df = 1970) 0.085 (df = 1969) 0.083 (df = 1968) 0.083 (df = 1967)
## F Statistic 172.424*** (df = 5; 1970) 757.295*** (df = 6; 1969) 691.492*** (df = 7; 1968) 605.083*** (df = 8; 1967)
## =========================================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
This part of the study investigates the correlations between children’s (under 18) access to health coverage and social-economic factors in NYC Bronx, Queens, Kings, and Manhattan at the census tract level.
*Unable to access health insurance may be positively related to lower household income/poverty, minority races, school enrollment, employment status (if parents are employed), and citizenship status for the foreign origin population. The study also look into those factor’s correlations with public insurance.
*Research into high school enrollment in relationship to:
Healthcare_coverage for children under 18 (y) = constant + correlation_coefficient * % school enrollment + correlation_coefficient * %single parents + correlation_coefficient * %employed + correlation_coefficient * %foreign origin non citizens
Ho: healthcare coverage is not related to poverty status / single parenting / employment status / foreign origin non citizens
Ha: healthcare coverage is related to poverty status / single parenting / employment status / foreign origin non citizens
# children with health insurance in each county
health_ins_dat|>
ggplot(aes(x = Geo_County, y = perc_children_with_ins, fill = Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "Dark2")+
theme_minimal()+
labs(title = "Children's Health Insurance Access for Each County",
subtitle = "% of children under 18 with health insurance",
caption = "Data source: ACS 2015-2019",
y = "Percentage% children with health insurance", x = "County",
tag = "Summary",
fill ='County')
#public insurance
health_ins_dat|>
ggplot(aes(x = Geo_County, y = perc_children_with_ins_public, fill = Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "Dark2")+
theme_minimal()+
labs(title = "Children's Health Insurance Access for Each County - Public Insurance",
subtitle = "% of children under 18 with public health insurance",
caption = "Data source: ACS 2015-2019",
y = "Percentage% children with public health insurance", x = "County",
tag = "Summary",
fill ='County')
#Is children having insurance correlated with adult with insurance?
#children with insurance vs adult population with insurance
summary(lm(perc_children_with_ins~perc_adult_with_ins, data=health_ins_dat))
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_adult_with_ins, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.31433 -0.01096 0.01364 0.02112 0.08230
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.87123 0.01058 82.324 <2e-16 ***
## perc_adult_with_ins 0.11562 0.01172 9.863 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03595 on 1973 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.04698, Adjusted R-squared: 0.0465
## F-statistic: 97.27 on 1 and 1973 DF, p-value: < 2.2e-16
#public insurance - adult without insurance
summary(lm(perc_children_with_ins_public~perc_adult_without_ins, data=health_ins_dat))
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_adult_without_ins,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.05933 -0.17987 -0.00451 0.17527 0.61918
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.314300 0.009175 34.25 <2e-16 ***
## perc_adult_without_ins 1.607871 0.075560 21.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2317 on 1972 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.1867, Adjusted R-squared: 0.1863
## F-statistic: 452.8 on 1 and 1972 DF, p-value: < 2.2e-16
There is high correlation between % adult without insurance and % of children covered in public insurance
1% increase in % adult with insurance lead to 0.11% increase of % of children covered in public insurance. R-squared: 0.04698
1% increase in % adult without insurance leads to 1.6% increase of % of children covered in public insurance. R-squared: 0.1867.
Ho1: % children enrolled in school is not correlated with % of children covered in health insurance
Ha1: % children enrolled in school is correlated with % of children covered in health insurance
Ho2: % children enrolled in school is not correlated with % of children covered in public health insurance
Ha2: % children enrolled in school is correlated with % of children covered in public health insurance
health_ins_dat|>
ggplot(aes(x=Geo_County, y=perc_children_in_school, fill=Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "Dark2")+
theme_minimal()
## Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
# health insurance enrollment rate children under 18 vs school enrollment
model_child_with_ins_school_enrollment <- lm(perc_children_with_ins ~
perc_children_in_school,
data = health_ins_dat)
summary(model_child_with_ins_school_enrollment)
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_children_in_school,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35254 -0.01214 0.01407 0.02464 0.02660
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.977747 0.005533 176.705 <2e-16 ***
## perc_children_in_school -0.003281 0.007337 -0.447 0.655
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03682 on 1973 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.0001013, Adjusted R-squared: -0.0004055
## F-statistic: 0.1999 on 1 and 1973 DF, p-value: 0.6548
#public insurance insurance
model_child_with_ins_school_enrollment_public <- lm(perc_children_with_ins_public ~
perc_children_in_school,
data = health_ins_dat)
summary(model_child_with_ins_school_enrollment_public)
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_children_in_school,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.57550 -0.20285 0.01096 0.20856 0.55584
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.18399 0.03803 4.837 1.42e-06 ***
## perc_children_in_school 0.39025 0.05044 7.737 1.61e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2531 on 1972 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.02946, Adjusted R-squared: 0.02897
## F-statistic: 59.87 on 1 and 1972 DF, p-value: 1.608e-14
#insurance
health_ins_dat|>
ggplot(aes(y=perc_children_with_ins,x=perc_children_in_school, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#public insurance
health_ins_dat|>
ggplot(aes(y=perc_children_with_ins_public,x=perc_children_in_school, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=-.4)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Ho1: % adult employed is not correlated with % of children covered in health insurance
Ha1: % adult employed is correlated with % of children covered in health insurance
Ho2: % adult unemployed is not correlated with % of children covered in public health insurance
Ha2: % adult unemployed in school is correlated with % of children covered in public health insurance
health_ins_dat|>
ggplot(aes(x=Geo_County, y=perc_unemployed_16_over, fill=Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "Dark2")+
theme_minimal()
# health insurance enrollment rate children under 18 vs % of population over 16 employed
model_child_with_ins_employ <- lm(perc_children_with_ins~
perc_employed_16_over,
data = health_ins_dat)
summary(model_child_with_ins_employ)
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_employed_16_over,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35164 -0.01224 0.01373 0.02478 0.02555
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.98746 0.01809 54.579 <2e-16 ***
## perc_employed_16_over -0.01301 0.01934 -0.673 0.501
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03682 on 1973 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.0002295, Adjusted R-squared: -0.0002772
## F-statistic: 0.453 on 1 and 1973 DF, p-value: 0.501
#public insurance
model_child_with_ins_employ_public <- lm(perc_children_with_ins_public~
perc_unemployed_16_over,
data = health_ins_dat)
summary(model_child_with_ins_employ_public)
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_unemployed_16_over,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62923 -0.18439 0.00284 0.18037 0.59965
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.309568 0.009548 32.42 <2e-16 ***
## perc_unemployed_16_over 2.533655 0.122278 20.72 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2328 on 1972 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.1788, Adjusted R-squared: 0.1784
## F-statistic: 429.3 on 1 and 1972 DF, p-value: < 2.2e-16
# plot
health_ins_dat|>
ggplot(aes(y=perc_children_with_ins,x=perc_unemployed_16_over, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation()+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# plot
health_ins_dat|>
ggplot(aes(y=perc_children_with_ins_public,x=perc_unemployed_16_over, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=-.4)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Ho1: % single parents is not correlated with % of children covered in health insurance
Ha1: % single parents is correlated with % of children covered in health insurance
Ho2: % single parents is not correlated with % of children covered in public health insurance
Ha2: % single parents is correlated with % of children covered in public health insurance
# health insurance enrollment rate children under 18 vs percentage of children with single parents
model_child_with_ins_single_parent <- lm(perc_children_with_ins~
perc_children_with_single_parents,
data = health_ins_dat)
summary(model_child_with_ins_single_parent)
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_children_with_single_parents,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35229 -0.01202 0.01396 0.02463 0.02527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9755028 0.0014886 655.306 <2e-16 ***
## perc_children_with_single_parents -0.0007703 0.0047165 -0.163 0.87
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03682 on 1973 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 1.352e-05, Adjusted R-squared: -0.0004933
## F-statistic: 0.02667 on 1 and 1973 DF, p-value: 0.8703
# public - not significant
model_child_with_ins_single_parent_public <- lm(perc_children_with_ins~
perc_children_with_single_parents,
data = health_ins_dat)
summary(model_child_with_ins_single_parent_public)
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_children_with_single_parents,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35229 -0.01202 0.01396 0.02463 0.02527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9755028 0.0014886 655.306 <2e-16 ***
## perc_children_with_single_parents -0.0007703 0.0047165 -0.163 0.87
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03682 on 1973 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 1.352e-05, Adjusted R-squared: -0.0004933
## F-statistic: 0.02667 on 1 and 1973 DF, p-value: 0.8703
Ho1: % single parents is not correlated with % of children covered in health insurance
Ha1: % single parents is correlated with % of children covered in health insurance
Ho2: % single parents is not correlated with % of children covered in public health insurance
Ha2: % single parents is correlated with % of children covered in public health insurance
# health insurance enrollment rate children under 18 vs median household income
model_child_with_ins_income <- lm(perc_children_with_ins~
log(Median_Household_Income_2019),
data = health_ins_dat)
summary(model_child_with_ins_income)
##
## Call:
## lm(formula = perc_children_with_ins ~ log(Median_Household_Income_2019),
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34951 -0.01168 0.01429 0.02394 0.03007
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.931673 0.017299 53.859 <2e-16 ***
## log(Median_Household_Income_2019) 0.003894 0.001542 2.525 0.0116 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03676 on 1973 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.003221, Adjusted R-squared: 0.002716
## F-statistic: 6.375 on 1 and 1973 DF, p-value: 0.01165
#public insurance
model_child_with_ins_income_public <- lm(perc_children_with_ins_public~
log(Median_Household_Income_2019),
data = health_ins_dat)
summary(model_child_with_ins_income_public)
##
## Call:
## lm(formula = perc_children_with_ins_public ~ log(Median_Household_Income_2019),
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74547 -0.09625 0.00917 0.09724 0.64170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.899916 0.068275 71.77 <2e-16 ***
## log(Median_Household_Income_2019) -0.394963 0.006087 -64.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1451 on 1972 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.681, Adjusted R-squared: 0.6808
## F-statistic: 4210 on 1 and 1972 DF, p-value: < 2.2e-16
# plot
health_ins_dat|>
ggplot(aes(y=perc_children_with_ins,x=log(Median_Household_Income_2019), col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# plot
health_ins_dat|>
ggplot(aes(y=perc_children_with_ins_public,x=log(Median_Household_Income_2019), col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=-.4)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Ho1: Race is not correlated with % of children covered in health insurance
Ha1: Race is correlated with % of children covered in health insurance
Ho2: Race is not correlated with % of children covered in public health insurance
Ha2: Race is correlated with % of children covered in public health insurance
# health insurance enrollment rate children under 18 vs percentage population non-white
model_child_with_ins_race <- lm(perc_children_with_ins~
perc_Pop_Black +
perc_Pop_Hispanic_Latino +
perc_Pop_Asian+
perc_Pop_More_Races+
perc_Pop_Other_Races,
data = health_ins_dat)
summary(model_child_with_ins_race)
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino +
## perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32015 -0.01270 0.01359 0.02257 0.05356
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.986585 0.002508 393.379 < 2e-16 ***
## perc_Pop_Black -0.012016 0.003405 -3.529 0.000427 ***
## perc_Pop_Hispanic_Latino -0.008109 0.003977 -2.039 0.041613 *
## perc_Pop_Asian -0.046934 0.005605 -8.374 < 2e-16 ***
## perc_Pop_More_Races 0.031586 0.042066 0.751 0.452820
## perc_Pop_Other_Races 0.005837 0.028383 0.206 0.837075
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0362 on 1969 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.03555, Adjusted R-squared: 0.0331
## F-statistic: 14.52 on 5 and 1969 DF, p-value: 5.421e-14
#public insurance
model_child_with_ins_race_public <- lm(perc_children_with_ins_public~
perc_Pop_Black +
perc_Pop_Hispanic_Latino +
perc_Pop_Asian+
perc_Pop_More_Races+
perc_Pop_Other_Races,
data = health_ins_dat)
summary(model_child_with_ins_race_public)
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black +
## perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races +
## perc_Pop_Other_Races, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59144 -0.14445 -0.01207 0.11774 0.77964
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.14084 0.01353 10.409 <2e-16 ***
## perc_Pop_Black 0.32929 0.01838 17.916 <2e-16 ***
## perc_Pop_Hispanic_Latino 0.75512 0.02146 35.190 <2e-16 ***
## perc_Pop_Asian 0.36933 0.03024 12.214 <2e-16 ***
## perc_Pop_More_Races -0.44778 0.22695 -1.973 0.0486 *
## perc_Pop_Other_Races 0.21951 0.15314 1.433 0.1519
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1953 on 1968 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.4232, Adjusted R-squared: 0.4217
## F-statistic: 288.8 on 5 and 1968 DF, p-value: < 2.2e-16
##### children with insurance
# black
health_ins_dat|>
ggplot(aes(x=perc_Pop_Black,y=perc_children_with_ins, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=0.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# asian
health_ins_dat|>
ggplot(aes(x=perc_Pop_Asian,y=perc_children_with_ins, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=0.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# H L
health_ins_dat|>
ggplot(aes(x=perc_Pop_Hispanic_Latino,y=perc_children_with_ins, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=0.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# white
health_ins_dat|>
ggplot(aes(x=perc_White,y=perc_children_with_ins, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=0.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
##### children with insurance
# black
health_ins_dat|>
ggplot(aes(x=perc_Pop_Black,y=perc_children_with_ins_public, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
# asian
health_ins_dat|>
ggplot(aes(x=perc_Pop_Asian,y=perc_children_with_ins_public, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
# H L
health_ins_dat|>
ggplot(aes(x=perc_Pop_Hispanic_Latino,y=perc_children_with_ins_public, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
# white
health_ins_dat|>
ggplot(aes(x=perc_White,y=perc_children_with_ins_public, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing non-finite values
## (`stat_regline_equation()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
Ho1: % population in poverty is not correlated with % of children covered in health insurance
Ha1: % population in poverty is correlated with % of children covered in health insurance
Ho2: % population in poverty is not correlated with % of children covered in public health insurance
Ha2: % population in poverty is correlated with % of children covered in public health insurance
health_ins_dat|>
ggplot(aes(x=Geo_County, y=perc_Poverty, fill=Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "Dark2")+
theme_minimal()
## Warning: Removed 115 rows containing non-finite values (`stat_boxplot()`).
#### Linear Regression
# health insurance enrollment rate children under 18 vs poverty
model_child_with_ins_poverty <- lm(perc_children_with_ins~
perc_Poverty,
data = health_ins_dat)
summary(model_child_with_ins_poverty)
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Poverty, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35215 -0.01241 0.01395 0.02469 0.02499
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.975009 0.001457 668.995 <2e-16 ***
## perc_Poverty 0.002184 0.007311 0.299 0.765
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03662 on 1858 degrees of freedom
## (116 observations deleted due to missingness)
## Multiple R-squared: 4.805e-05, Adjusted R-squared: -0.0004901
## F-statistic: 0.08928 on 1 and 1858 DF, p-value: 0.7651
#public insurance
model_child_with_ins_poverty_public <- lm(perc_children_with_ins_public~
perc_Poverty,
data = health_ins_dat)
summary(model_child_with_ins_poverty_public)
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Poverty, data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.63382 -0.13657 0.00423 0.12767 0.60505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.226600 0.007292 31.07 <2e-16 ***
## perc_Poverty 1.522890 0.036570 41.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1831 on 1857 degrees of freedom
## (117 observations deleted due to missingness)
## Multiple R-squared: 0.4829, Adjusted R-squared: 0.4826
## F-statistic: 1734 on 1 and 1857 DF, p-value: < 2.2e-16
We fail to reject Ho1 at 95% significance level, poverty is not correlated to % of children covered in health insurance
We reject Ho2 at 95% significance level, poverty is positively correlated to % of children covered in public health insurance
# plot
health_ins_dat|>
ggplot(aes(x=perc_Poverty,y=perc_children_with_ins, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# plot
health_ins_dat|>
ggplot(aes(x=perc_Poverty,y=perc_children_with_ins_public, col=Geo_County)) +
geom_point(size = 0.8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=-.2)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Ho1: % population foreign origin non citizen is not correlated with % of children covered in health insurance
Ha1: % population foreign origin non citizen is correlated with % of children covered in health insurance
Ho2: % population foreign origin non citizen is not correlated with % of children covered in public health insurance
Ha2: % population foreign origin non citizen is correlated with % of children covered in public health insurance
health_ins_dat|>
ggplot(aes(x=Geo_County, y=perc_Origin_FO_NCitizen, fill=Geo_County))+
geom_boxplot(outlier.size=0.5,outlier.fill="grey", outlier.alpha=.2)+
scale_fill_brewer(palette = "Dark2")+
theme_minimal()
# health insurance enrollment rate children under 18 vs percentage population foreign origin non citizen
model_child_with_ins_foreign <- lm(perc_children_with_ins~
perc_Origin_FO_NCitizen,
data = health_ins_dat)
summary(model_child_with_ins_foreign)
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Origin_FO_NCitizen,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.31919 -0.01120 0.01427 0.02217 0.04821
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.985933 0.001661 593.675 < 2e-16 ***
## perc_Origin_FO_NCitizen -0.066641 0.009061 -7.355 2.79e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03633 on 1973 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.02668, Adjusted R-squared: 0.02619
## F-statistic: 54.09 on 1 and 1973 DF, p-value: 2.792e-13
#public insurance
model_child_with_ins_foreign_public <- lm(perc_children_with_ins_public~
perc_Origin_FO_NCitizen,
data = health_ins_dat)
summary(model_child_with_ins_foreign_public)
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Origin_FO_NCitizen,
## data = health_ins_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68068 -0.17455 0.00012 0.16725 0.64056
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.29330 0.01077 27.24 <2e-16 ***
## perc_Origin_FO_NCitizen 1.13834 0.05874 19.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2354 on 1972 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.16, Adjusted R-squared: 0.1596
## F-statistic: 375.6 on 1 and 1972 DF, p-value: < 2.2e-16
# plot
health_ins_dat|>
ggplot(aes(x=perc_Origin_FO_NCitizen,y=perc_children_with_ins, col=Geo_County)) +
geom_point(size = .8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=0.7)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# plot
health_ins_dat|>
ggplot(aes(x=perc_Origin_FO_NCitizen,y=perc_children_with_ins_public, col=Geo_County)) +
geom_point(size = .8, alpha=.2)+
facet_wrap(~Geo_County)+
geom_smooth(method = lm)+
scale_color_brewer(palette = "Dark2")+
stat_regline_equation(label.y=1)+
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#children with insurance
func_children_ins <- function(data){
model_children <- lm(perc_children_with_ins~
perc_Pop_Black +
perc_Pop_Hispanic_Latino +
perc_Pop_Asian+
perc_Pop_More_Races+
perc_Pop_Other_Races+
perc_Poverty+
perc_unemployed_16_over+
log(Median_Household_Income_2019)+
perc_Origin_FO_NCitizen+
perc_Origin_FO_Citizen+
perc_children_in_school,
data = data)
summary(model_children)
}
#children with public insurance
func_children_public_ins <- function(data){
model_children <- lm(perc_children_with_ins_public~
perc_Pop_Black +
perc_Pop_Hispanic_Latino +
perc_Pop_Asian+
perc_Pop_More_Races+
perc_Pop_Other_Races+
perc_Poverty+
perc_unemployed_16_over+
log(Median_Household_Income_2019)+
perc_Origin_FO_NCitizen+
perc_Origin_FO_Citizen+
perc_children_in_school,
data = data)
summary(model_children)
}
# Four counties - insurance
health_ins_dat|>func_children_ins()
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino +
## perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races +
## perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) +
## perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29720 -0.01129 0.01291 0.02184 0.05539
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9228962 0.0387034 23.845 < 2e-16 ***
## perc_Pop_Black -0.0069802 0.0041856 -1.668 0.095555 .
## perc_Pop_Hispanic_Latino 0.0081523 0.0060823 1.340 0.180304
## perc_Pop_Asian -0.0301485 0.0079114 -3.811 0.000143 ***
## perc_Pop_More_Races 0.0286227 0.0425697 0.672 0.501431
## perc_Pop_Other_Races 0.0040546 0.0299192 0.136 0.892217
## perc_Poverty 0.0189494 0.0123799 1.531 0.126026
## perc_unemployed_16_over -0.0003649 0.0241608 -0.015 0.987951
## log(Median_Household_Income_2019) 0.0057899 0.0030786 1.881 0.060172 .
## perc_Origin_FO_NCitizen -0.0546899 0.0133485 -4.097 4.36e-05 ***
## perc_Origin_FO_Citizen 0.0119858 0.0107762 1.112 0.266179
## perc_children_in_school -0.0084999 0.0078701 -1.080 0.280274
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03582 on 1848 degrees of freedom
## (116 observations deleted due to missingness)
## Multiple R-squared: 0.04815, Adjusted R-squared: 0.04249
## F-statistic: 8.499 on 11 and 1848 DF, p-value: 8.778e-15
# Four counties - public insurance
health_ins_dat|>func_children_public_ins()
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black +
## perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races +
## perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over +
## log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen +
## perc_Origin_FO_Citizen + perc_children_in_school, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.48120 -0.08679 -0.00095 0.08772 0.45877
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.26111 0.14320 22.774 < 2e-16 ***
## perc_Pop_Black 0.04862 0.01547 3.143 0.0017 **
## perc_Pop_Hispanic_Latino 0.09360 0.02246 4.167 3.23e-05 ***
## perc_Pop_Asian -0.05991 0.02923 -2.050 0.0405 *
## perc_Pop_More_Races 0.12196 0.15721 0.776 0.4380
## perc_Pop_Other_Races 0.47476 0.11049 4.297 1.82e-05 ***
## perc_Poverty 0.46096 0.04582 10.061 < 2e-16 ***
## perc_unemployed_16_over -0.07603 0.08930 -0.851 0.3946
## log(Median_Household_Income_2019) -0.26765 0.01139 -23.499 < 2e-16 ***
## perc_Origin_FO_NCitizen 0.51417 0.04931 10.428 < 2e-16 ***
## perc_Origin_FO_Citizen 0.05078 0.03990 1.273 0.2033
## perc_children_in_school 0.01613 0.02907 0.555 0.5791
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1323 on 1847 degrees of freedom
## (117 observations deleted due to missingness)
## Multiple R-squared: 0.7316, Adjusted R-squared: 0.73
## F-statistic: 457.7 on 11 and 1847 DF, p-value: < 2.2e-16
# Bronx - insurance
Bronx|>func_children_ins()
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino +
## perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races +
## perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) +
## perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.157556 -0.009653 0.008102 0.018660 0.038357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9474711 0.1051397 9.012 < 2e-16 ***
## perc_Pop_Black 0.0095971 0.0154850 0.620 0.53588
## perc_Pop_Hispanic_Latino 0.0514580 0.0185091 2.780 0.00578 **
## perc_Pop_Asian 0.0157602 0.0388106 0.406 0.68497
## perc_Pop_More_Races 0.0807487 0.1549610 0.521 0.60269
## perc_Pop_Other_Races -0.0296361 0.1216399 -0.244 0.80768
## perc_Poverty -0.0234814 0.0257986 -0.910 0.36346
## perc_unemployed_16_over -0.0359757 0.0453086 -0.794 0.42782
## log(Median_Household_Income_2019) -0.0006071 0.0086905 -0.070 0.94435
## perc_Origin_FO_NCitizen -0.0500972 0.0266910 -1.877 0.06150 .
## perc_Origin_FO_Citizen 0.0011430 0.0350271 0.033 0.97399
## perc_children_in_school 0.0276166 0.0195829 1.410 0.15951
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03047 on 299 degrees of freedom
## (16 observations deleted due to missingness)
## Multiple R-squared: 0.05984, Adjusted R-squared: 0.02525
## F-statistic: 1.73 on 11 and 299 DF, p-value: 0.06621
# Bronx - public insurance
Bronx |>func_children_public_ins()
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black +
## perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races +
## perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over +
## log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen +
## perc_Origin_FO_Citizen + perc_children_in_school, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34179 -0.05413 0.00941 0.07094 0.24437
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.67042 0.37415 4.465 1.14e-05 ***
## perc_Pop_Black 0.33387 0.05559 6.006 5.54e-09 ***
## perc_Pop_Hispanic_Latino 0.51538 0.06562 7.854 7.34e-14 ***
## perc_Pop_Asian 0.26808 0.13756 1.949 0.052250 .
## perc_Pop_More_Races 1.24195 0.54914 2.262 0.024442 *
## perc_Pop_Other_Races -0.41341 0.43106 -0.959 0.338311
## perc_Poverty 0.30829 0.09274 3.324 0.000997 ***
## perc_unemployed_16_over 0.34596 0.16184 2.138 0.033358 *
## log(Median_Household_Income_2019) -0.15136 0.03092 -4.895 1.62e-06 ***
## perc_Origin_FO_NCitizen 0.39145 0.09461 4.138 4.57e-05 ***
## perc_Origin_FO_Citizen -0.09353 0.12734 -0.734 0.463232
## perc_children_in_school 0.04455 0.06954 0.641 0.522285
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.108 on 298 degrees of freedom
## (17 observations deleted due to missingness)
## Multiple R-squared: 0.7975, Adjusted R-squared: 0.79
## F-statistic: 106.7 on 11 and 298 DF, p-value: < 2.2e-16
In Bronx County:
% of children with health insurance is positively correlated with race (Hispanic and Latino) and children in school. Negatively correlated with population in poverty.
% of children with public health insurance is positively correlated with race (Black, Hispanic and Latino, Asian, Two or More Races), unemployment, and foreign origin with no citizenship population. Negatively correlated with foreign origin with citizenship.
# Queens - insurance
Queens|>func_children_ins()
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino +
## perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races +
## perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) +
## perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.27238 -0.01417 0.01284 0.02449 0.06428
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.729186 0.109129 6.682 5.55e-11 ***
## perc_Pop_Black -0.005516 0.009521 -0.579 0.5626
## perc_Pop_Hispanic_Latino 0.019566 0.016434 1.191 0.2343
## perc_Pop_Asian -0.037846 0.017994 -2.103 0.0359 *
## perc_Pop_More_Races 0.095971 0.082162 1.168 0.2433
## perc_Pop_Other_Races -0.020209 0.044564 -0.453 0.6504
## perc_Poverty 0.016535 0.034523 0.479 0.6322
## perc_unemployed_16_over 0.090220 0.061701 1.462 0.1442
## log(Median_Household_Income_2019) 0.021896 0.009067 2.415 0.0161 *
## perc_Origin_FO_NCitizen -0.051683 0.030192 -1.712 0.0875 .
## perc_Origin_FO_Citizen 0.026543 0.030560 0.869 0.3855
## perc_children_in_school -0.009745 0.017984 -0.542 0.5881
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04202 on 579 degrees of freedom
## (46 observations deleted due to missingness)
## Multiple R-squared: 0.09921, Adjusted R-squared: 0.0821
## F-statistic: 5.797 on 11 and 579 DF, p-value: 6.137e-09
# Queens - public insurance
Queens |>func_children_public_ins()
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black +
## perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races +
## perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over +
## log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen +
## perc_Origin_FO_Citizen + perc_children_in_school, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38269 -0.09251 -0.00449 0.09296 0.42711
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.43436 0.35380 6.881 1.55e-11 ***
## perc_Pop_Black 0.16139 0.03087 5.229 2.39e-07 ***
## perc_Pop_Hispanic_Latino 0.39143 0.05328 7.347 6.94e-13 ***
## perc_Pop_Asian 0.13088 0.05834 2.244 0.025239 *
## perc_Pop_More_Races 0.10862 0.26637 0.408 0.683577
## perc_Pop_Other_Races 0.72201 0.14448 4.997 7.71e-07 ***
## perc_Poverty 0.25288 0.11193 2.259 0.024235 *
## perc_unemployed_16_over 0.69938 0.20004 3.496 0.000508 ***
## log(Median_Household_Income_2019) -0.19757 0.02940 -6.721 4.33e-11 ***
## perc_Origin_FO_NCitizen 0.41863 0.09788 4.277 2.22e-05 ***
## perc_Origin_FO_Citizen -0.21507 0.09908 -2.171 0.030357 *
## perc_children_in_school -0.05767 0.05831 -0.989 0.323013
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1362 on 579 degrees of freedom
## (46 observations deleted due to missingness)
## Multiple R-squared: 0.5743, Adjusted R-squared: 0.5662
## F-statistic: 71.02 on 11 and 579 DF, p-value: < 2.2e-16
In Queens County:
% of children with health insurance is negatively correlated with race (Asian) and foreign origin with no citizenship population
% of children with public health insurance is positively correlated with race (Black, Hispanic and Latino, Asian, Other Races*), unemployment, and foreign origin with no citizenship population. Negatively correlated with foreign origin with citizenship.
# Kings - insurance
Kings|>func_children_ins()
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino +
## perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races +
## perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) +
## perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.19678 -0.01168 0.01181 0.02183 0.03333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.961082 0.061604 15.601 <2e-16 ***
## perc_Pop_Black -0.006970 0.005415 -1.287 0.198
## perc_Pop_Hispanic_Latino -0.006706 0.010107 -0.664 0.507
## perc_Pop_Asian -0.007399 0.012985 -0.570 0.569
## perc_Pop_More_Races -0.055430 0.061215 -0.906 0.366
## perc_Pop_Other_Races 0.010134 0.104762 0.097 0.923
## perc_Poverty 0.020634 0.018636 1.107 0.269
## perc_unemployed_16_over -0.027731 0.034349 -0.807 0.420
## log(Median_Household_Income_2019) 0.002604 0.004933 0.528 0.598
## perc_Origin_FO_NCitizen -0.041715 0.024183 -1.725 0.085 .
## perc_Origin_FO_Citizen 0.001635 0.014479 0.113 0.910
## perc_children_in_school -0.006152 0.013388 -0.460 0.646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03324 on 694 degrees of freedom
## (38 observations deleted due to missingness)
## Multiple R-squared: 0.02274, Adjusted R-squared: 0.007254
## F-statistic: 1.468 on 11 and 694 DF, p-value: 0.1386
# Kings - public insurance
Kings |>func_children_public_ins()
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black +
## perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races +
## perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over +
## log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen +
## perc_Origin_FO_Citizen + perc_children_in_school, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44404 -0.07860 -0.00335 0.08334 0.34380
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.59747 0.22976 15.658 < 2e-16 ***
## perc_Pop_Black 0.01501 0.02020 0.743 0.45770
## perc_Pop_Hispanic_Latino 0.09701 0.03769 2.574 0.01027 *
## perc_Pop_Asian -0.01379 0.04843 -0.285 0.77591
## perc_Pop_More_Races -0.30710 0.22831 -1.345 0.17903
## perc_Pop_Other_Races 0.08702 0.39072 0.223 0.82382
## perc_Poverty 0.52209 0.06951 7.512 1.80e-13 ***
## perc_unemployed_16_over -0.37619 0.12811 -2.937 0.00343 **
## log(Median_Household_Income_2019) -0.28863 0.01840 -15.689 < 2e-16 ***
## perc_Origin_FO_NCitizen 0.64350 0.09019 7.135 2.45e-12 ***
## perc_Origin_FO_Citizen -0.01382 0.05400 -0.256 0.79808
## perc_children_in_school -0.05085 0.04993 -1.018 0.30885
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.124 on 694 degrees of freedom
## (38 observations deleted due to missingness)
## Multiple R-squared: 0.7357, Adjusted R-squared: 0.7315
## F-statistic: 175.6 on 11 and 694 DF, p-value: < 2.2e-16
In Kings County:
% of children with health insurance is not significantly correlated with any independent variables
% of children with public health insurance is positively correlated with race (Hispanic and Latino), poverty, unemployment, and foreign origin population with no citizenship.
# New York - insurance
NewYork|>func_children_ins()
##
## Call:
## lm(formula = perc_children_with_ins ~ perc_Pop_Black + perc_Pop_Hispanic_Latino +
## perc_Pop_Asian + perc_Pop_More_Races + perc_Pop_Other_Races +
## perc_Poverty + perc_unemployed_16_over + log(Median_Household_Income_2019) +
## perc_Origin_FO_NCitizen + perc_Origin_FO_Citizen + perc_children_in_school,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.155441 -0.008007 0.012520 0.018116 0.047810
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.087364 0.095897 11.339 <2e-16 ***
## perc_Pop_Black 0.002682 0.018869 0.142 0.8871
## perc_Pop_Hispanic_Latino -0.003370 0.026973 -0.125 0.9007
## perc_Pop_Asian -0.038511 0.033624 -1.145 0.2532
## perc_Pop_More_Races 0.117972 0.131918 0.894 0.3721
## perc_Pop_Other_Races 0.125349 0.235599 0.532 0.5952
## perc_Poverty -0.028161 0.036967 -0.762 0.4469
## perc_unemployed_16_over -0.072222 0.068924 -1.048 0.2958
## log(Median_Household_Income_2019) -0.006632 0.007466 -0.888 0.3753
## perc_Origin_FO_NCitizen -0.006105 0.040653 -0.150 0.8808
## perc_Origin_FO_Citizen -0.005474 0.048265 -0.113 0.9098
## perc_children_in_school -0.023417 0.013027 -1.798 0.0735 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03138 on 240 degrees of freedom
## (16 observations deleted due to missingness)
## Multiple R-squared: 0.04463, Adjusted R-squared: 0.0008428
## F-statistic: 1.019 on 11 and 240 DF, p-value: 0.4298
# New York - public insurance
NewYork|>func_children_public_ins()
##
## Call:
## lm(formula = perc_children_with_ins_public ~ perc_Pop_Black +
## perc_Pop_Hispanic_Latino + perc_Pop_Asian + perc_Pop_More_Races +
## perc_Pop_Other_Races + perc_Poverty + perc_unemployed_16_over +
## log(Median_Household_Income_2019) + perc_Origin_FO_NCitizen +
## perc_Origin_FO_Citizen + perc_children_in_school, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38528 -0.06018 -0.00953 0.06292 0.51727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.05137 0.34790 5.896 1.25e-08 ***
## perc_Pop_Black 0.30594 0.06845 4.469 1.21e-05 ***
## perc_Pop_Hispanic_Latino 0.47821 0.09786 4.887 1.87e-06 ***
## perc_Pop_Asian 0.25045 0.12198 2.053 0.0411 *
## perc_Pop_More_Races 0.20020 0.47858 0.418 0.6761
## perc_Pop_Other_Races -0.03926 0.85472 -0.046 0.9634
## perc_Poverty 0.25936 0.13411 1.934 0.0543 .
## perc_unemployed_16_over -0.15790 0.25005 -0.631 0.5283
## log(Median_Household_Income_2019) -0.17451 0.02709 -6.443 6.35e-10 ***
## perc_Origin_FO_NCitizen -0.11829 0.14748 -0.802 0.4233
## perc_Origin_FO_Citizen 0.15602 0.17510 0.891 0.3738
## perc_children_in_school 0.08653 0.04726 1.831 0.0684 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1138 on 240 degrees of freedom
## (16 observations deleted due to missingness)
## Multiple R-squared: 0.8595, Adjusted R-squared: 0.853
## F-statistic: 133.4 on 11 and 240 DF, p-value: < 2.2e-16
In New York County:
% of children with health insurance is not significantly correlated with any independent variables
% of children with public health insurance is correlated with race (Black, Asian, Latino) and poverty
Almost all children have healthcare coverage thanks to the Medicaid, which covers about 50% of all births in the New York state, and Child Health Plus (CHIP) programs, which covers for about 35% of New York’s children under 19, that supports families regardless income and immigration status; their parents is much less certain.
When analyzing linear relationship with independent variables separately:
When analyzing using multivariate linear regression: