The assignment involves testing the relationship between Race and Health Insurance in the United States.
First, let us load all the packages we may need
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.0 ✓ stringr 1.4.0
## ✓ tidyr 1.1.3 ✓ forcats 0.5.1
## ✓ readr 1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
We then load the data
## Sex Age Married Income HoursWk Race USCitizen HealthInsurance Language
## 1 0 38 1 64 40 white 1 1 1
## 2 1 18 0 0 0 black 1 1 1
## 3 0 21 0 4 20 white 1 1 1
## 4 1 17 0 0 0 other 1 1 0
## 5 0 55 1 34 40 other 0 0 0
## 6 1 51 0 30 40 black 1 1 1
## Sex Age Married Income HoursWk Race USCitizen HealthInsurance Language
## 1995 1 28 0 70 40 white 1 1 1
## 1996 1 20 0 30 43 white 1 1 1
## 1997 0 28 0 100 45 white 1 1 1
## 1998 1 56 1 125 40 asian 0 1 0
## 1999 1 49 1 50 40 black 1 1 0
## 2000 0 64 1 0 40 black 1 1 0
## 'data.frame': 2000 obs. of 9 variables:
## $ Sex : int 0 1 0 1 0 1 1 0 1 0 ...
## $ Age : int 38 18 21 17 55 51 28 46 17 80 ...
## $ Married : int 1 0 0 0 1 0 0 0 0 0 ...
## $ Income : num 64 0 4 0 34 30 13.7 114 0 0 ...
## $ HoursWk : int 40 0 20 0 40 40 40 60 0 0 ...
## $ Race : chr "white" "black" "white" "other" ...
## $ USCitizen : int 1 1 1 1 0 1 1 1 1 1 ...
## $ HealthInsurance: int 1 1 1 1 0 1 0 1 1 1 ...
## $ Language : int 1 1 1 0 0 1 0 0 1 0 ...
## Length Class Mode
## Sex 2000 -none- numeric
## Age 2000 -none- numeric
## Married 2000 -none- numeric
## Income 2000 -none- numeric
## HoursWk 2000 -none- numeric
## Race 2000 -none- character
## USCitizen 2000 -none- numeric
## HealthInsurance 2000 -none- numeric
## Language 2000 -none- numeric
## [1] "Sex" "Age" "Married" "Income"
## [5] "HoursWk" "Race" "USCitizen" "HealthInsurance"
## [9] "Language"
##
## asian black other white
## 0 11 21 30 113
## 1 118 178 122 1407
## `summarise()` has grouped output by 'Race'. You can override using the `.groups` argument.
| Race | HealthInsurance | Count | Proportions |
|---|---|---|---|
| asian | 0 | 11 | 8.53 |
| asian | 1 | 118 | 91.47 |
| black | 0 | 21 | 10.55 |
| black | 1 | 178 | 89.45 |
| other | 0 | 30 | 19.74 |
| other | 1 | 122 | 80.26 |
| white | 0 | 113 | 7.43 |
| white | 1 | 1407 | 92.57 |
##
## Pearson's Chi-squared test
##
## data: ins_data
## X-squared = 27.094, df = 3, p-value = 5.627e-06
Since the p-value is less than 0.05, we reject the null hypothesis that the race of the citizens is not associated with the health insurance status.
## [1] 2000 10
## Warning: glm.fit: algorithm did not converge
##
## Call:
## glm(formula = HealthInsurance ~ ., family = binomial(), data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.409e-06 2.409e-06 2.409e-06 2.409e-06 2.409e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.657e+01 4.955e+04 -0.001 1.000
## Sex -1.290e-09 1.636e+04 0.000 1.000
## Age 8.309e-11 4.594e+02 0.000 1.000
## Married -3.609e-10 1.708e+04 0.000 1.000
## Income -6.482e-12 1.954e+02 0.000 1.000
## HoursWk -2.820e-11 4.783e+02 0.000 1.000
## Raceblack -1.164e-09 4.390e+04 0.000 1.000
## Raceother -1.179e-09 4.357e+04 0.000 1.000
## Racewhite -4.487e-10 3.637e+04 0.000 1.000
## USCitizen 5.698e-11 3.589e+04 0.000 1.000
## Language -1.009e-09 2.450e+04 0.000 1.000
## HealthInsurance_cat1 5.313e+01 2.959e+04 0.002 0.999
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.1869e+03 on 1999 degrees of freedom
## Residual deviance: 1.1603e-08 on 1988 degrees of freedom
## AIC: 24
##
## Number of Fisher Scoring iterations: 25