The assignment involves testing the relationship between Race and Health Insurance in the United States.

First, let us load all the packages we may need

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ stringr 1.4.0
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ readr   1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

We then load the data

##   Sex Age Married Income HoursWk  Race USCitizen HealthInsurance Language
## 1   0  38       1     64      40 white         1               1        1
## 2   1  18       0      0       0 black         1               1        1
## 3   0  21       0      4      20 white         1               1        1
## 4   1  17       0      0       0 other         1               1        0
## 5   0  55       1     34      40 other         0               0        0
## 6   1  51       0     30      40 black         1               1        1
##      Sex Age Married Income HoursWk  Race USCitizen HealthInsurance Language
## 1995   1  28       0     70      40 white         1               1        1
## 1996   1  20       0     30      43 white         1               1        1
## 1997   0  28       0    100      45 white         1               1        1
## 1998   1  56       1    125      40 asian         0               1        0
## 1999   1  49       1     50      40 black         1               1        0
## 2000   0  64       1      0      40 black         1               1        0
## 'data.frame':    2000 obs. of  9 variables:
##  $ Sex            : int  0 1 0 1 0 1 1 0 1 0 ...
##  $ Age            : int  38 18 21 17 55 51 28 46 17 80 ...
##  $ Married        : int  1 0 0 0 1 0 0 0 0 0 ...
##  $ Income         : num  64 0 4 0 34 30 13.7 114 0 0 ...
##  $ HoursWk        : int  40 0 20 0 40 40 40 60 0 0 ...
##  $ Race           : chr  "white" "black" "white" "other" ...
##  $ USCitizen      : int  1 1 1 1 0 1 1 1 1 1 ...
##  $ HealthInsurance: int  1 1 1 1 0 1 0 1 1 1 ...
##  $ Language       : int  1 1 1 0 0 1 0 0 1 0 ...
##                 Length Class  Mode     
## Sex             2000   -none- numeric  
## Age             2000   -none- numeric  
## Married         2000   -none- numeric  
## Income          2000   -none- numeric  
## HoursWk         2000   -none- numeric  
## Race            2000   -none- character
## USCitizen       2000   -none- numeric  
## HealthInsurance 2000   -none- numeric  
## Language        2000   -none- numeric
## [1] "Sex"             "Age"             "Married"         "Income"         
## [5] "HoursWk"         "Race"            "USCitizen"       "HealthInsurance"
## [9] "Language"

##    
##     asian black other white
##   0    11    21    30   113
##   1   118   178   122  1407
## `summarise()` has grouped output by 'Race'. You can override using the `.groups` argument.
Race HealthInsurance Count Proportions
asian 0 11 8.53
asian 1 118 91.47
black 0 21 10.55
black 1 178 89.45
other 0 30 19.74
other 1 122 80.26
white 0 113 7.43
white 1 1407 92.57
## 
##  Pearson's Chi-squared test
## 
## data:  ins_data
## X-squared = 27.094, df = 3, p-value = 5.627e-06

Since the p-value is less than 0.05, we reject the null hypothesis that the race of the citizens is not associated with the health insurance status.

## [1] 2000   10
## Warning: glm.fit: algorithm did not converge
## 
## Call:
## glm(formula = HealthInsurance ~ ., family = binomial(), data = data)
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -2.409e-06   2.409e-06   2.409e-06   2.409e-06   2.409e-06  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)
## (Intercept)          -2.657e+01  4.955e+04  -0.001    1.000
## Sex                  -1.290e-09  1.636e+04   0.000    1.000
## Age                   8.309e-11  4.594e+02   0.000    1.000
## Married              -3.609e-10  1.708e+04   0.000    1.000
## Income               -6.482e-12  1.954e+02   0.000    1.000
## HoursWk              -2.820e-11  4.783e+02   0.000    1.000
## Raceblack            -1.164e-09  4.390e+04   0.000    1.000
## Raceother            -1.179e-09  4.357e+04   0.000    1.000
## Racewhite            -4.487e-10  3.637e+04   0.000    1.000
## USCitizen             5.698e-11  3.589e+04   0.000    1.000
## Language             -1.009e-09  2.450e+04   0.000    1.000
## HealthInsurance_cat1  5.313e+01  2.959e+04   0.002    0.999
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.1869e+03  on 1999  degrees of freedom
## Residual deviance: 1.1603e-08  on 1988  degrees of freedom
## AIC: 24
## 
## Number of Fisher Scoring iterations: 25