#Testing the relationship between Race and Health Insurance in the United States.

The assignment involves testing the relationship between Race and Health Insurance in the United States.

First, let us load all the packages we may need

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ stringr 1.4.0
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ readr   1.4.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

We then load the data

##   Sex Age Married Income HoursWk  Race USCitizen HealthInsurance Language
## 1   0  38       1     64      40 white         1               1        1
## 2   1  18       0      0       0 black         1               1        1
## 3   0  21       0      4      20 white         1               1        1
## 4   1  17       0      0       0 other         1               1        0
## 5   0  55       1     34      40 other         0               0        0
## 6   1  51       0     30      40 black         1               1        1

##      Sex Age Married Income HoursWk  Race USCitizen HealthInsurance Language
## 1995   1  28       0     70      40 white         1               1        1
## 1996   1  20       0     30      43 white         1               1        1
## 1997   0  28       0    100      45 white         1               1        1
## 1998   1  56       1    125      40 asian         0               1        0
## 1999   1  49       1     50      40 black         1               1        0
## 2000   0  64       1      0      40 black         1               1        0

## 'data.frame':    2000 obs. of  9 variables:
##  $ Sex            : int  0 1 0 1 0 1 1 0 1 0 ...
##  $ Age            : int  38 18 21 17 55 51 28 46 17 80 ...
##  $ Married        : int  1 0 0 0 1 0 0 0 0 0 ...
##  $ Income         : num  64 0 4 0 34 30 13.7 114 0 0 ...
##  $ HoursWk        : int  40 0 20 0 40 40 40 60 0 0 ...
##  $ Race           : chr  "white" "black" "white" "other" ...
##  $ USCitizen      : int  1 1 1 1 0 1 1 1 1 1 ...
##  $ HealthInsurance: int  1 1 1 1 0 1 0 1 1 1 ...
##  $ Language       : int  1 1 1 0 0 1 0 0 1 0 ...

##                 Length Class  Mode     
## Sex             2000   -none- numeric  
## Age             2000   -none- numeric  
## Married         2000   -none- numeric  
## Income          2000   -none- numeric  
## HoursWk         2000   -none- numeric  
## Race            2000   -none- character
## USCitizen       2000   -none- numeric  
## HealthInsurance 2000   -none- numeric  
## Language        2000   -none- numeric

## [1] "Sex"             "Age"             "Married"         "Income"         
## [5] "HoursWk"         "Race"            "USCitizen"       "HealthInsurance"
## [9] "Language"

##    
##     asian black other white
##   0    11    21    30   113
##   1   118   178   122  1407

## `summarise()` has grouped output by 'Race'. You can override using the `.groups` argument.

Race	HealthInsurance	Count	Proportions
asian	0	11	8.53
asian	1	118	91.47
black	0	21	10.55
black	1	178	89.45
other	0	30	19.74
other	1	122	80.26
white	0	113	7.43
white	1	1407	92.57

## 
##  Pearson's Chi-squared test
## 
## data:  ins_data
## X-squared = 27.094, df = 3, p-value = 5.627e-06

Since the p-value is less than 0.05, we reject the null hypothesis that the race of the citizens is not associated with the health insurance status.

## [1] 2000   10

## Warning: glm.fit: algorithm did not converge

## 
## Call:
## glm(formula = HealthInsurance ~ ., family = binomial(), data = data)
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -2.409e-06   2.409e-06   2.409e-06   2.409e-06   2.409e-06  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)
## (Intercept)          -2.657e+01  4.955e+04  -0.001    1.000
## Sex                  -1.290e-09  1.636e+04   0.000    1.000
## Age                   8.309e-11  4.594e+02   0.000    1.000
## Married              -3.609e-10  1.708e+04   0.000    1.000
## Income               -6.482e-12  1.954e+02   0.000    1.000
## HoursWk              -2.820e-11  4.783e+02   0.000    1.000
## Raceblack            -1.164e-09  4.390e+04   0.000    1.000
## Raceother            -1.179e-09  4.357e+04   0.000    1.000
## Racewhite            -4.487e-10  3.637e+04   0.000    1.000
## USCitizen             5.698e-11  3.589e+04   0.000    1.000
## Language             -1.009e-09  2.450e+04   0.000    1.000
## HealthInsurance_cat1  5.313e+01  2.959e+04   0.002    0.999
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.1869e+03  on 1999  degrees of freedom
## Residual deviance: 1.1603e-08  on 1988  degrees of freedom
## AIC: 24
## 
## Number of Fisher Scoring iterations: 25