Health Disparities

* Before delving infant mortality, I need to first explain health disparities. Health disparities are preventable differences in the burden of disease, most prevalent among ethicities. These disparities, and subsequently infant mortality, are rooted in years of racism and inequalities.

* These inequalities breed inequity among races. For instance, in neighborhoods that are predominantly Black, you see:

  • higher rates of unemployment,
  • higher incarceration rates,
  • inadequate healthcare
  • lack of maternal resources and support
  • lack of educational services

* All of these factors place minorities, especially African Americans at a disadvantage and increase their risk for many nonfavorable events such as preterm labor and infant mortality.

Infant Mortality

* As health disparities is a very broad topic, I chose to study infant mortality.

* Infant mortality is defined as the number of babies less than one year of age that have died.

* To compare this number across different groups, we must first convert infant mortality to a rate:

\(IMR = ( #D / #B ) * 1000\)

The Data

* The dataset that I used is from the NYC government.

* It lists different mortality rates from multiple races/ethnicities and the accompanying number of births and deaths from 2007-2016

head(infantmort)
## # A tibble: 6 x 9
##    year maternalraceore… infantmortality… neonatalmortali… postneonatalmor…
##   <dbl> <chr>                       <dbl>            <dbl>            <dbl>
## 1  2016 Asian/Pacific I…              2.9              2                0.9
## 2  2015 Asian/Pacific I…              2.6              1.6              1  
## 3  2014 Asian/Pacific I…              2.6              1.8              0.8
## 4  2013 Asian/Pacific I…              3.1              2.5              0.6
## 5  2012 Asian/Pacific I…              3.3              2.1              1.2
## 6  2011 Asian/Pacific I…              2.9              1.8              1.2
## # … with 4 more variables: infantdeaths <dbl>, neonatalinfantdeaths <dbl>,
## #   postneonatalinfantdeaths <dbl>, numberoflivebirths <dbl>

The first thing I wanted to do was see the variation between the different groups.

p1 <- infantmort %>%
  ggplot(aes(x=maternalraceorethnicity,y=infantmortalityrate,fill=maternalraceorethnicity))+geom_boxplot()+ggtitle("Infant Mortality Rate by Ethnicity") + xlab("Maternal Race or Ethnicity")+ylab("Infant Mortality Rate")+theme(axis.text.x=element_blank(),axis.ticks.x=element_blank())
p1

* The most obvious observation I made was the large difference in infant mortality rates between African Americans and Whites.

* Another observation I made was between Asians and Whites. I was not expecting their IMRs to be so comparable.

* To see if this variation was of any significance, I ran a Chi-Squared Test comparing the number of infant deaths and births from 2016.

Asian <- c(62,21566)
White <- c(105,40633)
imr <- data.frame(Asian,White)
rownames(imr) <- c('Number of Infant Deaths', 'Number of Live Births')
imr
##                         Asian White
## Number of Infant Deaths    62   105
## Number of Live Births   21566 40633
chisq.test(imr)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  imr
## X-squared = 0.3408, df = 1, p-value = 0.5594

* With a p- value of 0.559, I concluded that these values were not significant.

* Next, I wanted to evaluate the differences between African Americans and Whites.

* Before I started any analyses, I first had to check for normality.

imr_b <- infantmort[c(11:20),c(1,3)]
imr_w <- infantmort[c(41:50),c(1,3)]
par(mfrow=c(1,2))
hist(imr_b$infantmortalityrate, cex.main=1)
hist(imr_w$infantmortalityrate, cex.main=1)

* As the sample sizes are small (n<10), the distributions are not normal.

* As a result, I performed a Wilcox test to at least be able to compare the distributions.

bimr <- c(9.8,10.2,9.5,8.6,8.1,8.5,8.3,7.5,8,8)
wimr <- c(3.9,3.3,3.4,2.8,3.1,2.7,3,2.6,2,7,2.6)
wilcox.test(bimr,wimr,conf.int = TRUE,conf.level = .95)
## Warning in wilcox.test.default(bimr, wimr, conf.int = TRUE, conf.level =
## 0.95): cannot compute exact p-value with ties
## Warning in wilcox.test.default(bimr, wimr, conf.int = TRUE, conf.level =
## 0.95): cannot compute exact confidence intervals with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  bimr and wimr
## W = 110, p-value = 0.0001229
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  4.700032 6.399998
## sample estimates:
## difference in location 
##               5.400028

* This tells me that there is a right sided shift in the distribution with African Americans having higher infant mortality rates, with 95% of the data existing between 4.70 and 6.39.

* After this I wanted to see how the two groups compared on a scatterplot.

plot(imr_b, ylim= c(0,11), xlim=c(2007,2016),col='red',pch=8,main="Infant Mortality Rates in NYC from 2007-2016")
par(new=TRUE)
plot(imr_w, ylim=c(0,11), xlim=c(2007,2016), col='blue', pch=8,main="Infant Mortality Rates in NYC from 2007-2016")
legend(x=2014,y=11, legend=c('Black Non-H', 'White Non-H'), col=c('red', 'blue'), pch=c(8,8))
abline(h=4.1, lty=2,col='purple')
text(x=2011,y=4.5, labels=c('Average NYC Infant Mortality Rate (4.1)'),col='purple')
abline(h=5.7, lty=2)
text(x=2011,y=6.1, labels=c('Average US Infant Mortality Rate (5.7)'))

* IMRs for African Americans are higher than Whites, the NYC average infant mortality rate, and the United States average infant mortality rate.

* Since my dataset had a lot of variables, I wanted to know if certain variables such as year, maternal race, and number of births were strong predictors of infant mortality rate.

imr_fit <- lm(infantmort$infantmortalityrate~as.factor(infantmort$year)+as.factor(infantmort$maternalraceorethnicity)+infantmort$numberoflivebirths)
summary(imr_fit)
## 
## Call:
## lm(formula = infantmort$infantmortalityrate ~ as.factor(infantmort$year) + 
##     as.factor(infantmort$maternalraceorethnicity) + infantmort$numberoflivebirths)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.67133 -0.25852 -0.08323  0.33196  1.73812 
## 
## Coefficients:
##                                                               Estimate
## (Intercept)                                                  6.707e-01
## as.factor(infantmort$year)2008                               2.390e-01
## as.factor(infantmort$year)2009                               6.157e-03
## as.factor(infantmort$year)2010                              -1.995e-01
## as.factor(infantmort$year)2011                              -2.495e-01
## as.factor(infantmort$year)2012                              -1.157e-01
## as.factor(infantmort$year)2013                              -5.260e-01
## as.factor(infantmort$year)2014                              -3.854e-01
## as.factor(infantmort$year)2015                              -5.067e-01
## as.factor(infantmort$year)2016                              -1.078e+00
## as.factor(infantmort$maternalraceorethnicity)Black Non-H     4.882e+00
## as.factor(infantmort$maternalraceorethnicity)Other Hispanic  1.842e-01
## as.factor(infantmort$maternalraceorethnicity)Puerto Rican    4.531e+00
## as.factor(infantmort$maternalraceorethnicity)White Non-H    -2.581e+00
## infantmort$numberoflivebirths                                1.324e-04
##                                                             Std. Error
## (Intercept)                                                  1.435e+00
## as.factor(infantmort$year)2008                               3.859e-01
## as.factor(infantmort$year)2009                               3.880e-01
## as.factor(infantmort$year)2010                               3.934e-01
## as.factor(infantmort$year)2011                               3.946e-01
## as.factor(infantmort$year)2012                               3.938e-01
## as.factor(infantmort$year)2013                               4.036e-01
## as.factor(infantmort$year)2014                               3.977e-01
## as.factor(infantmort$year)2015                               4.004e-01
## as.factor(infantmort$year)2016                               4.048e-01
## as.factor(infantmort$maternalraceorethnicity)Black Non-H     4.764e-01
## as.factor(infantmort$maternalraceorethnicity)Other Hispanic  6.653e-01
## as.factor(infantmort$maternalraceorethnicity)Puerto Rican    7.704e-01
## as.factor(infantmort$maternalraceorethnicity)White Non-H     1.339e+00
## infantmort$numberoflivebirths                                6.674e-05
##                                                             t value
## (Intercept)                                                   0.467
## as.factor(infantmort$year)2008                                0.619
## as.factor(infantmort$year)2009                                0.016
## as.factor(infantmort$year)2010                               -0.507
## as.factor(infantmort$year)2011                               -0.632
## as.factor(infantmort$year)2012                               -0.294
## as.factor(infantmort$year)2013                               -1.303
## as.factor(infantmort$year)2014                               -0.969
## as.factor(infantmort$year)2015                               -1.266
## as.factor(infantmort$year)2016                               -2.664
## as.factor(infantmort$maternalraceorethnicity)Black Non-H     10.248
## as.factor(infantmort$maternalraceorethnicity)Other Hispanic   0.277
## as.factor(infantmort$maternalraceorethnicity)Puerto Rican     5.881
## as.factor(infantmort$maternalraceorethnicity)White Non-H     -1.928
## infantmort$numberoflivebirths                                 1.984
##                                                             Pr(>|t|)    
## (Intercept)                                                   0.6431    
## as.factor(infantmort$year)2008                                0.5398    
## as.factor(infantmort$year)2009                                0.9874    
## as.factor(infantmort$year)2010                                0.6152    
## as.factor(infantmort$year)2011                                0.5313    
## as.factor(infantmort$year)2012                                0.7707    
## as.factor(infantmort$year)2013                                0.2010    
## as.factor(infantmort$year)2014                                0.3391    
## as.factor(infantmort$year)2015                                0.2140    
## as.factor(infantmort$year)2016                                0.0116 *  
## as.factor(infantmort$maternalraceorethnicity)Black Non-H    4.44e-12 ***
## as.factor(infantmort$maternalraceorethnicity)Other Hispanic   0.7835    
## as.factor(infantmort$maternalraceorethnicity)Puerto Rican   1.11e-06 ***
## as.factor(infantmort$maternalraceorethnicity)White Non-H      0.0620 .  
## infantmort$numberoflivebirths                                 0.0552 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6051 on 35 degrees of freedom
## Multiple R-squared:  0.9491, Adjusted R-squared:  0.9287 
## F-statistic: 46.59 on 14 and 35 DF,  p-value: < 2.2e-16
plot(imr_fit)

* With a p-value of < 2.2e-16 and an adjusted R-squared of 0.9287, it seems like my model was a good fit, however, I had a lot of other predictors available that had an effect on infant mortality rate.

* With this in mind I retried my model by only focusing on the variables that were significant from the previous model.

imr_2016 <- infantmort[c(1,11,21,31,41),c(2:3,9)]
imr_2016
## # A tibble: 5 x 3
##   maternalraceorethnicity infantmortalityrate numberoflivebirths
##   <chr>                                 <dbl>              <dbl>
## 1 Asian/Pacific Islander                  2.9              21566
## 2 Black Non-H                             8                22465
## 3 Other Hispanic                          3.8              26915
## 4 Puerto Rican                            3.4               7159
## 5 White Non-H                             2.6              40633
imr_fit_2016 <- lm(imr_2016$infantmortalityrate~imr_2016$numberoflivebirths)
summary(imr_fit_2016)
## 
## Call:
## lm(formula = imr_2016$infantmortalityrate ~ imr_2016$numberoflivebirths)
## 
## Residuals:
##       1       2       3       4       5 
## -1.3045  3.8221 -0.2464 -1.2304 -1.0408 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)
## (Intercept)                  4.842e+00  2.729e+00   1.774    0.174
## imr_2016$numberoflivebirths -2.956e-05  1.047e-04  -0.282    0.796
## 
## Residual standard error: 2.514 on 3 degrees of freedom
## Multiple R-squared:  0.02589,    Adjusted R-squared:  -0.2988 
## F-statistic: 0.07973 on 1 and 3 DF,  p-value: 0.796

* This new model showed that none of these variables were significant. So now it is unclear whether or not these variables were truly significant or not.

To Conclude

  • As a society we have come a long way in reducing infant mortality, however more work needs to be done as these disparities still exist.
  • My analysis showed that there is an alarming difference in IMRs between AAs and Whites, but it is still unclear which of those variables are truly important.
  • More resources need to be poured into these communities that are mainly Black.