Analysis of Infant Mortality Rates Using Gamma Regression

Introduction

Infant mortality is a crucial indicator of a population’s overall health and development. Understanding the factors that contribute to infant mortality rates (IMR) can inform policy decisions aimed at improving maternal and child healthcare. This study investigates how maternal race/ethnicity and year influence infant mortality rates using a Gamma regression model. The research questions guiding this analysis are:

  1. How has infant mortality evolved over time?

  2. Does maternal race/ethnicity significantly impact infant mortality rates?

These questions are important as they provide insights into disparities in healthcare access and the effectiveness of policies aimed at reducing IMR over time.

Data Overview

The dataset consists of 87 observations and 9 variables, including:

Year: The year of data collection. Maternal Race or Ethnicity: The ethnic background of the mother. Infant Mortality Rate: The number of infant deaths per 1,000 live births. Neonatal Mortality Rate: The number of neonatal deaths (deaths in the first 28 days of life) per 1,000 live births. Postneonatal Mortality Rate: The number of postneonatal deaths (deaths after the first 28 days but before the first birthday) per 1,000 live births. Infant Deaths: The total number of infant deaths. Neonatal and Postneonatal Deaths: The number of neonatal and postneonatal deaths. Number of Live Births: The total number of live births during the year.

I found this data on NYC open data website. The data was collected by government agencies and public health organizations, including the Centers for Disease Control and Prevention (CDC) in the United States. This data is often compiled through vital statistics registries, which track births and deaths across the country. The dataset covers annual reports of infant mortality rates from different years.

The unit of measurement for the infant mortality rate is typically expressed as the number of deaths per 1,000 live births. This is the most common unit used for infant mortality because it allows for comparison across populations of different sizes.

Methodology

To explore the relationship between the infant mortality rate and the predictors, a Gamma regression model was applied. The Gamma distribution is suitable for modeling positive continuous data like infant mortality rates, which are typically skewed and non-negative. The log-link function was chosen for the Gamma regression, which provides a linear relationship between the predictors and the response variable on the log scale.

install.packages(“betareg”) install.packages(“modelsummary”)

library(betareg)
library(modelsummary)
library(tidyverse)
library(clarify)
# clean up
rm(list=ls())
# Load the data
data <- read.csv("C:/Users/susha/OneDrive/Documents/Infant_Mortality_20250331 (1).csv")
# Check the data
head(data)
##   Year Materal.Race.or.Ethnicity Infant.Mortality.Rate Neonatal.Mortality.Rate
## 1 2007        Black Non-Hispanic                   9.8                     6.0
## 2 2013            Other Hispanic                   4.3                     2.6
## 3 2013        Black Non-Hispanic                   8.3                     5.5
## 4 2008        White Non-Hispanic                   3.3                     2.1
## 5 2009        Black Non-Hispanic                   9.5                     5.8
## 6 2010        Black Non-Hispanic                   8.6                     5.6
##   Postneonatal.Mortality.Rate Infant.Deaths Neonatal.Infant.Deaths
## 1                         3.8           287                    177
## 2                         1.7           120                     72
## 3                         2.9           201                    132
## 4                         1.1           125                     82
## 5                         3.7           259                    158
## 6                         3.1           230                    148
##   Postneonatal.Infant.Deaths Number.of.Live.Births
## 1                        110                 29268
## 2                         48                 27621
## 3                         69                 24108
## 4                         43                 38383
## 5                        101                 27405
## 6                         82                 26635
# Check the structure of the data
str(data)
## 'data.frame':    87 obs. of  9 variables:
##  $ Year                       : int  2007 2013 2013 2008 2009 2010 2010 2011 2008 2007 ...
##  $ Materal.Race.or.Ethnicity  : chr  "Black Non-Hispanic" "Other Hispanic" "Black Non-Hispanic" "White Non-Hispanic" ...
##  $ Infant.Mortality.Rate      : num  9.8 4.3 8.3 3.3 9.5 8.6 2.8 8.1 NA NA ...
##  $ Neonatal.Mortality.Rate    : num  6 2.6 5.5 2.1 5.8 5.6 2 5.3 NA NA ...
##  $ Postneonatal.Mortality.Rate: num  3.8 1.7 2.9 1.1 3.7 3.1 0.8 2.9 NA NA ...
##  $ Infant.Deaths              : int  287 120 201 125 259 230 104 210 NA NA ...
##  $ Neonatal.Infant.Deaths     : int  177 72 132 82 158 148 75 136 NA NA ...
##  $ Postneonatal.Infant.Deaths : int  110 48 69 43 101 82 29 74 NA NA ...
##  $ Number.of.Live.Births      : int  29268 27621 24108 38383 27405 26635 37780 25825 2548 230 ...
# Gamma regression model for Infant.Mortality.Rate
gamma_model <- glm(Infant.Mortality.Rate ~ Year + Materal.Race.or.Ethnicity, family = Gamma(link = "log"), data = data)
# Summarize the model
summary(gamma_model)
## 
## Call:
## glm(formula = Infant.Mortality.Rate ~ Year + Materal.Race.or.Ethnicity, 
##     family = Gamma(link = "log"), data = data)
## 
## Coefficients:
##                                                      Estimate Std. Error
## (Intercept)                                         36.488023   7.698194
## Year                                                -0.017554   0.003811
## Materal.Race.or.EthnicityAsian and Pacific Islander -0.063485   0.119877
## Materal.Race.or.EthnicityBlack NH                    0.901902   0.159537
## Materal.Race.or.EthnicityBlack Non-Hispanic          0.976977   0.125009
## Materal.Race.or.EthnicityNon-Hispanic Black          1.049784   0.123915
## Materal.Race.or.EthnicityNon-Hispanic White         -0.191476   0.123915
## Materal.Race.or.EthnicityOther Hispanic              0.316065   0.119201
## Materal.Race.or.EthnicityOther/Two or More           0.068993   0.159537
## Materal.Race.or.EthnicityPuerto Rican                0.630951   0.119201
## Materal.Race.or.EthnicityTotal                       0.374229   0.159583
## Materal.Race.or.EthnicityWhite NH                   -0.241162   0.159537
## Materal.Race.or.EthnicityWhite Non-Hispanic         -0.068540   0.125009
##                                                     t value Pr(>|t|)    
## (Intercept)                                           4.740 1.39e-05 ***
## Year                                                 -4.607 2.23e-05 ***
## Materal.Race.or.EthnicityAsian and Pacific Islander  -0.530   0.5984    
## Materal.Race.or.EthnicityBlack NH                     5.653 4.83e-07 ***
## Materal.Race.or.EthnicityBlack Non-Hispanic           7.815 1.12e-10 ***
## Materal.Race.or.EthnicityNon-Hispanic Black           8.472 8.76e-12 ***
## Materal.Race.or.EthnicityNon-Hispanic White          -1.545   0.1276    
## Materal.Race.or.EthnicityOther Hispanic               2.652   0.0103 *  
## Materal.Race.or.EthnicityOther/Two or More            0.432   0.6670    
## Materal.Race.or.EthnicityPuerto Rican                 5.293 1.86e-06 ***
## Materal.Race.or.EthnicityTotal                        2.345   0.0224 *  
## Materal.Race.or.EthnicityWhite NH                    -1.512   0.1360    
## Materal.Race.or.EthnicityWhite Non-Hispanic          -0.548   0.5856    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.01272609)
## 
##     Null deviance: 14.26148  on 71  degrees of freedom
## Residual deviance:  0.79894  on 59  degrees of freedom
##   (15 observations deleted due to missingness)
## AIC: 121.45
## 
## Number of Fisher Scoring iterations: 4
# Display results
modelsummary(gamma_model)
(1)
(Intercept) 36.488
(7.698)
Year -0.018
(0.004)
Materal.Race.or.EthnicityAsian and Pacific Islander -0.063
(0.120)
Materal.Race.or.EthnicityBlack NH 0.902
(0.160)
Materal.Race.or.EthnicityBlack Non-Hispanic 0.977
(0.125)
Materal.Race.or.EthnicityNon-Hispanic Black 1.050
(0.124)
Materal.Race.or.EthnicityNon-Hispanic White -0.191
(0.124)
Materal.Race.or.EthnicityOther Hispanic 0.316
(0.119)
Materal.Race.or.EthnicityOther/Two or More 0.069
(0.160)
Materal.Race.or.EthnicityPuerto Rican 0.631
(0.119)
Materal.Race.or.EthnicityTotal 0.374
(0.160)
Materal.Race.or.EthnicityWhite NH -0.241
(0.160)
Materal.Race.or.EthnicityWhite Non-Hispanic -0.069
(0.125)
Num.Obs. 72
AIC 121.4
BIC 153.3
Log.Lik. -46.724
F 86.588
RMSE 0.54
# Simulate the model
sim_results <- sim(gamma_model, n = 1000)
summary(sim_results$sim.coefs)
##   (Intercept)         Year         
##  Min.   :13.72   Min.   :-0.03035  
##  1st Qu.:31.20   1st Qu.:-0.02014  
##  Median :36.60   Median :-0.01760  
##  Mean   :36.51   Mean   :-0.01757  
##  3rd Qu.:41.70   3rd Qu.:-0.01491  
##  Max.   :62.34   Max.   :-0.00629  
##  Materal.Race.or.EthnicityAsian and Pacific Islander
##  Min.   :-0.50764                                   
##  1st Qu.:-0.13379                                   
##  Median :-0.05796                                   
##  Mean   :-0.06431                                   
##  3rd Qu.: 0.01256                                   
##  Max.   : 0.36652                                   
##  Materal.Race.or.EthnicityBlack NH Materal.Race.or.EthnicityBlack Non-Hispanic
##  Min.   :0.3483                    Min.   :0.5463                             
##  1st Qu.:0.8023                    1st Qu.:0.9029                             
##  Median :0.8994                    Median :0.9810                             
##  Mean   :0.9006                    Mean   :0.9794                             
##  3rd Qu.:1.0078                    3rd Qu.:1.0635                             
##  Max.   :1.3210                    Max.   :1.3311                             
##  Materal.Race.or.EthnicityNon-Hispanic Black
##  Min.   :0.6482                             
##  1st Qu.:0.9778                             
##  Median :1.0516                             
##  Mean   :1.0509                             
##  3rd Qu.:1.1327                             
##  Max.   :1.3982                             
##  Materal.Race.or.EthnicityNon-Hispanic White
##  Min.   :-0.6389                            
##  1st Qu.:-0.2643                            
##  Median :-0.1837                            
##  Mean   :-0.1878                            
##  3rd Qu.:-0.1078                            
##  Max.   : 0.2616                            
##  Materal.Race.or.EthnicityOther Hispanic
##  Min.   :-0.09784                       
##  1st Qu.: 0.24344                       
##  Median : 0.32327                       
##  Mean   : 0.31779                       
##  3rd Qu.: 0.39376                       
##  Max.   : 0.71449                       
##  Materal.Race.or.EthnicityOther/Two or More
##  Min.   :-0.45856                          
##  1st Qu.:-0.03168                          
##  Median : 0.07086                          
##  Mean   : 0.07316                          
##  3rd Qu.: 0.17417                          
##  Max.   : 0.51619                          
##  Materal.Race.or.EthnicityPuerto Rican Materal.Race.or.EthnicityTotal
##  Min.   :0.1839                        Min.   :-0.1279               
##  1st Qu.:0.5598                        1st Qu.: 0.2736               
##  Median :0.6356                        Median : 0.3820               
##  Mean   :0.6324                        Mean   : 0.3774               
##  3rd Qu.:0.7103                        3rd Qu.: 0.4821               
##  Max.   :1.0141                        Max.   : 0.8581               
##  Materal.Race.or.EthnicityWhite NH Materal.Race.or.EthnicityWhite Non-Hispanic
##  Min.   :-0.7594                   Min.   :-0.53890                           
##  1st Qu.:-0.3443                   1st Qu.:-0.13961                           
##  Median :-0.2451                   Median :-0.06690                           
##  Mean   :-0.2425                   Mean   :-0.06906                           
##  3rd Qu.:-0.1428                   3rd Qu.: 0.01280                           
##  Max.   : 0.3254                   Max.   : 0.34332

The results of the Gamma regression model showed the following key findings:

Intercept: The baseline infant mortality rate when all other variables are zero is 36.49 (on the log scale).

Year: The coefficient for the year variable is -0.018, which suggests that as the years increase, the infant mortality rate tends to decrease. The relationship is statistically significant (p-value < 0.01).

Maternal Race or Ethnicity: The coefficients for different maternal races or ethnicities varied. For example: Black Non-Hispanic: The coefficient for Black Non-Hispanic mothers is 0.977, indicating higher infant mortality rates compared to the baseline group. Other Hispanic: The coefficient for Other Hispanic mothers is 0.316, which shows a lower but statistically significant increase in the infant mortality rate. White Non-Hispanic: The coefficient for White Non-Hispanic mothers is -0.191, indicating a slight decrease in infant mortality rates for this group compared to the baseline. Puerto Rican: The coefficient for Puerto Rican mothers is 0.678, suggesting a higher infant mortality rate compared to the baseline group.

These findings suggest significant differences in infant mortality rates across different racial and ethnic groups. The year-over-year trend indicates a decreasing trend in the infant mortality rate, although certain racial and ethnic groups still face higher risks.

Model Diagnostics The model’s goodness of fit was assessed using the AIC (121.45) and RMSE (0.54). The relatively low RMSE suggests that the model fits the data reasonably well, and the statistical significance of the coefficients suggests that the variables included are relevant predictors of infant mortality.

Additionally, simulations of the coefficients confirm the stability of the results. For instance, the coefficient for Year varied between -0.029 and -0.006 in different simulated datasets, consistently showing a negative relationship with infant mortality.

The analysis highlights the importance of maternal race and ethnicity in determining infant mortality rates. The model suggests that the disparity in infant mortality rates is not just a function of the year but also significant differences based on maternal background. Programs targeting health interventions for specific groups, particularly Black Non-Hispanic and Puerto Rican mothers, may help reduce these disparities.

Moreover, the negative trend of infant mortality over the years is promising but does not yet fully account for the racial and ethnic inequalities in infant health outcomes.

Conclusion

The Gamma regression model provides valuable insights. The trend suggests improvements over time, disparities across different racial and ethnic groups persist. Policymakers and healthcare providers should continue to focus on addressing these disparities through targeted health interventions and policies.

The findings of this study contribute to the ongoing efforts to reduce infant mortality rates and improve maternal and child health outcomes for all communities.