1 Introduction

Using data from the United Nations Population Division, Department of Economic and Social Affairs, this study aims to investigate global infant mortality rates from 1950-2010 in order to identify the continents which have the highest rates of infant mortality (per 1000 births), and determine if rates of infant mortality have ideally improved over the years due to medical advancements and greater access to care. Decreasing infant mortality rates globally is of critical importance, as every child deserves the right to live, regardless of the resources available in their birthplace. By comparing not only infant mortality rates between continents, but also across time, we can determine not only which continents have the lowest morality rates, but also which continents have made the most progress in decreasing infant mortality rates. While this data will reveal global trends in infant morality rates by considering the effects of continent and year, we will not be able to specifically associate increases or decreases in IMR with specific medical advancements or healthcare initiatives without additional qualitative research.

It’s important to note that our infant mortality rate is number of infant deaths per 1000 births.


2 Exploratory data analysis

It appears that starting in 1950, each continent had a different infant mortality rate and as time progressed, all those infant mortality rates decreased. Yet, they decreased at different rates depending on each continent, with Asia and Africa having the largest overall decreases in infant mortality rate over time while North America (NA) had the smallest decrease in infant mortality over time. Additionally, some continents appear to fit a more curved, polynomial trend, such as Europe, North America (NA), Oceania, and Latin America and the Caribbean (LAC).

In terms of observational units, each row in our dataset represents the infant mortality rate at a certain year for a certain continent. Each point on the exploratory data analysis scatter plot also represents the infant mortality rate a certain number of years after 1950 for an individual continent.

It’s important to note that the continents Latin America and the Caribbean and North America are denoted by LAC and NA, respectively. These abbreviations are to ensure a clearer presentation of our data trends in the visualizations.

imr_per_1000 years_since_1950 year continent years_since_squared
188.3688 0 1950 AFRICA 0
171.3920 5 1955 AFRICA 25
156.9131 10 1960 AFRICA 100
144.7107 15 1965 AFRICA 225
133.6601 20 1970 AFRICA 400
120.9642 25 1975 AFRICA 625


3 Multiple regression

In this regression model, we are determining the relationship between the categorical variable, continent, and the numerical variable year on infant mortality rate, shown by two terms in our regression: years_since_1950 (linear) and years_since_squared (polynomial).

term estimate std_error statistic p_value conf_low conf_high
intercept 183.980 2.186 84.171 0.000 179.608 188.353
years_since_1950 -2.607 0.169 -15.400 0.000 -2.945 -2.268
continentASIA -23.336 3.091 -7.549 0.000 -29.519 -17.153
continentEUROPE -120.905 3.091 -39.113 0.000 -127.088 -114.721
continentLAC -55.461 3.091 -17.942 0.000 -61.644 -49.278
continentNA -151.919 3.091 -49.146 0.000 -158.102 -145.735
continentOCEANIA -126.012 3.091 -40.765 0.000 -132.196 -119.829
years_since_squared 0.010 0.003 3.553 0.001 0.004 0.015
years_since_1950:continentASIA -0.626 0.239 -2.617 0.011 -1.105 -0.148
years_since_1950:continentEUROPE 0.392 0.239 1.640 0.106 -0.086 0.871
years_since_1950:continentLAC -0.193 0.239 -0.807 0.423 -0.672 0.286
years_since_1950:continentNA 1.724 0.239 7.204 0.000 1.246 2.203
years_since_1950:continentOCEANIA 1.554 0.239 6.494 0.000 1.076 2.033
continentASIA:years_since_squared 0.008 0.004 2.193 0.032 0.001 0.016
continentEUROPE:years_since_squared 0.012 0.004 3.180 0.002 0.005 0.020
continentLAC:years_since_squared 0.006 0.004 1.499 0.139 -0.002 0.013
continentNA:years_since_squared -0.002 0.004 -0.574 0.568 -0.010 0.005
continentOCEANIA:years_since_squared -0.002 0.004 -0.590 0.558 -0.010 0.005

3.1 Statistical interpretation

The outputted table uses the continent Africa as a baseline for the other data points. Our exploratory data analysis and original residual analysis suggested that for several continents there is a nonlinear relationship between year since 1950 and infant mortality rate per 1000 births. Therefore, we fit a polynomial regression to the data where years_since_squared is equal to the square of the number of years since 1950 and year_since_1950 is equal to the number of years since 1950.

The table shows that the associated effect of year on the infant mortality rate differs depending on the continent. Furthermore, the regression indicates that in each continent, infant mortality rate has decreased since 1950.

The results of the table mimic the visual results of our exploratory data analysis. In our interpretation, when taking looking at regression estimates for certain continents, we are taking into account all other variables.

The regression for continent = Africa has the highest intercept at 183.980 since it had the highest infant mortality rate in 1950 of all the continents. It has the third smallest coefficient on average for the years_since_squared term and it’s coefficient for the years_since_1950 term is quite negative at -2.607 on average.

Shown by our exploratory data analysis and verified by the values of the regression table, the regression for continent = NA has the smallest y-intercept at 32.061 since it had the smallest infant mortality rate in 1950. The regression for continent = Oceania also boasts a small y-intercept at 57.968. Both Oceania’s and North America’s years_since_squared term are on average the same at 0.008, and the years_since_1950 terms for Oceania and North America on average are less negative than the other continent’s coefficients at -1.053 and -0.883, respectively.

The regression for continent = Asia has a slightly lower intercept than Africa at 160.644 since it had the second highest rate of infant mortality in 1950 out of all the continents. It’s coefficient on average for the years_since_1950 term is the most negative of all the continents at -3.233 and its coefficient on average for the years_since_squared term the second largest at 0.018.

The regression for continent = Europe appears to have the most curved trend based on our visualizations. It has the largest coefficient for the years_since_squared term on average at 0.022 while it’s years_since_1950 coefficient on average is similar to Africa’s at -2.215. It’s y-intercept is 63.075, which is similar to Europe’s infant mortality rate in 1950.

For continent = LAC, it’s y-intercept (infant mortality rate in 1950) is quite high at 128.519 and its coefficient for the years_since_squared term is on average 0.016, which is larger than Africa’s, North America’s, and Oceania’s. The coefficient for the years_since_1950 term is on average more negative than all but Asia’s at -2.800.

While we can associate decreases in global infant mortality with time in history, this analysis will not allow us to associate changes with specific medical advancements or healthcare initiatives that have been developed each year. Instead it provides the initial analysis that will allow us to begin that qualitative analysis to make associations between years with the implementation of a specific initiative.

3.2 Non-statistical interpretation

Our preliminary results show that global infant mortality rates have decreased since 1950 in all continents. Infant mortality rate has been decreasing most rapidly in Asia, and least rapidly in North America and Oceania. Although Africa remains to be the continent with the highest infant mortality rate, as was true in 1950, it boasts a very rapid decrease in infant mortality rates across this span. Our regression suggests that between 1950-1955, approximately 15% of babies born in Africa died. By 2010, this percentage has fallen to 7.5%. Latin America and the Caribbean’s infant mortality rate appears to have heavily decreased as well, followed by Europe, Oceania and North America. This data suggests that significant advancements have been made to decrease infant mortality globally, however, gaps between IMR still exists between continents in 2010, and greater efforts are necessary to ensure the vitality of infants in all continents.


4 Inference for multiple regression

For our multiple regression, we chose the interaction model effect (for both polynomial and linear terms) because based on the results of our visualizations in the exploratory data analysis section, it appears that the associated effect of year on infant mortality rate depends on the continent.

Confidence Intervals

In general, our confidence intervals reveal that some continents best fit a polynomial decline in infant mortality rate, while others suggest a linear decline.

A 95% confidence interval for the intercept value, corresponding to the rate of infant mortality at year=1950 in Africa (b0) is [179.608, 188.353]. This value does not capture 0 within the margin of error, and a p-value of 0.000 indicates that we can reject the null hypothesis that the intercept is equal to 0. Intuitively this makes sense, considering that there is always some level of infant mortality. Therefore, we can be 95% confident that in 1950, the number of infant deaths in Africa was between 178.608 and 188.353 per 1000 births.

A 95% confidence interval for the slope of the Africa regression line including the years_since_squared term, corresponding with the rate of change in infant mortality rate, is equal to [-2.941,-2.930] This value does not capture 0, and a p-value of 0.001 would suggest that we can reject the null hypothesis and ascertain that there has been a significant change in infant mortality rate in Africa since 1950. Based on this interval, we are 95% confident that for every year since 1950, there is an associated decrease in infant deaths per 1000 births of on average between 2.941 and 2.930 times the square of the years since 1950.

Interestingly, the 95% confidence interval for the slope of the Africa regression line not including the years_since_squared term also gives us statistically significant results. The 95% confidence interval is equal to [-2.945, -2.268], which does not include 0, and has a p-value of 0.000. This indicates that we can reject the null hypothesis and ascertain that there has been a significant, and negative change in infant mortality rate in Africa since 1950. Based on this interval, we are 95% confident that for every year since 1950, there is an associated decrease of on average between 2.268 and 2.945 infant deaths per 1000 births.

While both the polynomial and linear regression models for Africa express a p-value less than 0.05, the polynomial fit shows a narrower confidence interval, indicative of a smaller margin of error. This suggests that the polynomial regression model may best describe the trend in infant mortality rate in Africa.

A 95% confidence interval for the slope of the North America regression line, when taking into account the years_since_squared term, is equal to [-0.010, 0.005], which captures 0, and has a p value of 0.568. This data provides insufficient evidence to reject the null hypothesis, and thus we take this term to be 0.

A 95% confidence interval for the slope of the North America regression line, when the years_since_squared term is taken to be 0, is equal to [1.699,2.203] with a p-value equal to 0.00. These results suggest that we may reject the null hypothesis, and determine that there has been a significant decline in infant mortality rate in North America, but at a rate slower than that of Africa.

A 95% confidence interval for the slope of the Latin American and Caribbean regression line, corresponding to the rate of change in infant mortality rate, when taking into account the years_since_squared term, is equal to [-0.002, 0.0.013]. Notably, this interval captures 0, and a p-value of 0.139 suggests that we have insufficient evidence to reject the null hypothesis, and thus this term is taken to be zero. A 95% confidence interval in the slope of the regression is equal to [-0.672, 0.286] (p=0.423). The p value indicates that these results are not statistically significant and thus we do not reject the null hypothesis that the term is taken to 0.

Similarly, a 95% confidence interval for the slope of the Asia regression line is equal to [-0.356, 0.117]. This interval captures 0, and a p-value of 0.314 suggests that we have insufficient evidence to reject the null hypothesis, and thus we take this term to be zero. A 95% confidence interval for the slope of the Asia linear regression line is equal to [-1.105, -0.148] where p=0.011. These results suggest that for every year since 1950, there is an associated decrease of on average between 0.148 and 1.105 infant deaths per 1000 births.

A 95% confidence interval for the slope of the Europe regression line, taking into account the year_since_squared term is equal to [0.005, 0.020], with a p-value of 0.002. Since p=0.002<0.05, these results provide sufficient evidence to reject the null hypothesis, and accept that there has been a polynomial decline in the infant mortality rate in Europe since 1950 at a rate that is smaller than that for Africa.

Residual Analysis

ID imr_per_1000 years_since_1950 continent years_since_squared imr_per_1000_hat residual
1 188.369 0 AFRICA 0 183.980 4.388
2 171.392 5 AFRICA 25 171.189 0.203
3 156.913 10 AFRICA 100 158.881 -1.968
4 144.711 15 AFRICA 225 147.056 -2.345
5 133.660 20 AFRICA 400 135.714 -2.054
6 120.964 25 AFRICA 625 124.855 -3.891

The histogram of the residuals appears to be centered at 0, as desired. There is a slightly longer right tail than left tail, but otherwise the histogram exhibits a normal distribution.

Since we had both a polynomial and linear term, we observed two different residual scatter plots, one with the x-variable as years_since_squared and the other as years_since_1950.

The residual scatter plot for x = years_since_1950 appears to have points that are randomly distributed above and below the horizontal line y = 0. There seems to be no apparent pattern to the residuals. The residual scatter plot for x = years_since_squared also appears to have no distinct nor drastic pattern. The residuals seem to be randomly placed above and below the horizontal line y = 0 with points closer and more clustered at the far ends of the plot.


5 Conclusion

As expected, the infant mortality rate in each of the six continents studied in our analysis, namely Africa, Asia, Europe, Latin American and the Caribbean, North America and Oceania, decreased between the years of 1950 and 2010. However, this rate declined at markedly different rates, revealing global health inequality dependent on where you are born.

Multiple regression analysis revealed that continents fit either linear or polynomial regression models. The rate of change of infant mortality since 1950, given by the slope of the regression, was almost equally as large in Asia and Africa, although in 1950, the intercept value indicates that Africa had the highest mortality rate (183.980 +/- 2.186), indicating that approximately 18.4% of all births were fatal. In Asia, we are 95% confident that for every year since 1950, the number of infant deaths per 1000 births was on average between 2.252 and 2.944. In Africa, we are 95% confident that for every increase in one year since 1950, the number of infant deaths per 1000 births decreases by on average 2.253 to 2.941 deaths times the square of the number of years since 1950.

A 95% confidence interval for the polynomial regression, including the term years_since_squared indicated that the rate of decline of IMR in Latin American and the Caribbean (p=0.139), North America (p=0.568), and Oceania (p=0.558) did not follow a polynomial trend. Rather, our linear model showed that we are 95% confident that there is an associated average decrease of 0.065 to 1.699 infant deaths per 1000 babies born in North America (p=0.000) per year since 1950. In Oceania, this rate of decrease was slightly higher: between 0.235 and 1.869 deaths per 1000 babies year (p=0.000). It should be noted that Oceania and North America had the two lowest infant mortality rates respectively in 1950.

There proved to be insufficient evidence to support that infant mortality rate in Latin America underwent any significant changes since 1950 when considering both a linear and polynomial model as the confidence intervals for both coefficients correlated with a p value greater than 0.05. However, from our exploratory data analysis, it is evident that number of infant deaths per year has decreased drastically since 1950. This discrepancy suggest that there may be a lurking variable outside of continent and year since 1950 that may be a better predictor of infant mortality rate in Latin America and the Caribbean.

The rate of change of infant mortality rate in Europe may be best represented by a polynomial model. We are 95% confident that for every year since 1950, the number of deaths per 1000 births decreased on average by a factor of between 2.248 and 2.940 times the square of the number of years since 1950.

While our analysis suggests a global decline in infant mortality rate associated with number of years since 1950 that is variable by continent, it does not consider other factors such as race, gender, socioeconomic mobility, or various medical advancements that may influence the differences in IMR for each continent. While we are able to identify and describe trends over time, we are limited in our ability to explain why rates of immortality differ among continents.

Additional studies may dive deeper into these aforementioned factors in order to ascertain the specific factors contributing to discrepancies in global infant mortality rates. If anything, this study provides evidence that Africa and Asia are high priority regions for neonatal healthcare improvements and advancements.


6 Citations and References

Infant Mortality Rate (IMR)


Supplementary Materials

The following NIH link outlines the research activities and scientific advancements made by NIH funded research in the area of infant mortality. This review contextualizes our data within a larger framework of research advancements and introduces factors that explain variability in infant mortality rates internationally such as poverty, domestic and sexual violence, air pollution, levels of infection, average maternal age, etc. For example, while birth defects and genetic conditions are the highest contributors to child death in the US, infection is one of the leading causes for infant mortality internationally, due to diseases such as HIV and malaria.

NIH Funded Research to Decrease IMR