1 Introduction

  • Discuss the research question you will be addressing with your multiple regression model.
  • Talk about your data’s context, their sources, and any limitations of the data.

Using data from the United Nations Population Division, Department of Economic and Social Affairs, this study aims to investigate global infant mortality rates from 1950-2010 in order to identify the continents which have the highest rates of infant mortality (per 1000 births), and determine if rates of infant mortality have ideally improved over the years due to medical advancements and greater access to care. Decreasing infant mortality rates globally is of critical importance, as every child deserves the right to live, regardless of the resources available in their birthplace. By comparing not only infant mortality rates between continents, but also across time, we can determine not only which continents have the lowest morality rates, but also which continents have made the most progress in decreasing infant mortality rates. While this data will reveal global trends in infant morality rates by considering the effects of continent and year, we will not be able to specifically associate increases or decreases in IMR with specific medical advancements or healthcare initiatives without additional qualitative research.

It’s important to note that the infant mortality rate number of infant deaths per 1000 births.


2 Exploratory data analysis

  • In the code block below
    1. Compute relevant summary statistics and tables
    2. Display informative well-polished visualizations
  • Perform all “eye-ball tests” and make preliminary/initial observations here:

It appears that starting in 1950, each continent had a different infant mortality rate and as time progressed, all those infant mortality rates decreased. Yet, they decreased at different rates depending on each continent, with Asia and Africa having the largest average decreases in infant mortality rate over time while North America had the smallest decrease in infant mortality over time.

##   imr_per_1000     years_since_1950      year       continent        
##  Min.   :  5.343   Min.   : 0       Min.   :1950   Length:78         
##  1st Qu.: 21.848   1st Qu.:15       1st Qu.:1965   Class :character  
##  Median : 39.331   Median :30       Median :1980   Mode  :character  
##  Mean   : 57.086   Mean   :30       Mean   :1980                     
##  3rd Qu.: 87.627   3rd Qu.:45       3rd Qu.:1995                     
##  Max.   :188.369   Max.   :60       Max.   :2010


3 Multiple regression

  • Describe in words the components of your multiple regression model. It should be a single model involving at least one numerical and one categorical explanatory variable

In this regression model, we are determining the relationship between the categorical variable, continent, and the numerical variable, year, on infant mortality rate.

  • Fit the regression model
  • Compute the regression table
term estimate std_error statistic p_value conf_low conf_high
intercept 178.666 2.968 60.192 0.000 172.740 184.593
years_since_1950 -2.027 0.084 -24.142 0.000 -2.194 -1.859
continentASIA -27.974 4.198 -6.664 0.000 -36.355 -19.593
continentEUROPE -127.629 4.198 -30.404 0.000 -136.010 -119.248
continentLAC -58.632 4.198 -13.967 0.000 -67.013 -50.250
continentNA -150.705 4.198 -35.901 0.000 -159.086 -142.324
continentOCEANIA -124.765 4.198 -29.722 0.000 -133.146 -116.384
years_since_1950:continentASIA -0.120 0.119 -1.014 0.314 -0.357 0.117
years_since_1950:continentEUROPE 1.126 0.119 9.484 0.000 0.889 1.363
years_since_1950:continentLAC 0.153 0.119 1.286 0.203 -0.084 0.390
years_since_1950:continentNA 1.592 0.119 13.408 0.000 1.355 1.829
years_since_1950:continentOCEANIA 1.418 0.119 11.946 0.000 1.181 1.655

3.1 Statistical interpretation

  • Interpret the output of your table using statistical language.

The outputted table uses the continent Africa as a baseline for the other data points since it is first alphabetically out of the continent categories. T (per 1000 births)he table shows that the associated effect of year on the infant mortality rate differs depending on the continent. Furthermore, the regression indicates that in each continent, infant mortality rate has decreased since 1950.

  • Tie in the results of the table with the results of your exploratory data analysis.

Since Africa is the point of reference for the data points in the regression table, it’s intercept and slope are easily distinguishable from the table. Shown by the visualizations, it has they highest intercept at around 179 and one of the most negative slopes at approximately -2.03, which is slightly less negative than the slope of Asia’s regression line.

The results of the table mimic the results seen in our exploratory data analysis. The line for continent = NA (North America) has the shallowest slope, and it’s intercept is the lowest, which means that its infant mortality rate in year 1950 was the lowest of the continents and changed the least over time.

Shown by our exploratory data analysis and verified by the values of the regression table, the line for continent = ASIA has the most negative, steepest slope, which means that over the years, it’s infant mortality rate has been decreasing at the fastest rate in comparison to the other continents. The intercept for continent = ASIA is the closest to that of Africa as it is around 151 infant deaths per 1000 births while Africa’s is approximately 179.

The slopes of the lines for continent = NA (North America) and continent = OCEANIA are quite similar, differing by only 0.17. North America has a lower intercept, however, compared to Oceania.

The intercepts of the regression lines for continent = EUROPE and continent = OCEANIA are very similar as that for Europe is around 51 and that for Oceania is around 54. However, Europe has a more negative slope than Oceania both visually and statistically (-0.90 vs. -0.61).

The intercept and slope of regression line for continent = LAC (Latin America and the Caribbean) are contingent with the visualizations. The slope is very similar to that of Africa and is less negative by a magnitude of just 0.153 and the intercept is around 120.

  • Discuss (any) potential limitations of your analysis.

While we can associate decreases in global infant mortality with time in history, this analysis will not allow us to associate changes with specific medical advancements or healthcare initiatives that have been developed each year. Instead it provides the initial analysis that will allow us to begin that qualitative analysis to make associations between years with the implementation of a specific initiative.

3.2 Non-statistical interpretation

  • Explain the preliminary results of your model using language meant for a non-statistically trained audience.

Our preliminary results show that global infant mortality rates have decreased since 1950 in all continents. Infant mortality rate has been decreasing most rapidly in Asia, and least rapidly in North America and Oceania. Although Africa remains to be the continent with the highest infant mortality rate, as was true in 1950, it boasts the second most rapid decrease in infant mortality rates across this span. Our regression suggests that between 1950-1955, approximately 15% of babies born in Africa died. By 2010, this percentage has fallen to 7.5%. Latin America and the Caribbean ranked third highest in rate of decrease in infant mortality rates, followed by Europe, Oceania and North America. This data suggests that significant advancements have been made to decrease infant mortality globally, however, gaps between IMR still exists between continents in 2010, and greater efforts are necessary to ensure the vitality of infants in all continents.


4 Inference for multiple regression

Note: This section is to be skipped for the initial submission and completed for the re-submission.

  • Interpret:
    1. All confidence intervals emphasizing the “practical significance” of the results
    2. All p-values emphasizing the “statistical significance” of the results
  • Get all the regression points and conduct a residual analysis and its implications for the interpretations.

5 Conclusion

Note: This section is to be skipped for the initial submission and completed for the re-submission.

  • Summarize your the results of all analysis
  • Emphasize the “take-home message” of your analysis
  • Discuss all limitations of this analysis and caveats to keep in mind.
  • Discuss potential future work.

6 Citations and References

Infant Mortality Rate (IMR)


Supplementary Materials

Optional: If you have any other materials that you think are interesting, but not directly relevant to the project. For example interesting observations or a cool visualization.

The following NIH link outlines the research activities and scientific advancements made by NIH funded research in the area of infant mortality. This review contextualizes our data within a larger framework of research advancements and introduces factors that explain variability in infant mortality rates internationally such as poverty, domestic and sexual violence, air pollution, levels of infection, average maternal age, etc. For example, while birth defects and genetic conditions are the highest contributors to child death in the US, infection is one of the leading causes for infant mortality internationally, due to diseases such as HIV and malaria.

NIH Funded Research to Decrease IMR