I have chosen to UNICEF under 5 year mortality data.

It looks at the mortality rate (deaths per 1000 children) of all countries from 1950 to 2015.

We will investigate the how the countries stack up against each other and which countries have improved their mortality rate the most.


1) Loading required libraries and downloading data

We load all required libraries:

library(httr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(stringr)
library(tufte)

We download the CSV from my GitHub account replacing all empty spaces with NAs to make the clean up easier later on:

urlRemote  <- "https://raw.githubusercontent.com/"
pathGithub <- "chilleundso/DATA607/master/Project2/"
fileName   <- "unicef.csv"

#reading csv as a data frame making any potential empty cells into N/A
mortal <- read.csv(paste0(urlRemote, pathGithub, fileName),header = TRUE, na.strings=c(""," ","NA"))

The raw data looks as follows (Due to the large dimensions we will always look at he first and last 5 years of the data set):

kable(head(mortal[,c(1:6, (ncol(mortal)-4):ncol(mortal))]))
CountryName U5MR.1950 U5MR.1951 U5MR.1952 U5MR.1953 U5MR.1954 U5MR.2011 U5MR.2012 U5MR.2013 U5MR.2014 U5MR.2015
Afghanistan NA NA NA NA NA 102.3 99.5 96.7 93.9 91.1
Albania NA NA NA NA NA 16.0 15.5 14.9 14.4 14.0
Algeria NA NA NA NA 251 26.6 26.1 25.8 25.6 25.5
Andorra NA NA NA NA NA 3.2 3.1 3.0 2.9 2.8
Angola NA NA NA NA NA 177.3 172.2 167.1 162.2 156.9
Antigua & Barbuda NA NA NA NA NA 9.5 9.1 8.7 8.4 8.1

We reformat the header so that it shows propper years:

#everything before . is deleted in the header
names(mortal) <- gsub("^.*\\.","", names(mortal))
kable(head(mortal[,c(1:6, (ncol(mortal)-4):ncol(mortal))]))
CountryName 1950 1951 1952 1953 1954 2011 2012 2013 2014 2015
Afghanistan NA NA NA NA NA 102.3 99.5 96.7 93.9 91.1
Albania NA NA NA NA NA 16.0 15.5 14.9 14.4 14.0
Algeria NA NA NA NA 251 26.6 26.1 25.8 25.6 25.5
Andorra NA NA NA NA NA 3.2 3.1 3.0 2.9 2.8
Angola NA NA NA NA NA 177.3 172.2 167.1 162.2 156.9
Antigua & Barbuda NA NA NA NA NA 9.5 9.1 8.7 8.4 8.1

2) Highest and lowest child mortality

Next we look at the 5 countries with the highest child mortality , which are all African countries:

#use arrange to find bottom 5
maxmort <- arrange(mortal, desc(mortal$"2015"))[1:5,]
kable(maxmort[c(1:6, (ncol(maxmort)-4):ncol(maxmort))])
CountryName 1950 1951 1952 1953 1954 2011 2012 2013 2014 2015
Angola NA NA NA NA NA 177.3 172.2 167.1 162.2 156.9
Chad NA NA NA NA NA 156.0 151.6 147.1 142.9 138.7
Somalia NA NA NA NA NA 155.3 150.6 146.1 141.2 136.8
Central African Republic NA NA NA NA NA 146.2 142.1 138.5 134.0 130.1
Sierra Leone NA NA NA NA NA 150.6 141.6 133.4 126.4 120.4

In contrast, all 5 countries with the least amount of child mortality are in Europe:

#use arrange to find top 5
minmort <- arrange(mortal, mortal$"2015")[1:5,]
kable(minmort[c(1:6, (ncol(minmort)-4):ncol(minmort))])
CountryName 1950 1951 1952 1953 1954 2011 2012 2013 2014 2015
Luxembourg NA NA NA NA NA 2.3 2.1 2.0 2.0 1.9
Iceland NA 29.2 27.8 26.5 25.3 2.3 2.2 2.1 2.1 2.0
Finland NA 42.0 40.5 38.9 37.3 2.9 2.7 2.6 2.4 2.3
Norway 32.8 30.8 29.1 27.7 26.6 3.1 3.0 2.8 2.7 2.6
Slovenia NA NA NA NA NA 3.1 3.0 2.8 2.7 2.6

3) Global Average

Technically, we cannot take a simple average of all child mortality rates. Firstly, since different countries start reporting their rates at different times, which means that the entry of a country could increase the average, giving th impression of a change, while it is just the addition of a level that is not close to the mean. Secondly, we assume, that all countries have the same population, which is clearly not true. However, we still consider an average over all countrie’s mortality rate to be an intersting indicator:

#use summarise to create global view
avg_mortal <- round(mortal %>%
  summarise_if(is.numeric, mean, na.rm = TRUE),1)
kable(avg_mortal[,c(1:5, (ncol(avg_mortal)-4):ncol(avg_mortal))])
1950 1951 1952 1953 1954 2011 2012 2013 2014 2015
151.7 158.1 160.9 164.4 154.8 36.8 35.4 34.1 32.9 31.8
#create long data in order to plot the global mean
avg_mortal_long <- tidyr::gather(avg_mortal, "year", "AvgMortRate")
avg_mortal_long$year <- as.numeric(avg_mortal_long$year)

colors <- c("Average Mortality Rate" = "black")

ggplot(avg_mortal_long, aes(x = year)) +
    geom_line(aes(y = AvgMortRate, color = "Average Mortality Rate",group = 1), size = 1.2) +
    labs(x = "year",
         y = "average mortality rate",
         color = "Legend") +
    scale_color_manual(values = colors) +
    ggtitle("Average Mortality Rate") +
    theme(plot.title = element_text(hjust = 0.5)) +
    scale_x_continuous(breaks=seq(1950, 2015,5)) 

We can see clearly that the unweighted average under 5 year mortality rate has decreased drastically from around 160 per 1000 children to almost 30. from 1950.

4) Biggest movers

We want to investigate the largest movers between 1995 and 2015. We chose this time frame since all countries have complete data as of 1990.

#make long data with gather in order to investigate
mortal_long <- tidyr::gather(mortal, "Year", "Mortality", -CountryName)
#kable(head(mortal_long))

We create a dataframe with countries in the rows and a column for each year:

#choose 1990 to 2015 as span (since complete)
mortal_span <- mortal_long %>% 
    dplyr::filter(Year == '1990'| Year == '2015')
#move the years into 2 columns
mortal_span <- tidyr::spread(mortal_span,Year,Mortality)
kable(head(mortal_span))
CountryName 1990 2015
Afghanistan 181.0 91.1
Albania 40.6 14.0
Algeria 46.8 25.5
Andorra 8.5 2.8
Angola 226.0 156.9
Antigua & Barbuda 25.5 8.1

We then add 2 columns: one for percentage and one for absolute change.

#create percentage change column
mortal_span$percchange <- (mortal_span$'1990' - mortal_span$'2015') / mortal_span$'1990'
#create absolute change column
mortal_span$abschange <- (mortal_span$'1990' - mortal_span$'2015')
kable(head(mortal_span))
CountryName 1990 2015 percchange abschange
Afghanistan 181.0 91.1 0.4966851 89.9
Albania 40.6 14.0 0.6551724 26.6
Algeria 46.8 25.5 0.4551282 21.3
Andorra 8.5 2.8 0.6705882 5.7
Angola 226.0 156.9 0.3057522 69.1
Antigua & Barbuda 25.5 8.1 0.6823529 17.4
#show biggest percentage change
kable(head(arrange(mortal_span, desc(abschange))))
CountryName 1990 2015 percchange abschange
Niger 328.2 95.5 0.7090189 232.7
Liberia 255.0 69.9 0.7258824 185.1
Malawi 242.4 64.0 0.7359736 178.4
Mozambique 239.7 78.5 0.6725073 161.2
South Sudan 253.2 92.6 0.6342812 160.6
Ethiopia 204.6 59.2 0.7106549 145.4

We can see that the largest change has happened in Africa, which is to be expected, given the high child mortality rate in our start year 1990 for most Afrikan countries compared to the rest of the world.

#show smallest percentage change
kable(head(arrange(mortal_span, abschange)))
CountryName 1990 2015 percchange abschange
Niue 13.8 23.0 -0.6666667 -9.2
Dominica 17.1 21.2 -0.2397661 -4.1
Lesotho 88.1 90.2 -0.0238365 -2.1
Brunei 12.2 10.2 0.1639344 2.0
Seychelles 16.5 13.6 0.1757576 2.9
Canada 8.3 4.9 0.4096386 3.4

For the country with the smallest change we see a large proportion of island countries.


5) Niue and Dominica

Specifically Niue and Dominica are intersting, so we will plot their entire history below:

mortal_ND <- mortal_long %>% 
    dplyr::filter(CountryName == 'Niue'| CountryName == 'Dominica')
mortal_ND2 <- tidyr::spread(mortal_ND,CountryName,Mortality)
mortal_ND2$Year <- as.numeric(mortal_ND2$Year)
#defining our color scheme and legend names:
colors <- c("Niue" = "blue", "Dominica" = "green")

#using ggplot with two geom_lines:
ggplot(mortal_ND2, aes(x = Year)) +
    geom_line(aes(y = Niue, color = "Niue"), size = 1.2) +
    geom_line(aes(y = Dominica, color = "Dominica"), size = 1.2) +
    labs(x = "Date",
         y = "Under 5 year Mortality",
         color = "Legend") +
    scale_color_manual(values = colors) +
    ggtitle("Under 5 year Mortality for Niue and Dominica") +
    theme(plot.title = element_text(hjust = 0.5))

Dominica has a long history and we can see that compared to the large reduction the uptick in recent years is relatively muted.

For Niue it does seem less clear and therefore I have done some research about this 1,400 person country. I have found a long statistical report which finds the below reasoning of uncertainty in the data which can have a large impact on a very small overall population:

“When aggregated over 5 years, under 5 mortality in Niue is shown to have increased slightly, since the earliest period as shown in the graph, although there are no clear trends. This however primarily reflects a growing level of uncertainty in the figures (95% confidence intervals are shown as the upright bars) due to a substantial decline in the overall number of births resulting in smallerdenominator when calculating IMR and U5Mrather than a true increase in childhood deaths. These figures clearly demonstrate the potential for uncertainty due to small numbers even when aggregated over several years, and the need for data interpretation when reporting mortality measures for policy.”

Niue Vital Statistics Report: 1987 -2011

We have seen that Africa shows both the highest child mortality but also shows the fastest rate of improvement.

We stumbled upon some intertsing irregularities in the case of Niue which can partially be explained by data uncertainty.

source of the Niue report: https://prism.spc.int/images/VitalStatistics/Niue_VITAL_STATISTICS_REPORT-1.pdf