I have chosen to UNICEF under 5 year mortality data.
It looks at the mortality rate (deaths per 1000 children) of all countries from 1950 to 2015.
We will investigate the how the countries stack up against each other and which countries have improved their mortality rate the most.
We load all required libraries:
library(httr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(stringr)
library(tufte)
We download the CSV from my GitHub account replacing all empty spaces with NAs to make the clean up easier later on:
urlRemote <- "https://raw.githubusercontent.com/"
pathGithub <- "chilleundso/DATA607/master/Project2/"
fileName <- "unicef.csv"
#reading csv as a data frame making any potential empty cells into N/A
mortal <- read.csv(paste0(urlRemote, pathGithub, fileName),header = TRUE, na.strings=c(""," ","NA"))
The raw data looks as follows (Due to the large dimensions we will always look at he first and last 5 years of the data set):
kable(head(mortal[,c(1:6, (ncol(mortal)-4):ncol(mortal))]))
| CountryName | U5MR.1950 | U5MR.1951 | U5MR.1952 | U5MR.1953 | U5MR.1954 | U5MR.2011 | U5MR.2012 | U5MR.2013 | U5MR.2014 | U5MR.2015 |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | NA | NA | NA | NA | NA | 102.3 | 99.5 | 96.7 | 93.9 | 91.1 |
| Albania | NA | NA | NA | NA | NA | 16.0 | 15.5 | 14.9 | 14.4 | 14.0 |
| Algeria | NA | NA | NA | NA | 251 | 26.6 | 26.1 | 25.8 | 25.6 | 25.5 |
| Andorra | NA | NA | NA | NA | NA | 3.2 | 3.1 | 3.0 | 2.9 | 2.8 |
| Angola | NA | NA | NA | NA | NA | 177.3 | 172.2 | 167.1 | 162.2 | 156.9 |
| Antigua & Barbuda | NA | NA | NA | NA | NA | 9.5 | 9.1 | 8.7 | 8.4 | 8.1 |
We reformat the header so that it shows propper years:
#everything before . is deleted in the header
names(mortal) <- gsub("^.*\\.","", names(mortal))
kable(head(mortal[,c(1:6, (ncol(mortal)-4):ncol(mortal))]))
| CountryName | 1950 | 1951 | 1952 | 1953 | 1954 | 2011 | 2012 | 2013 | 2014 | 2015 |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | NA | NA | NA | NA | NA | 102.3 | 99.5 | 96.7 | 93.9 | 91.1 |
| Albania | NA | NA | NA | NA | NA | 16.0 | 15.5 | 14.9 | 14.4 | 14.0 |
| Algeria | NA | NA | NA | NA | 251 | 26.6 | 26.1 | 25.8 | 25.6 | 25.5 |
| Andorra | NA | NA | NA | NA | NA | 3.2 | 3.1 | 3.0 | 2.9 | 2.8 |
| Angola | NA | NA | NA | NA | NA | 177.3 | 172.2 | 167.1 | 162.2 | 156.9 |
| Antigua & Barbuda | NA | NA | NA | NA | NA | 9.5 | 9.1 | 8.7 | 8.4 | 8.1 |
Next we look at the 5 countries with the highest child mortality , which are all African countries:
#use arrange to find bottom 5
maxmort <- arrange(mortal, desc(mortal$"2015"))[1:5,]
kable(maxmort[c(1:6, (ncol(maxmort)-4):ncol(maxmort))])
| CountryName | 1950 | 1951 | 1952 | 1953 | 1954 | 2011 | 2012 | 2013 | 2014 | 2015 |
|---|---|---|---|---|---|---|---|---|---|---|
| Angola | NA | NA | NA | NA | NA | 177.3 | 172.2 | 167.1 | 162.2 | 156.9 |
| Chad | NA | NA | NA | NA | NA | 156.0 | 151.6 | 147.1 | 142.9 | 138.7 |
| Somalia | NA | NA | NA | NA | NA | 155.3 | 150.6 | 146.1 | 141.2 | 136.8 |
| Central African Republic | NA | NA | NA | NA | NA | 146.2 | 142.1 | 138.5 | 134.0 | 130.1 |
| Sierra Leone | NA | NA | NA | NA | NA | 150.6 | 141.6 | 133.4 | 126.4 | 120.4 |
In contrast, all 5 countries with the least amount of child mortality are in Europe:
#use arrange to find top 5
minmort <- arrange(mortal, mortal$"2015")[1:5,]
kable(minmort[c(1:6, (ncol(minmort)-4):ncol(minmort))])
| CountryName | 1950 | 1951 | 1952 | 1953 | 1954 | 2011 | 2012 | 2013 | 2014 | 2015 |
|---|---|---|---|---|---|---|---|---|---|---|
| Luxembourg | NA | NA | NA | NA | NA | 2.3 | 2.1 | 2.0 | 2.0 | 1.9 |
| Iceland | NA | 29.2 | 27.8 | 26.5 | 25.3 | 2.3 | 2.2 | 2.1 | 2.1 | 2.0 |
| Finland | NA | 42.0 | 40.5 | 38.9 | 37.3 | 2.9 | 2.7 | 2.6 | 2.4 | 2.3 |
| Norway | 32.8 | 30.8 | 29.1 | 27.7 | 26.6 | 3.1 | 3.0 | 2.8 | 2.7 | 2.6 |
| Slovenia | NA | NA | NA | NA | NA | 3.1 | 3.0 | 2.8 | 2.7 | 2.6 |
Technically, we cannot take a simple average of all child mortality rates. Firstly, since different countries start reporting their rates at different times, which means that the entry of a country could increase the average, giving th impression of a change, while it is just the addition of a level that is not close to the mean. Secondly, we assume, that all countries have the same population, which is clearly not true. However, we still consider an average over all countrie’s mortality rate to be an intersting indicator:
#use summarise to create global view
avg_mortal <- round(mortal %>%
summarise_if(is.numeric, mean, na.rm = TRUE),1)
kable(avg_mortal[,c(1:5, (ncol(avg_mortal)-4):ncol(avg_mortal))])
| 1950 | 1951 | 1952 | 1953 | 1954 | 2011 | 2012 | 2013 | 2014 | 2015 |
|---|---|---|---|---|---|---|---|---|---|
| 151.7 | 158.1 | 160.9 | 164.4 | 154.8 | 36.8 | 35.4 | 34.1 | 32.9 | 31.8 |
#create long data in order to plot the global mean
avg_mortal_long <- tidyr::gather(avg_mortal, "year", "AvgMortRate")
avg_mortal_long$year <- as.numeric(avg_mortal_long$year)
colors <- c("Average Mortality Rate" = "black")
ggplot(avg_mortal_long, aes(x = year)) +
geom_line(aes(y = AvgMortRate, color = "Average Mortality Rate",group = 1), size = 1.2) +
labs(x = "year",
y = "average mortality rate",
color = "Legend") +
scale_color_manual(values = colors) +
ggtitle("Average Mortality Rate") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(breaks=seq(1950, 2015,5))
We can see clearly that the unweighted average under 5 year mortality rate has decreased drastically from around 160 per 1000 children to almost 30. from 1950.
We want to investigate the largest movers between 1995 and 2015. We chose this time frame since all countries have complete data as of 1990.
#make long data with gather in order to investigate
mortal_long <- tidyr::gather(mortal, "Year", "Mortality", -CountryName)
#kable(head(mortal_long))
We create a dataframe with countries in the rows and a column for each year:
#choose 1990 to 2015 as span (since complete)
mortal_span <- mortal_long %>%
dplyr::filter(Year == '1990'| Year == '2015')
#move the years into 2 columns
mortal_span <- tidyr::spread(mortal_span,Year,Mortality)
kable(head(mortal_span))
| CountryName | 1990 | 2015 |
|---|---|---|
| Afghanistan | 181.0 | 91.1 |
| Albania | 40.6 | 14.0 |
| Algeria | 46.8 | 25.5 |
| Andorra | 8.5 | 2.8 |
| Angola | 226.0 | 156.9 |
| Antigua & Barbuda | 25.5 | 8.1 |
We then add 2 columns: one for percentage and one for absolute change.
#create percentage change column
mortal_span$percchange <- (mortal_span$'1990' - mortal_span$'2015') / mortal_span$'1990'
#create absolute change column
mortal_span$abschange <- (mortal_span$'1990' - mortal_span$'2015')
kable(head(mortal_span))
| CountryName | 1990 | 2015 | percchange | abschange |
|---|---|---|---|---|
| Afghanistan | 181.0 | 91.1 | 0.4966851 | 89.9 |
| Albania | 40.6 | 14.0 | 0.6551724 | 26.6 |
| Algeria | 46.8 | 25.5 | 0.4551282 | 21.3 |
| Andorra | 8.5 | 2.8 | 0.6705882 | 5.7 |
| Angola | 226.0 | 156.9 | 0.3057522 | 69.1 |
| Antigua & Barbuda | 25.5 | 8.1 | 0.6823529 | 17.4 |
#show biggest percentage change
kable(head(arrange(mortal_span, desc(abschange))))
| CountryName | 1990 | 2015 | percchange | abschange |
|---|---|---|---|---|
| Niger | 328.2 | 95.5 | 0.7090189 | 232.7 |
| Liberia | 255.0 | 69.9 | 0.7258824 | 185.1 |
| Malawi | 242.4 | 64.0 | 0.7359736 | 178.4 |
| Mozambique | 239.7 | 78.5 | 0.6725073 | 161.2 |
| South Sudan | 253.2 | 92.6 | 0.6342812 | 160.6 |
| Ethiopia | 204.6 | 59.2 | 0.7106549 | 145.4 |
We can see that the largest change has happened in Africa, which is to be expected, given the high child mortality rate in our start year 1990 for most Afrikan countries compared to the rest of the world.
#show smallest percentage change
kable(head(arrange(mortal_span, abschange)))
| CountryName | 1990 | 2015 | percchange | abschange |
|---|---|---|---|---|
| Niue | 13.8 | 23.0 | -0.6666667 | -9.2 |
| Dominica | 17.1 | 21.2 | -0.2397661 | -4.1 |
| Lesotho | 88.1 | 90.2 | -0.0238365 | -2.1 |
| Brunei | 12.2 | 10.2 | 0.1639344 | 2.0 |
| Seychelles | 16.5 | 13.6 | 0.1757576 | 2.9 |
| Canada | 8.3 | 4.9 | 0.4096386 | 3.4 |
For the country with the smallest change we see a large proportion of island countries.
Specifically Niue and Dominica are intersting, so we will plot their entire history below:
mortal_ND <- mortal_long %>%
dplyr::filter(CountryName == 'Niue'| CountryName == 'Dominica')
mortal_ND2 <- tidyr::spread(mortal_ND,CountryName,Mortality)
mortal_ND2$Year <- as.numeric(mortal_ND2$Year)
#defining our color scheme and legend names:
colors <- c("Niue" = "blue", "Dominica" = "green")
#using ggplot with two geom_lines:
ggplot(mortal_ND2, aes(x = Year)) +
geom_line(aes(y = Niue, color = "Niue"), size = 1.2) +
geom_line(aes(y = Dominica, color = "Dominica"), size = 1.2) +
labs(x = "Date",
y = "Under 5 year Mortality",
color = "Legend") +
scale_color_manual(values = colors) +
ggtitle("Under 5 year Mortality for Niue and Dominica") +
theme(plot.title = element_text(hjust = 0.5))
Dominica has a long history and we can see that compared to the large reduction the uptick in recent years is relatively muted.
For Niue it does seem less clear and therefore I have done some research about this 1,400 person country. I have found a long statistical report which finds the below reasoning of uncertainty in the data which can have a large impact on a very small overall population:
“When aggregated over 5 years, under 5 mortality in Niue is shown to have increased slightly, since the earliest period as shown in the graph, although there are no clear trends. This however primarily reflects a growing level of uncertainty in the figures (95% confidence intervals are shown as the upright bars) due to a substantial decline in the overall number of births resulting in smallerdenominator when calculating IMR and U5Mrather than a true increase in childhood deaths. These figures clearly demonstrate the potential for uncertainty due to small numbers even when aggregated over several years, and the need for data interpretation when reporting mortality measures for policy.”
We have seen that Africa shows both the highest child mortality but also shows the fastest rate of improvement.
We stumbled upon some intertsing irregularities in the case of Niue which can partially be explained by data uncertainty.
source of the Niue report: https://prism.spc.int/images/VitalStatistics/Niue_VITAL_STATISTICS_REPORT-1.pdf