This data set was posted by Raj Kumar. Childmortality.org is an organization that publishes Child Mortality estimates for all the countries around the world. They provide all available data and the latest child mortality estimates for each country based on the research of the UN Inter-agency Group for Child Mortality Estimation.
The data is in a very wide format and contains six variables with values of interest. Each of these variables are concatinated to each year from 1950 to 2016, resulting in 405 columns. These key variables are
Under-5 (0-4 years) mortality Infant (0-1 years) mortality Neonatal (0-1 month) mortality Number of under-5 deaths Number of infant deaths Number of neonatal deaths
As suggested by Raj, We can read this data into a data frame and subset the data set to the median estimate for each country. We need to also handle null values of the data. Then we can convert this data into long format with 4 variables country, year, category and their respective value. This will make it easier to analyze the data."
My goals with this data set are as follows:
(-) Load, Tidy, and transform the data.
(-) Map Infant Mortality Rate across the globe.
(-) Map and compare the change in Infant Mortality Rates
(-) Create time series plots of Infant Mortality for the extremes.
library("tidyverse")## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.8.0 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("stringr")
library("DT")
library("rworldmap")## Loading required package: sp
## ### Welcome to rworldmap ###
## For a short introduction type : vignette('rworldmap')
data <- read.csv("data/ChildMortality.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE)str(data)## 'data.frame': 585 obs. of 405 variables:
## $ ISO.Code : chr "AFG" "AFG" "AFG" "AGO" ...
## $ CountryName : chr "Afghanistan" "Afghanistan" "Afghanistan" "Angola" ...
## $ Uncertainty.bounds. : chr "Lower" "Median" "Upper" "Lower" ...
## $ U5MR.1950 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1951 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1952 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1953 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1954 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1955 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1956 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1957 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1958 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1959 : num NA NA NA NA NA NA NA NA NA NA ...
## $ U5MR.1960 : num 308 364 427 NA NA ...
## $ U5MR.1961 : num 307 358 414 NA NA ...
## $ U5MR.1962 : num 306 352 404 NA NA ...
## $ U5MR.1963 : num 303 346 394 NA NA ...
## $ U5MR.1964 : num 299 340 386 NA NA ...
## $ U5MR.1965 : num 295 334 378 NA NA ...
## $ U5MR.1966 : num 290 328 372 NA NA ...
## $ U5MR.1967 : num 286 323 365 NA NA ...
## $ U5MR.1968 : num 281 317 359 NA NA ...
## $ U5MR.1969 : num 276 312 353 NA NA ...
## $ U5MR.1970 : num 270 306 348 NA NA ...
## $ U5MR.1971 : num 265 300 342 NA NA ...
## $ U5MR.1972 : num 261 295 336 NA NA ...
## $ U5MR.1973 : num 255 289 330 NA NA ...
## $ U5MR.1974 : num 250 283 323 NA NA ...
## $ U5MR.1975 : num 245 277 316 NA NA ...
## $ U5MR.1976 : num 240 271 308 NA NA ...
## $ U5MR.1977 : num 235 264 301 NA NA ...
## $ U5MR.1978 : num 230 258 293 NA NA ...
## $ U5MR.1979 : num 224 252 285 NA NA ...
## $ U5MR.1980 : num 218 245 277 181 236 ...
## $ U5MR.1981 : num 213 238 269 185 234 ...
## $ U5MR.1982 : num 207 232 262 188 230 ...
## $ U5MR.1983 : num 201 225 253 190 228 ...
## $ U5MR.1984 : num 196 218 245 192 225 ...
## $ U5MR.1985 : num 190 211 237 193 224 ...
## $ U5MR.1986 : num 184 204 228 194 223 ...
## $ U5MR.1987 : num 179 197 219 195 222 ...
## $ U5MR.1988 : num 173 190 210 196 221 ...
## $ U5MR.1989 : num 168 184 202 197 221 ...
## $ U5MR.1990 : num 162 177 194 198 221 ...
## $ U5MR.1991 : num 157 171 186 198 222 ...
## $ U5MR.1992 : num 152 165 180 199 223 ...
## $ U5MR.1993 : num 147 160 173 200 224 ...
## $ U5MR.1994 : num 142 154 168 201 224 ...
## $ U5MR.1995 : num 138 150 162 200 223 ...
## $ U5MR.1996 : num 134 145 157 199 222 ...
## $ U5MR.1997 : num 130 141 152 196 220 ...
## $ U5MR.1998 : num 127 137 148 192 216 ...
## $ U5MR.1999 : num 124 133 144 188 212 ...
## $ U5MR.2000 : num 120 130 140 183 207 ...
## $ U5MR.2001 : num 117 126 136 176 201 ...
## $ U5MR.2002 : num 113 122 132 170 194 ...
## $ U5MR.2003 : num 110 118 128 161 186 ...
## $ U5MR.2004 : num 106 114 124 152 177 ...
## $ U5MR.2005 : num 102 110 120 141 167 ...
## $ U5MR.2006 : num 98.4 106.3 115 129.2 157.5 ...
## $ U5MR.2007 : num 94.4 102.2 110.8 117.2 147.6 ...
## $ U5MR.2008 : num 90.3 98.2 106.6 105.6 137.9 ...
## $ U5MR.2009 : num 86.1 94.1 102.9 94.3 128.3 ...
## $ U5MR.2010 : num 81.7 90.2 99.4 83.1 119.4 ...
## $ U5MR.2011 : num 77.3 86.4 96.3 73 111 ...
## $ U5MR.2012 : num 72.7 82.8 93.4 63.8 103.5 ...
## $ U5MR.2013 : num 68.2 79.3 90.7 56.1 96.8 ...
## $ U5MR.2014 : num 64 76.1 88.1 50 91.2 ...
## $ U5MR.2015 : num 60.2 73.2 86.1 45 86.5 ...
## $ U5MR.2016 : num 56.6 70.4 84.7 41.2 82.5 147 7.3 13.5 24.8 1.6 ...
## $ IMR.1950 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1951 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1952 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1953 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1954 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1955 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1956 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1957 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1958 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1959 : num NA NA NA NA NA NA NA NA NA NA ...
## $ IMR.1960 : num 206 246 292 NA NA ...
## $ IMR.1961 : num 206 241 283 NA NA ...
## $ IMR.1962 : num 205 237 275 NA NA ...
## $ IMR.1963 : num 203 233 268 NA NA ...
## $ IMR.1964 : num 200 228 262 NA NA ...
## $ IMR.1965 : num 197 224 256 NA NA ...
## $ IMR.1966 : num 194 220 252 NA NA ...
## $ IMR.1967 : num 191 216 247 NA NA ...
## $ IMR.1968 : num 187 213 242 NA NA ...
## $ IMR.1969 : num 184 209 238 NA NA ...
## $ IMR.1970 : num 180 205 234 NA NA ...
## $ IMR.1971 : num 177 201 230 NA NA ...
## $ IMR.1972 : num 174 197 226 NA NA ...
## $ IMR.1973 : num 170 193 221 NA NA ...
## $ IMR.1974 : num 167 189 217 NA NA ...
## $ IMR.1975 : num 164 185 212 NA NA ...
## $ IMR.1976 : num 160 181 207 NA NA ...
## $ IMR.1977 : num 157 176 201 NA NA ...
## $ IMR.1978 : num 153 172 196 NA NA ...
## [list output truncated]
The structure reveals a data set with 405 variables. Variables in the form Under.five.Deaths.1955 will pose a problem for analysis because it represents two variables, category and year.
gather function to convert variables of the form variable.name.year to rows. This should result in a variable called YearCat (preserving the current format), and variable called Rate.data <- gather(data, "YearCat", "Rate", 4:405)YearCatvariable into Category and Year.data$Category <- str_sub(data$YearCat,1,-6)
data$Year <- strtoi(str_sub(data$YearCat, -4))
datatable( tail(data), options = list(filter = FALSE),filter="none" )# filter the data to match criteria
mdata <- filter(data, `Category` == "IMR", `Uncertainty.bounds.` == "Median", Year == 1980)
# match country polygon to country code in data set
sPDF <- joinCountryData2Map( mdata,joinCode = "ISO3", nameJoinColumn = "ISO.Code")## 195 codes from your data successfully matched countries in the map
## 0 codes from your data failed to match with a country code in the map
## 48 codes from the map weren't represented in your data
mapParams <- mapCountryData(sPDF,nameColumnToPlot='Rate',
missingCountryCol = NA,
addLegend ='FALSE',
mapTitle = "Infant Mortality Rate Per 1000 \n Year: 1980")
do.call( addMapLegend, c( mapParams, legendLabels = "all", legendWidth = 1.5 ) )# filter the data to match criteria
mdata <- filter(data, `Category` == "IMR", `Uncertainty.bounds.` == "Median", Year == 2016)
# match country polygon to country code in data set
sPDF <- joinCountryData2Map( mdata,joinCode = "ISO3", nameJoinColumn = "ISO.Code")## 195 codes from your data successfully matched countries in the map
## 0 codes from your data failed to match with a country code in the map
## 48 codes from the map weren't represented in your data
mapParams <- mapCountryData(sPDF,nameColumnToPlot='Rate',
missingCountryCol = NA,
addLegend ='FALSE',
mapTitle = "Infant Mortality Rate Per 1000 \n Year: 2016")
do.call( addMapLegend, c( mapParams, legendLabels = "all", legendWidth = 1.5 ) )The maps show a stunning decrease in Infant Mortality Rates from 1980 to 2016. This is evident by looking at the scales, where the both minimum and maximum values decreased from 7.1 to 1.6 and 177 to 88.5 respectively. Africa is still plagued by relatively high infant mortaility rates after all those years.
Europe, North America (excluding Mexico), Russia, and Australia have lowest rates. China is the only “world power” with questionable Infant Mortality Rates.
# calculate the difference in IMR between 1980 and 2016
mData <- select(data, -YearCat) %>%
filter( `Category` == "IMR", `Uncertainty.bounds.` == "Median", Year == 1980 | Year == 2016 ) %>%
spread( Year, Rate ) %>%
mutate( ISO.Code = ISO.Code, Rate = `1980` - `2016` )
# match country polygon to country code in data set
sPDF <- joinCountryData2Map( data.frame(mData),joinCode = "ISO3", nameJoinColumn = "ISO.Code" )## 195 codes from your data successfully matched countries in the map
## 0 codes from your data failed to match with a country code in the map
## 48 codes from the map weren't represented in your data
mapParams <- mapCountryData(sPDF,nameColumnToPlot='Rate',
numCats = 50,
missingCountryCol = NA,
colourPalette = 'diverging',
addLegend ='FALSE',
mapTitle = "Change in Infant Mortality Rate Per 1000 \n Year: 1980 - 2016")
do.call( addMapLegend, c( mapParams, legendLabels = "all", legendWidth = 1.5 ) )The map shows that the highest decreases in Infant Mortality Rates were in countries with previously very high rates. These are concentrated in Africa and the Middle-East. However, the map does a poor a job of displaying countries with increases in Infant Mortality Rates. The map legend suggests that at least one country had an increase in Infant Mortality of 16.3.
datatable(filter(mData, Rate < 0 ), options = list(filter = FALSE),filter="none")Dominica is the only country that had a higher Infant Mortality Rate in 2016 when compared to 1980!
pData <- filter(data,
`ISO.Code` == arrange( mData, `Rate` )$`ISO.Code`[1] | `ISO.Code` == arrange( mData, desc( `Rate` ) )$`ISO.Code`[1],
`Category` == "IMR",
`Uncertainty.bounds.` == "Median",
!is.na(`Rate`) )
ggplot( pData, aes(x = Year, y = Rate) ) +
labs( title = "Infant Mortality Rate" ) +
geom_line( aes(color = `ISO.Code`), size = 1 )The plots are almost the inverse of each other. Dominica’s rate stablilized in the 1980’s but began increasing at an alarming rate around 2005. Mozambique’s rate trended downwards throughout. Mozambique probabilty had the largest deacrease due to its very high initial rate
Much more can be done with this dataset. It would be interesting to have some relating fincancial/economic and management data to guage the effectiveness of methodoligies used to combat the issue of Infant Mortality.
Tools in the Tidyverse were particularly useful when this data set ballooned from 585 to over 200,000 observations.