UNICEF dataset on Under 5 Mortality

Discussion thread created by : Samuel Bellows

1. Introduction

UNICEF dataset that gives the under 5 mortality for many countries across the years 1950-2015. The problem is that the year variable is spread out into 65 different columns, 1 for each year, that need to be gathered into 1 column. In order to make this dataset tiny we would gather the year columns into one column until we had a 3 column dataset of Country name, Year, and Mortality.

2. Load library

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

3. Data load and cleaning

Data is stored in the Github and loaded data from Github to Rstudio using read.csv() method.

3.1 Gather year from 1950 to 2015

This dataset year has given from 1950 to 2015. Each year mentioned as a column. Using tidyr convert these columns to Year column.

4. Analysis

Mortality based on year:

  • Year: 1950, Highest/lowest mortality
  • Year: 2015 Highest/lowest mortality

4.1 Highest mortality in the year 1950

4.3 Highest mortality in 2015

5. Conclusion

The plot 4.2, plot 4.4, and plot 4.5 shows mortality goes down year over year.

  • In the year 1950, the African continent countries have more mortality.
  • In the year 2015, developed countries have fewer mortality as compared to undeveloped or developing countries.
  • The average mortality for 10 developed countries: Sweden has the least average and Singapore has the highest average.
  • The boxplot (plot 4.7) shows the mortality of the United States from 1950 to 2015.
  • The mortality of the United States has gone down year over year.