Investigation into Infant Mortality

The background

On the advent of my first son, Aashay, the discussions around the dinner table with the grandparents center on the massive difference in the way they were brought up. Activities taken for granted in our generation living in the US - sterilization of bottles & feeding equipment, usage of hypoallergenic washing machine soap, the constant bombardment of hand sanitizer - are a stark contrast to the grandparent’s generation. Furthermore, they said all the knowledge about baby-care, the dos and don’ts were solely passed down by the previous generation. There was no Google.

And yet, they have survived. “Do you really need all this? We made it alright; our parents didn’t have santizers… we didn’t even use hand soap most of the times. We’re still alive and well!”, said my son’s grandparents to me.

My counter argument was to look at average infant mortality: it did not matter that only they survived (well, I’m extremely pleased they survived, obviously), but the effect of improved infant care must be evaluated by looking at mortality rates averaged across populations.

What a great opportunity to do some data mining, and attempt my first Rmarkdown code.

Where can we find this data?

Infant mortality rates can be easily found, among other things, at the World Bank. Raw data in CSV format can be downloaded here: Link.

There are two files of interest:

Metadata_Country_09508e1d-4bb1-419a-a275-77169f9cc773_v2.csv - This file holds references between 3-letter country code, country name, region and income group.
09508e1d-4bb1-419a-a275-77169f9cc773_v2.csv - This file holds the raw data (in an untidy format) for country code, country name, and mortality rates saved in one column for each year between 1960 and 2015.

Infant mortality is defined as the number of deaths per 1000 births, within 1 year of birth.

Import the data and have a look

First, I include the libraries I love to work with:

library('dplyr')
library('tidyr')
library('ggplot2')
library('ggrepel')
library('Amelia')
library('rworldmap')
library('RColorBrewer')

I import these files and have a quick look at their content:

country <- tbl_df(read.csv(file = '2011-2015/Metadata_Country_09508e1d-4bb1-419a-a275-77169f9cc773_v2.csv'))
glimpse(country)

## Observations: 247
## Variables: 6
## $ Country.Name (fctr) Aruba, Afghanistan, Angola, Albania, Andorra, Ar...
## $ Country.Code (fctr) ABW, AFG, AGO, ALB, AND, ARB, ARE, ARG, ARM, ASM...
## $ Region       (fctr) Latin America & Caribbean, South Asia, Sub-Sahar...
## $ IncomeGroup  (fctr) High income: nonOECD, Low income, Upper middle i...
## $ SpecialNotes (fctr) SNA data for 2000-2011 are updated from official...
## $ X            (lgl) NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

data <- tbl_df(read.csv('2011-2015/09508e1d-4bb1-419a-a275-77169f9cc773_v2.csv',header = T,skip=3))
glimpse(data)

## Observations: 248
## Variables: 61
## $ Country.Name   (fctr) Aruba, Andorra, Afghanistan, Angola, Albania, ...
## $ Country.Code   (fctr) ABW, AND, AFG, AGO, ALB, ARB, ARE, ARG, ARM, A...
## $ Indicator.Name (fctr) Mortality rate, infant (per 1,000 live births)...
## $ Indicator.Code (fctr) SP.DYN.IMRT.IN, SP.DYN.IMRT.IN, SP.DYN.IMRT.IN...
## $ X1960          (dbl) NA, NA, NA, NA, NA, 159.9502, 137.3000, NA, NA,...
## $ X1961          (dbl) NA, NA, 240.5000, NA, NA, 155.3136, 130.5000, N...
## $ X1962          (dbl) NA, NA, 236.300, NA, NA, 152.233, 124.000, NA, ...
## $ X1963          (dbl) NA, NA, 232.300, NA, NA, 156.522, 117.400, NA, ...
## $ X1964          (dbl) NA, NA, 228.5000, NA, NA, 152.1666, 111.0000, N...
## $ X1965          (dbl) NA, NA, 224.6000, NA, NA, 148.0678, 104.4000, N...
## $ X1966          (dbl) NA, NA, 220.7000, NA, NA, 144.1876, 97.5000, NA...
## $ X1967          (dbl) NA, NA, 217.0000, NA, NA, 140.5364, 90.6000, NA...
## $ X1968          (dbl) NA, NA, 213.3000, NA, NA, 137.0936, 83.8000, NA...
## $ X1969          (dbl) NA, NA, 209.8000, NA, NA, 133.7045, 77.2000, 60...
## $ X1970          (dbl) NA, NA, 206.100, NA, NA, 130.551, 70.900, 59.50...
## $ X1971          (dbl) NA, NA, 202.2000, NA, NA, 127.4003, 65.1000, 58...
## $ X1972          (dbl) NA, NA, 198.2000, NA, NA, 123.4264, 59.6000, 58...
## $ X1973          (dbl) NA, NA, 194.3000, NA, NA, 119.8548, 54.7000, 56...
## $ X1974          (dbl) NA, NA, 190.3000, NA, NA, 116.0879, 50.2000, 55...
## $ X1975          (dbl) NA, NA, 186.60, NA, NA, 111.78, 45.90, 53.20, N...
## $ X1976          (dbl) NA, NA, 182.6000, NA, NA, 107.6929, 41.9000, 50...
## $ X1977          (dbl) NA, NA, 178.7000, NA, NA, 103.4666, 38.2000, 47...
## $ X1978          (dbl) NA, NA, 174.50000, NA, 73.00000, 99.17863, 34.8...
## $ X1979          (dbl) NA, NA, 170.40000, NA, 68.40000, 94.93588, 31.7...
## $ X1980          (dbl) NA, NA, 166.10000, 138.30000, 64.00000, 90.7236...
## $ X1981          (dbl) NA, NA, 161.80000, 137.50000, 59.90000, 86.4385...
## $ X1982          (dbl) NA, NA, 157.50000, 136.80000, 56.10000, 82.0634...
## $ X1983          (dbl) NA, NA, 153.20000, 136.00000, 52.40000, 78.9439...
## $ X1984          (dbl) NA, NA, 148.7000, 135.3000, 49.1000, 74.7107, 2...
## $ X1985          (dbl) NA, NA, 144.50000, 134.90000, 45.90000, 70.7542...
## $ X1986          (dbl) NA, NA, 140.20000, 134.40000, 43.20000, 67.3163...
## $ X1987          (dbl) NA, NA, 135.70000, 134.10000, 40.80000, 64.3025...
## $ X1988          (dbl) NA, NA, 131.30000, 133.80000, 38.60000, 61.7870...
## $ X1989          (dbl) NA, NA, 126.80000, 133.60000, 36.70000, 59.6437...
## $ X1990          (dbl) NA, 7.5000, 122.5000, 133.5000, 35.1000, 57.802...
## $ X1991          (dbl) NA, 7.00000, 118.30000, 133.50000, 33.70000, 56...
## $ X1992          (dbl) NA, 6.50000, 114.40000, 133.50000, 32.50000, 54...
## $ X1993          (dbl) NA, 6.10000, 110.90000, 133.40000, 31.40000, 53...
## $ X1994          (dbl) NA, 5.60000, 107.70000, 133.20000, 30.30000, 51...
## $ X1995          (dbl) NA, 5.20000, 105.00000, 132.80000, 29.10000, 50...
## $ X1996          (dbl) NA, 5.00000, 102.70000, 132.30000, 27.90000, 49...
## $ X1997          (dbl) NA, 4.60000, 100.70000, 131.50000, 26.80000, 47...
## $ X1998          (dbl) NA, 4.30000, 98.90000, 130.60000, 25.50000, 46....
## $ X1999          (dbl) NA, 4.10000, 97.20000, 129.50000, 24.40000, 45....
## $ X2000          (dbl) NA, 3.90000, 95.40000, 128.30000, 23.20000, 44....
## $ X2001          (dbl) NA, 3.70000, 93.40000, 126.90000, 22.10000, 42....
## $ X2002          (dbl) NA, 3.5000, 91.2000, 125.5000, 21.0000, 41.6506...
## $ X2003          (dbl) NA, 3.30000, 89.00000, 124.10000, 20.00000, 40....
## $ X2004          (dbl) NA, 3.2000, 86.7000, 122.8000, 19.1000, 39.2494...
## $ X2005          (dbl) NA, 3.10000, 84.40000, 121.20000, 18.30000, 38....
## $ X2006          (dbl) NA, 2.90000, 82.30000, 119.40000, 17.40000, 36....
## $ X2007          (dbl) NA, 2.80000, 80.40000, 117.10000, 16.70000, 35....
## $ X2008          (dbl) NA, 2.70000, 78.60000, 114.70000, 16.00000, 34....
## $ X2009          (dbl) NA, 2.60000, 76.80000, 112.20000, 15.40000, 33....
## $ X2010          (dbl) NA, 2.5000, 75.1000, 109.6000, 14.8000, 32.2817...
## $ X2011          (dbl) NA, 2.40000, 73.40000, 106.80000, 14.30000, 31....
## $ X2012          (dbl) NA, 2.30000, 71.70000, 104.10000, 13.80000, 30....
## $ X2013          (dbl) NA, 2.20000, 69.90000, 101.40000, 13.30000, 29....
## $ X2014          (dbl) NA, 2.10000, 68.10000, 98.80000, 12.90000, 28.6...
## $ X2015          (dbl) NA, 2.10000, 66.30000, 96.00000, 12.50000, 27.9...
## $ X              (lgl) NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...

Clearly,

The two data sets need to be combined at some point
The data set ‘data’ is untidy and must be cleaned up before further processing

Tidy up the data

I need to get rid of all the columns from X1960 to X2015 and replace it with two columns: Year and MortalityRate. This is where the dplyr package truly shines. The gather() function very quickly converts the original data set into a tidy long format.

data.tidy <- data %>%
    gather(key = 'Year',value = 'MortalityRate',X1960:X2015)
glimpse(data.tidy)

## Observations: 13,888
## Variables: 7
## $ Country.Name   (fctr) Aruba, Andorra, Afghanistan, Angola, Albania, ...
## $ Country.Code   (fctr) ABW, AND, AFG, AGO, ALB, ARB, ARE, ARG, ARM, A...
## $ Indicator.Name (fctr) Mortality rate, infant (per 1,000 live births)...
## $ Indicator.Code (fctr) SP.DYN.IMRT.IN, SP.DYN.IMRT.IN, SP.DYN.IMRT.IN...
## $ X              (lgl) NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ Year           (chr) "X1960", "X1960", "X1960", "X1960", "X1960", "X...
## $ MortalityRate  (dbl) NA, NA, NA, NA, NA, 159.9502, 137.3000, NA, NA,...

The Indicator.Name and Indicator.Code columns are not useful. Let’s get rid of them using select()

data.tidy <- data.tidy %>%
    select(-c(X,Indicator.Name,Indicator.Code))
glimpse(data.tidy)

## Observations: 13,888
## Variables: 4
## $ Country.Name  (fctr) Aruba, Andorra, Afghanistan, Angola, Albania, A...
## $ Country.Code  (fctr) ABW, AND, AFG, AGO, ALB, ARB, ARE, ARG, ARM, AS...
## $ Year          (chr) "X1960", "X1960", "X1960", "X1960", "X1960", "X1...
## $ MortalityRate (dbl) NA, NA, NA, NA, NA, 159.9502, 137.3000, NA, NA, ...

The Year column is of class character and each numeric year value is preceded by an ‘X’. Let’s cleanup this column. Using substr() to eliminate the ‘X’, followed by as.factor() to cast it as a factor variable.

data.tidy$Year <- substr(data.tidy$Year,start = 2,stop = 5)
data.tidy$Year <- as.factor(data.tidy$Year)
glimpse(data.tidy)

## Observations: 13,888
## Variables: 4
## $ Country.Name  (fctr) Aruba, Andorra, Afghanistan, Angola, Albania, A...
## $ Country.Code  (fctr) ABW, AND, AFG, AGO, ALB, ARB, ARE, ARG, ARM, AS...
## $ Year          (fctr) 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960,...
## $ MortalityRate (dbl) NA, NA, NA, NA, NA, 159.9502, 137.3000, NA, NA, ...

Combine the two datasets

I am interested in bringing the Region and IncomeGroup variables to my data.tidy data set. First, I filter the country data set to these columns. Then, I use left_join() to merge the two data sets using Country.Code as the key.

country <- country %>%
    select(Country.Code,Region,IncomeGroup)
data.tidy <- data.tidy %>% 
    left_join(country,by = 'Country.Code')

## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector

data.tidy$Country.Code <- as.factor(data.tidy$Country.Code)
str(data.tidy)

## Classes 'tbl_df', 'tbl' and 'data.frame':    13888 obs. of  6 variables:
##  $ Country.Name : Factor w/ 248 levels "Afghanistan",..: 11 5 1 6 2 8 234 9 10 4 ...
##  $ Country.Code : Factor w/ 248 levels "ABW","AFG","AGO",..: 1 5 2 3 4 6 7 8 9 10 ...
##  $ Year         : Factor w/ 56 levels "1960","1961",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MortalityRate: num  NA NA NA NA NA ...
##  $ Region       : Factor w/ 8 levels "","East Asia & Pacific",..: 4 3 7 8 3 1 5 4 3 2 ...
##  $ IncomeGroup  : Factor w/ 6 levels "","High income: nonOECD",..: 2 2 4 6 6 1 2 2 5 6 ...

#global data.. regions
ggplot(data.tidy,aes(x=Region))+geom_bar(stat='count')+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,size = 10))

#global data.. IncomeGroup
ggplot(data.tidy,aes(x=IncomeGroup))+geom_bar(stat='count')+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,size = 10))

Both Region and IncomeGroup have a few NA values. What are these NA values?

data.tidy[data.tidy$Region=='<NA>',]

## Source: local data frame [56 x 6]
## 
##    Country.Name Country.Code   Year MortalityRate Region IncomeGroup
##          (fctr)       (fctr) (fctr)         (dbl) (fctr)      (fctr)
## 1            NA           NA     NA            NA     NA          NA
## 2            NA           NA     NA            NA     NA          NA
## 3            NA           NA     NA            NA     NA          NA
## 4            NA           NA     NA            NA     NA          NA
## 5            NA           NA     NA            NA     NA          NA
## 6            NA           NA     NA            NA     NA          NA
## 7            NA           NA     NA            NA     NA          NA
## 8            NA           NA     NA            NA     NA          NA
## 9            NA           NA     NA            NA     NA          NA
## 10           NA           NA     NA            NA     NA          NA
## ..          ...          ...    ...           ...    ...         ...

data.tidy[data.tidy$IncomeGroup=='<NA>',]

## Source: local data frame [56 x 6]
## 
##    Country.Name Country.Code   Year MortalityRate Region IncomeGroup
##          (fctr)       (fctr) (fctr)         (dbl) (fctr)      (fctr)
## 1            NA           NA     NA            NA     NA          NA
## 2            NA           NA     NA            NA     NA          NA
## 3            NA           NA     NA            NA     NA          NA
## 4            NA           NA     NA            NA     NA          NA
## 5            NA           NA     NA            NA     NA          NA
## 6            NA           NA     NA            NA     NA          NA
## 7            NA           NA     NA            NA     NA          NA
## 8            NA           NA     NA            NA     NA          NA
## 9            NA           NA     NA            NA     NA          NA
## 10           NA           NA     NA            NA     NA          NA
## ..          ...          ...    ...           ...    ...         ...

Looks like NAs all across the board for 56 rows. I’ll get rid of these. Since the NAs are common between Region & IncomeGroup, I only need to use one of these variables in the filter.

data.tidy <- data.tidy %>%
    filter(IncomeGroup!='<NA>')

Investigating the blanks further, I find that these are not countries, but aggregate results of groups of Countries / Regions / IncomeGroups. unique() helps identify this. I’ll filter this data out.

unknown <- data.tidy %>% filter(IncomeGroup=='')
unique(unknown$Country.Name)

##  [1] Arab World                                    
##  [2] Central Europe and the Baltics                
##  [3] Caribbean small states                        
##  [4] East Asia & Pacific (developing only)         
##  [5] East Asia & Pacific (all income levels)       
##  [6] Europe & Central Asia (developing only)       
##  [7] Europe & Central Asia (all income levels)     
##  [8] Euro area                                     
##  [9] European Union                                
## [10] Fragile and conflict affected situations      
## [11] High income                                   
## [12] Heavily indebted poor countries (HIPC)        
## [13] Latin America & Caribbean (developing only)   
## [14] Latin America & Caribbean (all income levels) 
## [15] Least developed countries: UN classification  
## [16] Low income                                    
## [17] Lower middle income                           
## [18] Low & middle income                           
## [19] Middle East & North Africa (all income levels)
## [20] Middle income                                 
## [21] Middle East & North Africa (developing only)  
## [22] North America                                 
## [23] High income: nonOECD                          
## [24] High income: OECD                             
## [25] OECD members                                  
## [26] Other small states                            
## [27] Pacific island small states                   
## [28] South Asia                                    
## [29] Sub-Saharan Africa (developing only)          
## [30] Sub-Saharan Africa (all income levels)        
## [31] Small states                                  
## [32] Upper middle income                           
## [33] World                                         
## 248 Levels: Afghanistan Albania Algeria American Samoa Andorra ... Zimbabwe

data.tidy <- data.tidy %>% 
    filter(IncomeGroup!='')

Are there any missing values? Let’s have a look using missmap() from the Amelia package.

data.tidy <- data.tidy %>%
    arrange(Year,Country.Code)
missmap(data.tidy, main="Missings Map", col=c("yellow", "black"), legend=FALSE,y.cex = .01)

Clearly, MortalityRates have quite a few missing values. Since the data is sorted by Year, it’s clear that data is missing for quite a few countries in the past (1960s…) compared to the present (2015). To find out which Regions do not have data, let’s plot a Pareto chart of the missing MortalityRate values.

#Filter data which have NA in the MortalityRate, group by Region and arrange in descending order
data.isna <- data.tidy %>%
    group_by(Region) %>% 
    filter(is.na(MortalityRate)) %>%
    tally(sort = T) %>%
    ungroup()  %>% 
    arrange(desc(n))

data.isna$Region <- factor(data.isna$Region,
                           levels=data.isna$Region[order(data.isna$n,decreasing = T)])

ggplot(data.isna,aes(x=Region,y=n))+geom_bar(stat='identity')+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,size = 9))+
    ggtitle('Missing values for Mortality Rate by Region')+
    ylab('')+xlab('')

Europe & Central Asia, East Asia & Pacific, and Latin America & Carribean have the highest amount of missing values. This is to be expected, since these regions have a large number of underdeveloped & developing countries, which may not have participated in submitting data to The World Bank. This is confirmed by filtering and sorting Country.Name as follows:

#for countryname...
data.isna <- data.tidy %>%
    group_by(Country.Name) %>% 
    filter(is.na(MortalityRate)) %>%
    tally(sort = T) %>%
    ungroup()  %>% 
    arrange(desc(n))

data.isna$Country.Name <- factor(data.isna$Country.Name,
                                 levels=data.isna$Country.Name[order(data.isna$n,decreasing = T)])

data.isna %>% print(n=20)

## Source: local data frame [116 x 2]
## 
##                 Country.Name     n
##                       (fctr) (int)
## 1             American Samoa    56
## 2                      Aruba    56
## 3                    Bermuda    56
## 4             Cayman Islands    56
## 5            Channel Islands    56
## 6                    Curacao    56
## 7             Faeroe Islands    56
## 8           French Polynesia    56
## 9                  Greenland    56
## 10                      Guam    56
## 11      Hong Kong SAR, China    56
## 12               Isle of Man    56
## 13                    Kosovo    56
## 14             Liechtenstein    56
## 15          Macao SAR, China    56
## 16             New Caledonia    56
## 17  Northern Mariana Islands    56
## 18               Puerto Rico    56
## 19 Sint Maarten (Dutch part)    56
## 20  St. Martin (French part)    56
## ..                       ...   ...

To process the data further, I’ll get rid of the missing MortalityRate rows.

data.tidy <- data.tidy %>%
    filter(!is.na(MortalityRate))

What does the data tell us?

The data’s scrubbed and ready for some dissecting! My favourite part!

First, let’s look at how MortalityRate has changed since 1960, separated by IncomeGroup. IncomeGroup is a factor defined by The World Bank for each country. The following plot isn’t very useful, even with alpha set to 1/4th for geom_point(). I get the idea that higher income countries have lower mortality rates as compared to lower income countries.

ggplot(data.tidy,aes(x=as.numeric(as.character(Year)),y=MortalityRate,color=IncomeGroup))+
    geom_point(alpha=.25)+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,size = 9))+
    scale_x_continuous(breaks = seq(1960, 2015, by = 5))

Changing to a line plot and using facet_grid() to dissect the data according to IncomeGroup gives a much better plot.

ggplot(data.tidy,aes(as.numeric(as.character(Year)),MortalityRate))+
    geom_line(aes(color=Country.Name))+
    facet_grid(.~IncomeGroup)+ 
    theme(legend.position="none")+
    theme(strip.text.x = element_text(size = 7))+
    labs(x='Year',y='Mortality Rate',title='Change in mortality rate over time')

This graph is most striking! Each line represents a country.

Even in 1960, the mean and spread of infant mortality of low/low-mid/upper-mid income group countries is significantly higher than the high income OECD and some of the nonOECD countries
The high income countries have made signifant improvements within a 20 year timeframe: 1960-1980. Notice the steeper slopes in this period. This is followed by a leveling off, almost asymptotic in nature.
Lower income countries have huge spreads in mortality rates, even today.
For some countries, mortality rates drop and then peak again - Reasons could vary from epidemics to droughts to wars or regime changes
There is a direct correlation between mortality rates and the income group. The discrepancy in average infant mortality rates is alarming.

data.tidy %>% filter(Year==2015) %>% group_by(IncomeGroup) %>% summarise(AvgMR=mean(MortalityRate)) %>% arrange(AvgMR)

## Source: local data frame [5 x 2]
## 
##            IncomeGroup     AvgMR
##                 (fctr)     (dbl)
## 1    High income: OECD  3.334375
## 2 High income: nonOECD  9.718519
## 3  Upper middle income 17.376923
## 4  Lower middle income 33.028000
## 5           Low income 54.125806

Another great way to look at this data is using the rworldmap package. Here are the mortality rates compared for year 2015.

data.2015 <- data.tidy %>% filter(Year==2015)
barplotCountryData(data.2015,'MortalityRate','Country.Name',scaleSameInPanels = T,numPanels = 5,cex = .9,main='Mortality Rates by Country for Year 2015')

Visually, we look at the year 2015 on a world map. Countries with missing data are shown in gray.

sPDF <- joinCountryData2Map(data.2015, joinCode = "ISO3",nameJoinColumn = "Country.Code")

## 192 codes from your data successfully matched countries in the map
## 0 codes from your data failed to match with a country code in the map
## 51 codes from the map weren't represented in your data

par(mai=c(0,0,0.4,0),xaxs="i",yaxs="i")
palette <-brewer.pal(7,'OrRd') 
mapParams <- mapCountryData(sPDF,
                             nameColumnToPlot="MortalityRate",
                             addLegend=FALSE,
                             colourPalette=palette,
                             oceanCol = 'lightblue',
                             missingCountryCol = 'gray')
do.call(addMapLegend, c(mapParams, 
                        legendWidth=0.4, 
                        legendShrink=.45,
                        legendMar = 22,
                        labelFontSize=.8, 
                        tcl=-.4))

How does my home country fare?

From the bar graph, I can see India is the 45th worse country in the world. How has my country done in the past 50 years?

india <- data.tidy %>%
    filter(Country.Name=='India')
min <- india %>% summarise(min(MortalityRate))
max <- india %>% summarise(max(MortalityRate))
aver10 <- ((max-min)/(max(as.numeric(india$Year))-min(as.numeric(india$Year))))*10

ggplot(india,aes(as.numeric(as.character(Year)),MortalityRate))+
    geom_point(color = 'red')+
    geom_text_repel(data=india[1,],label=india$MortalityRate[1],size=4,nudge_x = 4)+
    geom_text_repel(data=india[length(india$MortalityRate),],label=india$MortalityRate[length(india$MortalityRate)],size=4,nudge_x = -4)+
    labs(x='Years',y='Mortality Rate',title='Mortality Rate for India')+
    scale_x_continuous(breaks=seq(1960,2015,by=5))+
    annotate('text',x=2002,y=165,label=paste('Change in mortality rate per decade:',round(aver10,2)),size=4,color='red')

We have a reduction in mortality rate by ~23 deaths every 10 years. While this is encouraging, it seems like awfully slow progress to me. We are still at 38 deaths every 1000 births in my home country.

How does India compare against it’s geographical neighbours? Hopefully we are doing better. Let’s find out.

neighbors <- data.tidy %>% 
    filter(Country.Name %in% c('India','Pakistan','Bangladesh','China','Sri Lanka','Nepal','Bhutan','Maldives'))
ggplot(neighbors,aes(as.numeric(as.character(Year)),MortalityRate))+
    geom_line(aes(color=Country.Name,lty=Country.Name))+
    geom_point(aes(color=Country.Name,shape=Country.Name))+
    labs(x='Years',y='Mortality Rate',title='Mortality Rate for India\'s Neighbors')+
    scale_x_continuous(breaks = seq(1960, 2015, by = 15))

This plot shows India’s performance against our seven neighbours. We are clearly not the best performing nation, not even average. As of 2015, India is only second to Pakistan among our neighbours. Surprisingly, Maldives has done wonderfully since 1960; so has Sri Lanka. Unsurprisingly, China has a low mortality rate, given their 1-child policy - one takes care of their only offspring. As a corollary to that, does that mean India & Pakistan don’t value their offspring, since we produce so many?

What income classes are my neighbouring countries?

table(as.character(neighbors$Country.Name),as.character(neighbors$IncomeGroup))

##             
##              Low income Lower middle income Upper middle income
##   Bangladesh          0                  56                   0
##   Bhutan              0                  47                   0
##   China               0                   0                  47
##   India               0                  56                   0
##   Maldives            0                   0                  53
##   Nepal              56                   0                   0
##   Pakistan            0                  56                   0
##   Sri Lanka           0                  56                   0

Although Sri Lanka is also a part of the Lower middle income, and Nepal is in the Low income category, both countries are doing significantly better than India.

neighbors %>%
    filter(Year==2015) %>%
    arrange(MortalityRate)

## Source: local data frame [8 x 6]
## 
##   Country.Name Country.Code   Year MortalityRate              Region
##         (fctr)       (fctr) (fctr)         (dbl)              (fctr)
## 1     Maldives          MDV   2015           7.4          South Asia
## 2    Sri Lanka          LKA   2015           8.4          South Asia
## 3        China          CHN   2015           9.2 East Asia & Pacific
## 4       Bhutan          BTN   2015          27.2          South Asia
## 5        Nepal          NPL   2015          29.4          South Asia
## 6   Bangladesh          BGD   2015          30.7          South Asia
## 7        India          IND   2015          37.9          South Asia
## 8     Pakistan          PAK   2015          65.8          South Asia
## Variables not shown: IncomeGroup (fctr)

par(mai=c(0,0,0.4,0),xaxs="i",yaxs="i")
mapParams <- mapCountryData( sPDF, 
                             nameColumnToPlot="MortalityRate",
                             addLegend=FALSE,
                             colourPalette=palette,
                             oceanCol = 'lightblue',
                             missingCountryCol = 'darkgray',
                             mapRegion = 'asia')
do.call(addMapLegend, c(mapParams, 
                       horizontal=FALSE,
                       legendWidth=0.4, 
                       legendShrink=.6,
                       legendMar = 4,
                       labelFontSize=.8, tcl=-.4))

Certainly, my home country has a ways to go before we catch up to our neighbours, let alone developed high-income countries to the west.